
Politiets IT-enhet
Building a privacy-first data collection platform for speech recognition AI
Project details
As part of my bachelor's thesis, I collaborated with three fellow students and the Norwegian Police IT Unit (PIT) to develop a web application designed to collect voice recordings for the training of future speech recognition models.
The project was initiated as part of a larger research effort to develop artificial intelligence capable of transcribing police interrogations. Before such a model could be trained, a large and diverse dataset of Norwegian speech recordings needed to be collected. Our responsibility was to design, develop, and evaluate a platform capable of gathering this data while satisfying strict privacy and scalability requirements.
Working closely with stakeholders from the Police IT Unit, our team delivered a complete solution that balanced user experience, technical requirements, and GDPR considerations within Microsoft's Azure ecosystem.
Technologies





The Challenge
At the time, police interrogations were transcribed manually, requiring significant time and resources. Existing speech-to-text solutions from major technology providers struggled with Norwegian dialects, slang, and non-standard language commonly encountered in interrogations.
The success of the future AI model depended heavily on collecting large amounts of high-quality audio data representing a diverse range of Norwegian voices and dialects.
The platform therefore needed to satisfy several critical requirements:
My Contribution
The project was developed collaboratively as part of a four-person bachelor's thesis team, where I was primarily responsible for the frontend architecture, design system implementation, and user experience of the application.
My work focused on translating requirements into a responsive and accessible interface that would function reliably across desktop and mobile devices while making the process of contributing voice recordings as simple as possible. In addition to implementing the frontend application, I was responsible for developing and maintaining the project's design system, ensuring consistency across the platform and creating reusable components that could scale alongside the application.
A key objective throughout development was creating an experience that encouraged participation while maintaining the privacy and simplicity required by the project's goals.
Privacy & GDPR
Privacy was one of the most important aspects of the project. Because the application collected voice recordings intended for AI training, strict requirements were placed on how data could be collected, stored, and processed.
The platform was designed around the principle of collecting only the minimum amount of information required for the project. No personally identifying information was stored, and users retained control over their own submitted recordings. This privacy-first approach allowed the solution to satisfy both technical and legal requirements while maintaining trust with participants.
Cloud Architecture
The application was designed to be deployed within the Police IT Unit's Microsoft Azure environment.
Particular attention was given to scalability, reliability, and maintainability to ensure the platform could support large-scale data collection efforts without compromising performance.
User Experience
One of the project's primary goals was encouraging participation.
The experience was designed to feel simple, approachable, and efficient, reducing the effort required to contribute recordings. Users were guided through reading predefined text passages, recording their voice, and submitting recordings through a streamlined workflow.
By minimizing complexity and prioritizing usability, the platform increased the likelihood of users contributing multiple recordings over time, helping improve the quality and diversity of the collected dataset.
Impact
The project provided the Police IT Unit with a dedicated platform for collecting the large-scale voice dataset required to support future speech recognition research.
By combining privacy-conscious design, scalable cloud infrastructure, and an accessible user experience, the solution established a foundation for the development of AI systems capable of better understanding Norwegian dialects, slang, and non-standard language patterns used in real-world police interrogations.
“Politiets IT-Enhet is responsible for this website and the processing of the personal data collected.”
Tage Stabell-Kulø
Technology Principal | Politiets IT-enhet

Research & Industry Collaboration
This project was conducted in close collaboration with the Norwegian Police IT Unit (PIT), providing a unique opportunity to work on a real-world problem with practical implications for the future of law enforcement technology.
Throughout the project, our team worked directly with stakeholders from PIT to understand technical requirements, privacy concerns, operational workflows, and long-term objectives. The collaboration involved regular discussions, feedback sessions, and validation of both technical and user experience decisions.
Unlike a traditional academic exercise, the project required balancing research objectives with the realities of building a production-ready solution. This included addressing challenges related to scalability, GDPR compliance, cloud infrastructure, accessibility, and user adoption.
The experience provided valuable insight into how software development, research, and public sector innovation intersect, while demonstrating how modern web technologies can support emerging artificial intelligence initiatives.







Other related cases

