Staff Voice AI Engineer - Applied AI
Uber
About the Role:
Applied AI at Uber builds intelligent systems that power next-generation product experiences for riders, drivers, merchants, and couriers. As a Staff Voice AI Engineer, you will lead the design and deployment of large-scale, real-time Voice AI systems that enable natural, reliable, and intelligent voice interactions across Uber’s ecosystem.
You will operate as a full-stack technical leader across speech modeling, LLM-powered conversational intelligence, and low-latency backend infrastructure — owning Voice AI systems end-to-end, from model development and evaluation to highly available, distributed production services. This includes advancing capabilities in automatic speech recognition (ASR), text-to-speech (TTS), spoken language understanding, and LLM-driven dialogue systems.
You will partner closely with product, design, and infrastructure teams to translate customer pain points into seamless voice-first experiences — setting the foundation for how Voice AI is built, deployed, and operated across Uber’s global platform.
What You Will Do:
- Design and build end-to-end Voice AI solutions, from understanding customer pain points and defining product requirements to deploying LLM-powered, real-time voice interfaces in production.
- Benchmark and evaluate voice AI systems, including speech recognition, speech synthesis, and spoken language understanding, by designing evaluations, analyzing results, and identifying systematic weaknesses.
- Improve voice model performance through system prompt tuning, fine-tuning voice- and speech-specific models, and optimizing architectures for low-latency, real-time voice interactions.
- Analyze voice request logs, prompt traces, and audio inputs to diagnose failure modes, improve transcription accuracy, conversational quality, and overall user experience.
- Build and maintain internal tools and platforms to automate Voice AI workflows, such as large-scale transcription pipelines, real-time audio processing services, and evaluation harnesses for voice quality.
- Own Voice AI systems in production end-to-end, including rollout strategies, monitoring, alerting, quality regression detection, and on-call readiness.
- Collaborate closely with product, design, and research teams to translate user needs into Voice AI capabilities with measurable business and customer impact.
Basic Qualifications:
- 10+ years of experience in software engineering, data science, or machine learning, including a track record of shipping production AI systems.
- Deep understanding of large language models, including fine-tuning, prompt engineering, embeddings, and retrieval-augmented generation (RAG).
- Strong backend and distributed systems expertise, with experience designing and operating highly available, scalable services in production.
- Deep experience with ML infrastructure, including model training pipelines, online serving systems, feature stores, experiment platforms, and evaluation frameworks.
- Hands-on experience with distributed data processing systems (e.g., Spark, Flink, Ray) and workflow orchestration (e.g., Airflow or equivalent).
- Ability to analyze data, run experiments, and derive insights for model and product improvement.
- Excellent communication and collaboration skills across technical and non-technical teams.
Preferred Qualifications:
- Experience building evaluation frameworks for Voice AI, including metrics and human/LLM-assisted evaluations for speech recognition accuracy, latency, robustness, and naturalness of synthesized speech.
- Demonstrated expertise in machine learning fundamentals applied to voice, including model evaluation, training, and fine-tuning of ASR, TTS, or speech-language models.
- Proven experience deploying Voice AI systems to production, with an emphasis on low-latency, high-reliability, real-time environments.
- Experience writing developer documentation, creating voice-specific SDKs, or enabling internal teams to build on shared Voice AI platforms.
- Hands-on work with large-scale audio datasets, including data curation, labeling strategies, and optimization of voice processing pipelines at scale.
For San Francisco, CA-based roles: The base salary range for this role is USD$232,000 per year - USD$258,000 per year.
For Sunnyvale, CA-based roles: The base salary range for this role is USD$232,000 per year - USD$258,000 per year.
For all US locations, you will be eligible to participate in Uber's bonus program, and may be offered an equity award & other types of comp. All full-time employees are eligible to participate in a 401(k) plan. You will also be eligible for various benefits. More details can be found at the following link https://jobs.uber.com/en/benefits.
Uber's mission is to reimagine the way the world moves for the better. Here, bold ideas create real-world impact, challenges drive growth, and speed fuels progress. What moves us, moves the world - let's move it forward, together.
Uber is proud to be an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know by completing this form.
Offices continue to be central to collaboration and Uber's cultural identity. Unless formally approved to work fully remotely, Uber expects employees to spend at least half of their work time in their assigned office. For certain roles, such as those based at green-light hubs, employees are expected to be in-office for 100% of their time. Please speak with your recruiter to better understand in-office expectations for this role.