Senior Software Engineer
Microsoft
Overview
As part of the Microsoft Azure AI Knowledge group, the team builds Document Intelligence capabilities that semantically structure documents for intelligent processing across traditional scenarios (RPA, search indexing, compliance, security) and modern LLM-based applications (RAG, agent memory). The team is on a mission to empower people through AI in everyday document tasks.
We are looking for a Senior Software Engineer to be a technical expert in our core engine team. In this role, you will bridge the gap between AI models and production infrastructure, taking ownership of complex architectural decisions. You will drive initiatives to optimize AI model inference, architect highly efficient infrastructure, and design scalable features to extend our document understanding capabilities.
This is a high-impact role designed for an experienced engineer who thrives on solving ambiguous technical challenges. You will work alongside researchers and service team to turn state-of-the-art prototypes into robust, high-performance, enterprise-grade solutions used by customers worldwide, while also acting as a mentor to elevate the engineering bar of the team.
Microsoft Mission
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Responsibilities
Runtime Architecture & Development: Lead the design and implementation of critical, high-quality code in C++, C#, and Python. You will design and implement the core inference for our exceptional OCR and document layout analysis engine. You will also design the technical strategy for integrating Microsoft built-in and open-source solutions to support a broader range of formats (Word, Excel, PowerPoint), ensuring the system is extensible and maintainable.
Inference Optimization Strategy: Spearhead efforts to optimize deep learning model inference for maximum speed and throughput. You will define performance benchmarks and drive low-level optimizations, utilizing hardware accelerators to ensure our models run efficiently at massive scale.
System Infrastructure & Scalability: Architect and oversee the pipeline design for high-scaling AI services. You will drive best practices in containerization and deployment (Docker, Kubernetes), ensuring the system is not only functional but resilient, observable, and cost-effective.
Technical Leadership & Mentorship: Act as a technical role model within the team. You will lead code reviews, drive architectural discussions across teams. You will be responsible for upholding engineering excellence and debugging the most complex, systemic issues that span across the stack.
Qualifications
Qualifications
Required/Minimum Qualifications (RQs/MQs)
Master’s degree in Computer Science or a related field (or equivalent practical experience).
5+ years of professional software engineering experience with a track record of delivering complex, high-impact systems.
Expert-level proficiency in at least one of the following languages: C++, C#, or Python, with the ability to architect solutions across a polyglot environment.
Deep understanding of Computer Science fundamentals, including advanced data structures, algorithms, and distributed system design.
System Architecture: Proven ability to design systems that are scalable, reliable, and maintainable, capable of handling ambiguity and trade-off decisions.
Additional or Preferred Qualifications (PQs)
Proven experience in high-performance model inference optimization (e.g., CUDA, TensorRT, ONNX Runtime) in a production environment.
Deep expertise with containerization and orchestration technologies, specifically Docker and Kubernetes, at an enterprise scale.
Solid understanding of Machine Learning concepts and hands-on experience integrating ML frameworks (e.g., PyTorch, TensorFlow) into production pipelines.
Experience in low-level code optimization for latency and memory management.
Experience processing complex document formats (PDF internals, Office Open XML) is a strong plus.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.