Site Reliability Engineer
Microsoft
Site Reliability Engineer
Belgrade, Serbia
Save
Overview
We are Azure Data, a Microsoft team that drives the future of data processing in the Microsoft Cloud. Our software development team, located in Belgrade, is building some of the most advanced and widely used data processing cloud services in the world. The services we build are based on groundbreaking technology and are global market leaders, with millions of active users.
Azure SQL Managed Instance is our customer's first choice service for migrating existing SQL Server instances from on-premise data centers to the cloud. Thousands of customers have easily migrated their apps to this service and got all the benefits of automated database management (patching, backups, high-availability, security). Keeping them run smooth and highly available round the clock, while satisfying huge scale is a daunting challenge, one that we enjoy tackling.
We are looking for a Site Reliability Engineer to join our SRE team and closely collaborate with our software engineers, support teams and other partners to ensure a great experience to our customers. Running software as a service means more than just developing and releasing features. Ensuring reliability and serviceability is a critical part of the software cycle. This is where you come into the picture. As a Site Reliability Engineer, you will ensure the service of Azure SQL Managed Instance run smoothly with required reliability and availability. You will design and implement software to automatically resolve issues. You will work closely with feature teams to design, implement and release features that are reliable and serviceable. You will be a cross-domain expert who has a holistic view of our cloud service.
This is an opportunity to work with some of the best engineers in the industry to continue to innovate and deliver Azure SQL Managed Instance for the Cloud. The challenges span the entire stack of database technology - connectivity, high availability, programming language, query processing, transaction processing and data management involving hundreds of nodes. You will learn what it takes to deploy and run software as a 24x7 enterprise grade cloud service!
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required
- Bachelor’s degree in Computer Science, Information Technology, or related field OR equivalent experience.
- Knowledge of cloud concepts, distributed systems, and at least one programming language (C#, Java, Python).
- Familiarity with OS fundamentals and concepts (such as processes, threading, memory allocation), networking basics, understanding of how applications are affected by the above, and ability to debug the same.
Preferred
- Proficient programming skills using managed code such as C#/Java. Ability to read native C/C++ code to debug issues and find answers not documented.
- Exposure to Azure or other cloud platforms.
- Experience with source control software such as git.
- Experience with observability tools or CI/CD pipelines.
Responsibilities
- Act as subject matter expert for configuring, troubleshooting and monitoring Azure SQL Managed Instance service.
- Identify opportunities and implement automation to resolve and reduce live-site incidents.
- Design and implement solutions to improve service health, manageability, reliability and telemetry.
- Implement configuration and data changes safely using automation and tooling.
- Troubleshoot issues affecting availability, reliability, or performance and propose solutions.
- Contribute to post-incident reviews and document operational processes.
- Ability to meet on-call responsibilities periodically.