Hero Image

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!
0
Companies
0
Jobs

Senior Site Reliability Engineer

Red Hat

Red Hat

Software Engineering
Pune, Maharashtra, India
Posted on Jan 26, 2026

About the Job :

The Red Hat IT Automation & Intelligence Evolution (AIE) team is seeking a senior site reliability engineer to drive our strategic shift from traditional operations to intelligent automation and AIOps. In this pivotal role, you will serve as a technical lead for reliability and a strategic consultant to the wider organization.

You will design and implement self-service platforms, drive AI-driven operational workflows, and spearhead our "alert-noise reduction" campaigns. You will act as a technical leader, mentoring junior engineers and partnering with internal teams to identify high-ROI automation opportunities. Your goal is not just to resolve issues but to permanently remove the "hidden tax" of toil and interruptions through engineering and AI adoption.

What wil you do?

Reliability Engineering & Standards

  • Define and Enforce SLOs: Lead the definition of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services, managing Error Budgets to balance feature velocity with system stability.

  • AIOps Implementation: Drive the adoption of AIOps solutions (including Anomaly Detection and Predictive Alerting) to reduce incident volume and improve Mean Time to Resolution (MTTR).

  • Resilience Engineering: Design and lead Chaos Engineering experiments (e.g., fault injection) to validate system recovery and uncover weaknesses before they impact production.

Automation & Efficiency

  • Eliminate Toil: Identify manual, repetitive work patterns and engineer complex automation solutions to eliminate them, aiming to boost overall team capacity.

  • Intelligent Workflows: Move beyond basic scripting to build intelligent agents and workflows using tools like the Model Context Protocol (MCP) and LLM integrations to automate decision-making processes.

  • Infrastructure as Code: Maintain and evolve the "Infrastructure as Code" ecosystem, ensuring robust configuration management and version control standards are applied across the environment.

Enablement & Leadership

  • Internal Consulting: Act as a subject matter expert, engaging with other engineering teams to scope their automation needs and help them build/manage their own workflows.

  • Incident Command: Lead high-severity incident response efforts, serving as the Incident Commander when necessary.

  • Root Cause Analysis: Facilitate blameless post-mortems, focusing on systemic root causes (Graph Algorithm Design, Blast Radius Analysis) rather than human error to prevent recurrence.

  • Mentorship: Mentor junior SREs, conducting code reviews and guiding them through complex troubleshooting and systems engineering principles.

What will you bring?

Technical Competency:

  • Programming: Proficiency in Python or Go, with experience in building modular, scalable software.

  • Automation: Proficiency with Ansible for configuration management, orchestration, and automation workflows.

  • Observability Stack: Expert-level knowledge of monitoring ecosystems, specifically the TIGK Stack (Telegraf, InfluxDB, Grafana, Kapacitor) and Prometheus.

  • Cloud & Containerization: Deep understanding of Linux environments, Kubernetes/OpenShift, and public cloud infrastructure (AWS/Azure/GCP).

SRE Methodology:

  • Demonstrated experience designing and implementing SLIs, SLOs, and Error Budgets.

  • Proven track record of Toil Reduction strategies and implementation.

  • Experience with Incident Management lifecycles (escalation policies, paging, and post-mortems).

Soft Skills:

  • Growth Mindset: Open-minded approach to problem-solving and a demonstrated willingness to learn and adopt new technologies.

  • Strategic Thinking: Ability to translate business goals into technical roadmaps.

  • Communication: Strong ability to explain complex reliability concepts to non-SRE teams and leadership.

The following are considered as a plus:

  • Automation Platforms: Experience with Ansible Automation Platform (AAP) or similar configuration management tools for enterprise-scale environments.

  • AI/LLM Integration: Experience with Model Context Protocol (MCP), Claude Plugin development, or integrating LLMs into operational workflows.

  • Data Science for Ops: Experience with regression data or algorithms for predictive alerting.

  • Security: Experience with hardening systems (Bastion Hosts) and managing security policies within automation workflows.

#LI-AK1

About Red Hat

Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.

Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village.

Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.


Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.


Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistance@redhat.com. General inquiries, such as those regarding the status of a job application, will not receive a reply.