Staff Software Engineer, Site Reliability
Intuit
Software Engineering
San Diego, CA, USA
USD 188,500-255k / year + Equity
Staff Software Engineer, Site Reliability
Company Overview
Intuit is the global financial technology platform that powers prosperity for the people and communities we serve. With approximately 100 million customers worldwide using products such as TurboTax, Credit Karma, QuickBooks, and Mailchimp, we believe that everyone should have the opportunity to prosper. We never stop working to find new, innovative ways to make that possible.
Job Overview
Come join Intuit's Identity Team as a Staff Software Engineer who sits at the true intersection of software development and site reliability to build and operate large scale systems that are secure, fault-tolerant, performant, highly available, affordable, and scalable. This is not a traditional SRE role - we're looking for an engineer who writes production-quality code as readily as they debug infrastructure, and who is equally at home building new system reliability tools and capabilities as they are increasing the resiliency, and operational excellence of the Identity platform capabilities as a whole.
Identity is at the heart of all offerings across Intuit and is foundational to strategic transformation of Intuit. Identity at Intuit is one of the most critical services powering close to 500+ applications/services and enables Intuit’s 3 strategic big bets.. Identity capabilities position Intuit at the center of the financial ecosystem and enable fluid exchange of Identity, profile and data across an ecosystem of financial institutions. Identity's technical stack is cloud native microservices based architecture fully operating on Kubernetes & AWS cloud. The work is complex, the stakes are high, and the impact is company-wide.
Responsibilities
Software Development (~50%)
Design and build tools, automation systems, and self-healing mechanisms from the ground up - this means writing real, production code, not scripts
Develop self-service tools and services that enable Identity developers to troubleshoot and triage issues at scale
Build and evolve observability components to detect and isolate issues quickly across massive-scale systems
Contribute to the cost and capacity management, uncovering cost saving opportunities and developing automation to enforce optimization at scale
Leverage AI to build tools, solve complex operational and auto-healing problems at scale
Support and coach other engineers, pair programming or peer reviewing code, helping to ensure that all engineers are growing and part of a community. Be a role model to engineers and inspire a high technical bar for the team.
Systems Reliability & Infrastructure (~50%)
Act as the technical subject matter expert to evaluate and evangelize forward-looking processes, tools, technologies and architecture to deliver high-quality secure software faster, efficiently and meeting availability, scale & performance requirements in a AWS public cloud and Kubernetes environment.
Partner with Architecture, Product, and Operations on infrastructure target state, Resilience and Operational Excellence (OpEx) best practices & patterns and influence the roadmap to non-linearly improve all -ilities
Contribute to FMEA (Failure Mode Effective Analysis) and Chaos Engineering to proactively identify resiliency gaps and prepare for faster recovery during incidents.
Drive Incident management and Incident Root Cause Analysis (RCA) to continuously improve development and operational practices
Participate in 12/7 on-call rotation
Qualifications
Required
BS/MS in Computer Science, Engineering, or equivalent experience
10+ years of experience in software engineering, with demonstrated hands-on expertise designing, developing (not just scripting) and operating complex (high scale and high availability) distributed systems in a cloud-native architecture and AWS environment.
Experience using AI to build tools and solve complex operational and auto healing problems.
Coding in Python, Java, Go or similar languages combined with strong operational skills
Experience in Infrastructure as code (Terraform/CDK preferred), CI/CD pipelines (Jenkins, CircleCI, or similar), Kubernetes and containerization (Docker, ECS) and Monitoring / Alerting tools (Splunk, Wavefront, Grafana Mimir)
Ability to handle a fast-paced environment for iterative project turnarounds on mission critical systems.
Ability to collaborate across a wide range of roles and experience levels. Strong communication skills.
Strong Linux/Unix fundamentals
Who Thrives in This Role
You're a software engineer who genuinely enjoys building tools and services that solve reliability and operational problems - not someone who codes occasionally. You're comfortable writing complex, scalable software and also rolling up your sleeves on an incident to drive investigation, resolution and learnings towards continued improvements. You don't see "dev" and "ops" as separate worlds. If that's you, we want to talk.
Intuit provides a competitive compensation package with a strong pay for performance rewards approach. This position will be eligible for a cash bonus, equity rewards and benefits, in accordance with our applicable plans and programs (see more about our compensation and benefits at [1] Intuit®: Careers | Benefits). Pay offered is based on factors such as job-related knowledge, skills, experience, and work location. To drive ongoing fair pay for employees, Intuit conducts regular comparisons across categories of ethnicity and gender. The expected base pay range for this position is: San Diego: $188,500 - $255,000 References Visible links 1. https://www.intuit.com/careers/benefits/full-time-employees/ San Diego $188500 - $255000