Hero Image

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!
0
Companies
0
Jobs

Site Reliability Developer 4

Oracle

Oracle

Software Engineering
Mexico
Posted on Jan 23, 2026

What You’ll Do

  • Capacity Engineering – Act as a strategic capacity partner, immersing in the end-to-end architecture and performance of SaaS production services. Ensure mission-critical workloads—including emerging agentic AI and MLOps pipelines—are forecasted, scaled, and optimized for OCI cloud capacity at enterprise scale.
  • Cost Engineering – Translate SaaS capacity architectures into cost models that improve efficiency year over year. Partner with Cost Engineers to drive down infrastructure margins while enhancing reliability, producing actionable forecasts and executive-level insights.
  • AI/MLOps & Automation – Apply deep knowledge of AI, MLOps, and orchestration to streamline operations, eliminate technical debt, and propose automation opportunities. Collaborate with AI/ML Ops and data engineering teams to evolve architectures, enhance scalability, and influence future OCI feature sets.
  • Run-the-Business Support – Deliver detailed capacity roadmaps that define tuning, scaling, and demand characteristics. Communicate inflection points and future requirements to the Cloud Capacity Run-the-Business organization for seamless planning.
  • Technical Expertise – Leverage a strong foundation in cloud capacity topologies (compute, storage, network) to identify dependencies and drive service reliability improvements. Prior experience across DB, middleware, containers, or networking is valuable in translating complex architectures into capacity supply requirements.
  • Cross-Team Collaboration – Engage confidently across all levels of the organization, from ICs to executives, as a trusted advisor on SaaS capacity. Present data-driven insights with clarity and executive presence.
  • Curiosity & Breadth – Approach services with professional curiosity, exploring APIs, profiling workloads, and analyzing anomalies to anticipate demand and performance needs.

Your Experience

  • Bachelor’s degree in Computer Science or related field; Master’s preferred
  • Relevant Cloud MLOps / AI certifications (e.g., AWS ML Specialty, GCP ML Engineer, Azure AI, NVIDIA MLOps, Linux Foundation MLOps Practitioner)
  • 10+ years senior engineering experience across one or more domains: databases (Oracle DB preferred), virtualization/middleware, container orchestration, networking, or monitoring/observability
  • Proven expertise in forecasting, scaling, and cost-optimizing capacity for AI/ML and MLOps workloads, including dynamic and agentic workloads, across hybrid and cloud environments
  • Strong knowledge of Oracle OCI cloud services
  • Advanced analytical skills with experience building and interpreting complex models (Excel or equivalent)
  • Exceptional communication and stakeholder-management skills; ability to translate engineering into executive-ready narratives
  • Experience driving initiatives in fast-paced, dynamic, cross-functional environments

As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.

We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.


We are seeking a Strategic Capacity Engineer to design, forecast, and optimize SaaS cloud capacity for AI/ML pipelines, MLOps platforms, and emerging agentic workloads. You will leverage deep direct expertise in cloud topologies (compute, storage, network) and automation/orchestration to ensure OCI services scale reliably and cost-efficiently. This senior role requires hands-on skill in modeling dynamic workloads, tuning infrastructure for performance, and translating complex architectures into actionable capacity strategies. You will partner across product and engineering to drive automation, improve reliability, and deliver executive-level insights that balance growth with cost discipline.

Career Level - IC4


  • Partner with SRE and Product Engineering on shared ownership of SaaS services, ensuring reliability, security, scale, and performance across OCI.
  • Forecast, design, and optimize capacity for AI/ML pipelines, MLOps platforms, and emerging agentic workloads; model dynamic demand and define scaling strategies.
  • Translate complex product architectures into capacity and cost models, aligning infrastructure with SaaS business priorities.
  • Drive automation and orchestration initiatives to reduce technical debt, accelerate delivery, and enhance service resiliency.
  • Serve as an escalation point for complex, cross-stack issues, leveraging deep knowledge of service topology and dependencies.
  • Collaborate with development teams to evolve SaaS Capacity architectures, propose cloud feature enhancements, and guide the addition of new capabilities to the Oracle Cloud portfolio.
  • Deliver clear communication of scale, capacity, performance, and cost characteristics to stakeholders, from engineers to executives.
  • Apply professional curiosity to explore APIs, workload profiles, and anomalies, turning insights into capacity and reliability improvements.