faq

AnitaB.org Talent Network

Connecting women in tech with the best professional opportunities!

My job alerts

Cloud Site Reliability Engineer (SRE) - Data Management & Analytics Platform

Bloomberg

Software Engineering, Data Science

Princeton, NJ, USA

Posted on May 5, 2026

Apply now

At Bloomberg, data is at the heart of everything we do. As part of the Data Management and Analytics Platform (DMAP) SRE team you will play a critical role in driving analytics throughout the organization to improve our products, better engage with our customers, create greater efficiencies, and unlock new business opportunities through data-driven insights.

Our team is responsible for capturing and processing the who, what, when, where, and why of how clients use Bloomberg products, how our systems perform, and how employees interact with customers. We ingest and prepare massive volumes of data to power reporting, dashboards, self-service tools, and advanced analytics used across the company.

We are looking for a Cloud Site Reliability Engineer (SRE) who is passionate about building and operating highly reliable, scalable data platforms in the cloud. In this role, you will focus on ensuring the availability, performance, and scalability of critical data pipelines and analytics infrastructure. You will work at the intersection of software engineering and infrastructure, applying automation, observability, and reliability best practices to support large-scale distributed systems.

You’ll Be Trusted To

Design, build, and operate highly available, scalable, and resilient cloud infrastructure supporting large-scale data ingestion and analytics platforms
Define, implement, and monitor SLIs/SLOs for data systems and services; drive reliability improvements using error budgets and operational metrics
Improve observability across data pipelines and platforms through logging, metrics, tracing, and alerting
Automate infrastructure provisioning and system management using Infrastructure as Code (IaC)
Lead incident response efforts, perform root cause analysis (RCA), and implement post-incident improvements
Optimize performance, reliability, and cost efficiency of cloud-based data systems
Ensure data platform reliability, including batch and streaming pipelines, storage systems, and reporting infrastructure
Partner with data engineers, software engineers, and stakeholders to improve system reliability and operational maturity
Strengthen platform security through proactive monitoring, vulnerability management, and cloud security best practices
Continuously improve CI/CD pipelines and deployment processes for data infrastructure

You’ll Need To Have

5+ years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Strong proficiency in at least one programming or scripting language (Python, and/or Go)
Experience supporting production systems with a focus on reliability, scalability, and observability
Hands-on experience operating or designing highly available distributed systems.
A Bachelor’s degree in Computer Science, Engineering, Mathematics, or a related field, or equivalent professional experience

We’d Love To See

Experience supporting large-scale data platforms, data pipelines, or analytics infrastructure
Strong experience operating production systems in AWS at scale
Experience defining and managing SLIs, SLOs, and error budgets
Strong background in monitoring and observability tools (e.g., Prometheus, Grafana, CloudWatch, Datadog)
Experience leading incident management and conducting postmortems
Hands-on experience with Infrastructure as Code (Terraform or CloudFormation)
Experience building and maintaining CI/CD pipelines
Strong understanding of distributed systems and cloud architecture
Experience with containerized workloads (Docker, Kubernetes)
Knowledge of AWS services related to data platforms (e.g., S3, EMR, Lambda, Kinesis, Glue, Redshift)
Knowledge of Databricks or Snowflake platform
Experience with cloud networking concepts (VPCs, routing, security groups)
Experience optimizing cloud costs in large-scale environments
AWS certification (Associate level or above)
A security-first mindset and familiarity with compliance and data governance best practices
Experience using operational metrics and data to drive continuous improvement

Our most successful engineers are collaborative, data-driven, and take strong ownership of production systems end-to-end, ensuring the reliability of the data platforms that power Bloomberg’s analytics and insights.

Apply now

See more open positions at Bloomberg

Powered by Getro.com

Privacy policy Cookie policy

Our Mission

Our History

Our Team

Our Board of Trustees

Board of Trustees Student Nominations

Audited Financials

Careers

Mentorship

Apprenticeship Pathway Program

Talent Network

Founders

Membership

Lifetime Membership

Responsible AI Certification (RAIC)

Apprenticeship Pathway Program Apprentice

Apprenticeship Pathway Program Industry Partners

NEXT

Tech Collabs

GHC

Donate

Recurring Donate

Sponsors & Partner Opportunities

Membership Sponsorship

Our Communities

Systers

Gift Membership

Case Studies & White Papers

Technical Equity Experience Study (TechEES)

Impact Reports

Visual Impact Report

Top Companies

Pass It On Awards

AnitaB.org Tech Journey Scholarship

Our Resources

Blog

Podcast

Become a Member

AnitaB.org Talent Network

Cloud Site Reliability Engineer (SRE) - Data Management & Analytics Platform