Senior DevOps Engineer (AI + Azure)

EY

EY

Software Engineering, Data Science
Madrid, Spain
Posted on Nov 5, 2025

About Us

At EY wavespace Madrid - AI & Data Hub, we are a diverse, multicultural team at the forefront of technological innovation, working with cutting-edge technologies like Gen AI, data analytics, robotics, etc. Our center is dedicated to exploring the future of AI and Data.

Overview:

We’re looking for a Senior DevOps Engineer to build and run cloud and AI infrastructure at scale. You’ll own IaC with Terraform, CI/CD, Kubernetes, and Linux. You’ll also help run LLM workloads both in Azure and locally (Ollama/vLLM/llama.cpp). Your work will enable fast, secure, repeatable delivery.

Key responsibilities

  • Build and maintain Azure infrastructure with Terraform (modules, workspaces, pipelines, policies).
  • Design and operate CI/CD with GitHub Actions and/or Azure DevOps (multi-stage, approvals, environments).
  • Run containers and Kubernetes/AKS (Helm, ingress, autoscaling, node pools, storage).
  • Manage AI/LLM runtime: local model runners (Ollama, vLLM, llama.cpp), GPU/CPU configs.
  • Support RAG: embeddings pipelines, vector DBs (Azure AI Search/Cognitive Search, pgvector, Milvus), data sync, retention.
  • Automate platform tasks with Python (tooling, CLI utilities, API glue, ops scripts).
  • Implement observability (Azure Monitor, Prometheus/Grafana, logs/traces/metrics, alerts, runbooks, SLOs).
  • Apply Zero Trust security; Enforce least privilege and role-based access control (RBAC), Identity-based segmentation (Azure AD, Conditional Access, MFA).
  • Implement policy-as-code (OPA, Azure Policy) for compliance.
  • Rotate secrets and certificates via Key Vault; integrate with pipelines.
  • Add continuous security scanning (SAST/DAST, container image scanning).
  • Handle reliability: rollout strategies, health probes, incident response, postmortems.
  • Optimize costs: right-sizing, autoscaling, budgets, tags, reporting.

Key requirements:

  • 4+ years in DevOps/SRE/Platform Engineering.
  • Strong Linux (shell, systemd, networking, performance troubleshooting).
  • Terraform at scale (modules, state backends, CI/CD integration).
  • Deep Azure experience (AKS, VNets, Key Vault, Storage, Monitor, Identity, Networking).
  • CI/CD expertise (GitHub Actions and/or Azure DevOps).
  • Containers and Kubernetes in production.
  • Python or scripting for automation (solid scripting and tooling; not full-time app dev).
  • Hands-on with LLM setups (local runners or Azure OpenAI), embeddings, vector indexes, and RAG basics.

Nice to have

  • Multi-cloud exposure (AWS / GCP).
  • Azure AI services (Azure OpenAI, Cognitive Search).
  • GitOps (Argo CD/Flux), Helm packaging, OCI registries.
  • Eventing/queues (Event Grid, Service Bus, Kafka).
  • Security/compliance in cloud (CIS, NIST, Microsoft CAF).
  • Certifications: AZ‑104, AZ‑204, AZ‑400, AI‑900, HashiCorp Terraform Associate, CKA/CKAD.
  • Experience with GPU nodes, drivers, CUDA/ROCm, or CPU-only optimizations for LLMs.

How we work

  • Everything as code. PRs, reviews, and tests.
  • Small batches. Trunk-based or short-lived branches.
  • Clear runbooks and on-call rotation where needed.
  • Measure, alert, fix, and improve.

Our commitment to diversity & inclusion

We are genuinely passionate about inclusion and we support individuals of all groups; we do not discriminate on the basis of race, religion, gender, sexual orientation, or disability status.