Platform Engineer - Vice President
Citi
The CTS Enterprise Analytics Services (EAS) organization is actively recruiting for a strong Platform Engineer to work on a broad spectrum of engineering initiatives. EAS organization is driving enterprise-wide strategy of engineering and managing best in class data and analytics services including Big Data platforms, Spark services, AI & ML services, etc. The role requires a thought leader who can perform hands on work in partnership with key stakeholders, architects, engineers, data scientist and devops teams to engineer and deliver highly resilient solutions.
Responsibilities:
- Assess the current landscape and book of work, and partner with various teams to identify key areas for infrastructure automations, configuration management, monitoring, alerting, etc.
 - Continuously work on designing and improving processes of detecting and responding to production service outages and build preventive solutions.
 - Act as the subject matter expert in Site Reliability Engineering to help drive engineering vision set by EAS stakeholders.
 - Produce availability and performance metrics for services and deliver processes to improve on major KPIs.
 - Operationalize highly available services deployed across multi-region and multi-data center environments.
 - Handle outages, perform root cause analysis, and provide architectural and engineering recommendations.
 - Build internal knowledge base to educate partners and support teams.
 
Skills:
- Proven track record of system design experience with highly available platforms and services supporting various types of workloads.
 - Experience in designing fail-over processes and solutions.
 - Strong scripting skills – shell scripts, Python, Perl, etc.
 - Experience with virtualization, containerization, and cloud technologies – Docker, Kubernetes and Cloud Service Providers e.g. GCP, AWS, etc.
 - Analytical thinker able to assess various aspects to methodically arrive at a solution.
 - Hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
 - Experience with monitoring and logging solutions and frameworks e.g. OTEL, Grafana, Prometheus, Kibana, Splunk, etc.
 - Hands on work on installing, configuring and troubleshooting Linux based environments.
 - Experience in IaC and CI/CD tooling e.g. Terraform, Jenkins, Harness, etc.
 - Strong knowledge of configuration management tools e.g. Ansible and/or Chef.
 - Familiarity with GPU management in virtualized enterprise environments.
 - Good understanding of security concepts and best practices.
 - Excellent written and verbal communication skills.
 - Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
 - Ability to work in a matrixed environment and follow procedures, processes and policies.
 - Experience managing vendor interactions for troubleshooting sessions, enhancement requests, and guiding vendor roadmaps to meet Citi standards and functional requirements.
 
Self-starter who works with minimal supervision and can work in a team of diverse skills and geographies.
 
Skills:
- Proven track record of designing and supporting highly available platforms and services supporting various types of workloads.
 
- Experience in designing fail-over processes and solutions.
 
- Strong scripting skills – shell scripts, Python, Perl, etc.
 
- Experience with virtualization, containerization, and cloud service providers – VMware, Docker, Kubernetes, AWS, GCP, etc.
 
- Analytical thinker able to assess various aspects of a work item to methodically arrive at a solution.
 
- Individual with hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
 
- Hands on work on installing, configuring and troubleshooting Linux based environments.
 
- Expertise with infrastructure automation, build, and deployment technologies for IaC and CI/CD e.g. Terraform, Ansible, Harness, ArgoCD, etc.
 
- Good understanding of security concepts and best practices.
 
- Strong experience with logging, monitoring, tracing, and visualization stacks e.g. ELK, Splunk, Grafana, etc.
 
- Excellent written and verbal communication skills.
 
- Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
 
- Experience managing vendor interactions for troubleshooting sessions, enhancement requests, and guiding vendor roadmaps to meet Citi standards and functional requirements.
 
- Self-starter who works with minimal supervision and is able to work in a team of diverse skills and geographies.
 
------------------------------------------------------
Job Family Group:
Technology------------------------------------------------------
Job Family:
Systems & Engineering------------------------------------------------------
Time Type:
Full time------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.