Business Intelligence Engineer (BIE), Data, eNgineering and Analytics (DNA)
Amazon
Description
The Data eNgineering and Analytics (DNA) Wizardry team is excited to have a new wizard join our coven! The DNA team spans across Data Architecture (DA), Data Engineering (DE), Business Intelligence (BI) and Data Science (DS) functional areas, and is responsible for providing data and analytic solutions for the North American Stores (NAS) Organization.
This role focuses on architecting and implementing comprehensive knowledge base management solutions while ensuring robust data infrastructure for machine learning applications. The position requires expertise in designing enterprise-scale Knowledge Base Management Systems (KBMS) and data catalogs that serve as the organizational single source of truth for data assets, definitions, and relationships. You will be responsible for developing logical and physical data models, designing schema architectures, and managing data warehouses (Redshift) to optimize data organization for efficient analysis. Core responsibilities include implementing data governance best practices, ensuring data quality through validation frameworks, and preparing complex datasets for machine learning applications. Working closely with our data scientist, you will ensure data readiness for bringing ML/LLM models into production. The role also encompasses creating impactful dashboards and reports using Quick Suite to communicate key insights and track performance metrics. Success in this position requires a deep understanding of data architecture designed for machine learning workflows, strong technical skills in preparing high-quality, feature-rich datasets, and the ability to collaborate effectively with cross-functional teams to drive data-driven decision making and unlock predictive insights.
The role requires high proficiency in complex SQL and Python scripting, often joining together multiple types of data sets that range from normalized summaries to raw information spanning many diverse sources. The role will need to have a high bar for ownership and work autonomously to create high-quality products for the customer (internal). The role is required to communicate project roadmap, prioritization, and release notes of new products with customer and stakeholder groups up to senior leadership. In addition, this role will influence our team and customers to build scalable and sustainable analytic solutions. They will review product development and provide mentorship on projects.
Key job responsibilities
Data Architecture & Modeling
Design and implement robust dimensional models (Redshift) that optimize data organization for analytical workloads while maintaining clear documentation and architecture standards.
Pipeline Development
Build and maintain scalable ETL pipelines that integrate data from multiple sources, applying necessary transformations and business rules while ensuring performance and reliability.
Feature Store Development
Design and manage centralized feature stores that provide versioned, production-ready features for ML/LLM models, while implementing reproducible data preparation workflows for streamlined model development.
Knowledge Management
Develop and maintain comprehensive metadata repositories and data catalogs that serve as the single source of truth for data assets, including lineage, transformations, and usage guidelines.
Quality & Governance
Establish and enforce data quality frameworks through automated validation checks, monitoring systems, and security protocols to ensure data accuracy and compliance.
Stakeholder Management
Collaborate with cross-functional teams to translate business requirements into technical solutions while effectively communicating data constraints and limitations.
Process Automation
Create automated workflows for routine data preparation tasks, reporting, and monitoring to improve operational efficiency and enable self-service data access.
Technical Skills/Tech Stack:
SQL: Essential for data extraction, manipulation, analysis, and writing complex queries against databases and data warehouses.
Python: The primary language, used for scripting, building ETL pipelines, complex data transformations (using libraries like Pandas and NumPy), automation, and collaborating on feature engineering with data scientists.
ETL Processes and Tools: Expertise in designing, building, and maintaining robust data pipelines using tools like AWS Glue
Cloud Platform Expertise (AWS): Proficiency with key AWS data services:
Amazon S3: For scalable, durable data storage (data lake).
Amazon Redshift: For managing and querying structured data warehouses.
AWS Glue: For server-less ETL, data cataloging, and schema discovery.
Amazon Athena: For server-less ad-hoc querying of data in S3.
AWS Lambda: For event-driven automation of data workflows.
A day in the life
In this role, you will build enterprise knowledge bases and prepare the data infrastructure for advanced analytics and ML applications. You'll combine data architecture expertise with AWS services (S3, Redshift, Glue) to develop robust ETL/ELT pipelines and maintain data quality standards. You will focus on curating comprehensive data catalogs with clear lineage, definitions, and context, transforming raw data into reliable enterprise assets. Working closely with data scientists, you'll prepare analysis-ready datasets, manage feature stores, and ensure data infrastructure supports both operational reporting and machine learning initiatives. This role bridges the gap between raw data and actionable insights, enabling both traditional BI reporting and advanced ML/LLM model development.
About the team
Our mission is to simplify data-driven decision making for customers through effective analytics solutions. We deliver accurate, complete, and timely information while maintaining rigorous quality standards. The solutions transform complex data into actionable insights, eliminating the need for technical expertise. This enables customers to optimize their business performance through direct data analysis.
The team builds robust processes and identifies improvement opportunities while partnering with stakeholders to enhance internal systems. We ensure proper data governance, visibility, and accessibility for all managed datasets. Our solutions focus on improving business performance (Increase Sales, Decrease Time and/or Reduce Cost) for the NAS organization.