FEI Systems is dedicated to creating innovative technology solutions that enhance the delivery of health and human services. They are seeking a Data Engineer to support Machine Learning and AI initiatives, focusing on maintaining high-quality data within their cloud-based platform to support model training and deployment.

Responsibilities:

Design, build, and maintain scalable data pipelines supporting ML/AI workloads
Engineer pipeline patterns including full loads, incremental loads, change-based loads, and slowly changing dimensions
Ensure pipelines are reliable, performant, secure, and maintainable, troubleshoot and monitor pipelines within an AWS ecosystem
Perform data transformations in Snowflake using SQL and native Snowflake features
Design and optimize schemas, tables, views, and materialized views for ML/AI consumption
Support AWS-native data lake patterns using S3, Glue, Athena, Apache Iceberg, and S3 Tables
Perform data cleansing, normalization, and enrichment to support ML model development
Design and implement feature engineering pipelines including aggregation and transformation
Ensure consistency, reuse, and versioning of features across models and use cases
Support feature store patterns to enable feature discoverability and reuse
Collaborate with ML engineers and data scientists to operationalize features into training pipelines
Support model training workflows, including dataset preparation and scheduled refreshes
Ensure training datasets and features are reproducible, traceable, and auditable
Integrate data pipelines into CI/CD workflows; support version control, testing, and deployment of data assets
Monitor pipeline health, data freshness, and downstream impact on ML/AI systems

Requirements:

5+ years of hands-on data engineering experience in a cloud environment
Strong proficiency in Python for data processing and pipeline development
Advanced skills in SQL with hands-on Snowflake transformation experience
Experience with ELT pipeline design, schema optimization, performance tuning, and cost management in Snowflake
Experience with querying, data modeling, and analytics in PostgreSQL; familiarity with SQL Server to PostgreSQL migration is a plus
Familiarity with AWS services including S3, Glue, Athena, and managed relational databases (e.g., Aurora, RDS)
Familiarity with Apache Iceberg / S3 Tables and open table format ecosystems
Experience with streaming ingestion tools (e.g., Kinesis, Kafka, or equivalent)
Experience with workflow orchestration tools (e.g., Airflow, Step Functions, or equivalent)
Experience with full loads, incremental loads, append-only pipelines, change-based processing, and slowly changing dimensions (SCDs)
Experience with data validation, reconciliation, error handling, and restart/recovery patterns
Experience with data modeling for analytics, ML/AI, and downstream application use cases
Ability to evaluate pipeline design trade-offs across performance, cost, reliability, and maintainability
Structured SDLC experience with CI/CD pipelines for data and ML workflows
Experience with API-based and event-driven data integration patterns
Experience in distributed data processing environments
Understanding of data requirements for ML/AI workloads
Experience preparing training datasets and features from enterprise data lakes
Familiarity with reproducibility, dataset versioning, and data lineage concepts
Familiarity with GenAI concepts relevant to data engineering, such as embedding pipelines, vector databases, retrieval-augmented generation (RAG) data flows, or prompt-driven data processing
Awareness of data security and privacy considerations when working with LLMs
Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related technical field. Equivalent professional experience will be considered

Data Engineer - ML/AI Data Platform (Remote)

Key skills

About this role

Responsibilities:

Requirements: