Sedgwick is a company dedicated to meaningful work, recognized as one of America’s Greatest Workplaces. They are seeking a Senior Data Engineer who will architect data supply chains to support Data Science models and AI applications, ensuring high-fidelity data is available for various initiatives.
Responsibilities:
- Hybrid Data Pipeline Execution: Design and implement robust ETL/ELT pipelines to ingest data from legacy on-prem sources, AWS (S3/RDS), and Azure (Blob/SQL), centralizing it for consumption in Snowflake and AI services
- Engineering for Data Science: Build and maintain Feature Stores and specialized datasets optimized for machine learning, ensuring Data Scientists have immediate access to clean, versioned, and statistically valid data
- Engineering for AI (RAG & LLMs): Develop the data pipelines required for Generative AI, including the automated extraction, chunking, and loading of unstructured data into vector stores across AWS and Azure
- Snowflake Power-User Execution: Act as the technical lead for our Snowflake data warehouse, implementing sophisticated data modeling, Snowpipe automation, and compute optimization to support high-concurrency AI workloads
- Legacy "Back-Reach" Engineering: Execute non-invasive data extraction patterns to unlock mission-critical data from decades-old on-premise systems without disrupting core business operations
- Multi-Cloud Orchestration: Manage complex, cross-platform data workflows using Airflow, Step Functions, or Azure Data Factory, ensuring the synchronization of data across our multi-cloud AI posture
- IT & Security Diplomacy: Partner directly with central IT, Database Administrators, and Security teams to solve connectivity hurdles (PrivateLink, IAM, firewalls) and secure "license to operate" for new data flows
- Data Quality for Model Integrity: Implement automated validation and observability layers to detect data drift and quality issues that could compromise the accuracy of production AI and Data Science models
- Cost & Performance Management: Drive the efficiency of our data stack by optimizing storage and query performance in Snowflake, AWS, and Azure to manage the ROI of the Transformation Office
- Direct Stakeholder Collaboration: Work as a dedicated engineering partner to MLOps and Data Science teams to rapidly iterate on data requirements for evolving AI use cases
Requirements:
- Bachelor's degree in Computer Science, Data Engineering, or a related field is required
- 6+ years of hands-on data engineering experience, with a track record of building production-grade pipelines for Data Science and AI in multi-cloud environments
- Expert-level proficiency in Snowflake architecture, including data sharing, performance tuning, and the integration of Snowflake with external cloud AI services
- Advanced, hands-on knowledge of AWS (S3, Glue, Lambda) and Azure (Data Factory, Synapse) data services
- Mastery of Python, SQL, and PySpark
- Deep experience with data orchestration and containerization (Docker)
- Proven ability to interface with 'old world' tech (on-premise SQL, Mainframe extracts, flat files) and transform it for modern cloud consumption
- A strong understanding of the specific data needs for Machine Learning (feature engineering) and Generative AI (vectorization and embedding pipelines)
- A 'get-it-done' attitude, capable of navigating enterprise bureaucracy and technical debt to ship code at the speed required by a Transformation Office
- A Master's degree is highly desirable