Tebra is a company focused on modernizing healthcare for independent practices. They are seeking a Senior Data Engineer who will architect and operate data infrastructure to support AI/ML initiatives, transforming healthcare data into high-quality training sets and real-time features.

Responsibilities:

Architect and write software that solves complex business problems, specifically designing scalable pipelines for feature extraction, training data generation, and model monitoring logs
Own and serve as a Subject Matter Expert (SME) for large software systems, such as the organization's Feature Store or Data Lakehouse, ensuring data availability for both experimentation and production inference
Continuously monitor data pipelines in production, detect data drift or quality anomalies, and implement automated recovery systems to ensure the reliability and freshness of features and training data over time
Lead Engineering Design Reviews, providing well-articulated and reasoned explanations for architecture decisions (e.g., choosing between batch processing for training vs. real-time streaming for inference)
Write software frameworks that can be extended by others on the team, such as automated data quality checks and schema validation tools that prevent training-serving skew
Translate business requirements into software solutions, bridging the gap between raw data sources and the structured inputs needed for advanced ML models
Know when and how to optimize complex code, specifically tuning Spark jobs or SQL queries to handle massive datasets required for Large Language Model (LLM) fine-tuning or deep learning
Collaborate cross-functionally including ML engineers to implement MLOps best practices, including data versioning, lineage tracking, and reproducibility
Expert at scoping tasks, breaking down complex data infrastructure initiatives into manageable deliverables for the squad

Requirements:

5+ years of professional software development experience
Deep technical subject matter expertise in 3+ general areas of software development (e.g., Big Data Processing, Distributed Systems, Data Modeling)
3+ years of hands-on experience in Data Engineering with a focus on supporting analytics or data science teams
Advanced proficiency in Python and SQL. You are comfortable writing production-grade code for data transformation and orchestration (not just scripts)
Proven ability to architect and write software that enables ML at scale—moving beyond simple ETL to building robust data platforms
Strong background in modern data infrastructure relevant to AI (e.g., Spark, Airflow, Kafka, Vector Databases)
Experience with Data Lake/Lakehouse architectures (e.g., Databricks, Snowflake, Delta Lake) and understanding how to structure data for efficient model training
Familiarity with MLOps concepts: You understand the difference between a training set and a test set, and you know what 'data leakage' is and how to prevent it in the pipeline
Proven ability to deploy and maintain data systems in production with CI/CD, monitoring, and alerting
Excellent technical communication and a product mindset—comfortable driving initiatives from concept to delivery
Background in healthcare software operations or working with structured business data
Experience implementing or managing a Feature Store (e.g., Feast, Tecton)
Familiarity with Data Versioning Control tools (e.g., DVC, LakeFS)
Published research or conference papers in data engineering, distributed systems, or machine learning
Experience with retrieval-augmented generation (RAG) pipelines or vector search infrastructure
Contributions to open-source data or ML infrastructure projects

Senior Data Engineer

Key skills

About this role

Responsibilities:

Requirements: