TechTorch is a company that builds data infrastructure, platforms, and pipelines to help organizations turn raw data into measurable business value. The AI Enabled Data Engineer will design and maintain scalable data pipelines, orchestrate workflows, and implement AI-enabled data engineering practices. This role combines deep data engineering skills with modern AI capabilities to optimize data workflows and enhance data quality.
Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL/ELT workflows across cloud and on-prem environments
- Work with Snowflake, Databricks, and Delta Lake as primary data platforms — handling ingestion, transformation, storage optimization, and access patterns
- Model data with dbt: write modular SQL transformations, manage dependencies, enforce data contracts, and maintain documentation
- Build and maintain semantic layers that serve consistent, governed metrics to downstream consumers
- Design data warehouse schemas and data lake structures that balance performance, cost, and queryability
- Implement data quality frameworks — testing, validation, alerting, and lineage — as first-class citizens in every pipeline
- Orchestrate workflows across Airflow, Dagster/Prefect, Azure Data Factory, and Databricks Workflows — choosing the right tool for each job
- Apply DataOps practices: CI/CD for data pipelines, environment promotion, infrastructure as code, and observability
- Own the reliability of data products end-to-end — monitoring, alerting, incident response, and root cause analysis
- Work across AWS and Azure cloud services (S3, Glue, ADLS, ADF, Synapse, Redshift) to design cost-effective, scalable architectures
- Build data pipelines that feed AI systems — including RAG ingestion workflows, vector store loading, document chunking, and embedding pipelines
- Use LLMs as active components in ETL logic: classification, entity extraction, enrichment, and data quality remediation in-flight
- Expose data infrastructure as consumable tools for AI agents via MCP or similar agent-integration patterns
- Use AI-paired programming (Claude Code or equivalent) as a daily productivity layer — not just autocomplete, but genuine workflow acceleration
- Stay current on how AI tooling changes the data engineering workflow and bring those patterns back to the team
Requirements:
- Design, build, and maintain scalable data pipelines and ETL/ELT workflows across cloud and on-prem environments
- Work with Snowflake, Databricks, and Delta Lake as primary data platforms — handling ingestion, transformation, storage optimization, and access patterns
- Model data with dbt: write modular SQL transformations, manage dependencies, enforce data contracts, and maintain documentation
- Build and maintain semantic layers that serve consistent, governed metrics to downstream consumers
- Design data warehouse schemas and data lake structures that balance performance, cost, and queryability
- Implement data quality frameworks — testing, validation, alerting, and lineage — as first-class citizens in every pipeline
- Orchestrate workflows across Airflow, Dagster/Prefect, Azure Data Factory, and Databricks Workflows — choosing the right tool for each job
- Apply DataOps practices: CI/CD for data pipelines, environment promotion, infrastructure as code, and observability
- Own the reliability of data products end-to-end — monitoring, alerting, incident response, and root cause analysis
- Work across AWS and Azure cloud services (S3, Glue, ADLS, ADF, Synapse, Redshift) to design cost-effective, scalable architectures
- Build data pipelines that feed AI systems — including RAG ingestion workflows, vector store loading, document chunking, and embedding pipelines
- Use LLMs as active components in ETL logic: classification, entity extraction, enrichment, and data quality remediation in-flight
- Expose data infrastructure as consumable tools for AI agents via MCP or similar agent-integration patterns
- Use AI-paired programming (Claude Code or equivalent) as a daily productivity layer — not just autocomplete, but genuine workflow acceleration
- Stay current on how AI tooling changes the data engineering workflow and bring those patterns back to the team
- Core Data Engineering: ETL/ELT Design · Data Modeling · Data Quality & Testing · Data Lineage · Batch & Incremental Loads
- Data Platforms: Snowflake · Databricks · Apache Spark / PySpark · Delta Lake · Data Warehouses · Data Lakes
- Transformation & Modeling: dbt Core / dbt Cloud · SQL (advanced) · Semantic Layer · Dimensional Modeling
- Orchestration: Apache Airflow · Dagster / Prefect · Azure Data Factory · Databricks Workflows
- AI-Enabled Engineering: RAG & Vector Store Pipelines · AI-Augmented ETL · MCP / Agent Data Tools · AI-Paired Programming · LLM Integration in Pipelines
- Cloud & DevOps: AWS (S3, Glue, Redshift) · Azure (ADLS, ADF, Synapse) · CI/CD for Data · Infrastructure as Code · Python
- Experience with streaming architectures: Kafka, Spark Streaming, or Flink
- Exposure to feature stores (Feast, Tecton) or ML platform data pipelines
- Hands-on with vector databases: Pinecone, Weaviate, Qdrant, or pgvector
- Familiarity with data mesh or data product ownership models
- Experience with Snowpark or Databricks AI/BI tooling
- Building or contributing to internal data tooling, frameworks, or accelerators