Quantiphi is an award-winning Applied AI and Big Data software and services company, driven by a deep desire to solve transformational problems at the heart of businesses. As a Senior Data Engineer, you will architect and build scalable ETL pipelines while collaborating with customers to deliver impactful AI solutions.
Responsibilities:
- Design and implement resilient ETL/ELT workflows using tools such as Airflow, Dagster, or Prefect
- Build scalable batch and real-time data pipelines
- Ensure end-to-end data integrity from ingestion through AI consumption
- Develop low-latency “Hot Path” pipelines to enable real-time agent decisioning
- Implement streaming architectures to support event-driven workflows
- Optimize pipelines for high throughput, performance, and cost efficiency
- Design systems supporting both batch and real-time analytics use cases
- Implement automated testing and validation frameworks for data pipelines
- Proactively monitor and detect data drift before it impacts AI model performance
- Establish observability standards across the data platform
- Maintain high standards for reliability, scalability, and production readiness
- Manage the CI/CD lifecycle for data pipelines
- Promote code seamlessly across development, staging, and production environments
- Implement Infrastructure-as-Code best practices where applicable
- Design scalable data models for AI and analytics use cases
- Implement and maintain Knowledge Graph structures to model complex relationships
- Optimize indexing and schema design to support RAG-based AI systems
- Engage directly with customers to translate business challenges into technical solutions
- Lead whiteboarding sessions to design data schemas collaboratively
- Iterate data models in tight feedback loops with stakeholders
- Create clear, maintainable technical documentation
Requirements:
- 7+ years of experience in Data Engineering or Data Platform development
- Expert-level proficiency in Python (Pandas, PySpark, FastAPI)
- Advanced SQL skills including complex joins, window functions, and query optimization
- Strong experience with Snowflake (Snowpark, Streams) or Kinetica
- Hands-on experience building scalable ETL/ELT pipelines
- Experience implementing streaming or real-time data processing architectures
- Experience with NoSQL databases such as MongoDB or DynamoDB
- Experience designing or working with Knowledge Graph technologies (Neo4j, AWS Neptune, etc.)
- Familiarity with CI/CD pipelines and automated deployment practices
- Strong understanding of data quality, observability, and production reliability standards
- Experience working directly with enterprise customers or stakeholders
- Excellent communication and documentation skills
- Experience with agent orchestration frameworks such as LangChain, LlamaIndex, or CrewAI
- Familiarity with vector similarity search and managing embeddings in Pinecone, Milvus, or Snowflake native vector types
- Understanding of Retrieval-Augmented Generation (RAG) architectures and optimization strategies
- Experience building robust GraphQL or REST APIs that agents can use as operational tools