Lifescale Analytics helps organizations unlock the power of data through advanced analytics, AI, and modern digital solutions. They are currently seeking a LLM/AI Data Engineer to support client engagement by designing and validating high-quality, production-grade data pipelines with integrated LLM capabilities.
Responsibilities:
- Design, build, and operate LLM-assisted analytics pipelines in structured data environments
- Implement retrieval-augmented generation (RAG) and structured data grounding patterns
- Validate and improve LLM output quality, consistency, and traceability
- Develop and maintain production-grade ETL/ELT pipelines
- Review and test pipelines to identify logic errors, data gaps, and performance issues
- Define and track pipeline SLAs (latency, throughput, data freshness)
- Build and enforce data quality frameworks and validation processes
- Document engineering processes including QC logs, test cases, and schema documentation
- Collaborate with cross-functional teams to ensure scalable and auditable data systems
- All other duties as assigned
Requirements:
- Applicants responding to this position must be a US Citizen and may be subject to a government security investigation and must meet eligibility requirements by currently possessing the ability to view classified government information
- The candidate must have lived in the United States for the past 5 years
- The Employer will not sponsor applicants for any employment visas, at hiring or in the future, including but not limited to H-1B visas
- Corp-to-Corp or subcontract personnel will not be considered for this position
- Experience designing, building, or operating LLM-assisted analytics pipelines
- Experience validating and improving LLM output quality and reliability
- Strong understanding of: Prompt engineering for structured outputs, Retrieval-Augmented Generation (RAG) patterns, Structured-data grounding & hallucination mitigation
- Minimum 4+ years of experience in: Data engineering, ETL/ELT pipeline development, Data quality assurance in production environments, Proven experience working with high-volume structured data systems
- Advanced proficiency in SQL and Python
- Experience with tools such as dbt, Spark, or similar frameworks
- Hands-on experience with Snowflake, including: Snowpark or equivalent transformation frameworks, Data modeling and performance optimization, Snowflake Cortex
- Ability to design and implement data quality frameworks
- Experience reviewing and validating production pipelines: Logic validation and transformation accuracy, Data completeness and integrity checks, Identification of edge cases and failure modes
- Ability to benchmark and optimize pipelines against performance targets
- Experience defining and measuring: Pipeline latency, Throughput, Data freshness SLAs
- Experience supporting auditable and explainable data systems
- Strong documentation practices, including: QC logs and validation reports, Test case design and execution records, Schema and lineage documentation, Issue tracking and remediation workflows
- Bachelor's degree in Computer Science, Data Engineering, or related field (or equivalent experience)
- Experience supporting U.S. Department of Defense (DoD) environments: Air Force Life Cycle Management Center (LCMC), Army Materiel Command (AMC)
- Familiarity with Palantir Foundry: Ontology modeling concepts, Data product consumption patterns
- Experience with defense datasets: Government-Industry Data Exchange Program (GIDEP), Federal Logistics Information System (FED-LOG)
- Exposure to: Entity resolution and part matching, ERP data integration into analytics platforms, Data normalization across fragmented systems