Build & Operate Data Pipelines (Batch + Streaming)
Design and implement batch and streaming ingestion from APIs, relational databases, file drops, event streams, and external partners.
Build and optimize ETL/ELT pipelines to produce curated, analytics-ready datasets for reporting and ML consumption.
Implement incremental processing patterns, change data capture (CDC) approaches where appropriate, and data contract standards.
Deliver a Modern Lakehouse (Data Lake / Delta Lake)
Build and manage a scalable lakehouse on AWS object storage (e.g., S3) using open table/file formats and delta/lakehouse concepts (e.g., ACID tables, schema evolution, time travel patterns).
Optimize performance and cost through partitioning, compaction, lifecycle policies, and efficient compute/storage usage.
Establish environment standards for dev/test/prod and consistent promotion across stages.
Implement a managed metadata repository for dataset cataloging, ownership, glossary/definitions, tagging, and discoverability.
Enable end-to-end lineage (source → transformations → consumption) to support auditability and impact analysis.
Implement governance controls including policy-based access, data classification, retention, and secure data handling.
Build operational data quality checks (freshness, completeness, validity, anomaly detection) and publish SLAs/SLOs.
AWS Automation + CI/CD for Data Pipelines
Implement automated cloud provisioning in AWS using Infrastructure as Code (IaC) for consistent environments and secure-by-default baselines.
Build and enhance CI/CD for data pipelines, including automated tests, validation gates, promotion workflows, and rollback strategies.
Improve observability with metrics/logs/alerts, dashboards, runbooks, and incident response readiness.
Cross-Team Collaboration & Documentation
Work closely with engineering, security, networking, and application teams to support mission needs and delivery timelines.
Maintain high-quality engineering documentation including SOPs, system diagrams, and secure configuration baselines.
Summarize and present findings and recommendations—both written and verbal—to technical and non-technical stakeholders.
Requirements
Must be able to OBTAIN and MAINTAIN a Federal or DoD "PUBLIC TRUST"; candidates must obtain approved adjudication of their PUBLIC TRUST prior to onboarding with Guidehouse. Candidates with an ACTIVE PUBLIC TRUST or SUITABILITY are preferred.
Bachelor’s degree in Engineering, IT, Computer Science, or related field (or equivalent experience).
Minimum of FOUR (4) years experience building production data pipelines and/or data platforms.
Strong experience implementing data ingestion and ETL/ELT workflows, including data modeling and transformation best practices.
Hands-on experience building a data lake / delta lake (lakehouse) on AWS (or equivalent cloud) using object storage and modern table formats/patterns.
Proficiency in SQL and one programming language commonly used for data engineering (Python preferred; Scala/Java acceptable).
Experience with metadata management and governance: cataloging, lineage, ownership, access controls, classification, and policy enforcement.
Experience implementing automated AWS provisioning using IaC and operating across multiple environments.
Experience building or operating CI/CD pipelines for data workflows (testing, packaging, deployment automation, environment promotion).