Design conceptual, logical, and physical data models for complex federal environments.
Lead the transition from legacy on-premises systems to modern, cloud-native (AWS/GCP) data platforms.
Architect and oversee the build of automated ETL/ELT pipelines using Python, SQL, and PySpark to ingest and transform unstructured and structured data.
Implement and optimize enterprise data warehouses using tools like AWS Redshift, Google BigQuery, AWS Glue, and Databricks.
Establish data governance frameworks, metadata management, and data lineage in alignment with federal standards (HIPAA, FHIR, NIST).
Conduct index/partition design, query tuning, and sharding strategies to ensure high availability and scalability for real-time analytics.
Design data architectures that facilitate AI/ML initiatives, including model training pipelines and real-time inference in production environments.
Mentor a team of data engineers, enforce software engineering best practices (CI/CD, unit testing, documentation), and serve as a technical bridge between stakeholders and delivery teams.
Requirements
Experience: 10+ years of progressive experience in data architecture, data modeling, or senior data engineering.
Technical Stack: * Expertise in AWS (S3, Redshift, Glue, Athena) or GCP (BigQuery, Cloud Functions).
Strong proficiency in Python (modular, testable code) and advanced SQL.
Experience with big data frameworks like Spark, Dask, or Ray.
Education: Bachelor’s degree in Computer Science, Information Technology, or a related field (Master’s preferred).
Compliance: Deep understanding of data security and federal compliance requirements.