CACI seeks a data engineer that will be responsible for designing, building, validating, and maintaining scalable data pipelines and analytics solutions with a strong emphasis on Databricks and data quality. This role partners closely with product owners, analysts, data scientists, and software engineers to translate business and technical requirements into reliable, testable, and high-quality data solutions to support programmatic goals.
Responsibilities:
- Design, develop, optimize and maintain scalable data pipelines and transformations using Databricks, Apache Spark and SQL
- Impelement data ingestion, transformation, and orchestration workflows to support back and where applicable real-time processing
- Perform data quality assurance activities, including identifying and resolving any inconsistencies in data flow, data outside legitimate ranges, and illogical data responses by developing data quality reports and investigation and resolution of data anomalies or errors by using a combination of software packages including SAS, Excel, and other software as warranted
- Use technical expertise, initiative, creativity, critical thinking, and strong communication and interpersonal skills daily to solve data quality problems in support of technical development efforts
- Implement data quality controls to ensure accuracy, completeness, and reliability of datasets
- Document data pipelines, transforms, business rules and data dependencies using appropriate technical documentation methods (e.g., data flow diagrams, data dictionaries, etc.)
- Serve as liaison and coordinate with a multi-disciplinary team
- Collaborate with the program team to identify opportunities for process improvements, making strategic adjustments, and exploit opportunities focused on maximizing programmatic impact
- Communicate data issues, risks, and remediation approaches clearly to technical and non-technical team members
Requirements:
- Must be able to obtain a Public Trust clearance
- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related technical field (or equivalent experience)
- Demonstrated experience as a Data Engineer in a production environment
- Strong hands-on experience with Databricks, including Spark-based data processing
- Proficiency in SQL and at least one programming language such as Python
- Excellent communication skills: listening, writing, and experience interacting comfortably with scientists, epidemiologists, informaticians and developers
- Experience supporting analytics, reporting or machine learning workloads
- Experience supporting public health, healthcare, or government data systems
- Knowledge of data governance, data quality frameworks, or metadata management
- Experience working with large-scale analytics or reporting environments
- Familiarity with Power BI or other business intelligence tools
- Prior experience supporting multiple teams or programs simultaneously (the 'steady hand' type)
- Exposure to Agile or iterative delivery environments
- Prefer candidate to be in the Atlanta, Georgia area