Praescient Analytics is seeking an experienced Data Engineer to design, build, and maintain scalable data pipelines supporting advanced fraud analytics and investigative solutions for a federal oversight organization. This role involves ensuring diverse data sources are efficiently ingested, transformed, governed, and made available for analytics, machine learning, and investigative support.
Responsibilities:
- Design, develop, maintain, and optimize scalable ETL pipelines supporting advanced analytics and investigative workloads
- Ingest, transform, and integrate structured and unstructured data from diverse sources including flat files, JSON, XML, Excel, APIs, graph databases, relational databases, and other evolving data formats
- Develop and optimize data pipelines supporting both streaming and batch ingestion frameworks
- Manage, organize, and optimize data within modern cloud-based analytics platforms, including Databricks Unity Catalog, SQL Server managed instances, and Lakehouse architectures
- Develop efficient SQL and Python-based data transformation processes that support downstream analytics, machine learning, graph analytics, and business intelligence solutions
- Implement data quality validation, lineage tracking, metadata management, and monitoring processes to ensure data reliability and integrity throughout the analytics lifecycle
- Collaborate with Data Scientists, Graph Data Scientists, Investigative Analysts, Forensic Accountants, and Project Managers to understand data requirements and support analytic initiatives
- Troubleshoot pipeline failures, optimize performance, and continuously improve scalability, reliability, and maintainability of enterprise data solutions
- Support enterprise data governance by implementing data management standards, documenting data assets, and ensuring compliance with enterprise data management (EDM) policies
- Contribute to data architecture improvements, ingestion strategies, and modernization efforts that enhance overall analytic capabilities
Requirements:
- Must have experience with Fraud Analysis
- Three (3) or more years of professional experience in data engineering or a related technical field
- Demonstrated experience designing, building, maintaining, and optimizing scalable ETL pipelines across diverse data sources
- Strong SQL and Python programming skills, or equivalent technologies, for data ingestion, transformation, and processing
- Experience ingesting and transforming data from flat files, JSON, XML, Excel, APIs, graph databases, relational databases, and other structured and unstructured data sources
- Experience loading, managing, and optimizing data within Databricks Unity Catalog, SQL Server managed instances, or comparable cloud-based data platforms
- Experience working with streaming and batch ingestion frameworks and modern Lakehouse architectures
- Demonstrated ability to implement data quality controls, lineage tracking, reliability monitoring, and performance optimization processes
- Familiarity with enterprise data governance, enterprise data management (EDM), metadata management, and data quality best practices
- Strong analytical, problem-solving, written, and verbal communication skills
- Supporting fraud detection, anomaly detection, financial oversight, program integrity, or investigative analytics environments
- Building cloud-native data engineering solutions utilizing Azure Databricks, Azure Data Lake Storage (ADLS), Microsoft SQL Server, Microsoft Fabric, Azure Synapse Analytics, Power BI, Neo4j, Git repositories, or comparable cloud data platforms
- Developing scalable data pipelines supporting machine learning, artificial intelligence (AI), graph analytics, natural language processing (NLP), or advanced analytics solutions
- Working with public, non-public, commercial, financial, law enforcement, or cross-agency datasets supporting fraud detection and investigative missions
- Designing and implementing Lakehouse architectures, Delta Lake, data partitioning strategies, and performance optimization techniques for large-scale analytics environments
- Developing automated data quality validation, metadata management, lineage tracking, schema evolution, and monitoring capabilities
- Supporting enterprise data governance initiatives, data catalogs, master data management, and compliance with organizational data standards
- Utilizing orchestration and workflow tools such as Apache Spark, Databricks Workflows, Azure Data Factory, Airflow, or comparable pipeline automation technologies
- Collaborating within Agile software development teams using Git-based version control, sprint planning, backlog management, and continuous integration/continuous deployment (CI/CD) practices
- Supporting Offices of Inspector General (OIGs), federal oversight organizations, law enforcement agencies, or other government data modernization initiatives