Integres, LLC is a Service-Disabled Veteran Owned Small Business focused on providing high-quality IT solutions. They are seeking a Senior Data Engineer to design and maintain data architectures, implement ELT/ETL pipelines, and ensure the architecture supports machine learning algorithms.
Responsibilities:
- Assist TSD with data products by providing highly skilled and authoritative expertise on data engineering methods and best practices, including code-first development approaches and modern pipeline design patterns
- Design, implement, and maintain an efficient, secure, stable, and flexible data architecture that supports products and end-users, with all assets managed via source control
- Design, implement, and maintain ELT/ETL pipelines for efficient processing of source data in Azure Synapse and Azure Machine Learning (using SDK V1 and SDK V2)
- Review, maintain, and improve existing architecture and pipelines, including periodic audits to address bottlenecks, deprecated dependencies, and architecture drift
- Establish quality controls for maintaining all pipelines, and introduce error handling, logging mechanisms, and validation checks
- Incorporate source control for all pipelines and data analytics codebases to enable iterative code development while ensuring data architecture stability
- Optimize the ingestion, processing, and storage of a wide variety of datasets and data types, including modern columnar formats such as Parquet
- Develop self-service capabilities for SBA OIG analysts to query and export data for investigations and audits
- Coordinate with data scientists to ensure the architecture efficiently supports machine learning algorithms and data pipelines in Azure Machine Learning
- Develop robust standard operating protocols (SOPs) dictating the authoring, development, validation, publishing, execution, and monitoring of all data pipelines and assets in Azure environment
- Provide detailed documentation of the data architecture, including data dictionaries, ER diagrams, and pipeline process maps
- Maintain and expand the environment with additional datasets and services upon request, following a defined intake and testing process prior to production deployment
- Stay current with emerging AI tools relevant to data engineering and contribute to exploratory efforts evaluating automation and LLM-assisted capabilities
Requirements:
- Five (5) years of hands-on experience in maintaining SQL databases and conducting advanced operations in SQL and T-SQL
- Five (5) years of hands-on experience in designing, implementing, and maintaining ELT/ETL processes in cloud-based data analytics environments
- Three (3) years of hands-on experience working in Azure Synapse and Azure Machine Learning, with the modern data stack
- Three (3) years of hands-on experience manipulating data in Python. Pandas required. PySpark/Polars preferred. Experience developing reusable, modular code preferred
- Implementing pipelines and infrastructure using code-first approaches (Python SDK, CLI, REST APIs, or IaC tooling)
- Implementing source control and CI/CD workflows
- Demonstrated familiarity with AI coding assistants and LLM integration patterns