DATAMAXIS, Inc is seeking a Senior Azure Data Engineer to help design, build, and operate their next-generation enterprise data platform on Microsoft Azure. The role involves end-to-end delivery of data pipelines and products, collaborating with various stakeholders to create reliable and efficient data solutions.
Responsibilities:
- Design and build robust, reusable, parameter-driven ingestion and transformation pipeline using Azure Data Factory, Synapse Pipelines, Data Bricks and/or Microsoft Fabric Data Factory
- Implement medallion architecture (Bronze / Silver / Gold) on Azure Data Lake Storage Gen2 using Delta Lake, Parquet, and structured streaming patterns
- Build performant ELT workflows that leverage pushdown to source systems (Synapse Dedicated SQL Pool, Azure SQL, Teradata) where appropriate
- Develop and optimize PySpark notebooks and jobs on Azure Databricks or Synapse Spark
- Design dimensional models (Kimball star/snowflake) and data vault patterns for analytics consumption
- Implement Slowly Changing Dimensions (Type 1/2/3), Change Data Capture, and late-arriving data patterns
- Tune distributed SQL workloads in Synapse Dedicated SQL Pool / Fabric Warehouse, including distribution keys, partitioning, and clustered column store indexes
- Implement CI/CD for data pipelines using Azure DevOps (YAML pipelines, ARM/Bicep/Terraform) across Dev / SIT / UAT / Prod environments
- Instrument pipelines with robust logging, auditing, and monitoring using Azure Monitor, Log Analytics, and KQL
- Define and enforce coding standards, code review practices, branching strategies, and release management
- Lead or contribute to legacy-to-cloud migrations — e.g., Informatica PowerCenter to Azure Data Factory, on-premises Teradata / Oracle / SQL Server to Synapse or Fabric
- Perform workload assessment, capacity planning, and cost modeling for target-state architectures
- Production incident response for critical pipelines
Requirements:
- Deep hands-on expertise with Azure Data Factory: pipelines, datasets, linked services, triggers, parameterization, mapping data flows, and all three Integration Runtime types (Azure, Self hosted, SSIS)
- Strong Experience in Data Bricks and PySpark
- Production experience with one or more of: Azure Synapse Analytics (Dedicated and Serverless SQL Pools, Spark Pools) OR Azure Databricks (Delta Lake, Unity Catalog) OR Microsoft Fabric (Warehouse, Lakehouse, OneLake)
- Strong working knowledge of Azure Data Lake Storage Gen2 (hierarchical namespace, RBAC + ACLs, lifecycle management, security)
- Experience with Azure Key Vault, Azure AD / Entra ID (including managed identities and service principals), and private networking (VNet integration, private endpoints)
- Monitoring and troubleshooting with Azure Monitor, Log Analytics, and KQL
- Advanced SQL — window functions, CTEs, query optimization, execution plan analysis, performance tuning
- Strong Python for data engineering — pandas, PySpark, REST API integration, unit testing (pytest)
- Proficient in T-SQL; familiarity with Spark SQL, KQL, PowerShell, and Bash shell scripting
- 5+ years of data warehouse development experience
- 5+ years of data modeling experience using ERWIN or similar tools
- 2+ years of experience with Azure Data Factory and Snowflake
- Medicaid Domain Knowledge is a plus