Allata is a global consulting and technology services firm that helps organizations accelerate growth and solve complex challenges. They are seeking a skilled Data Engineer to design, build, and optimize scalable data solutions that support analytics and reporting in the healthcare industry.

Responsibilities:

Design, develop, and maintain scalable data pipelines using Databricks (PySpark) and Python
Build and optimize ETL/ELT processes within Azure cloud environments
Implement data models following modern Data Lakehouse principles (e.g., Medallion architecture)
Ensure data quality, consistency, and performance across ingestion, staging, and curated layers
Collaborate with data architects, analysts, and business stakeholders to translate healthcare data requirements into technical solutions
Develop reusable data transformation logic and modular processing components
Support deployment processes following CI/CD and DevOps best practices
Monitor and optimize data workflows for performance, scalability, and reliability
Contribute to data governance, security, and compliance practices relevant to healthcare environments

Requirements:

Current knowledge of an using modern data tools like (Databricks, FiveTran, Data Fabric and others); Core experience with data architecture, data integrations, data warehousing, and ETL/ELT processes
Applied experience with developing and deploying custom whl and or in session notebook scripts for custom execution across parallel executor and worker nodes
Applied experience in SQL, Stored Procedures, and Pyspark based on area of data platform specialization
Strong knowledge of cloud and hybrid relational database systems, such as MS SQL Server, PostgresSQL, Oracle, Azure SQL, AWS RDS, Aurora or a comparable engine
Strong experience with batch and streaming data processing techniques and file compactization strategies
Strong analytical and problem-solving skills
Ability to work effectively in cross-functional and distributed teams
Clear communication skills, with the ability to explain technical concepts to non-technical stakeholders
Proactive mindset with a strong sense of ownership
Commitment to delivering high-quality, reliable data solutions
Strong hands-on experience with Databricks in Azure environments
Advanced proficiency in Python and PySpark for distributed data processing
Experience building and optimizing data pipelines in Azure (Azure Data Factory, Azure SQL, Data Lake Storage, etc.)
Solid understanding of data warehousing, data lakehouse concepts, and ETL/ELT frameworks
Experience working with relational databases such as SQL Server, PostgreSQL, Oracle, or similar
Knowledge of batch and streaming data processing patterns
Experience working with large, complex datasets in cloud-based distributed environments

Data Engineer (Databricks + Python + Azure)

Key skills

About this role

Responsibilities:

Requirements: