Allata is a global consulting and technology services firm that helps organizations accelerate growth and solve complex challenges. They are seeking a skilled Data Engineer to design, build, and optimize scalable data solutions that support analytics and reporting in the healthcare industry.
Responsibilities:
- Design, develop, and maintain scalable data pipelines using Databricks (PySpark) and Python
- Build and optimize ETL/ELT processes within Azure cloud environments
- Implement data models following modern Data Lakehouse principles (e.g., Medallion architecture)
- Ensure data quality, consistency, and performance across ingestion, staging, and curated layers
- Collaborate with data architects, analysts, and business stakeholders to translate healthcare data requirements into technical solutions
- Develop reusable data transformation logic and modular processing components
- Support deployment processes following CI/CD and DevOps best practices
- Monitor and optimize data workflows for performance, scalability, and reliability
- Contribute to data governance, security, and compliance practices relevant to healthcare environments
Requirements:
- Current knowledge of an using modern data tools like (Databricks, FiveTran, Data Fabric and others); Core experience with data architecture, data integrations, data warehousing, and ETL/ELT processes
- Applied experience with developing and deploying custom whl and or in session notebook scripts for custom execution across parallel executor and worker nodes
- Applied experience in SQL, Stored Procedures, and Pyspark based on area of data platform specialization
- Strong knowledge of cloud and hybrid relational database systems, such as MS SQL Server, PostgresSQL, Oracle, Azure SQL, AWS RDS, Aurora or a comparable engine
- Strong experience with batch and streaming data processing techniques and file compactization strategies
- Strong analytical and problem-solving skills
- Ability to work effectively in cross-functional and distributed teams
- Clear communication skills, with the ability to explain technical concepts to non-technical stakeholders
- Proactive mindset with a strong sense of ownership
- Commitment to delivering high-quality, reliable data solutions
- Strong hands-on experience with Databricks in Azure environments
- Advanced proficiency in Python and PySpark for distributed data processing
- Experience building and optimizing data pipelines in Azure (Azure Data Factory, Azure SQL, Data Lake Storage, etc.)
- Solid understanding of data warehousing, data lakehouse concepts, and ETL/ELT frameworks
- Experience working with relational databases such as SQL Server, PostgreSQL, Oracle, or similar
- Knowledge of batch and streaming data processing patterns
- Experience working with large, complex datasets in cloud-based distributed environments