Krasan Consulting Services is seeking a Senior Data Engineer Data Engineering Lead to ensure the availability of clean, accurate, secure, well-governed, and timely data for analytics and decision-making. This role involves designing, building, and optimizing scalable data platforms and pipelines while enforcing strong data governance and compliance.
Responsibilities:
- Design develop evaluate and maintain scalable resilient secure and highly available data pipelines and platforms to support analytics reporting business intelligence and advanced analytics use cases
- Build and maintain structured and unstructured data ingestion pipelines using ETL and ELT frameworks to move Early Childhood program data from multiple source systems into analytic and storage platforms
- Apply advanced SQL Python and Apache Spark to develop high performance distributed data transformations and processing workflows
- Develop scalable ingestion and transformation processes that support growing data volume velocity and complexity while optimizing performance reliability data quality and cost efficiency
- Implement secure access design using least privilege principles role based access control and segregation of duties across data platforms
- Optimize data workflows using partitioning indexing compression strategies query tuning distributed processing patterns and workload optimization
- Ensure data accuracy consistency integrity availability and performance across all pipelines and analytical environments
- Design and implement automated data quality checks validation rules anomaly detection and monitoring to proactively identify and resolve data issues
- Lead data classification and compliance handling for sensitive data including PII and PHI in alignment with regulatory contractual and agency standards
- Ensure compliance with state and federal regulations including FERPA HIPAA GDPR COPPA and Illinois data governance policies
- Use enterprise governance tools such as Microsoft Purview to manage metadata data classification lineage and data discovery
- Enforce encryption standards audit logging lineage traceability and breach notification requirements in coordination with DoIT security and governance teams
- Implement and maintain enterprise grade metadata management data catalogs and end to end data lineage to improve data transparency trust and usability
- Develop and maintain data dictionaries standardized metadata definitions and business glossaries aligned with agency terminology
- Produce clear architecture diagrams data flow diagrams pipeline specifications and technical documentation to support governance onboarding audits and operational continuity
- Design and implement backup disaster recovery and business continuity strategies for data pipelines and platforms
- Conduct restore testing and recovery validation to ensure operational readiness and compliance with recovery objectives
- Develop and optimize data processing solutions using SQL Python and Spark on distributed platforms such as Databricks
- Integrate cloud data ecosystems including AWS Azure IBM CloudPad and Google Cloud
- Design and maintain data warehouses such as BigQuery and Azure Synapse data lakes such as Amazon S3 and Google Cloud Storage and modern lakehouse architectures
- Use orchestration monitoring observability and testing tools including Airflow DBT tests Splunk and similar platforms to ensure reliability performance and cost control
- Coordinate delivery of data driven dashboards reports and visualizations using tools such as Tableau and Power BI
- Ensure accurate and timely submission of mandated state and federal reports
- Translate complex technical and analytical concepts into clear insights for both technical and non technical stakeholders
- Drive cost and performance optimization through usage analysis capacity planning query optimization and platform monitoring
- Promote self service analytics and interactive data exploration to improve data literacy and reduce reliance on static reporting
- Establish and enforce data engineering best practices including secure data architecture access control data quality frameworks metadata management lineage standards CI CD automation orchestration monitoring testing and backup strategies
- Develop and maintain operational runbooks and desktop procedures documenting pipelines architecture security controls and workflows
- Lead cross agency collaboration with IDEC leadership DoIT architects data scientists analysts and software engineers
- Support advanced analytics and machine learning initiatives by operationalizing data science solutions including batch and real time data processing
Requirements:
- Extensive experience designing building and securing large scale data pipelines and enterprise data platforms
- Advanced proficiency in SQL strong experience with Python and solid Apache Spark fundamentals
- Hands on experience with data cataloging and governance tools including Microsoft Purview
- Strong experience with data quality automation metadata management data lineage and compliance handling for sensitive data such as PII and PHI
- Proven experience documenting architecture diagrams data dictionaries and pipeline specifications
- Strong knowledge of cloud based data ecosystems and modern data architectures
- Excellent communication skills with the ability to explain complex technical concepts to diverse audiences
- Experience leading or coordinating data engineering teams and cross functional initiatives
- Experience supporting public sector education or early childhood data systems
- Familiarity with machine learning pipelines real time data processing and analytics enablement
- Strong background in data governance CI CD automation metadata driven architectures and cost optimization strategies