Pyramid Systems, Inc. is an award-winning technology leader driving digital transformation across federal agencies. They are seeking a Senior Data Engineer who will be responsible for designing and maintaining data architectures, ensuring alignment with business requirements, and optimizing data processing pipelines.

Responsibilities:

Plan, create, and maintain data architectures, ensuring alignment with business requirements
Obtain data, formulate dataset processes, and store optimized data
Identify problems and inefficiencies and apply solutions
Determine tasks where manual participation can be eliminated with automation
Identify and optimize data bottlenecks, leveraging automation where possible
Create and manage data lifecycle policies (retention, backups/restore, etc)
In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines
Create, maintain, and manage data transformations
Maintain/update documentation
Create, maintain, and manage data pipeline schedules
Monitor data pipelines
Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality
Support AI/ML teams with optimizing feature engineering code
Expertise in Spark/Python/Databricks, Data Lake and SQL
Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT
Research existing data in the data lake to determine best sources for data
Create, manage, and maintain ksqlDB and Kafka Streams queries/code
Data driven testing for data quality
Maintain and update Python-based data processing scripts executed on AWS Lambdas
Unit tests for all the Spark, Python data processing and Lambda codes
Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)
Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness
Must be able to obtain a Public Trust security clearance
MUST BE US CITIZEN
Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark
Data Lake concepts such as time travel and schema evolution and optimization
Structured Streaming and Delta Live Tables with Databricks a bonus
Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support
Advanced level understanding of streaming data pipelines and how they differ from batch systems
Formalize concepts of how to handle late data, defining windows, and data freshness
Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc
Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc
Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
Understanding of streaming data pipelines and batch systems
Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness
Indexing and partitioning strategy experience
Debug, troubleshoot, design and implement solutions to complex technical issues
Experience with large-scale, high-performance enterprise big data application deployment and solution
Understanding how to create DAGs to define workflows
Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
Architecture experience in AWS environment a bonus
Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
Experience with Docker, Jenkins, and CloudWatch
Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines
Experience working with AWS Lambdas for configuration and optimization
Experience working with DynamoDB to query and write data
Experience with S3
Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus
Familiarity with Pytest and Unittest a bonus
Experience working with JSON and defining JSON Schemas a bonus
Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus
Familiarity with Schema Registry, message formats such as Avro, ORC, etc
Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams
Ability to thrive in a team-based environment
Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management

Requirements:

8+ years of IT experience focusing on enterprise data architecture and management
Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required
Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services
Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
Plan, create, and maintain data architectures, ensuring alignment with business requirements
Obtain data, formulate dataset processes, and store optimized data
Identify problems and inefficiencies and apply solutions
Determine tasks where manual participation can be eliminated with automation
Identify and optimize data bottlenecks, leveraging automation where possible
Create and manage data lifecycle policies (retention, backups/restore, etc)
In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines
Create, maintain, and manage data transformations
Maintain/update documentation
Create, maintain, and manage data pipeline schedules
Monitor data pipelines
Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality
Support AI/ML teams with optimizing feature engineering code
Expertise in Spark/Python/Databricks, Data Lake and SQL
Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT
Research existing data in the data lake to determine best sources for data
Create, manage, and maintain ksqlDB and Kafka Streams queries/code
Data driven testing for data quality
Maintain and update Python-based data processing scripts executed on AWS Lambdas
Unit tests for all the Spark, Python data processing and Lambda codes
Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)
Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness
Must be able to obtain a Public Trust security clearance
MUST BE US CITIZEN
Bachelor degree required
Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark
Data Lake concepts such as time travel and schema evolution and optimization
Advanced level understanding of streaming data pipelines and how they differ from batch systems
Formalize concepts of how to handle late data, defining windows, and data freshness
Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc
Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc
Understanding of streaming data pipelines and batch systems
Indexing and partitioning strategy experience
Debug, troubleshoot, design and implement solutions to complex technical issues
Experience with large-scale, high-performance enterprise big data application deployment and solution
Understanding how to create DAGs to define workflows
Ability to thrive in a team-based environment
Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management
Structured Streaming and Delta Live Tables with Databricks a bonus
Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness
Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
Architecture experience in AWS environment a bonus
Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
Experience with Docker, Jenkins, and CloudWatch
Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines
Experience working with AWS Lambdas for configuration and optimization
Experience working with DynamoDB to query and write data
Experience with S3
Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus
Familiarity with Pytest and Unittest a bonus
Experience working with JSON and defining JSON Schemas a bonus
Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus
Familiarity with Schema Registry, message formats such as Avro, ORC, etc
Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams

Sr. Data Engineer

Key skills

About this role

Responsibilities:

Requirements: