ApacheAWSCloudDynamoDBETLPySparkPythonSparkELTData EngineeringAnalyticsLambdaS3CloudWatchSNSSQSGlueAthenaAPI GatewayPerformance OptimizationAgileScrumCI/CDRemote Work
About this role
Role Overview
Design, implement, and operate scalable, cloud-native data pipelines and platform components on AWS.
Build and maintain ETL/ELT workflows, data lakes, and data mesh components.
Develop, optimize, and troubleshoot PySpark-based data processing jobs for large-scale and time-series datasets.
Design and manage data schemas, tables, permissions, and metadata using AWS Glue Data Catalog and Lake Formation.
Develop and maintain AWS Glue Jobs for data ingestion, transformation, and orchestration.
Build and support event-driven architectures leveraging AWS Lambda, SNS, SQS, and Step Functions.
Integrate internal and external systems through APIs using AWS API Gateway and related services.
Monitor platform health, performance, and operational metrics using Amazon CloudWatch.
Ensure efficient, reliable, secure, and cost-effective data processing across the platform.
Contribute to Infrastructure as Code, CI/CD pipelines, and automated deployment processes.
Collaborate closely with data platform, analytics, and product teams in an Agile (Scrum) environment.
Requirements
Strong hands-on experience with AWS services, including AWS Glue (Jobs and Data Catalog), Lake Formation, AWS Lambda, Amazon S3, Amazon Athena, AWS Step Functions, Amazon DynamoDB, Amazon API Gateway, Amazon CloudWatch, Amazon SNS and SQS.
Proven experience in data engineering, including designing, building, and operating ETL/ELT pipelines.
Strong experience working with data lakes, lakehouse architectures, and/or data mesh concepts.
Solid hands-on experience with Spark and PySpark for distributed data processing and performance optimization.
Strong Python development skills.
Experience with modern data formats such as Apache Iceberg and Parquet.
Experience integrating and consuming APIs and data exchange services.
Experience with CI/CD pipelines and automated deployment practices.
Experience working in cross-functional Agile (Scrum) teams.
Proven ability to deliver production-grade, scalable, and maintainable cloud data solutions.
Willingness and readiness to travel as required by project or client needs is expected, which may include occasional domestic or international travel, sometimes on short notice.
Tech Stack
Apache
AWS
Cloud
DynamoDB
ETL
PySpark
Python
Spark
Benefits
Learning opportunities with compensated certificates, learning lunches, and language lessons.
Chance to switch projects after one year.
Team building twice a year.
Office in Vilnius, Lithuania that offers themed lunches and a pet-friendly environment.
Remote work opportunities.
Flexible time off depending on a project.
Seasonal activities with colleagues.
Additional health insurance and loyalty days for Lithuanian residents.