American Unit, Inc is seeking an AWS Data Platform Engineer to design and maintain scalable AWS Data Lakes and develop serverless applications. The role involves building ETL pipelines, implementing monitoring solutions, and automating cloud platform operations.
Responsibilities:
- Design and maintain scalable AWS Data Lakes using:
- Amazon S3 for multi-zone data storage (raw, curated, processed)
- AWS Glue Jobs, Crawlers, and Data Catalog
- AWS Lake Formation for centralized governance and security
- Configure Lake Formation capabilities:
- Row/column-level access control
- Tag-based access control (LF-TBAC)
- Governed table permissions
- Cross-account data sharing
- Implement best practices in:
- Data partitioning
- Schema evolution
- Metadata and catalog management
- Build and optimize ingestion and ETL pipelines for batch or near-real-time workloads
- Develop serverless applications using:
- Lambda
- API Gateway
- EventBridge
- SNS/SQS
- Step Functions
- Build and tune AWS Glue ETL jobs using Python and PySpark
- Implement cost-efficient, highly scalable, event-driven architectures
- Implement monitoring and alerting using:
- Amazon CloudWatch Logs, Metrics & Dashboards
- CloudWatch Alarms for proactive incident detection
- Use CloudWatch Log Insights for log analytics and troubleshooting
- Configure AWS X-Ray for distributed tracing and end-to-end visibility
- Build automated alerts around:
- Lambda failures
- Glue job errors
- Data pipeline SLAs
- Performance anomalies
- Develop ETL pipelines using PySpark on AWS Glue or EMR
- Optimize Spark workloads for performance, cost, and scaling
- Troubleshoot distributed data issues and tune job performance
- Write clean Python code for Lambda functions, Glue jobs, and automation tools
- Develop shared internal libraries for ETL, monitoring, and data governance
- Implement automated CI/CD delivery pipelines
- Use Infrastructure-as-Code tools including:
- Terraform (modules, workspaces, state management)
- AWS CloudFormation or CDK (optional but beneficial)
- Implement:
- Automated provisioning of AWS resources
- Version-controlled, repeatable deployments
- Secure IAM roles and resource policies
- Configure monitoring, alerting, logging pipelines, and operational dashboards
Requirements:
- Design and maintain scalable AWS Data Lakes using Amazon S3 for multi-zone data storage (raw, curated, processed)
- AWS Glue Jobs, Crawlers, and Data Catalog
- AWS Lake Formation for centralized governance and security
- Configure Lake Formation capabilities: Row/column-level access control, Tag-based access control (LF-TBAC), Governed table permissions, Cross-account data sharing
- Implement best practices in: Data partitioning, Schema evolution, Metadata and catalog management
- Build and optimize ingestion and ETL pipelines for batch or near-real-time workloads
- Develop serverless applications using Lambda, API Gateway, EventBridge, SNS/SQS, Step Functions
- Build and tune AWS Glue ETL jobs using Python and PySpark
- Implement cost-efficient, highly scalable, event-driven architectures
- Implement monitoring and alerting using Amazon CloudWatch Logs, Metrics & Dashboards
- CloudWatch Alarms for proactive incident detection
- Use CloudWatch Log Insights for log analytics and troubleshooting
- Configure AWS X-Ray for distributed tracing and end-to-end visibility
- Build automated alerts around: Lambda failures, Glue job errors, Data pipeline SLAs, Performance anomalies
- Develop ETL pipelines using PySpark on AWS Glue or EMR
- Optimize Spark workloads for performance, cost, and scaling
- Troubleshoot distributed data issues and tune job performance
- Write clean Python code for Lambda functions, Glue jobs, and automation tools
- Develop shared internal libraries for ETL, monitoring, and data governance
- Implement automated CI/CD delivery pipelines
- Use Infrastructure-as-Code tools including Terraform (modules, workspaces, state management)
- Implement: Automated provisioning of AWS resources, Version-controlled, repeatable deployments, Secure IAM roles and resource policies
- Configure monitoring, alerting, logging pipelines, and operational dashboards
- AWS CloudFormation or CDK (optional but beneficial)