Amplify is a pioneer in K–12 education, leading the way in next-generation curriculum and assessment. As a Data Engineer, you will build and maintain data systems that empower teams to utilize data effectively, ensuring data privacy and compliance while collaborating with various teams to enhance educational experiences.
Responsibilities:
- Helping teams create fun, compelling apps by using millions of data points
- Helping teachers understand their students by building reusable data pipelines
- Implementing and managing data storage solutions (e.g., data lakes, databases)
- Building and maintaining ETL & machine learning pipelines on AWS
- Collaborating with data scientists and analysts to facilitate data access and utilization
- Ensuring data privacy and compliance with relevant regulations
- Analyzing and improving performance and squashing tricky bugs using tools like: Snowflake, Airflow, dbt, SQL, Python, Looker, Terraform, and Datadog
- Immersing oneself in agile rituals and using our infrastructure
- Leading collaboration, pull request-ing, and mentoring on a multi-functional team
- Participating in cross-team share-outs, brown bags, and workshop series
- Becoming an expert in the data models and standards within Amplify and the educational industry in order to deliver quality and consistent solutions
- Building well-tested and optimized ETL data pipelines for both full and delta extraction
- Collaborating with data analysts and learning scientists to gather, design, and implement ETL and Data Warehousing requirements
- Contributing to leading industry data standards (Caliper Analytics or xAPI), communities or open source projects (dbt)
- Build and support a machine learning model pipeline from development through production deployment
Requirements:
- BS in Computer Science, Data Science, or equivalent
- 2+ years of professional software development, site reliability, devops, or data engineering experience
- Strong CS and data engineering fundamentals
- Proven fluency in SQL and a development language such as Python
- Understanding of ETL/ELT pipelines and Data Warehousing design, tooling, and support
- Understanding of different data formatting (JSON, CSV, XML) and data storage techniques (3NF, EAV Model, Star Schema, Data Lake)
- Strong communication skills in writing and conversation
- Experience with MLOps tooling such AWS Sagemaker, GCP Vertex AI and frameworks like Pytorch or Tensorflow
- Experience with tools we use every day: Storage: Snowflake, AWS Storage Services (S3, RDS, Glacier, DynamoDB) ETL/BI: Cube, dbt, Fivetran, Looker Cloud Infrastructure: AWS/GCP/Azure, Terraform
- Experience with tools we don't use, but should
- Proven passion and talent for teaching fellow engineers and non-engineers
- Proven passion for building and learning: open source contributions, pet projects, self-education, Stack Overflow
- Experience in education or ed-tech