The College Board is a mission-driven, not-for-profit organization dedicated to excellence in education. They are seeking a Data Engineer to design and build scalable data platforms that support analytics and AI/ML use cases, collaborating closely with Data Science and AI teams.
Responsibilities:
- Design, build, and maintain scalable batch and streaming data pipelines using AWS services such as S3, Glue, Lambda, Kinesis, Step Functions, Redshift, Athena, and DynamoDB
- Develop and optimize data models and complex SQL queries to support analytics, reporting, and downstream consumers
- Build and operate serverless ETL frameworks for automated ingestion, transformation, and loading of structured and semi-structured data
- Implement cloud-first, microservices-based architectures, ensuring high availability, performance, and cost efficiency
- Ensure data quality, reliability, and observability through automated testing, validation, monitoring, and alerting
- Integrate BI and analytics tool such as QuickSight to enable real-time and self-service analytics
- Contribute to CI/CD pipelines, infrastructure automation, and secure development practices to deliver production-grade data systems
- Partner with Data Science and AI teams to productionize ML-ready datasets, including training, evaluation, and inference data pipelines
- Build and maintain feature pipelines and embedding workflows that support ML models and experimentation
- Support MLOps/LLMOps workflows, including dataset versioning, experiment tracking, and capturing inference data for continuous improvement
- Enable AI use cases such as recommendation systems, personalization, and retrieval-augmented generation (RAG) through robust data foundations
- Apply a thoughtful approach to AI feasibility, fairness, and effectiveness, especially when working with sensitive or regulated data
- Participate actively in Agile/Scrum ceremonies, design reviews, and peer code reviews
- Collaborate cross-functionally with Product, UX, Infrastructure, and Security teams
- Mentor junior engineers by providing guidance on data architecture, coding standards, and best practices
- Produce clear documentation, runbooks, and technical guides to support long-term platform sustainability
Requirements:
- 4+ years of experience in Data Engineering or Software Engineering in a production environment using AWS services such as S3, Glue, Lambda, Athena, DynamoDB, Step Functions, Redshift and Kinesis
- Strong proficiency in Python and SQL, including performance tuning for large datasets
- 1+ years of hands-on experience designing, building, and deploying production-grade ML and generative AI solutions using AWS SageMaker and Amazon Bedrock
- Experience designing and operating ETL/ELT pipelines, data models, and analytics-ready datasets
- Solid understanding of cloud computing, DevOps, CI/CD, and microservices architectures
- Strong security and privacy mindset, especially when working with sensitive data
- Demonstrated interest in continuous learning, including keeping up with evolving data engineering and AI/ML best practices
- Excellent communication skills with the ability to explain technical concepts to both technical and non-technical stakeholders
- A passion for expanding educational and career opportunities and mission-driven work
- Authorization to work in the United States for any employer
- Curiosity and enthusiasm for emerging technologies, with a willingness to experiment with and adopt new AI-driven solutions and a comfort learning and applying new digital tools independently and proactively
- Clear and concise communication skills, written and verbal
- A learner's mindset and a commitment to growth: welcoming diverse perspectives, giving and receiving timely, respectful feedback, and continuously improving through iterative learning and user input
- A drive for impact and excellence: solving complex problems, making data-informed decisions, prioritizing what matters most, and continuously improving through learning, user input, and external benchmarking
- A collaborative and empathetic approach: working across differences, fostering trust, and contributing to a culture of shared success
- Experience with event-driven architectures and real-time analytics
- Front-end or API experience (e.g., React, Node.js) is a plus
- Exposure to observability and monitoring for data pipelines, including freshness, volume, and performance metrics
- Experience collaborating with product managers and analytics partners to translate business requirements into well-designed data solutions