Lead design and implementation of high-impact SRS platform capabilities: event platform CDC integration (Kafka), schema automation, Structured Record Access API, fleet management tooling
Set technical direction for one or more SRS sub-areas: architecture standards, design reviews, raising the bar on how the team builds
Continuously assess SRS cloud platform needs at scale, identify significant risks and gaps, and drive the roadmap to close them
Own and improve fleet-wide operations: capacity management, monitoring and alerting, query tuning, load optimization, and data replication for reporting workloads
Create and institutionalize best practices and health models for the database fleet; use observability tooling and AWS automation to enforce standards consistently across hundreds of managed databases
Build a high-impact network of storage champions across engineering groups, consult directly with product teams and group leads on storage readiness and risk, and select training opportunities that raise engineering capability division-wide
Own reliability for what you build: monitoring, runbooks, on-call response, post-incident learning
Partner with SRS area leadership across managed database platform, DevEx, and data warehousing to sequence work and make cross-team tradeoffs
Drive migration and schema lifecycle initiatives across the fleet, coordinating with product teams, managing rollout risk, validating rollback coverage
Mentor P2/P3 engineers; conduct design and code reviews with real feedback
Represent SRS technical requirements in cross-org forums: architecture review, group leadership
Provide tactical leadership during production emergencies
Requirements
6+ years of software engineering in a platform, infrastructure, or data engineering context
5+ years of Kubernetes architecture at scale
7+ years of Terraform experience
7+ years of experience with cloud data technologies
Deep Postgres or DynamoDB experience at scale: fleet-level operations, not application-level use
Demonstrated ability to lead technical projects end-to-end: design through production
Hands-on Kafka or event streaming experience in production; bonus if you've built CDC pipelines (Postgres binlog, DynamoDB Streams, Debezium)
ORM experience at the framework or platform layer: you've built or extended ORM tooling to drive semantic eventing, not just consumed it
Kubernetes CRD hands-on experience: designing or operating controllers/operators that manage platform resources