Checkfront is hiring a Staff DevOps Engineer for its new product, Manifest, in a fast-moving environment. This role involves owning critical infrastructure, improving developer experience, and collaborating with product engineers and DevOps leadership.

Responsibilities:

Work on a team with two other platform engineers
Own and evolve the infrastructure that supports Manifest, including AWS environments, networking, compute, data services, observability, CI/CD, and operational tooling
Work with Pulumi and TypeScript to define, maintain, and improve infrastructure as code across the platform
Support and improve our containerized application platform, including deployment pipelines, rollback mechanisms, and runtime configuration
Help operate and harden our data infrastructure, including connection pooling, backups, disaster recovery, replication, and safe schema-change practices
Partner with engineers to improve the reliability and safety of releases, including database migrations, deployment workflows, environment management, and production readiness checks
Improve CI/CD workflows so that builds, tests, infrastructure changes, and deployments are fast, reliable, and easy for engineers to understand
Lead observability and incident readiness work, including alerting, dashboards, SLOs, runbooks, incident response practices, and post-incident follow-up
Help ensure the platform is secure, cost-conscious, and maintainable as the product scales
Mentor engineers on infrastructure, operations, reliability, and production ownership

Requirements:

Deep production experience with AWS, especially services such as ECS/Fargate, RDS/Aurora PostgreSQL, VPC networking, load balancing, IAM, KMS, Secrets Manager, CloudFront, WAF, and related managed services
Experience designing and operating systems that serve a global user base, seamless multi-region availability, and disaster recovery procedures
Treats reliability, scalability, performance, and observability as a first-class design constraint, building these into designs from the start rather than bolting them on later
Strong infrastructure-as-code experience. Pulumi with TypeScript is ideal, but deep experience with Terraform or another mature IaC approach is also valuable
Strong operational knowledge of PostgreSQL, including performance investigation, connection pooling, backups, replication, locking, migrations, and safe schema-change patterns
Experience designing and maintaining CI/CD systems, ideally with GitHub Actions, OIDC-based cloud authentication, container builds, environment promotion, required checks, and deployment gates
Experience supporting containerized production workloads and improving deployment safety, rollback strategies, and runtime reliability
Strong observability and incident response experience, including metrics, logs, traces, alerting, dashboards, runbooks, and post-incident learning
The ability to work effectively in ambiguity, make pragmatic tradeoffs, and communicate clearly with both infrastructure specialists and product engineers
A track record of raising the engineering bar through reusable patterns, documentation, automation, mentoring, and thoughtful technical leadership

Staff Platform Engineer

Key skills

About this role

Responsibilities:

Requirements: