Stand Together is a philanthropic community that helps America’s boldest changemakers tackle the root causes of our country’s biggest problems. They are seeking a Senior DevOps Lead Engineer to lead and evolve the DevOps function for the Be The People digital ecosystem, responsible for environments, CI/CD pipelines, release readiness, and operational guardrails.
Responsibilities:
- Lead the DevOps function end-to-end for Be The People, and contribute to how applications are assembled, tested, released, and operated
- Lead creation and lifecycle management of environments, including spinning up new environments and coordinating content and data seeding
- Contribute to the design and evolution of platform architecture with an emphasis on scalability, resilience, and cost efficiency
- Establish cloud standards and manage cloud operations in cooperation with Cloud Engineering
- Design, implement, and maintain CI/CD pipelines using GitHub Actions as the primary orchestrator
- Establish reusable workflow templates and best practices without introducing unnecessary complexity
- Apply Infrastructure-as-Code discipline — including versioning, promotion across environments, and drift prevention — using tools such as Terraform, Terragrunt, CDK, Ansible, or CloudFormation
- Develop and debug Python, Ansible Playbooks, Terraform, and other infrastructure-as-code tooling
- Own or steward the release management function, acting as a go/no-go checkpoint to ensure required testing has occurred, stakeholder sign-offs are complete, and risks are clearly surfaced
- Participate in production incident response, root-cause analysis, and post-incident learning
- Construct a resilient ecosystem capable of quickly restoring services, and participate in developing the disaster recovery playbook for Be The People
- Implement monitoring, alerting, and observability for production systems using tools such as Prometheus, Grafana, Nagios, or ELK Stack
- Participate in the development, implementation, and automation of security policies
- Apply baseline security best practices across infrastructure and pipelines
- Apply SRE-inspired practices such as defining SLIs/SLOs and designing systems operable by teams beyond DevOps
- Deeply understand CDN configuration (e.g., Cloudflare), including caching layers and their impact on testing and production behavior
- Be competent in DNS concepts and troubleshooting, even if DNS ownership resides elsewhere
- Optimize cloud infrastructure (AWS primary; familiarity with Azure and GCP a plus) for performance, reliability, and cost
- Provide architectural guidance and design recommendations for cloud assets, resource consolidation, and standardized practices
- Help shape test automation strategy, recognizing AI-assisted testing and organizational QA maturity constraints
- Ensure testing happens and is validated, rather than personally writing all tests
- Partner with the testing team on test automation, test tracking, and incorporate into the release process
- Embed quality checks as first-class controls in delivery pipelines
- Serve as a technical leader who leads through influence rather than authority
- Collaborate cross-functionally with application, infrastructure, and data teams
- Participate in technical screening and interview panels, including short, high-signal technical screens
- Coach and mentor engineers on cloud and DevOps best practices, raising overall DevOps maturity
- Create and maintain durable documentation, including environment definitions, deployment processes, and runbooks
Requirements:
- 10+ years of relevant DevOps and Cloud Engineering experience — this is a senior-level role not suitable for junior or mid-level candidates
- Proven experience operating production systems with real customer and business impact
- Minimum 5 years of professional Cloud Engineering background, with deep AWS expertise
- Design, implement, and maintain CI/CD pipelines using GitHub Actions as the primary orchestrator
- Establish reusable workflow templates and best practices without introducing unnecessary complexity
- Apply Infrastructure-as-Code discipline — including versioning, promotion across environments, and drift prevention — using tools such as Terraform, Terragrunt, CDK, Ansible, or CloudFormation
- Develop and debug Python, Ansible Playbooks, Terraform, and other infrastructure-as-code tooling
- Own or steward the release management function, acting as a go/no-go checkpoint to ensure required testing has occurred, stakeholder sign-offs are complete, and risks are clearly surfaced
- Participate in production incident response, root-cause analysis, and post-incident learning
- Construct a resilient ecosystem capable of quickly restoring services, and participate in developing the disaster recovery playbook for Be The People
- Implement monitoring, alerting, and observability for production systems using tools such as Prometheus, Grafana, Nagios, or ELK Stack
- Participate in the development, implementation, and automation of security policies
- Apply baseline security best practices across infrastructure and pipelines
- Apply SRE-inspired practices such as defining SLIs/SLOs and designing systems operable by teams beyond DevOps
- Deeply understand CDN configuration (e.g., Cloudflare), including caching layers and their impact on testing and production behavior
- Be competent in DNS concepts and troubleshooting, even if DNS ownership resides elsewhere
- Optimize cloud infrastructure (AWS primary; familiarity with Azure and GCP a plus) for performance, reliability, and cost
- Provide architectural guidance and design recommendations for cloud assets, resource consolidation, and standardized practices
- Help shape test automation strategy, recognizing AI-assisted testing and organizational QA maturity constraints
- Ensure testing happens and is validated, rather than personally writing all tests
- Partner with the testing team on test automation, test tracking, and incorporate into the release process
- Embed quality checks as first-class controls in delivery pipelines
- Serve as a technical leader who leads through influence rather than authority
- Collaborate cross-functionally with application, infrastructure, and data teams
- Participate in technical screening and interview panels, including short, high-signal technical screens
- Coach and mentor engineers on cloud and DevOps best practices, raising overall DevOps maturity
- Create and maintain durable documentation, including environment definitions, deployment processes, and runbooks
- Strong judgment and ownership — comfortable saying 'no' or 'not yet' when appropriate
- Ability to communicate clearly with both technical and non-technical stakeholders around risk and tradeoffs
- Comfortable with command-line tools and the Linux/Unix environment
- Enthusiasm to contribute to Stand Together's vision and principled approach to solving problems, and a commitment to stewarding our culture, which champions values including transformation and innovation, entrepreneurialism, humility, and respect
- AWS or Azure certifications (Certified Developer, DevOps Engineer, Solutions Architect, Data Analytics, or Database)
- Experience with regulated or compliance-sensitive environments
- Exposure to multi-product or platform organizations
- Experience modernizing legacy delivery practices
- Background in building and deploying web applications from source code
- DevSecOps experience including code analysis, vulnerability management, regulatory compliance, security policy monitoring, to help build a security-aware culture