TaxGPT is revolutionizing the tax and accounting space with AI-driven solutions tailored for accountants, tax professionals, and SMBs. They are seeking a Senior DevOps Engineer to build, scale, and secure their infrastructure and deployment systems while improving reliability and performance across the company.
Responsibilities:
- Own the design, implementation, and long-term health of cloud infrastructure and internal platform systems
- Build and maintain scalable, secure, and reliable infrastructure across development, staging, and production environments
- Improve infrastructure automation, environment consistency, and operational resilience
- Manage networking, compute, storage, observability, and access controls across core systems
- Design, improve, and maintain CI/CD pipelines for fast, safe, and repeatable deployments
- Build tooling and workflows that improve developer experience and reduce operational friction
- Standardize release processes, deployment practices, and rollback strategies
- Identify bottlenecks in development and deployment workflows and drive improvements
- Improve system reliability, availability, and performance through strong operational practices
- Build and maintain monitoring, alerting, logging, and incident response systems
- Lead root cause analysis and drive permanent fixes for recurring operational issues
- Establish and improve standards around uptime, recovery, and production readiness
- Implement infrastructure security best practices across environments and workflows
- Strengthen access controls, secrets management, auditability, and system hardening
- Partner with engineering leadership to reduce operational and security risk
- Support compliance, backup, disaster recovery, and resilience initiatives where needed
- Lead architectural decisions related to infrastructure, deployment systems, and platform reliability
- Partner with engineering leaders to shape long-term infrastructure strategy
- Mentor engineers on infrastructure, deployment, observability, and operational best practices
- Raise the team’s standards through documentation, design reviews, code reviews, and process improvements
- Works independently on complex infrastructure and reliability problems
- Identifies risks and improvements before they become urgent issues
- Translates broad engineering goals into clear technical plans
- Owns critical infrastructure, deployment systems, and platform reliability
- Takes responsibility for scalability, resilience, maintainability, and operational health
- Drives long-term fixes, not just short-term patches
- Guides engineers on infrastructure and operational best practices
- Improves team effectiveness through documentation, reviews, and technical support
- Collaborates closely with engineering, product, and leadership teams
- Makes sound decisions on cloud architecture, deployment strategy, observability, and security
- Evaluates tradeoffs carefully across cost, speed, reliability, and complexity
- Builds systems that scale with company needs
Requirements:
- 7+ years of experience in DevOps, Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering
- Strong experience designing and managing production infrastructure in cloud environments such as AWS, GCP, or Azure
- Deep experience with CI/CD systems, infrastructure automation, and deployment pipelines
- Strong knowledge of containers and orchestration, including tools such as Docker and Kubernetes
- Experience with Infrastructure as Code tools such as Terraform, Pulumi, or CloudFormation
- Strong experience with monitoring, logging, and observability tools
- Experience improving reliability, security, and scalability in production systems
- Strong scripting or coding ability in languages such as Python, Bash, or Go
- Strong understanding of networking, system design, access control, and cloud security fundamentals
- Experience supporting fast-moving startup engineering teams
- Experience building internal developer platforms or self-service infrastructure tooling
- Familiarity with modern security and compliance practices
- Experience with incident management, postmortems, and operational maturity improvements
- Experience working closely with backend and application engineering teams to improve system design and delivery quality