Avalara is an AI-first company scaling an automation ecosystem to enhance business processes across tax, finance, and operations. They are seeking a Senior DevOps Engineer to design, deploy, and manage the infrastructure for AI workflows, ensuring operational reliability and security.
Responsibilities:
- You are the technical owner for the infrastructure and core components that run n8n and related AI workflow tools—environments, CI/CD, containers, runtime clusters, storage, secrets, networking, and integrations—ensuring they are resilient, scalable, and cost‑effective
- You embed security, privacy, and compliance into how AI workflows are built and run: hardened baselines, secret management, network and IAM boundaries, and CI/CD guards that prevent unsafe changes from reaching production
- You define and drive SLOs, metrics, logging, and alerting for the AI workflow platform, turning incidents into systematic improvements that reduce MTTR and change failure rates over time
- You create and enforce reusable patterns (templates, reference pipelines, IaC modules, guardrails) so teams building on n8n and other tools follow consistent, auditable practices instead of bespoke one‑offs
- You partner with architecture, security, and platform teams to align AI workflow infra with Avalara’s Software Maturity Model and engineering governance, moving the platform and guiding teams to higher levels of maturity
- You elevate how teams build and operate automations by mentoring engineers, codifying best practices, and using AI tools yourself to materially improve speed, quality, and reliability of the AI workflow platform
Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- 5–8+ years in DevOps, SRE, or platform engineering for SaaS or large-scale distributed systems, with direct ownership of production environments
- Strong experience with at least one major cloud provider (AWS, Azure, or GCP), including VPC design, security groups, load balancers, and managed Kubernetes (EKS/AKS/GKE) or equivalent container orchestration
- Deep hands-on use of Infrastructure as Code (Terraform or equivalent) to manage multi-environment infra and platform services
- Proven ownership of CI/CD pipelines (GitLab CI/CD or similar), including automated testing, security scanning, and artifact management for complex services or platforms
- Solid understanding of Linux and/or Windows, networking fundamentals (DNS, TLS, routing, firewalls), and secure secret management practices
- Hands-on experience with logging and monitoring stacks (e.g., Sumo Logic, Splunk, Prometheus, Grafana, or equivalents) and defining meaningful SLOs and alerts for production systems
- Demonstrated experience running or supporting multi-tenant or shared platforms used by multiple teams (internal developer platforms, workflow/orchestration tools, or integration platforms)
- Evidence of using AI tools in day-to-day engineering or operations (not just experimentation) with clear impact on speed, reliability, or quality