Fabletics is currently looking for a DevOps Engineer II to join their DevOps team. The role involves working across a diverse, high-scale infrastructure and supporting a large portfolio of services, focusing on automation, reliability, and continuous improvement.
Responsibilities:
- Support and improve CI/CD pipelines built on Jenkins and GitHub Actions, including management of self-hosted GitHub Runners
- Embed with dedicated engineering teams as a primary DevOps partner: understanding their systems, priorities, and pain points to deliver targeted infrastructure and automation improvements
- Design, deploy, and maintain Kubernetes clusters across Amazon EKS and NKP (Nutanix Kubernetes Platform) on VM environments
- Build and maintain shared libraries and reusable components (e.g., GitHub Actions workflows, Terraform modules) to drive consistency and accelerate delivery across teams
- Manage and maintain Nexus Sonatype artifact repository on VM environments
- Collaborate on AWS infrastructure management including networking, IAM, storage, and compute resources
- Contribute to observability initiatives: helping implement and maintain logging, metrics, and tracing tooling across a high-volume
- Participate in on-call rotations; own and respond to incidents with appropriate urgency
- Document systems, runbooks, and processes to reduce tribal knowledge
Requirements:
- 2–4 years of experience in a DevOps, Site Reliability, or Platform Engineering role
- Hands-on experience with Kubernetes in production
- Solid experience with CI/CD workflow automation tools (GitHub Actions, CircleCI, or similar)
- Solid AWS experience - EC2, EKS, S3, IAM, VPC, CloudWatch at minimum
- Comfort working in a Linux/VM-based infrastructure environment
- Hands-on experience with IaC tools (Terraform or equivalent) and configuration management tools such as Ansible, Chef, or Puppet
- Familiarity with Nexus Sonatype or similar artifact repository management
- Comfortable with on-call responsibilities and experienced in incident triage, troubleshooting, and driving timely resolutions across complex distributed systems
- Strong scripting skills in a programming language, preferably Bash, Python, and/or Node.js
- Background in observability platform evaluation or implementation (e.g., Datadog, Grafana, OpenTelemetry, ELK/OpenSearch)
- Experience at a high-growth or large-scale e-commerce/retail tech company