Contribute to the development of cutting-edge platforms that serve the needs of the Studio across various cloud providers and data centers
Responsible for developing the compute, storage, database and platform solutions on prem and in the public cloud
Collaborate within a cross-functional team to design and implement next-generation CI/CD platforms-as-a-service for internal product solutions
Leverage Kubernetes operators and Spinnaker pipelines to automate application deployments and administration across multiple environments (on-premises and in the cloud)
Define and implement workflows, processes, and standards for Infrastructure-as-Code (IaC)
Demonstrate a strong understanding and contribute to the continuous improvement of the studio platform, including in-house DevOps tools and custom products used for platform operations
Integrate chaos engineering practices and support the development of full-stack, continuous testing environments to improve system reliability and resilience
Continuously monitor system health, capacity, and performance indicators, driving optimization and proactive improvements
Provide tier-3 on-call support for troubleshooting and break-fix support for production services
Actively drive and participate in the team’s continuous improvement initiatives
Monitor and maintain platform infrastructure, utilizing tools like Datadog for performance tracking, alerts, and capacity management
Requirements
Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, Information Technology, or a related field (or foreign degree equivalent)
eight (8) years of experience in the job offered or in a related occupation in the animation industry
Revision control and DevOps best practices (Git)
Expert Linux experience (Red Hat, CentOS)
Animation artists’ production pipelines, including critical backend applications related to farm rendering, asset and catalog management software
CI/CD tools and platforms such as Spinnaker, Drone, Jenkins, Ansible, Argo
Best practices in release management, including versioning, branching, and deployment strategies
seven (7) years of experience in production-critical systems and workflows
Operational support experience, including platform infrastructure monitoring and troubleshooting using tools like Datadog, Prometheus, or similar
Working in multiple programming languages (Java, Go, Python)
five (5) years of experience in cloud-native technologies and architecture (Docker, Kubernetes, OpenShift)
Experience in operations supporting production artists and supervisor Technical Directors through P1, P2 incident handling