Collins Aerospace, through its FlightAware platform, is a leader in aviation software solutions. They are seeking a Principal Site Reliability Engineer to automate infrastructure processes, ensure service availability, and collaborate with engineering teams to enhance system reliability and performance.
Responsibilities:
- Spend your days working to automate and improve reliability and continue to push FlightAware's infrastructure forward, ensuring it is resilient and reproducible
- Be responsible for service availability, performance, monitoring, incident response, and capacity planning
- Create, improve, and manage environments to ensure decisions on resource allocation, problem identification, and capacity planning are based on accurate data-driven insights
- Maintain a physical infrastructure using Kubernetes, Linux, & Ceph, and a cloud infrastructure in AWS as part of the Site Reliability Engineering team
- Impact technology decision and direction to grow and support the FlightAware platform
- Collaborate closely with fellow SREs on your team and extend your collaboration across other FlightAware teams and disciplines to design dependable and scalable solutions and services
- Identify, implement, and champion process improvements to enhance productivity, collaboration, and delivery efficiency, while ensuring alignment with company goals and industry best practices
Requirements:
- Typically requires a degree in Science, Technology, Engineering or Mathematics (STEM) and minimum 8 years prior relevant experience or an Advanced Degree in a related field and minimum 5 years of experience or in absence of a degree, 12 years of relevant experience
- Must be authorized to work in the U.S. without sponsorship now or in the future. RTX will not offer sponsorship for this position
- Experience as a SRE, Platform Engineer, or related position within a Linux or UNIX environment working on large, complex infrastructures and/or projects using Docker and Kubernetes solutions
- Experience automating configuration and infrastructure with tools such as Saltstack, Ansible, Terraform or other declarative languages
- Experience with hardware; including servers, network switches, & cabling
- Experience managing Kubernetes clusters using GitOps with continuous delivery (CD) pipelines such as Flux or Argo
- Experience deploying and maintaining large, distributed storage solutions, such as Ceph
- Established proficiency in at least one (ideally more) of the following: Python, Go, Rust, or Shell (bash, awk, sed)
- Experience with PostgreSQL, or equivalent RDBMS and SQL in general
- Experience working with Nix or NixOS
- Familiarity with Cloud infrastructure, ideally AWS
- Understanding of SRE principles including building observability solutions and exposing metrics to inform SLO's and KPI's
- Understanding of how IT infrastructure services work, including: DNS, DHCP, PXE, LDAP, NFS
- Understanding of network segmentation, routing and VPNs
- You are a private pilot; you are looking to pursue your private pilot license or have a passion for aviation