Fragomen is seeking a senior‑level engineering role responsible for leading the evolution of Fragomen’s global cloud, container, and automation platforms. The Senior DevOps / Platform Engineer will set technical direction, drive modernization initiatives, and ensure the performance, reliability, and scalability of systems that support Fragomen’s mission‑critical immigration services.
Responsibilities:
- Lead the design, evolution, and stabilization of global cloud, container, and automation platforms
- Own and operate production Docker Swarm environments, including:
- Cluster sizing and capacity planning
- Scaling and service orchestration strategies
- Troubleshooting distributed system issues such as networking, scheduling, and performance
- Drive modernization efforts for CI/CD pipelines and deployment workflows
- Define and enforce standards for build, test, security, and release processes across engineering teams
- Architect, implement, and support scalable cloud infrastructure in AWS and/or Azure
- Optimize environments for reliability, performance, cost efficiency, and compliance
- Perform advanced root‑cause analysis using logs, metrics, traces, and cloud platform telemetry
- Partner with development, security, and operations teams to improve platform reliability and automation
- Contribute to architecture decisions, platform strategy, and engineering best practices
- Mentor engineers and provide technical leadership across teams
Requirements:
- Extensive hands‑on experience operating and improving large‑scale, production‑grade infrastructure
- Significant production experience with container platforms, ideally Docker Swarm, including: Cluster design, scaling, and orchestration; Diagnosis and remediation of complex distributed systems issues
- Strong CI/CD engineering background, with experience designing and refining pipelines using tools such as: GitLab CI, Jenkins, Octopus Deploy
- Experience implementing approval workflows, automated rollback strategies, artifact management, and modernizing legacy pipelines
- Deep experience with cloud infrastructure in AWS and/or Azure, including: Compute, networking, IAM, storage, and monitoring services
- Strong Infrastructure as Code experience; Terraform experience is highly preferred
- Solid Linux systems administration skills
- Strong understanding of networking fundamentals, including: TLS, DNS, Load balancers and reverse proxies
- Experience with secrets management and certificate lifecycle management
- Strong ability to collaborate across cross‑functional teams
- Comfortable mentoring and guiding engineers at varying levels of experience
- Ability to communicate complex technical concepts clearly to both technical and non‑technical stakeholders
- Demonstrated ability to influence platform strategy and drive engineering standards organization‑wide
- Experience working in high‑security or regulated environments such as legal, financial, or enterprise‑grade organizations
- Hands‑on experience with observability and monitoring platforms such as: CloudWatch, ELK, Prometheus, Splunk
- Proven ability to apply monitoring and observability best practices to improve reliability and operational insight