Zone 5 Technologies is redefining unmanned aircraft systems with a focus on innovative autonomous solutions. They are seeking a Platform Engineer to design and operate scalable compute infrastructure that powers their autonomous vehicle simulation and testing framework, enabling rapid iteration on autonomy algorithms through parallel simulation workloads.

Responsibilities:

Design and implement auto-scaling compute infrastructure for simulation workloads using cloud platforms
Build and maintain on-premises GPU and CPU clusters for simulation and machine learning training
Architect hybrid cloud solutions that optimize cost and performance across cloud and local compute resources
Implement job scheduling and orchestration systems using Kubernetes for thousands of concurrent simulations
Design storage solutions for large-scale simulation data, logs, and artifacts using cloud and local storage systems
Deploy and maintain robotics simulation environments at scale
Build CI/CD pipelines for automated simulation testing of autonomy software
Create infrastructure for distributed parameter sweeps, Monte Carlo testing, and regression suites
Develop monitoring and observability systems for simulation fleet health and resource utilization
Implement data pipelines for simulation results ingestion, analysis, and visualization
Write and maintain infrastructure as code for reproducible infrastructure deployment
Build automation tools and CLI utilities to simplify developer access to compute resources
Implement GitOps workflows for infrastructure changes and configuration management
Create self-service interfaces for engineers to launch and manage simulation jobs
Develop cost monitoring and optimization strategies for cloud and on-prem resources
Monitor and optimize infrastructure performance, reliability, and cost efficiency
Troubleshoot complex distributed systems issues across networking, storage, and compute layers
Implement backup, disaster recovery, and business continuity strategies
Maintain security best practices including IAM, secrets management, and network isolation
Collaborate with autonomy, ML, and robotics teams to understand compute requirements and optimize workflows
Design and implement network architectures for distributed simulation workloads across AWS and on-premises environments
Configure VPCs, subnets, security groups, and routing for secure, high-performance compute clusters
Establish hybrid cloud connectivity (VPN, Direct Connect, site-to-site tunnels) between on-premises and cloud resources
Optimize network performance for large data transfers, multi-node communication, and distributed workloads
Support internal infrastructure network design and provide technical guidance to engineering programs
Troubleshoot network issues including latency, packet loss, and connectivity problems across distributed systems

Requirements:

Bachelor's in Computer Science, Software Engineering, or related technical field – equivalent industry experience also welcome
2-5+ years of experience in platform engineering, DevOps, SRE, or cloud infrastructure roles
Strong hands-on experience with Kubernetes for container orchestration and workload management
Experience with cloud computing platforms and services (compute, storage, networking)
Deep understanding of Linux system administration and troubleshooting
Strong networking fundamentals including TCP/IP, routing, DNS, VPNs, and security
Understanding of infrastructure as code principles and configuration management
Proficiency in scripting and automation (Python, Bash, or similar)
Experience building and maintaining CI/CD pipelines
Solid grasp of distributed systems concepts, job scheduling, and resource management
Ability to design infrastructure from first principles and make architectural decisions
Experience building infrastructure for simulation, robotics, or autonomous systems workloads
Understanding of GPU computing and accelerated workload management
Knowledge of job scheduling systems for batch and parallel workloads
Experience managing on-premises clusters and hybrid cloud architectures
Familiarity with robotics middleware (ROS/ROS2) or simulation platforms
Understanding of cost optimization for compute-intensive workloads
Experience with monitoring, logging, and observability systems
Knowledge of containerization technologies and image management
Background in data engineering, MLOps, or machine learning infrastructure
Experience with network performance analysis and troubleshooting
Understanding of software-defined networking and network automation
Familiarity with security compliance requirements in aerospace/defense environments

Platform Engineer II/III

Key skills

About this role

Responsibilities:

Requirements: