Architect, deploy, and operate the cloud infrastructure that serves El Agente to users at scale, including multi-tenant access, secure compute provisioning, and auto-scaling
Design and manage the interface between the cloud platform and HPC job scheduling systems (SLURM, PBS) that run quantum chemistry computations
Build and maintain containerized deployment pipelines (Docker, Kubernetes) for reproducible, version-controlled environments
Implement monitoring, logging, alerting, and incident response processes to ensure platform reliability
Develop and maintain the web-facing application layer — user authentication, tiered access control, usage metering, and data management
Build and operate data pipelines for computational results, including secure storage, retrieval, and IP protection for registered users
Manage CI/CD pipelines and infrastructure-as-code for consistent, repeatable deployments
Collaborate with researchers to integrate new agent capabilities and chemistry tools into the production environment as they are developed
Requirements
Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent combination of education and experience
Minimum five (5) years' experience in cloud infrastructure, platform engineering, or DevOps/SRE roles
Strong proficiency in Python and Linux systems administration
Hands-on experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code (Terraform, CloudFormation)
Working knowledge of containerization (Docker) and orchestration (Kubernetes)
Experience with CI/CD pipelines and production monitoring/observability tools
Familiarity with building or maintaining web applications (React, Vue.js, or similar)
Strong problem-solving skills and ability to work in a fast-paced, research-driven environment
Excellent communication skills, with ability to collaborate across disciplines