Home
Jobs
Saved
Resumes
Staff AI Infrastructure Engineer at SentinelOne | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Staff AI Infrastructure Engineer
SentinelOne
Remote
Website
LinkedIn
Staff AI Infrastructure Engineer
United States
Full Time
2 hours ago
$170,200 - $234,600 USD
H1B Sponsor
Apply Now
Key skills
AWS
Azure
Cloud
Google Cloud Platform
Grafana
Jenkins
Kubernetes
Prometheus
Python
Terraform
Bash
AI
Machine Learning
ML
OpenAI
GCP
Google Cloud
GitHub Actions
Helm
ArgoCD
Datadog
Jaeger
GitHub
CI/CD
About this role
Role Overview
Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably.
Automate infrastructure deployment and management using Helm, ArgoCD and Terraform.
Manage and optimize Kubernetes clusters to support high-performance AI workloads.
Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins.
Ensure infrastructure compliance with security standards including FedRAMP and related guidelines.
Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements.
Monitor infrastructure health and performance, implementing optimizations proactively.
Drive infrastructure best practices and mentor team members to foster technical excellence.
Requirements
A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
Extensive hands-on experience with Kubernetes for deploying containerized workloads.
Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
Strong scripting and automation skills using Python, Bash, or similar languages.
Excellent problem-solving skills, creativity, and self-driven motivation.
Exceptional candidates will also bring expertise in:
Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
Networking concepts and security best practices within cloud infrastructure.
Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Grafana
Jenkins
Kubernetes
Prometheus
Python
Terraform
Benefits
Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid Company Holidays
Paid Sick Time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events
Apply Now
Home
Jobs
Saved
Resumes