Sword Health is shifting healthcare from human-first to AI-first through its AI Care platform, making world-class healthcare available anytime, anywhere. As a Senior DevOps Engineer, you will own and evolve the infrastructure that powers the AI Care platform, collaborating closely with multiple engineering teams to ensure reliability, scalability, and performance.
Responsibilities:
- Design, implement, and maintain scalable, resilient infrastructure to support Sword Health’s high-demand applications and services
- Automate and streamline deployment processes, CI/CD pipelines, and routine maintenance tasks to enhance efficiency and reduce downtime
- Monitor and optimize system performance, proactively identifying and resolving issues to ensure high availability and reliability
- Collaborate closely with development, data, and security teams to ensure seamless integration of infrastructure and code changes
- Drive security best practices by implementing and managing access control, network security, and compliance-related policies across the infrastructure
- Lead incident response and troubleshooting for infrastructure-related issues, ensuring rapid and effective resolution to maintain service continuity
- Mentor and guide junior team members, sharing DevOps best practices and fostering a culture of continuous learning and improvement within the team
- Stay up-to-date with industry trends and emerging technologies, bringing innovative solutions to Sword Health’s DevOps processes and toolchains
Requirements:
- Experience with Linux environments
- Experience with DevOps and GitOps methodologies
- Experience with Kubernetes and Containerized applications (Docker)
- Experience with Infrastructure as Code (Terraform)
- Experience with Monitoring Tools (Google Cloud Monitoring/StackDriver, Grafana, Prometheus/AlertManager, NewRelic)
- Experience with Jenkins
- Experience with CI/CD
- Team player, Solution-oriented, Proactive attitude with “Get Things Done” mindset
- Enthusiast and interested in technologies and innovation
- Fluent in English (written and oral)
- Experience/Knowledge with Kafka
- Experience/Knowledge with Prometheus/AlertManager
- Experience/Knowledge with Grafana
- Experience/Knowledge with Elasticsearch/ Logstash/ Kibana
- Experience/Knowledge with Vault
- Experience/Knowledge with Redis
- Experience/Knowledge with MySQL
- Experience/Knowledge with DNS
- Experience with PHP
- Experience with Javascript
- Experience with GoLang
- Experience provisioning servers and services using AWS
- Experience provisioning servers and services using Azure
- Experience provisioning servers and services using GCP
- Experience/Knowledge with Istio
- Good know-how about Cloud Networking including VPC Management, Routing, NAT, and overall troubleshooting using TCPdump analysis