Supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications.
Demonstrating growing expertise in site reliability practices, with skills in incident response, system monitoring, automation, and performance troubleshooting.
Collaborating with DevOps, development, and infrastructure teams to resolve moderately complex issues, propose improvements, and strengthen the reliability, scalability, and security of our SaaS platform.
Requirements
BS in Computer Science, Information Systems, or related field (or equivalent experience).
2–4 years of experience in site reliability engineering, DevOps, or cloud operations.
Experience with cloud platforms (Azure or AWS), including services such as AKS, ECS/EKS, Functions/Lambda, S3, and Blob storage.
Proficiency with infrastructure-as-code and automation (Terraform, Ansible, YAML, Python, Bash, PowerShell).
Strong Linux engineering skills; working knowledge of Windows administration.
Experience supporting production environments and participating in on-call rotations.
Familiarity with web servers and middleware (Nginx, Apache Tomcat).
Experience with CI/CD tools (GitLab, Git, or similar).
Strong written, oral, and interpersonal communication skills.
Preferred Qualifications
Experience with monitoring tools (Prometheus, Grafana, ELK, Site24x7, Nagios).
Knowledge of performance analysis and system vulnerability remediation.
Cloud certification (AWS or Azure) preferred.
Familiarity with restaurant industry SaaS platforms and customer-facing applications.
Tech Stack
Ansible
Apache
AWS
Azure
Cloud
Grafana
Linux
NGINX
Prometheus
Python
Terraform
Benefits
Comprehensive medical benefits, 100% paid for employee