Lendistry is the nation’s largest minority-led, tech-savvy lender for small businesses and commercial real estate. The Cloud Operations Engineer will be part of a remote team responsible for site reliability, monitoring, automation, and resolving systems alerts and customer issues, while collaborating closely with DevOps and CloudOps teams.
Responsibilities:
- Monitor and resolve application and customer issues in production
- Support the automation of recurring issues and issues that need manual intervention
- Identify and implement process improvement to reduce time to resolve support tickets
- Create dashboards and solutions to pro-actively identify issues
- Reduce human errors, increase quality and security through automation
- Collaborate with excellent verbal and written communication skills
- Troubleshoot alerts and escalated issues
- Engage in and improve services from deployment, operation, through refinement
- Maintain production environments by measuring and monitoring availability, latency, and overall system health
- Scale systems sustainably through automation
- Evolve systems by pushing for changes that improve reliability
- Practice sustainable incident response and disaster recovery exercises
- Communicate in real-time using Slack and MS Teams
- Follow infrastructure as code best practices
- Participate in on-call rotation that will troubleshoot production impacting issues
- Create and improve documentation and runbooks
- Participate in blameless postmortems
- Perform other duties assigned to support the efficient, effective operation of the department, and that help to make Lendistry the best place to work!
Requirements:
- Bachelor's Degree in Computer Science or related technical field, or equivalent experience
- 2+ years professional experience in Cloud Operations and application monitoring
- High sense of urgency and drive to resolve issues quickly
- Expertise in analyzing and troubleshooting containerized workloads and applications
- Script first mentality for automation
- Ability to debug, optimize code, and automate routine tasks
- Solid python, shell, Java and JavaScript knowledge
- Systematic and creative problem-solving approach, with effective communication
- Proven track record of supporting multi-az, multi-region, N-tier architecture applications in a public cloud-based infrastructure
- Understanding of Unix/Linux operating systems
- Understanding of application golden signal
- Understanding of dashboarding using techniques like USE and RED
- Managing cloud-based infrastructure on AWS (preferred), Azure or GCP
- Advanced knowledge of Infrastructure as code tools and best practices
- Code repository best practices; Git, GitHub, “Git Flow” or other workflows
- IaaS Administration (SDKs and cli - AWS preferred)
- Building, optimizing, hardening, and troubleshooting of new services, tasks, and technology from POC to production
- Application performance monitoring (APM)
- Experience using PostgreSQL and/or MySQL
- Experience with Continuous Integration tools like GitHub Actions
- Knowledge of web and application server management (Nginx, Tomcat, NodeJS)
- Experience with Terraform, Ansible or Cloud Formation
- Experience with AWS technologies such as EC2, ECS, S3, RDS, and CloudWatch
- Ability to run Docker containers on AWS ECS
- AWS and Terraform Certifications are a plus