Aventiv is a company focused on innovative technology solutions, and they are seeking a Senior Developer with expertise in Site Reliability Engineering. This role involves defining and driving the observability roadmap and ensuring system reliability and operational efficiency through effective monitoring and alerting strategies.
Responsibilities:
- Partner with software development on next generation of our platform that will leverage Kubernetes (orchestration) and CRI-O (containerization)
- Working closely with software development, employ monitoring tools to find problems and troubleshoot them to resolution ensuring we exceed our SLAs
- Leverage scripting to build required automation, monitoring, and platform tuning tools on an ad-hoc basis
- Ensure our platform complies with corporate security and privacy guidelines
- Establishing a consistent approach to monitoring (including synthetic monitoring) and alerting to leveraging golden signals to enhance system reliability and operational efficiency
- The SRE Golden Signals Engineer will work closely with the Center of Excellence teams, Tech leads, and US remote-based resources to build a unified observability strategy and ensure alignment with organizational goals
- Act as a trusted advisor in Application Performance Management as well as Infrastructure Management using the New Relic platform
- Act as a subject matter expert on the New Relic platform and its technology, combining product expertise with your skills and experience
- Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to enhance reliability and performance
- Educate teams on SRE maturity, starting from basic monitoring to distributed tracing and beyond
- Act as an SRE evangelist, advocating best practices and ensuring adherence across teams
- Build and optimize New Relic dashboards for performance insights
- Drive the evolution of performance engineering into modern SRE practices
- Build and optimize New Relic Synthetic Monitoring to proactively catch and resolve issues
- Onboard applications into New Relic
- Analyze performance data and assist with performance troubleshooting / tuning of customer applications, architecture, procedures and practices
- Ensure effective monitoring of critical business applications
- Deploy, administer, configure and maintain New Relic technology
- Build and enhance dashboards, reports and alerts to meet customer requirements
- Perform other duties as assigned
Requirements:
- 5+ years of experience to include implementing New Relic technologies and APM strategies throughout the software application lifecycle to provide real-time analytics to help our customers proactively avoid performance issues
- Experience working with multiple operating systems (e.g., Windows and Linux) and web servers to include administration responsibilities
- Experience with OS or similar scripting (e.g., Perl, bash, csh, Windows PowerShell, etc.)
- Excellent verbal and written communication skills
- Excellent problem-solving and troubleshooting skills
- Effective with minimal supervision to include effective planning and priority setting to meet operational objectives set by upper management
- Familiarity with application architecture to include microservices, clustering/load balancing concepts and related technologies
- Collaborate to build relationships across various IT departments
- Ability to take total ownership of assigned components with minimal supervision
- Aptitude for learning new technologies. (Serverless, Cloud Architecture, etc.)
- High level investigative skills in working with developers to resolve system-wide technical problems that cut across all areas of mission critical software and IT infrastructure
- Written skills should include extensive technical documentation experience
- Must be able to handle multiple tasks with changing priorities and be able to communicate changes in scope and schedule to others on the team
- Knowledge in Agile/Lean development methodology
- High school diploma or GED
- Bachelors degree in Computer Engineering or equivalent IT degree
- Identify issues and to rapidly diagnose and resolve problems should they occur
- Version control experience (GIT)
- Linux/Windows experience
- Software Development Life Cycle (SDLC) deliverables