AWSAzureCloudITSMJenkinsServiceNowPowerShellAIGenerative AILarge Language ModelsLambdaS3IAMCloudWatchAPI GatewayOAuthSSOAppDynamicsCI/CDLeadershipCollaboration
About this role
Role Overview
Lead the design, development, and evolution of monitoring solutions in support of IT operations systems, infrastructure, and applications, Cloud and On premises
Provide technical leadership for business and technical analysis and architectural reviews with customers.
Lead and continuously improve enterprise scale continuous integration/continuous delivery (CI/CD) processes and pipelines.
Drive strategy and implementation of automated monitoring and alerting across platforms and services
Oversee the design and development of ingest pipelines, visualizations, and dashboard capabilities for structured and unstructured data.
Lead the design and implementation of triggered alert functionality, including on screen alerts and event integrations with ITSM and Event Management Platforms
Provide escalation support and leadership for day-to-day Request and Incident ticket work as necessary
Lead collaboration with stakeholders to gather requirements, develop solution designs, and ensure scalability, resiliency, and efficiency of platform architectures.
Establish and govern system guidelines, process documentation, and training materials for the organization.
Proactively assess and lead responses to emerging requirements and ambiguous technology decisions.
Lead and coordinate IT and business unit projects related to platform and collaboration solutions, including acquisitions, divestitures, and migrations.
Requirements
Bachelor’s degree in computer science, Information Technology, a related field or equivalent education/experience and 8–10+ years of related work experience
Demonstrated ability to lead the design and enforcement of monitoring standards in collaboration with application teams (AppDynamics, Elastic Stack, CloudWatch, Site24x7)
Extensive experience architecting, engineering, and scaling distributed telemetry pipelines (Elastic ingestion, data normalization, dashboards)
Expert level proficiency configuring alert normalization, enrichment, and correlation patterns at enterprise scale
Advanced experience with the Open Integration Hub, webhook based and API driven event ingestion
Deep understanding of the BigPanda incident lifecycle, correlation models, and automated routing to ServiceNow
Expert understanding of logs, metrics, traces, and observability concepts (APM, RUM, synthetic monitoring)
Proven ability to design, configure, and optimize AI driven workflows (automated incident analysis, similar incidents, change risk scoring)
Strong familiarity with vector DB concepts, enrichment pipelines, and generative AI guardrails
Advanced knowledge of SSO, OAuth, API Gateway patterns, and secured data flows
Expert level AWS experience (Lambda, S3, API Gateway, CloudWatch, IAM)
Demonstrated ability to interpret telemetry, identify patterns proactively, and influence engineering outcomes
Advanced AI Prompt Engineering Proficiency
Extensive experience interacting with large language models and incorporating them into platform workflows
Proven experience as a Lead Platform Engineer or similar role (i.e. M365, AWS, or Azure Engineer).
Expert understanding of cloud technologies, DevOps processes, and large-scale automation of services.
Extensive experience with CI/CD tools and practices (i.e. Jenkins, Azure Pipelines, etc.).
Advanced experience with automation and scripting tools (i.e. PowerShell, Graph API, etc.)
Tech Stack
AWS
Azure
Cloud
ITSM
Jenkins
ServiceNow
Benefits
Competitive Pay
Bonus for Eligible Employees
Benefits Package
Pension Plan
401k Match
Employee Stock Purchase Plan
Tuition Reimbursement
Disability Insurance
Medical Insurance
Dental Insurance
Vision Insurance
Employee Discounts
Career Training & Development Opportunities
Health and Work/Life Balance Benefits
Paid Time Off starting at 160 hours annually for employees in their first year of service.
Ten (10) paid holidays per year (typically mirroring the New York Stock Exchange (NYSE) holidays).
Be Well Company holistic wellness program, which includes Wellness Coaching and Reward Dollars
Parental Leave – fifteen (15) days of paid parental leave per calendar year to eligible employees with at least one year of service at the time of birth, placement of an adopted child, or placement of a foster care child.