OneTrust’s mission is to enable innovation through the responsible use of data and AI. They are seeking a Senior Software Engineer to contribute in all phases of the development lifecycle, ensuring high availability and performance of mission-critical applications.

Responsibilities:

Contribute in all phases of the development lifecycle
Write well designed, testable, efficient code
Ensure designs are in compliance with specification
Own your code in production, responding to incidents as they occur and participating in retros to determine how to be better in the future
Prepare and produce releases of software components
Support continuous improvement by investigating alternatives and technologies and presenting these for architectural review
Engage and partner with various Engineering, Operations, and Product teams to design, deliver, and maintain a highly available and performant application platform
Build and implement application observability and platform monitoring tools to continuously improve the customer experience
Eliminate toil by automating processes, tuning alerts, and improving code where it is most needed
Frequently evaluate new ideas and trends to identify potentially useful tools and techniques
Collaborate with different functional groups to identify gaps, prioritize, and resolve issues
Defining, implementing, and maintaining SLIs and SLOs aligned with customer experience
Design and instrument SLIs such as latency, error rates, and availability across critical services
Manage and enforce error budgets to balance system reliability with product feature velocity
Improving alert quality by reducing noise and focusing on actionable, high-signal alerts
Embed with product teams to review architectures and catch reliability risks early
Share your knowledge and experience with the Engineering organization
Share your findings with technical leadership and senior management
Build scripts in python/bash/java or ruby for operational automation and incident response

Requirements:

BE/BTech/MS degree in Computer Science Engineering or a related subject
Experience in software application development using Java, Spring and Hibernate
Experience in Spring Boot, Micro services is a plus
Strong knowledge of algorithms, data structure and design patterns
Experience with SQL and NoSQL technologies
Sound understanding of concepts of Restful services
Solid understanding and experience of Application Server and middleware technologies
Unix/Linux environments and OS fundamentals
Bachelor's degree in computer science, Engineering, or related technical or business field
4+ yrs. of application development experience with Java or other equivalent language
Experience with Spring environment
Experience in cloud-based infrastructure (Azure, AWS, GCP, etc.)
Experience with the factors that affect software application performance at different levels
A knowledge of the importance of centralizing logging, metrics dashboards, and alerting
A good awareness of databases (ideally SQL/NoSQL)
Hands-on experience with observability tools (Datadog, Prometheus, Grafana, etc.)
Knowledge with CI/CD pipelines and infrastructure-as-code (Terraform, Helm, jenkins, gitlab)
Build and operate AI-assisted incident response systems (root cause analysis, log summarization, anomaly triage)
Develop or integrate LLM-based tools to reduce MTTR and improve alert quality
Apply machine learning techniques for anomaly detection, capacity prediction, or failure pattern analysis
Experience deploying AI systems in production (not just experimentation)
Knowledge with vector databases, embeddings, or RAG architectures for operational intelligence
Well-developed insight of prompt engineering and evaluation of LLM outputs in the reliability workflow
Kubernetes and container orchestration (EKS/AKS/GKE)
Experience with distributed systems at scale
Familiarity with service meshes and microservices architectures
Experience with chaos engineering tools (Gremlin, Chaos Monkey)
Background in product-facing services with high traffic scale
Understand how to use incident management platforms. This includes using tools like PagerDuty for alerts. It also includes working with DataDog for monitoring

Senior Software Engineer - SRE

Key skills

About this role

Responsibilities:

Requirements: