Harness is the AI Software Delivery Platform company, and they are seeking a Staff Software Engineer with expertise in distributed systems and cloud-native backend engineering. The role involves architecting scalable backend systems, ensuring operational excellence, and collaborating across teams to enhance platform capabilities.

Responsibilities:

Architect and develop scalable, fault-tolerant backend systems that handle millions of requests per second
Implement microservices using Go, Java, or Python, ensuring high availability and resilience
Deploy and manage applications on AWS, GCP, or Azure with Kubernetes (EKS, GKE, AKS)
Work with Kafka, Pulsar, RabbitMQ for distributed messaging and streaming workloads
Implement best practices for graceful degradation, retries, circuit breakers, and auto-scaling
Define SLAs/SLIs/SLOs, set up robust alerting & escalation processes for incident handling
Lead post-incident analysis, drive corrective actions, and improve system reliability
Define and implement logging, monitoring, and distributed tracing using Prometheus, OpenTelemetry, Grafana, Datadog
Diagnose and optimize latency, throughput, and memory utilization for large-scale distributed systems
Design and implement highly concurrent, multithreaded backend services for parallel processing
Improve performance of SQL (PostgreSQL, MySQL) and NoSQL (Cassandra, DynamoDB, Redis, MongoDB) solutions
Implement API security, authentication, authorization, and ensure compliance with SOC2, ISO 27001, PCI DSS
Guide engineers in best practices for platform engineering, microservices, and distributed systems
Work with cloud engineering, security, and product engineering teams to align platform capabilities with business needs

Requirements:

10 -14 years of experience in backend platform engineering, distributed systems, and microservices
Strong programming expertise in Go, Java, or Python, with a focus on multithreading and concurrency
Expertise in Kubernetes, service meshes (Istio, Linkerd), and cloud infrastructure
Deep understanding of gRPC, REST APIs, GraphQL, and API performance tuning
Hands-on experience with CI/CD and infrastructure automation (Terraform, Pulumi)
Proven ability to manage production incidents and other operational excellence practices
Excellent debugging and problem-solving skills in complex, distributed environments

Staff Software Engineer - Platform Foundations

Key skills

About this role

Responsibilities:

Requirements: