Stripe is a financial infrastructure platform for businesses, aiming to increase the GDP of the internet. The Staff Software Engineer in Stream Compute will define and deliver the next generation of Stripe's Flink-first stream compute infrastructure, focusing on high availability and reliable operations for global scale.

Responsibilities:

Design, build, and operate stream compute infrastructure with Apache Flink at the center, alongside technologies like Kafka, Temporal, and AWS services
Partner with product and platform teams across Stripe to understand requirements, unblock Flink adoption, and improve how stream processing infrastructure is used end-to-end
Define and implement operational best practices (e.g., shuffle sharding, cellular architecture, load shedding, automated state recovery) to improve resilience and reliability at scale
Drive fleet-level automation and standardization ("pets" to "cattle") through self-service workflows, safer rollouts, and self-healing systems that reduce manual operations
Lead initiatives that raise the bar on Flink availability and state durability (e.g., multi-region strategies, disaster recovery readiness, operational readiness reviews, incident learning)
Evaluate and productionize Flink ecosystem capabilities (e.g., SQL, connectors, state backends) to improve developer experience and scalability without compromising reliability
Work closely with the open source community to identify opportunities for adopting new open source features as well as contribute back to OSS

Requirements:

This is a Staff-level role - that typically means 10+ years of experience building, operating, and evolving large-scale production systems
Experience as a technical lead for team(s) working on distributed systems, including scaling them in fast-moving environments
Hands-on experience with big data technologies such as Flink, Spark, Kafka, Pulsar, or Pinot
Experience developing, maintaining and debugging distributed systems built with open source tools
Experience building and scaling infrastructure as a product
Strong software engineering skills and a passion for Big Data Distributed Systems
Ability to write high quality code (in programming languages like Go, Java, Scala, etc)
Comfortable operating with high autonomy and ownership
Growth mindset and a willingness to learn quickly, explore ambiguous problem spaces, and dive deep when needed
Strong written and verbal communication skills, including the ability to produce clear technical documentation
Experience operating streaming infrastructure as a platform (e.g., Flink clusters, Kafka, Pulsar) for internal customers at scale
Deep hands-on experience authoring, optimizing, and operating real-time processing frameworks such as Flink, Spark Streaming, Storm, or Kafka Streams in production
Experience building or operating control planes for managing large-scale infrastructure
Open source contributions to data processing or big data systems (Hadoop, Spark, Celeborn, Flink, etc)

Staff Software Engineer, Stream Compute

Key skills

About this role

Responsibilities:

Requirements: