Instacart is transforming the grocery industry and is looking for a Senior Staff Software Engineer on the Data Infrastructure team. This role involves setting the technical direction for the data platform that supports the company's data strategy and collaborating with various teams to ensure a reliable and efficient data ecosystem.
Responsibilities:
- Define and Drive Data Infrastructure Vision: Own the multi-year technical vision and roadmap for Instacart’s core data platform (storage, compute, streaming, orchestration, analytical serving). Translate company data strategy (monetization, federated access, real-time) into a coherent, actionable architecture plan. Align with leadership and proactively evolve the architecture for scale, maturity, and cost
- Lead Platform Strategy (Build, Buy, Ownership): Architect the ownership strategy for the data platform, determining build vs. buy (including managed services vs. open-source self-hosting). Lead technical/business case evaluations, full cost-benefit modeling, and risk analysis for major investments. Design phased migrations to ensure reliability while achieving long-term independence and cost efficiency
- Own the Data Lakehouse Foundation: Drive the architecture and delivery of the open lakehouse, including unified table format, compute engine portfolio, and storage governance. Expand multi-engine compute (interactive, batch, stream processing). Define standards for data storage, access, governance, and sharing to enable compute portability and prevent lock-in. Ensure reliable scaling without proportional cost increase
- Drive Real-Time and Streaming Infrastructure: Own the architecture for streaming data, event-driven pipelines, stream processing, and real-time serving for critical use cases (Ads, Fraud, ML). Make principled decisions on deployment models balancing cost, availability, and operational maturity
- Pioneer AI-native Data Infrastructure Engineering: Lead the adoption, application, and cultural integration of AI/LLM tools across the data platform lifecycle, setting a high standard for AI-augmented workflows, driving high-leverage opportunities from automation to cost optimization, and partnering with other teams to embed AI-powered capabilities into the platform itself
- Elevate Engineering Excellence: Serve as the senior technical voice, setting standards for system design and reliability. Lead architecture reviews. Mentor staff/senior engineers, fostering ownership and execution. Be a visible engineering leader, contributing to hiring and cross-org alignment
- Partner Deeply with Stakeholders: Collaborate with Data Science, ML Platform, Ads Infra, Product Eng, Finance Eng, and Security to translate needs into reliable, self-serve infrastructure. Represent Data Infra in architectural forums, ensuring decisions support business priorities (monetization, compliance, AI). Clearly communicate complex trade-offs to technical and executive audiences
Requirements:
- 10+ years of software engineering, focused on data infrastructure or distributed systems at scale
- Sets technical direction for large-scale data platforms, defining multi-year architecture roadmaps and influencing strategy
- Experience in high-growth, data-intensive environments with significant infrastructure scale and spend
- Expertise in modern data lakehouse architectures, open table formats (Iceberg, Delta Lake, Hudi), and compute/storage trade-offs
- Experience in distributed query/compute systems (Trino, Spark, ClickHouse, etc.) for performance tuning and production reliability
- Experience in event-driven infrastructure (Kafka, Flink, etc.)
- Proven track record owning and executing major infrastructure platform transitions, including build vs. buy, migration design, and risk management
- Experience building compelling business cases for infrastructure investments, including cost-benefit analysis and TCO modeling
- Exceptional technical communication for clear architecture documents, strategy memos, and proposals to drive leadership alignment
- Strong ownership, comfort with ambiguity, and organizational influence to drive large, multi-team initiatives from concept to production
- Familiarity with data governance, compliance frameworks (SOX, CPRA, GDPR), and designing governance controls into the platform architecture
- Experience with FinOps and data platform cost optimization, including managing multi-million dollar infrastructure budgets and negotiating vendor contracts
- Deep knowledge of SQL and strong proficiency in Python or Scala for systems-level work
- Experience with orchestration systems (e.g., Apache Airflow) and data transformation pipelines (e.g., dbt) in large-scale production environments
- Track record of building and growing high-performing data infrastructure teams
- Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience