Block is a company that builds simple, powerful tools to create an economy that is open to all. They are seeking a Senior Software Engineer for their Data Ingestion team, which is responsible for building and operating platforms that replicate and ingest data into Block's Lakehouse for analytics and AI initiatives.
Responsibilities:
- Design, build, and operate scalable data replication and ingestion pipelines that move data from production databases, event streams, and third-party sources into Block's Lakehouse
- Develop and enhance Kafka Iceberg connectors and data loading frameworks, enabling reliable, low-latency data delivery to Snowflake and Databricks
- Drive the modernization of Block's CDC platform — evaluating and implementing next-generation approaches for database replication, including cloud-native alternatives, and Iceberg-based ingestion patterns
- Build self-service tooling and observability features that empower internal teams to onboard, monitor, and troubleshoot their own data pipelines with minimal support
- Collaborate with data engineering, platform infrastructure, and product teams to define data contracts, improve service encapsulation, and reduce tight coupling between operational databases and analytics consumers
- Contribute to the unification of Block's data ingestion architecture by identifying opportunities to consolidate overlapping systems and reduce infrastructure complexity
- Design and implement solutions for PII detection, masking, and privacy-compliant data handling within ingestion pipelines, ensuring sensitive data is properly classified, protected, and governed in accordance with Block's privacy policies and regulatory requirements (e.g., GDPR, CCPA)
- Establish and promote best practices for data pipeline reliability, cost optimization, schema management, and compliance across the ingestion platform
Requirements:
- 8+ years of experience in software engineering or data platform development, with a focus on building scalable data systems or distributed infrastructure
- Strong programming proficiency in languages such as Java, Python, Scala, or Go, with experience developing data frameworks, libraries, or services
- Hands-on experience with streaming data systems and technologies such as Apache Kafka, Kafka Connect, or similar distributed messaging platforms
- Solid understanding of Change Data Capture (CDC), database replication patterns, and data lake or Lakehouse architectures
- Experience with modern data storage formats and table formats such as Apache Iceberg or Delta Lake
- Experience with cloud-based data ecosystems (AWS, GCP, or Azure) and infrastructure-as-code tools