Socure is building the identity trust infrastructure for the digital economy, focusing on verifying identities in real time and preventing fraud. The Senior Data Engineer will design and build scalable data platforms and pipelines to support Socure's identity verification products and analytics, requiring a strong passion for solving business problems with data.
Responsibilities:
- Design and build batch and streaming data pipelines to support automated data ingestion, ML feature engineering and analytics across multiple product domains
- Own end-to-end delivery of complex, ambiguous data initiatives, including architecture, implementation, testing, deployment, monitoring, and documentation
- Develop and evolve the data platform to support large-scale data processing using modern cloud-native technologies
- Automate data operations (validation, quality checks, alerting, backfills, and recovery workflows) to reduce manual effort and improve consistency
- Optimize cost, performance, and reliability of data workloads
- Partner closely with cross-functional teams (Data Science, Product, Engineering) to understand requirements, translate them into technical solutions
- Evaluate and adopt new technologies (new processing engines, storage formats, orchestration tools, GenAI-assisted ingestion) to keep the platform modern and efficient
Requirements:
- 5+ years of hands-on data engineering experience, building and maintaining production-grade data platforms and pipelines
- Strong programming skills in general-purpose language (such as Python or Scala) for data processing, and SQL for data analytics
- Deep experience with distributed data processing frameworks, such as Apache Spark, including performance tuning and optimization
- Proven experience building data solutions using services on AWS (EMR, Lambda, s3, etc)
- Strong understanding of data modeling and data warehousing concepts, including partitioning, schema design for large-scale datasets
- Experience operating and supporting production pipelines, including monitoring, alerting, incident response, and improving reliability over time
- Solid foundation in software engineering practices, including version control, CI/CD, testing strategies, and code review
- Strong communication and collaboration skills, with the ability to work effectively with both technical and non-technical stakeholders
- Experience with streaming or near-real-time data processing (Kafka, Kinesis, etc)
- Hands-on experience with data orchestration tools (Airflow, Step Functions, etc)
- Familiarity with modern data platform patterns such as Data Lakehouse, Data Mesh, and large-scale data sharing across teams
- Experience with prompt engineering using modern GenAI, Large Language Models (LLM)
- Experience mentoring other engineers and contributing to engineering-wide standards, best practices