Silverchair is the premier independent platform partner for scholarly and professional publishers, dedicated to expanding the reach of the world’s most valuable knowledge. They are seeking a Data Engineer to build and maintain data pipelines that turn scholarly publishing activity into insights for their clients, ensuring reliable data flow and supporting production data issues.
Responsibilities:
- Design, build, and maintain data pipelines that ensure reliable data flow from source systems through transformation layers to reporting
- Integrate data quality checks and validation into the pipeline workflow
- Implement error handling, logging, and retry capabilities to keep pipelines robust and recoverable
- Develop SQL and Python-based transformations that cleanse, enrich, and structure data for analytical use
- Design and implement dimensional models including fact tables and dimension tables
- Monitor and tune pipeline and query performance
- Use execution plans and profiling tools to identify bottlenecks and improve throughput and efficiency
- Troubleshoot and resolve production data issues using logs, monitoring tools, and systematic debugging
- Ensure pipelines run reliably and data is delivered on schedule
- Work closely with your scrum team and cross-functional partners across analytics, product, and engineering
- Document pipeline designs, data lineage, and business rules
- Participate in code reviews and contribute to team knowledge sharing
Requirements:
- 3-5 years of professional experience in data engineering or a closely related role
- Bachelor's degree in Computer Science, Data Science, Information Systems, or a related field, or equivalent practical experience
- Strong SQL skills including complex joins, CTEs, window functions, aggregations, views, functions, and stored procedures
- Ability to write clean, modular Python using functions and classes
- Experience designing dimensional models (star schema, fact/dimension tables)
- Hands-on experience building data pipelines with orchestration tools
- Production experience with Azure Data Factory and Azure Synapse Analytics (Dedicated SQL Pool, Serverless, Spark) is required
- Understanding of data partitioning, shuffling, and distribution strategies
- Proficient with Git for branching, merging, and pull request workflows
- Comfortable working in an Agile/Scrum environment with CI/CD practices
- Microsoft DP-700 (Fabric Data Engineer Associate) or Databricks Data Engineer Associate certification is a nice-to-have
- Hands-on experience with modern lakehouse or unified analytics platforms (e.g., Databricks, Microsoft Fabric, Snowflake)
- Familiarity with Kafka-based event streaming (we use Confluent)
- Experience with Change Data Capture (CDC), incremental ingestion strategies, and preservation of historical data
- Familiarity with BI tools such as Power BI
- Comfortable using AI coding tools as part of your workflow (we use Claude Code)
- Ability to work within Eastern Time Zone hours (8a-5p)