Theoris Services is assisting their client in the search for a Data/Software Engineer to join their growing team. The role involves designing and optimizing data pipelines, implementing lakehouse architectures, and developing data visualization tools to support scientific data integration and analysis.

Responsibilities:

Design, build, and optimize scalable data pipelines and ETL/ELT processes to integrate and harmonize scientific data (compounds, assays, experiments) from 30+ heterogeneous sources
Implement and maintain lakehouse architectures on AWS (S3, Glue, Athena, Iceberg) to support multibillion-record datasets
Develop federated query capabilities using Trino (or similar distributed engines) for unified access across platforms like PostgreSQL, Snowflake, and others
Build robust backend services, RESTful APIs, and data services using Python (FastAPI, Flask preferred) to enable seamless data flow and integration with scientific tools (e.g., Benchling, computational chemistry systems, AI/ML endpoints)
Optimize query and database performance for complex analytical workloads across PostgreSQL, Iceberg, Trino, and other platforms
Implement caching, indexing, and query tuning techniques to improve response times and scalability as data volumes and user bases grow
Apply reverse engineering and advanced troubleshooting skills to debug complex data issues, pipeline bottlenecks, application failures, and performance problems proactively
Monitor systems, identify root causes, and implement fixes for data and application reliability
Design and develop interactive dashboards, visual analytics, and scientific data visualizations using Power BI and Spotfire (or equivalent tools)
Create reusable visualization components and data-rich UIs (React/TypeScript preferred) to enable scientists to search, filter, explore, and interpret complex datasets—including dose-response curves, chemical structures, and analytical results
Translate scientific and engineering data into clear, actionable visual insights for researchers and stakeholders
Apply best software engineering practices: modular/reusable design, clean code principles, code reviews, comprehensive documentation, and creation of maintainable libraries/services
Write high-quality unit, integration, and end-to-end tests; use mock data effectively to create reliable automated test cases and ensure code stability
Implement CI/CD pipelines for automated testing, deployment, and monitoring on AWS (EC2, ECS, Lambda, S3)
Collaborate on full-stack features from database to frontend, ensuring end-to-end functionality, security (SSO/LDAP), and performance
Partner with scientists, UX designers, and cross-functional teams to gather requirements, conduct user testing, and iterate on usability
Implement data validation, quality checks, metadata management, and governance to ensure compliance and accuracy
Contribute to engineering best practices and foster a culture of quality and scalability

Requirements:

Bachelor's degree in Computer Science, Data Engineering, Software Engineering, Information Systems, or a related technical field
3+ years of professional experience in data engineering, full-stack development, or closely related roles
Proven track record of building and delivering production-grade data pipelines, platforms, and/or user-facing scientific applications
Intermediate to strong proficiency in Python (core for pipelines, backend, and data manipulation with pandas/PySpark); familiarity with JavaScript/TypeScript for frontend
Hands-on experience creating scalable pipelines, ETL/ELT processes, and distributed processing (Spark, Trino/Presto)
Deep expertise in relational databases (PostgreSQL), modern warehouses (Snowflake, Redshift), and query engines; strong focus on query performance improvement and optimization
Practical experience with AWS services (S3, Glue, Athena, Lambda, RDS, EC2/ECS)
Proven experience with Power BI and Spotfire (or similar) for scientific and analytical dashboards/visualizations
Strong unit testing skills; experience writing automated tests with mock data for robust coverage
Git for version control; API design (RESTful); CI/CD; clean code and reusable library development
Excellent reverse engineering and troubleshooting capabilities for complex data and system issues
Strong problem-solving skills with attention to detail and commitment to data quality/accuracy
Ability to work independently and collaboratively in cross-functional, scientific teams
Excellent communication skills to bridge technical concepts with non-technical stakeholders (scientists, researchers)
Modern JavaScript/TypeScript frameworks (React preferred), responsive UI development, and component libraries

Data Engineer

Key skills

About this role

Responsibilities:

Requirements: