FUJIFILM Biotechnologies is looking for a Principal Data/AI Engineer to drive the technical strategy and architecture of enterprise-scale data and AI platforms. The role involves planning, designing, and developing data pipelines while mentoring junior engineers and advocating for best practices in data and AI engineering.
Responsibilities:
- Architect, build, and maintain highly scalable batch and streaming pipelines on the Snowflake Data Platform (Snowpipe, Tasks, Streams, Dynamic Tables, Snowpark, Iceberg)
- Architect and deliver ML/GenAI solutions using managed cloud services (AWS, Azure, Snowflake Cortex)
- Implement modern data modeling and architecture patterns; establish and enforce standards for data quality (tests, expectations, SLAs/SLOs), observability (metrics, logs, traces), and lineage
- Ensure integration of biotech systems (MES, LIMS, SCADA, ERP, QMS) into centralized data platform
- Collaborate with product managers, product engineers, platform architects, and business stakeholders to align data and AI engineering solutions with business requirements
- Enable modern AI use cases - feature stores, vector search/RAG, model serving, safety/guardrails, and continuous monitoring for drift, bias, and performance
- Optimize storage tiers, compute clusters/warehouses, caching, and workload orchestration for latency and throughput
- Partner with cybersecurity and compliance teams to ensure adherence to GxP, FDA 21 CFR Part 11, and data privacy regulations
- Lead design reviews, incident postmortems, and cross-team architecture forums
- Stay current with emerging technologies (data mesh, real-time streaming, digital twins, generative AI platforms) and introduce relevant innovations
- And other job duties that may be assigned from time to time
Requirements:
- Bachelor's degree in Computer Science, Data Engineering, AI/ML Engineering, or related field
- 12+ years of professional experience in data/software engineering, AI/ML engineering, or cloud platform engineering
- Proven experience using Python and SQL
- Extensive experience building and maintaining data pipelines using modern frameworks (e.g. Airflow, dbt)
- Proven experience with data modelling for analytics and AI use cases
- Strong experience with cloud platforms (AWS, Azure)
- Proven experience delivering production-grade data solutions
- Familiarity with biotech or life sciences systems and regulatory compliance frameworks (GxP, FDA, EMA)
- Design and implementation of scalable batch and streaming data pipelines
- Strong proficiency in Python and SQL/dbt for data processing, automation, and analytics
- Extensive experience in Airflow or similar orchestration tool
- Expertise in designing and developing data solutions on Snowflake, including data modelling, performance optimization, and cost-efficient usage
- Experience with modern AI technologies, including LLMs, embeddings, and vector databases
- Proven track of delivering cloud-based solutions (AWS, Azure)
- Containerization and deployment of data and AI workloads using Docker
- Orchestration and operation of containerized workloads using Kubernetes
- Data quality management, observability, lineage, and governance
- Knowledge of biotech IT/OT systems (MES, LIMS, SCADA), and compliance frameworks (GxP, FDA, data privacy)
- Strong problem-solving, optimization, and troubleshooting skills for large-scale data systems
- Effective communication with both technical and non-technical stakeholders, influencing at senior levels
- Passion for emerging technologies, continuous improvement, and building innovative engineering cultures
- Advanced degree (MS/PhD) preferred
- Relevant industry certifications (e.g., Snowflake, AWS, Azure) preferred