Bayer is a visionary company dedicated to solving the world’s toughest challenges. They are seeking a Principal Research Data Engineer to oversee the development and implementation of research data pipelines, ensuring the integration of geospatial data into machine learning models, while collaborating with diverse research partners.
Responsibilities:
- Oversee the development & implementation of research data pipelines for producing data layers and storing research data
- Implement & maintain scalable data-intensive processing pipelines that apply geospatial to ML/DL models
- Architect, build & launch new data models to provide intuitive analytics to business users
- Develop infrastructure to inform on key metrics, recommend changes & predict future results
- Develop POCs for new pipelines for integration into science data pipeline through collaboration with diverse research partners
Requirements:
- Master's in Information Science, C.S., Data Science, Data Analytics, or closely related field
- 5 years of experience designing, developing, testing, and implementing scalable geospatial data integration pipelines that encompass statistical yield analysis and interactive report and visualization generation
- Working with raster & vector geospatial datasets applied to machine learning model generation and deployment in big data environment
- Packaging & deploying models and data pipelines using CI/CD practices, including production readiness and performance tuning activities using Python and/or Conda, Docker, Airflow, and Git CI/CD
- Using Google Cloud Platform, Google Cloud Functions, Google Big Query, and Data Proc to process data at scale and deliver robust data pipelines
- Using Avro, Parquet, CSVs, Geotiff and GeoJSON file formats
- Programming in SQL
- Conducting query optimization & Online Analytical Processing on RDBMS and No-SQL databases
- Using QGIS, ArcGIS & Postgis to ingest and process geospatial data in Avro, CSVs, and GeoJSON