Codvo.ai is committed to building scalable, future-ready data platforms that power business impact. The Data Engineer will be a foundational architect responsible for building and maintaining the data ecosystem required for complex data challenges in the oil and gas sector.
Responsibilities:
- Architect & Build Data Pipelines: Design, construct, install, test, and maintain highly scalable data management systems and ETL/ELT pipelines
- Integrate Diverse Data Sources: Develop processes to ingest and integrate high-volume, high-velocity data from SCADA systems, historians (like OSIsoft PI, Aspen InfoPlus.21), DCS, PLC, and IoT sensors
- Cloud Data Platform Development: Implement and manage data solutions on the Microsoft Azure cloud platform, Leveraging services like Azure IoT Hub, Azure Event Hubs, and Azure Stream Analytics for real-time ingestion and processing of operational technology (OT) data
- Data Modelling & Warehousing: Design and implement data models optimized for time-series data from industrial assets, supporting operational dashboards and real-time analytics
- Enable Advanced AI: Build the data infrastructure to support AI/ML models for predictive maintenance, operational anomaly detection, and process optimization using real-time OT data
- Champion Master Data Management (MDM): Design and implement MDM strategies and solutions to create a single, authoritative source of truth for critical data domains such as wells, equipment, and assets, ensuring data consistency across the enterprise
- Ensure Data Quality & Governance: Implement robust data quality checks, validation rules, and monitoring to ensure the accuracy, consistency, and reliability of our data. Adhere to and help shape our data governance policies
- Embrace Industry Standards: Champion and implement industry-specific data standards and models, such as the OSDU™ Data Platform, to ensure interoperability and a unified data view across the upstream lifecycle
- Collaborate & Innovate: Work closely with a cross-functional team of geoscientists, drilling engineers, data scientists, and business analysts to understand their data needs and deliver effective solutions
- Automate & Optimize: Identify opportunities for process automation and infrastructure optimization to improve data delivery, scalability, and cost-effectiveness
- Security First: Implement and maintain security best practices to protect our sensitive and proprietary data assets
Requirements:
- Bachelor's in engineering, Information Systems, or a related quantitative field
- 5+ years of proven experience in a data engineering role
- Experience within oil and gas industry is highly preferred
- Demonstrable experience building and operationalizing large-scale data pipelines and applications
- Expert-level proficiency in SQL and Python for data manipulation and pipeline development
- Hands-on experience with distributed computing frameworks like Apache Spark (PySpark). Experience with streaming technologies like Kafka is a plus
- Deep experience with Microsoft Azure (Azure Data Lake Storage, Azure Data Factory, Azure Databricks, Azure Synapse)
- Proven experience with modern data platforms Databricks Delta Lake
- Understanding of machine learning lifecycles and the data requirements for training and deploying AI/ML models
- Experience with workflow orchestration tools like Airflow, Dagster, or Azure Data Factory
- Strong understanding of both relational (e.g., PostgreSQL, SQL Server) and NoSQL databases. Experience with graph databases (e.g., Neo4j) and vector databases is highly desirable
- Proficiency with Git and CI/CD best practices
- Familiarity with historian systems (e.g., OSIsoft PI System) and their data structures
- Hands-on experience with the OSDU™ Data Platform
- Experience working with industrial communication protocols (e.g., OPC UA, Modbus TCP/IP)
- Understanding of cybersecurity considerations for OT environments and data segregation
- Experience integrating data from ERP systems like SAP
- Professional certifications in Azure or Databricks
- Advanced SQL skills, including query optimization and performance tuning
- Knowledge of containerization technologies (Docker, Kubernetes)
- Experience of constructing and maintaining enterprise knowledge graphs