Ford Motor Company is looking for a skilled GCP Data Engineer to join their EPEO - Data and AI Ops team. In this role, you will be responsible for designing, developing, and maintaining the Security Data Lake and associated data products, while ensuring data quality and integrating diverse data sources.
Responsibilities:
- Design, develop, and maintain robust data pipelines using GCP native services
- Build and manage data quality frameworks to ensure the integrity and reliability of security data assets
- Integrate diverse data sources and security tools via APIs to centralize security oversight
- Optimize database performance, query efficiency, and storage costs within Google BigQuery
- (Preferred) Utilize Cribl to route, shape, and enrich incoming security telemetry and log data
- Develop, deploy, and monitor automated data pipelines (ETL/ELT) using Python, SQL, and GCP services (such as Cloud Functions, Cloud Scheduler, and Dataflow)
- Manage and optimize schema designs, partitioning, and clustering in Google BigQuery to ensure cost-effective and high-performance querying
- Implement and scale data quality and auditing frameworks using GCP Dataplex with centralized rules metadata configuration
- Design and maintain robust API integrations (e.g., EAMS, TrendMicro, and other threat detection platforms) to ingest critical security logs
- [Preferred/Good to Have] Configure and manage Cribl Stream pipelines (sources, destinations, routes, and functions) to parse, mask, enrich, and route security logs
- [Preferred/Good to Have] Implement log reduction strategies in Cribl to optimize data ingestion and lower downstream storage costs
- Partner with security teams to deliver actionable data products, reporting views, and tactical dashboards to prevent service outages
Requirements:
- Deep technical expertise in Google Cloud Platform (GCP)
- Hands-on experience building scalable cloud data pipelines (ETL/ELT)
- Experience with Cribl (for log stream routing, shaping, and reduction)
- Design, develop, and maintain robust data pipelines using GCP native services
- Build and manage data quality frameworks to ensure the integrity and reliability of security data assets
- Integrate diverse data sources and security tools via APIs to centralize security oversight
- Optimize database performance, query efficiency, and storage costs within Google BigQuery
- Develop, deploy, and monitor automated data pipelines (ETL/ELT) using Python, SQL, and GCP services (such as Cloud Functions, Cloud Scheduler, and Dataflow)
- Manage and optimize schema designs, partitioning, and clustering in Google BigQuery to ensure cost-effective and high-performance querying
- Implement and scale data quality and auditing frameworks using GCP Dataplex with centralized rules metadata configuration
- Design and maintain robust API integrations (e.g., EAMS, TrendMicro, and other threat detection platforms) to ingest critical security logs
- Education: Bachelor's degree in Computer Science, Computer Engineering, Data Science, Information Technology, or a related technical field (or equivalent combination of education and experience)
- Experience: 8+ years of professional experience in Data Engineering, Cloud Data Warehousing, or software development
- Cloud Expertise: 5+ years of hands-on experience designing and implementing production-grade solutions on Google Cloud Platform (GCP), specifically utilizing native services such as Google BigQuery, Cloud Run/Functions, and Cloud Storage, Dataflow, PubSub
- Programming & Querying: High proficiency in Python and advanced SQL for building, optimizing, and troubleshooting complex ETL/ELT pipelines
- Collaboration: Excellent written and verbal communication skills, with a proven ability to collaborate effectively with cross-functional teams in an agile environment
- Utilize Cribl to route, shape, and enrich incoming security telemetry and log data
- Configure and manage Cribl Stream pipelines (sources, destinations, routes, and functions) to parse, mask, enrich, and route security logs
- Implement log reduction strategies in Cribl to optimize data ingestion and lower downstream storage costs