Design, develop, and maintain scalable and efficient data pipelines to collect, clean, and transform large volumes of data.
Collaborate with software engineers and other stakeholders to understand data requirements and implement effective solutions.
Ensure data pipelines are robust, reliable, and optimized for performance.
Design and implement data models that support the storage, retrieval, and analysis of structured and unstructured data.
Integrate and consolidate data from various sources, both internal and external, to create a unified and comprehensive data ecosystem.
Ensure data integrity and accuracy through data quality assessments, cleansing, and validation techniques.
Optimize and enhance machine learning algorithms for performance, scalability, and accuracy.
Implement data preprocessing, feature engineering, and model training workflows using Python and relevant libraries (e.g., scikit-learn, TensorFlow, PyTorch).
Configure and maintain cloud-based infrastructure for data storage, processing, and analysis.
Monitor and troubleshoot data-related issues, ensuring high availability and reliability of data systems.
Stay up-to-date with emerging technologies, tools, and best practices in data engineering, AI, and ML.
Requirements
Bachelor's or Master's degree in Computer Science, Data Science, or a related field with 3-5 years’ experience
Proven experience as a Data Engineer or similar role, with a focus on AI/ML projects.
Strong proficiency in Python programming and experience with relevant libraries and frameworks (e.g., pandas, NumPy, scikit-learn, TensorFlow, PyTorch).
Solid understanding of data engineering concepts, data modeling, and database systems (e.g., SQL, NoSQL).
Experience with data integration and ETL tools (e.g., Apache Airflow, Apache Spark, Talend).
Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and related services (e.g., S3, EC2, BigQuery).
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.