Perform data transformation and integration tasks to ensure data from various sources are accurately processed and made ready for annotation projects and for delivery to clients per client requirements.
Ensure efficient data storage, retrieval, and optimization to support data annotation workflows.
Implement data quality checks and validation processes to ensure the accuracy, consistency, and integrity of data used in annotation projects.
Collaborate with the Data Engineering team to design and implement robust and scalable data pipelines for importing and exporting data used in our data annotation projects.
Identify and address performance bottlenecks in data pipelines to enhance the speed and efficiency of data import and export processes.
Continuously seek opportunities to automate manual processes and improve data annotation workflows for increased productivity.
Work closely with cross-functional teams, including data annotation teams, backend developers, and project managers, to understand project requirements and provide timely data support.
Maintain comprehensive documentation of data pipelines, processes, and data structures to facilitate knowledge sharing and seamless project handovers.
Address and resolve data-related issues, providing technical support to data annotation teams when required.
Stay abreast of industry trends, tools, and technologies related to data engineering, and propose innovative solutions for data annotation projects.
Requirements
Bachelor’s degree in computer science, Data Engineering, or a related field.
Proven experience as a Data Engineer, with 4 years of hands-on experience in data pipeline design, data transformation, pipeline orchestration, and data integration, particularly for unstructured and semi-structured data.
Proficiency in programming languages such as Python, SQL, or Scala, and experience with data manipulation libraries and frameworks.
Experience with AirFlow, N8N is a plus.
Experience with Ruby is a big plus.
Knowledge and experience with machine learning projects is a big plus.
Solid knowledge of data storage and database management systems, including relational and NoSQL databases.
Familiarity with data visualization tools and techniques to facilitate data understanding and analysis.
Experience with AWS QuickSight and AWS Athena is a plus.
Solid understanding of data quality and data governance principles.
Familiarity with Data Lake concepts and with Apache Iceberg.
Experience with cloud-based data platforms, such as AWS, GCP, or Azure, is a plus.
Strong problem-solving skills with a keen eye for detail.
Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
A passion for data engineering and a desire to contribute to impactful data annotation projects.
Tech Stack
Airflow
Apache
AWS
Azure
Cloud
Google Cloud Platform
NoSQL
Python
Ruby
Scala
SQL
Benefits
LXT is an equal opportunity employer and ensures that no applicant is subject to less favorable treatment on the grounds of gender, gender identity, marital status, race, color, nationality, ethnicity, age, sexual orientation, socio-economic, responsibilities for dependents, or physical or mental disability.
Any hiring decision is made on the basis of skills, qualifications, and experiences.
We measure our success as a business, not only by delivering great products and services and continually increasing our assets under administration and market share but also by how we positively impact people, society, and the planet.