About this role

Onboard a hands-on SME in leveraging big data tech to solve complex data issues.
Spend almost half of time with hands-on coding.
Involves large scale text data processing, event driven data pipelines, in-memory computations, optimization considering CPU core to network IO to disk IO.
Use cloud native services in AWS and GCP.

7+ years of hands-on experience in Software Development with a focus on big data and large data pipelines.
Minimum 3 years of experience to build services and pipelines using Python.
Expertise with a variety of data processing systems, including streaming, event, and batch (Spark, Hadoop/MapReduce).
Understanding of at least one NoSQL store like MongoDB, Elasticsearch, HBase.
Understanding of how data models, sharding and data location strategies for distributed data stores in large scale high-throughput and high-availability environments and their effect in non-structured text data processing.
Experience with running scalable & high available systems with AWS or GCP.
Good to have: Experience with Docker / Kubernetes
Exposure with CI/CD
Knowledge of Crawling/Scraping

Big Data Lead

Key skills