Design and develop scalable, robust, and high-performance data pipelines and data storage solutions.
Architect and implement data models and schemas optimized for both performance and scalability.
Develop and maintain ETL (Extract, Transform, Load) processes to ingest data from various sources, including structured and unstructured data.
Implement data transformation and cleaning procedures to ensure data quality and consistency.
Optimize data processing workflows to ensure efficient resource utilization and minimize latency.
Utilize big data technologies such as Hadoop, Databricks, Spark SQL, Kafka, and Hive to process and analyze large datasets.
Integrate data from multiple sources, including databases, APIs, and third-party data providers.
Manage and maintain data lakes and data warehouses, ensuring data is organized and accessible for analysis.
Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet their needs.
Provide technical guidance and support to team members and other departments as needed.
Stay updated with the latest industry trends and best practices in big data engineering.
Continuously improve data processing techniques and tools to enhance the overall efficiency and effectiveness of the data engineering team.
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
Requirements
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Proven experience as a Big Data Engineer or in a similar role, with a strong focus on big data technologies and data processing.
Minimum 9 to 12 years of experience, depending upon education.
Certification in big data or cloud technologies (Preferred).
Proficiency in programming languages such as Java, Scala, or Python.
Strong experience with big data technologies and frameworks (e.g., Hadoop, Databricks, Spark SQL, Kafka, Hive).
Knowledge of database systems, both SQL (e.g., PostgreSQL, MySQL, Oracle) and NoSQL (e.g., Cassandra, MongoDB).
Implement and maintain Elasticsearch clusters, ensuring data availability, reliability, and consistency across distributed environments.
Integrate Elasticsearch with other big data tools and frameworks, such as Apache Kafka, Hadoop, and Spark, to streamline data processing pipelines.
Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
Knowledge of data warehousing solutions like Amazon Redshift, Google BigQuery, or Snowflake.
Excellent analytical and problem-solving skills.
Strong verbal and written communication skills.
Ability to work effectively both independently and as part of a team.
Strong organizational skills and the ability to manage multiple tasks and priorities.
Selected applicants must be a current US Citizen and must have an active Top-Secret security clearance with recent background investigation (within the last five years), or current enrollment in continuous evaluation.