Canaria is a technology product startup transforming the job market and career personalization space. They are seeking a Software/Data Engineering Intern to optimize and enhance their analytics engine, incorporating various databases and technologies for efficient data processing and querying.
Responsibilities:
- Upgrade and optimize the PostgreSQL analytics engine for enhanced performance and stability
- Design and implement robust data pipelines integrating data from NoSQL, graph, vector databases, and real-time Kafka streams
- Implement and optimize Spark for scalable big data querying
- Develop Kafka streams for real-time data ingestion and synchronization across multiple databases
- Containerize and deploy solutions using Docker and Kubernetes
- Conduct testing and performance tuning to identify and address bottlenecks and inefficiencies
- Ensure processed data quality meets the standards required for ML, NLP, and generative AI applications
- Write clean, efficient, and well-documented code
- Clearly communicate technical designs and concepts to stakeholders
- Optional: Develop integrations with Gemini/OpenAI APIs to demonstrate advanced analytics capabilities via natural language querying
Requirements:
- Currently pursuing a Bachelor's or Master's degree in Computer Science, Data Engineering, Software Engineering, or a related field
- Strong software development skills with proficiency in Python and Unix/Linux
- Knowledge of SQL, MongoDB, Spark, and Kafka
- Familiarity with Docker and Kubernetes
- Experience with RESTful API integration
- Understanding of scalable database design, analytics engines, and big data architectures
- Excellent problem-solving, debugging, and communication skills