Design, develop, and optimize new features and algorithms for oneDNN targeting Intel processors, Intel Processor Graphics, and Intel discrete GPUs.
Perform performance analysis and optimization to achieve best‑in‑class deep‑learning inference and training throughput on current and next‑generation Intel platforms.
Develop hardware‑specific parallel algorithms, including multithreading, vectorization, and memory‑layout optimizations.
Contribute to assembly‑level programming and low-level performance tuning for Intel microarchitectures.
Collaborate with cross‑functional teams across software engineering, architecture, and AI performance to ensure strong integration with Intel’s broader AI ecosystem.
Engage with the open‑source community, participate in code reviews, and maintain high-quality coding and documentation standards.
Requirements
Master or PhD Mathematics, Physics, Computer Science or in a related field
5+ years of experience in the following areas: C++ Algorithms and data structures, or Mathematical background
Low-level Performance Optimizations, preferably on GPUs
3 years+ High-performance computing (HPC) applications development (preferred)
1 year+ Machine learning and deep learning algorithms (preferred)
1 year+ Agile software development environment (preferred)
1 year+ Intel development tools (preferred)
Software libraries design and architecture (preferred)
Background in Linear algebra solvers, matrix-vector operations, or Fast Fourier Transforms (preferred)
Software development on Linux (preferred)
GPU optimizations (OpenCL, CUDA, SYCL/DPC++, C for Metal or similar) (preferred)
Parallel programming (OpenMP, TBB, or MPI) (preferred)
Tech Stack
Assembly
Linux
Benefits
stock program
annual and quarterly bonuses
pension plan
medical and life insurance for you and your family