About this role

Optimize emerging AI inference workloads such as Large Language Models (LLMs) and Diffusion models on GPUs
Develop and optimize graph-based compilation flows (e.g., MLIR/LLVM) for neural network workloads
Write and tune performance-critical GPU kernels and runtime code in C++ or parallel programming languages
Identify and resolve bottlenecks across compiler, runtime, and kernel layers
Profile, benchmark, and characterize AI workloads to validate performance gains
Collaborate with hardware, driver, and framework teams on hardware/software co-optimization

Bachelor's degree with 4+ years of relevant experience, OR Master's degree with 2+ years of relevant experience in Computer Science or a related field
Strong C++ development and debugging skills
Solid understanding of GPU architectures or AI accelerators
Hands-on experience with modern neural network architecture for inference on hardware accelerators
PhD and 1+ years of relevant experience preferred
Experience optimizing end-to-end real-world AI workloads
Familiarity with OpenVINO or other AI inference frameworks
Knowledge of neural network optimization techniques and performance tradeoffs
Experience across multiple layers of the AI software stack, including: AI inference engines or runtimes, Graph compilers (e.g., MLIR/LLVM), GPU kernels or performance critical compute code
Performance profiling and workload analysis

AI Software Development Engineer

Key skills