AI Kernel Optimization Engineer
The Role
As an AI Kernel optimization Engineer, you will play a key role in pushing the limits of AI inference performance on Openchip RISC-V platforms.
You will design, implement, and optimize AI compute kernels (Gen AI Large Language Model, AI Vision, CNNs, etc) and runtime components to fully exploit the underlying hardware architecture — from vector/matrix units and memory hierarchies down to the assembly level.
Your work will directly influence how efficiently AI models run on Openchip SoCs, shaping the performance of next-generation inference accelerators. You will collaborate closely with hardware architects, compiler engineers, and AI framework developers to achieve optimal hardware–software co-design.
Key Responsibilities
- Develop, optimize, profile, and debug AI compute kernels (e.g., GEMM, attention, activation) targeting Openchip RISC-V architectures.
- Identify and resolve performance bottlenecks at the ISA, compiler, and runtime levels.
- Collaborate with the hardware and architecture teams to guide design decisions that improve real‑world performance.
- Contribute to the development and tuning of AI runtime and graph execution engines.
- Evaluate and benchmark AI inference workloads on Openchip platforms.
- Implement performance analysis tools and scripts to automate profiling and validation.
- Work with AI frameworks (e.g., PyTorch, ONNX Runtime, TensorRT, TVM) to ensure efficient mapping to Openchip targets.
- Stay up to date on AI kernel optimization trends, emerging hardware acceleration techniques, and open‑source contributions.
Required Qualifications
- MSc or PhD in Computer Engineering or Computer Science, or equivalent practical experience.
- 3+ years of experience in performance optimization for AI Inference or HPC use cases.
Technical Skills
- Strong background in low‑level performance optimization (vectorization, memory access optimization, loop unrolling, instruction scheduling, data‑tiling, etc.).
- Proficiency in C/C++ and good understanding of assembly‑level optimizations (SIMD, intrinsics, compiler flags).
- Solid understanding of CPU/GPU/AI accelerator architecture (pipelines, caches, memory hierarchies, compute units).
- Experience with RISC‑V architectures or other custom ISAs.
- Experience with profiling and performance analysis tools (perf, VTune, nvprof, etc.).
- Strong knowledge of parallel programming (SIMD, multithreading, OpenMP, CUDA, or similar).
- Solid software engineering skills (version control, CI/CD, testing).
Nice to Have
- Experience with AI inference workloads or libraries (e.g., BLAS, cuDNN, oneDNN, TVM, or similar).
- Familiarity with MLIR/LLVM or other compiler infrastructures.
- Contributions to open‑source AI inference engines or kernel libraries.
- Understanding of chiplet‑based architectures or heterogeneous computing.
- Experience with quantization and mixed precision inference.
Soft Skills
- Passion for performance and detail‑oriented mindset.
- Strong analytical and problem‑solving abilities.
- Proactive, collaborative, and open to cross‑disciplinary work.
- Curious and self‑driven, with a learning mindset.
- Comfortable working in an international, fast‑evolving startup environment.
What We Offer
- Contract: Permanent contract (CDI)
- Start date: Beginning 2026
- Location: Montbonnot‑Saint‑Martin (near Grenoble)
- Remote policy: Up to 2 days per week remote work possible
- Benefits: Meal vouchers, Premium health coverage, Sustainable mobility incentives, Generous paternity leave, etc.
- We offer a remuneration package that values your experience
- Opportunity to travel to other countries in Europe to meet the teams, collaborate and drive solutions
At Openchip & Software Technologies S.L., we believe a diverse and inclusive team is the key to groundbreaking ideas.
We foster a work environment where everyone feels valued, respected, and empowered to reach their full potential - regardless of race, gender, ethnicity, sexual orientation, or gender identity.
#J-18808-Ljbffr