LLM Evaluation Expert Engineer
Level of qualifications required : Graduate degree or equivalent
Fonction : Support functions
Level of experience : From 5 to 12 years
Context
Following the priorities established in May 2024 by the Seoul Declaration for Safe, Innovative and Inclusive AI, to which France is a signatory, the French government decided to create INESIA, an institute whose mission is to bring together , without creating a new legal entity, national stakeholders involved in AI evaluation and safety , in particular:
- the French Cybersecurity Agency (ANSSI),
- the National Laboratory of Metrology and Testing (LNE),
- the Digital Regulation Expertise Center (PEReN),
- and the French National Institute for Research in Digital Science and Technology (Inria).
Within this framework, Inria primarily contributes to activities related to systemic risk analysis in the field of national security, as well as the evaluation of the performance and reliability of AI models.
This work is strategically coordinated with Inria’s AI Evaluation research program and materializes through the design and development of an AI evaluation platform , particularly focused on systems based on Large Language Models (LLMs).
The platform aims to provide an integrated, secure, and robust environment supporting the program’s research projects, while enabling the development of evaluation applications such as benchmarking campaigns and red teaming exercises. It relies on open-source tools from the AI ecosystem as well as internally developed components.
You will join a team operating in a fast‑paced, iterative development environment: the platform will evolve progressively through regular operational deliverables. We are looking for individuals capable of proposing solutions, making technical trade‑offs, and transforming technical requirements into operational systems.
As an LLM Evaluation Expert, you will play a central role in defining evaluation methodologies.
This position offers the opportunity to contribute to a strategic and ambitious project at the heart of current challenges related to AI safety, transparency, and governance, spanning technical, scientific, and societal dimensions.
Assignment
Design, structure, and implement evaluation protocols for LLM-based models and systems, and integrate them into the platform's modular architecture.
Main activities
- Define and implement evaluation protocols, benchmarks, and metrics
- Analyze and interpret evaluation results in order to derive methodological recommendations
- Contribute to the definition of the platform's software architecture
- Document methodologies and evaluation procedures
Required Skills
- Experience in AI model evaluation (metrics, experimental protocols) with a strong scientific understanding of the field
- Strong understanding of how LLMs operate
- Excellent proficiency in Python and the ML ecosystem
- Familiarity with software development best practices (Git versioning, CI/CD, documentation)
- Ability to write technical documentation
Preferred Skills
- Experience with evaluation frameworks (Inspect, LightEval, etc.)
- General knowledge of AI software engineering, particularly for scientific experimentation
- Familiarity with web application deployment tools (Docker, docker‑compose, CI/CD)
Additional Appreciated Skills
- Technical English proficiency, both written and spoken
- Awareness of AI trustworthiness and safety challenges
Location & Eligibility
The position may be situated in a restricted area (ZRR). Authorization to enter is granted by the director of the unit following a favourable Ministerial decision, as defined in the relevant decree. People with disabilities may apply under Inria’s diversity policy.
#J-18808-Ljbffr