LLM Evaluation Expert Engineer

LYON, 69

il y a 1 jour

Level of qualifications required : Graduate degree or equivalent

Fonction : Support functions

Level of experience : From 5 to 12 years

Context

Following the priorities established in May 2024 by the Seoul Declaration for Safe, Innovative and Inclusive AI, to which France is a signatory, the French government decided to create INESIA, an institute whose mission is to bring together , without creating a new legal entity, national stakeholders involved in AI evaluation and safety , in particular:

the French Cybersecurity Agency (ANSSI),
the National Laboratory of Metrology and Testing (LNE),
the Digital Regulation Expertise Center (PEReN),
and the French National Institute for Research in Digital Science and Technology (Inria).

Within this framework, Inria primarily contributes to activities related to systemic risk analysis in the field of national security, as well as the evaluation of the performance and reliability of AI models.

This work is strategically coordinated with Inria’s AI Evaluation research program and materializes through the design and development of an AI evaluation platform , particularly focused on systems based on Large Language Models (LLMs).

The platform aims to provide an integrated, secure, and robust environment supporting the program’s research projects, while enabling the development of evaluation applications such as benchmarking campaigns and red teaming exercises. It relies on open-source tools from the AI ecosystem as well as internally developed components.

You will join a team operating in a fast‑paced, iterative development environment: the platform will evolve progressively through regular operational deliverables. We are looking for individuals capable of proposing solutions, making technical trade‑offs, and transforming technical requirements into operational systems.

As an LLM Evaluation Expert, you will play a central role in defining evaluation methodologies.

This position offers the opportunity to contribute to a strategic and ambitious project at the heart of current challenges related to AI safety, transparency, and governance, spanning technical, scientific, and societal dimensions.

Assignment

Design, structure, and implement evaluation protocols for LLM-based models and systems, and integrate them into the platform's modular architecture.

Main activities

Define and implement evaluation protocols, benchmarks, and metrics
Analyze and interpret evaluation results in order to derive methodological recommendations
Contribute to the definition of the platform's software architecture
Document methodologies and evaluation procedures

Required Skills

Experience in AI model evaluation (metrics, experimental protocols) with a strong scientific understanding of the field
Strong understanding of how LLMs operate
Excellent proficiency in Python and the ML ecosystem
Familiarity with software development best practices (Git versioning, CI/CD, documentation)
Ability to write technical documentation

Preferred Skills

Experience with evaluation frameworks (Inspect, LightEval, etc.)
General knowledge of AI software engineering, particularly for scientific experimentation
Familiarity with web application deployment tools (Docker, docker‑compose, CI/CD)

Additional Appreciated Skills

Technical English proficiency, both written and spoken
Awareness of AI trustworthiness and safety challenges

Location & Eligibility

The position may be situated in a restricted area (ZRR). Authorization to enter is granted by the director of the unit following a favourable Ministerial decision, as defined in the relevant decree. People with disabilities may apply under Inria’s diversity policy.

#J-18808-Ljbffr

Entreprise

Inria

Plateforme de publication

WHATJOBS

Offres pouvant vous intéresser

Compute Infrastructure and HPC Expert Engineer

LILLE, 59

il y a 1 jour

LLMOps / AI Runtime Engineer

PARIS, 75

il y a 1 jour

MRI Sequence Development Engineer (M/F)

PARIS, 75

il y a 7 jours

Technical Development Manager Halamid

BOUC BEL AIR

il y a 8 jours