Chargement en cours

LLM Evaluator (Model Response Analyst)

LA RÉUNION, FRANCE
il y a 2 jours

Job Title: LLM Evaluator (Model Response Analyst)

Location: Remote (Worldwide)

Job Summary: We are seeking a detail-oriented and analytical LLM Evaluator to assess, analyze, and improve the performance of large language models (LLMs). In this role, you will evaluate AI-generated content for accuracy, coherence, factual reliability, bias, safety, and alignment with defined guidelines.

Responsibilities

  • Evaluate and rank model-generated text based on complex rubrics covering dimensions such as factuality, coherence, safety, instruction‑following, and creativity.
  • Review multiple model responses to the same prompt and determine which output a human would prefer, providing justifications for your choices.
  • Provide clear, concise feedback to the modeling and training teams regarding recurring failure models observed during evaluation sessions.
  • Attempt to “break” the model by crafting prompts designed to elicit biased, harmful, or insecure outputs to help patch safety vulnerabilities.
  • Collaborate with the quality assurance team to suggest improvements to evaluation guidelines when you encounter ambiguous or unclassifiable edge cases.
  • Participate in regular “cross-checking” sessions with other evaluators to calibrate scoring standards and ensure inter‑rater reliability across the global team.
  • When a model underperforms, dig deeper than the surface score to hypothesize “why” the model made a specific error (e.g., training data vs. prompt misinterpretation).
  • Identify and flag novel or unexpected model behaviors to the research team, contributing to a living library of unique model outputs and failure modes.

Requirements

  • Minimum of 2 years of professional experience in a relevant field such as computational linguistics, data analysis, technical writing, quality assurance (specifically for NLP/AI), or cognitive science.
  • Bachelor’s degree in Computer Science, or a related field.
  • Deep understanding of how to craft prompts to elicit specific behaviors and test model limits.
  • Ability to look at a text output and explain “why” it is “good” or “bad” based on logic, tone, factuality, and instruction adherence.
  • Experience working with Reinforcement Learning from Human Feedback (RLHF) data collection.
  • Proven experience monitoring and improving consistency among evaluation teams. Ability to analyze IAA scores and conduct calibration sessions to align judgment.
  • Experience sourcing, cleaning, and annotating datasets specifically for fine‑tuning or evaluating LLMs. Understanding of data distribution and its impact on model performance.
  • Familiarity with A/B testing concepts applied to AI. Ability to help design experiments to test if a new model version is truly “better” than the previous one.
#J-18808-Ljbffr
Entreprise
Odixcity Consulting
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
LA RÉUNION, FRANCE
il y a 2 jours
PARIS, 75
il y a 2 jours
PARIS, 75
il y a 2 jours
Soyez le premier à postuler aux nouvelles offres
Soyez le premier à postuler aux nouvelles offres
Créez gratuitement et simplement une alerte pour être averti de l’ajout de nouvelles offres correspondant à vos attentes.
* Champs obligatoires
Ex: boulanger, comptable ou infirmière
Alerte crée avec succès