Lead LLM Engineer
PARIS, 75
il y a 6 jours
Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.
What You Will Own
You will be responsible for one thing: Make our AI outputs reliable, fast, and indispensable in real workflows.
Responsibilities
- Design and evolve our LLM / agent architecture
- Own output quality across key use cases (emails, document analysis, etc.)
- Build evaluation systems (datasets, metrics, regression detection)
- Drive fast iteration loops from production data
- Improve retrieval, reasoning, and tool usage
- Ensure production reliability (latency, failure modes, fallback)
- Work directly with product + founders on what to build and why
What This Role Is Really About
- They don’t know what “good output” means
- They don’t have evals
- They iterate randomly
- They overuse agents
Your job is to fix that.
What You Will Turn
- Vague user problems into structured AI systems
- With measurable performance
- That improve every week
What You Need To Be Excellent At
- Shipping real LLM systems
- Built systems used in production (not demos)
- Understand RAG, tools, agents, structured outputs
- Can design full pipelines, not just prompts
- Evaluation-driven development
- Define quality metrics
- Build datasets from real usage
- Run continuous evals to prevent regressions
- Debug complex failures
- Trace issues across:
- retrieval
- prompts
- model behavior
- Do not guess — isolate and fix
- Speed of iteration: move from problem to improvement in hours or days, not weeks
- Use logs, traces, and data, not intuition alone
- Strong judgment
- Know when to:
- Use an agent vs a pipeline
- Add complexity vs simplify
- Optimize for reliability and user value, not novelty
What We Don’t Care About
- Number of years of experience
- Whether you’ve used a specific framework
- Fancy research credentials
Si vous pouvez construire, déboguer et améliorer des systèmes réels, vous êtes un candidat idéal.
What Success Looks Like (first 90 Days)
- Clear eval framework for core use cases
- Measurable improvement in output quality
- Faster iteration cycles across the team
- Reduced hallucinations / failures
- Stronger system architecture decisions
Stack (context, Not Requirements)
- Python (FastAPI)
- Postgres
- Google Cloud
- LangGraph / LangChain (evolving)
- PostHog (product analytics)
- Langfuse (LLM traces)
- LLM APIs (Azure OpenAI)
Entreprise
Leonar
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
FRANCE
il y a 13 jours
PARIS, 75
il y a 2 jours
ÎLE- E FRANCE, FRANCE
il y a 13 jours
PARIS, 75
il y a 13 jours