Full-stack ML Engineer

PARIS, 75

il y a 2 jours

Integrate and serve new models on GPUs — image, video, 3D, audio and text — from open-source weights or external provider APIs, behind one consistent interface.
Build and operate training and fine-tuning pipelines (LoRA, full fine-tunes) so customers can train custom models reliably and cheaply.
Own the processing layer around the models and generated assets — captioning, subtitles, upscaling, reframing, compositing, masking, mesh and texture rendering, etc.
Own GPU economics and reliability: latency budgets, cold starts, cost per generation, and the checkpoints/weights pipeline that ships model files to production.
Go deep in open-source code: read, debug and patch the model libraries and model repos you depend on — transformers, diffusers and friends. Fix issues upstream rather than waiting for a release.
Push past the model layer: tune the compute layer and infrastructure (GPU provisioning, serving, the cloud), open PRs in the cloud API to register and wire up new models, extend the SDKs, and step up on LLM and agent integration.
Bring AI pair‑programming into the ML team's daily flow — the prompts, agents and Claude‑Code‑driven workflows that make model integration faster.

The model layer end to end: integration, training, serving and the processing pipelines.
GPU economics and reliability: latency, cold starts, cost per generation, and the checkpoints/weights pipeline.
Cross‑stack reach: the compute layer and infra, model registration in the cloud API, SDK extensions, and LLM/agent integration.
AI‑native ML workflows: the prompts, agents and tooling that make the team faster.

Does not own

Frontend surfaces and the web app (Engineering – Front‑End) — though you'll test through it and even open issues / PRs there when a model needs it.
Cloud API architecture decisions (Engineering – Cloud) — though you ship model‑integration PRs within it.
Product roadmap and prioritization (Product).

Strong Python and fluency in the ML ecosystem, with a track record shipping ML in production — you've deployed models on GPUs and owned latency and cost, not just notebooks.
Hands‑on across modalities — image/video diffusion (Flux, Stable Diffusion, Reve, Wan), ideally 3D (Gaussian splatting, mesh/texture), audio or text/LLMs — and able to dive into open‑source model code and fix it or its integration: you could debug and open a PR against Transformers or a model repo, not just call the library.
Deeply AI‑native: a daily user of AI coding assistants (Claude Code, Cursor) who ships an integration live in the interview rather than describing one.
A bias for action across the stack: when a model needs registering in the API or exposing in an SDK, you open the PR. Comfortable stepping up on LLM and agent integration.
Clear written English; based around Lyon or Paris (or exceptional and willing to travel to a hub).

A research‑only profile: trains models in notebooks but has never shipped one behind an API with a latency and cost budget.
Needs clean boundaries: refuses to open a PR in the cloud API or an SDK when that's where the model integration lives.
Not AI‑native in their own work — thinks in tickets and hand‑offs rather than agents and prompts, with no integration they can build live.
Treats GPU cost, cold starts and reliability as someone else's problem.

#J-18808-Ljbffr

Entreprise

Scenario

Plateforme de publication

WHATJOBS

Offres pouvant vous intéresser

Prose on Pixels - Technical Director - AI Studio (H/F/X) - Freelance

PANTIN, 93

il y a 5 jours

Research Engineer, Model Inference & Serving - Paris

PARIS, 75

il y a 4 jours

Applied AI Engineer

PARIS, 75

il y a 5 jours

Cloud Infrastructure Cost Analyst (FinOps)

FRANCE

il y a 4 jours