Research Engineer
PARIS, 75
il y a 13 heures
We’re building foundational models for scientific signals, starting with the brain (EEG, fMRI, ultrasound), and we aim to extend these approaches to other complex domains.
We work with temporal signals, collected under real-world constraints (e.g., clinical settings with limited sample sizes): our data is noisy and heterogeneous.
What the Work Looks Like
As the architect of our “Data & Training Factory,” you’ll:
- Scale Pre-training to New Frontiers
- Implement and supervise large-scale pre-training runs on cutting-edge architectures (Transformers, SSMs/Mamba, long-context windows).
- Ensure convergence and performance across distributed resources (Scaleway/GENCI).
- Practice “Deep Debugging”: Diagnose why models converge—or fail—at a mathematical and technical level.
- Fine-tune pretrained models on both open-source and proprietary data
- Engineer Multimodal Data Pipelines
- Build robust pipelines to aggregate, align, and clean heterogeneous datasets (EEG/fMRI).
- Tackle the complexity of signal normalisation, temporal/spatial alignment, and data versioning.
- Champion Data-Centric AI: Ensure every byte feeding our models is pristine and traceable.
- Uphold Scientific Rigor & Software Standards
- Establish internal leaderboards and data tests to validate progress against the SOTA.
- Write production-ready research code: Typed, tested, and documented by default.
- Collaborate with the team on research papers.
Requirements
- 3–7 years of experience in applied research or R&D.
- Distributed Training: Proven track record with large‑scale model training.
- PyTorch Mastery: Deep knowledge of internals (memory, kernels, attention mechanisms).
- Data Engineering: Robust pipelines, versioning, and data quality.
- Scientific Rigor: Experience in research environments with high software standards.
- Fluent English (the team speaks English in the day‑to‑day).
Bonus Skills
- Multimodal Data: EEG, fMRI, time‑series, or sensor signals.
- Low‑Level Optimisation: FlashAttention, CUDA kernels, or custom hardware.
- Cloud Infrastructure: Scaleway or similar GPU cloud environments.
- Research Publications: NeurIPS, ICML, ICLR, or equivalent.
- Data‑Centric AI: Curation, alignment, and cleaning of heterogeneous datasets.
Beyond Technical Skills
- Integrity & Respect: We are striving for honesty, kindness, and fairness. We value people who treat others with dignity and foster an environment where everyone feels heard.
- Open Communication & Humility: Great ideas come from collaboration. We look for teammates who listen actively, communicate clearly, and approach challenges with self‑awareness and humility.
- Psychological Safety & Camaraderie: We strive to create a space where people feel safe to take risks, ask questions, and grow.
Entreprise
Sigma Nova
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
PARIS, 75
il y a 2 jours
SACLAY, 91
il y a 2 jours
LACAUSSADE, 47
il y a 2 jours
PARIS, 75
il y a 2 jours