AI Research Scientist – Datadog AI Research (DAIR)
Overview
As a Research Scientist on our team, you will partner with Research Engineers, working on fundamental research problems and collaborating with Datadog's product and engineering teams to translate research advances into products.
Building on our track record of AI‑powered solutions, Datadog AI Research tackles high‑risk, high‑reward problems grounded in real‑world challenges in cloud observability and security.
Research Areas
- World Models for Observability – Training multimodal foundation models that learn the joint dynamics of distributed systems across metrics, traces, logs, topology, and events to support forecasting, anomaly detection, root cause analysis, counterfactual simulation, and autonomous planning.
- Trained Agents for Observability – Post‑training models to operate autonomously across Datadog's domain. Targeting SRE incident response first, with paths to code repair, security response, and infrastructure optimization; building simulation environments, RL training loops, and evaluation infrastructure.
What You'll Do
- Conduct research in generative AI and machine learning, building specialized foundation models and trained agents for observability.
- Train multimodal models on large‑scale, diverse telemetry data (metrics, logs, traces, topology, events) using distributed training infrastructure.
- Design and build simulated environments and RL training loops for on‑policy agent training and evaluation.
- Collaborate with cross‑functional teams (Product, Engineering) to integrate capabilities into Datadog's products.
- Stay at the forefront of foundation models, world models, and RL‑based agent research.
- Contribute to research publications, present at top‑tier conferences (NeurIPS, ICLR, ICML), and help open‑source key model artifacts and benchmarks.
Qualifications
- PhD in Computer Science, Machine Learning, or related field, with deep expertise in generative modeling, world models, AI agents, reinforcement learning, or multimodal learning (or equivalent experience).
- Extensive experience designing and implementing deep learning models and agents with a strong background in distributed training frameworks (DeepSpeed, Megatron‑LM) and ML libraries (PyTorch).
- Track record of impactful publications at top‑tier venues (NeurIPS, ICLR, ICML, TMLR).
- Familiarity with efficient training, post‑training, and inference techniques for large foundation models.
- Ability to explain complex models and research findings to both technical and non‑technical audiences.
Bonus Points
- Experience bridging research and real‑world product applications, especially with large foundation models, world models, or RL‑trained agents.
- Passion for pushing AI boundaries with focus on customer impact and scalable deployment.
- Experience writing production data pipelines and applications.
- Hands‑on experience with GPU programming and optimization, including CUDA.
Benefits and Growth
- Competitive global benefits.
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP).
- Opportunities to collaborate closely with colleagues across Datadog offices in New York City and Paris.
- Opportunities to attend and present at conferences and meetups.
- Intra‑departmental mentor and buddy program for in‑house networking.
- An inclusive company culture, ability to join employee resource groups.
Equal Opportunity
Datadog is an affirmative action and equal opportunity employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.
#J-18808-Ljbffr