Observability Tech Lead

PARIS, 75

il y a 13 heures

Responsibilities

In 2023 we made a decisive move: we replaced our observability-as-a-service provider with a fully self-hosted observability stack, giving us complete control over cost, data residency, and the developer experience around telemetry
Today our stack spans the full LGTM suite — Grafana, Mimir, Loki, and Tempo — alongside Victoria
Metrics, self-hosted Sentry, Grafana Alloy as our telemetry collector, and Open
Telemetry as the instrumentation standard
We use Pyrra for SLO tracking and are building toward a unified service health dashboard powered by error budgets and burn-rate alerting
Telemetry is the backbone of how we operate a bank at scale — ingesting over 100 million samples, serving 400+ services, and capturing end-to-end traces from clients through services to system dependencies
Every trade, every card payment depends on our ability to see, measure, and respond to what’s happening in production. We’ve proven the architecture works
Now we’re building a dedicated in-house observability team to take it to the next level: stabilise and harden the platform, drive down cost-per-signal, and build the golden path for observability — where 100% of components ship with production-grade telemetry because the best thing to do is the easiest thing to do
Build and evolve the observability platform: Design and operate large-scale telemetry pipelines while continuously improving core components with a strong focus on automation, reliability, and developer experience
Build for scale, design for cost: Architect high-throughput telemetry systems with sampling strategies, data tiering, and retention policies that balance signal fidelity with infrastructure cost at scale
Make production observable by default: Define and implement observability and reliability standards — SLOs, error budgets, and low-noise alerting — and actively support engineering teams in adopting them, making doing the right thing effortless
Own the platform end to end: Participate in the on-call rotation for the observability platform, ensuring full end-to-end ownership of the systems you build and operate
Own the direction and drive it forward: Define long-term observability direction, drive cross-team initiatives from kickoff to delivery, and align observability strategy with broader engineering reliability and business goals

Qualifications

Proven ability to design and operate high-throughput telemetry pipelines across distributed, multi-cloud environments
Deep hands‑on expertise with the observability stack — Prometheus, OpenTelemetry, Grafana, or equivalent at scale
Hands‑on experience with Mimir, Loki, and Tempo architectures is a strong benefit
Strong command of SLO-based reliability practices — error budgets, burn‑rate alerting, and incident response tooling
A track record of turning observability best practices into opinionated standards that engineering teams actually adopt
Ability to contribute to architectural decisions and clearly communicate trade-offs to both engineers and leadership
5+ years of experience in observability, platform engineering, or a related SRE/infrastructure discipline
We are hiring from senior to staff level, so whether you have a strong foundation and are ready for more ownership or you have been leading observability strategy for large‑scale systems for many years, we would love to hear from you
Cloud‑native in your DNA: hands‑on with Kubernetes, Terraform, and running production workloads on AWS, GCP, or Azure
The ability to work in a flexible hybrid setup, with 2‑3 days a week in the office
Experience driving cross‑team technical initiatives end‑to‑end, from ambiguous problem to shipped solution

#J-18808-Ljbffr

Entreprise

Trade Republic

Plateforme de publication

WHATJOBS

Offres pouvant vous intéresser

SRE (DataPlatform)

PARIS, 75

il y a 2 jours

Platform Reliability Engineer

PARIS, 75

il y a 2 jours

Site Reliability Engineer (m/f/d)

SAINT OUEN SUR SEINE

il y a 2 jours

Site Reliability Engineer – AI Applications (M/W/X)

SAINT MANDÉ

il y a 2 jours