Chargement en cours

Site Reliability Engineer (SRE)

PARIS, 75
il y a 5 jours

Site Reliability Engineer

We are looking for a Site Reliability Engineer to strengthen our Infrastructure & Security department and help us scale our internal and customer-facing platforms.

Responsibilities

  • Operate, maintain, and improve production and internal infrastructure environments across cloud and on‑premise platforms.
  • Contribute to both run activities, such as incident response, monitoring, support, troubleshooting, maintenance, and reliability improvements, and build activities, such as architecture evolution, automation, migration, tooling, and platform transformation.
  • Help design, build, and maintain resilient, scalable, secure, observable, and cost‑efficient infrastructure.
  • Lead or contribute to technical migrations, modernization projects, and architecture transformation initiatives.
  • Strengthen operational processes: incident management, change management, backup and restore, disaster recovery, on‑call practices, documentation, and post‑incident reviews.
  • Improve observability across systems, services, and infrastructure through metrics, logs, traces, dashboards, alerting, and SLOs.
  • Promote a strong production mindset across teams, with a focus on reliability, performance, security, customer impact, and operational simplicity.
  • Collaborate closely to improve delivery quality and platform reliability.
  • Contribute to Developer Experience by improving tooling, CI/CD workflows, infrastructure automation, environments, deployment processes, and self‑service capabilities.
  • Support FinOps practices by monitoring costs, optimizing infrastructure usage, and helping teams make cost‑aware decisions.
  • Build automation and tooling to reduce manual work, improve repeatability, and make infrastructure easier to operate.
  • Participate in technical architecture discussions and provide guidance to infrastructure and engineering teams.
  • Contribute to infrastructure roadmaps, technical standards, best practices, and long‑term platform strategy.
  • Maintain strong documentation and knowledge sharing practices.

Success criteria (6 to 12 months)

  • Built strong trust with Infrastructure, Engineering, Security, and Product teams.
  • Demonstrated strong ownership of production systems and contributed to improving reliability, stability, and operational maturity.
  • Helped improve observability through better dashboards, alerts, metrics, logs, traces, or SLOs.
  • Contributed to reducing operational toil through automation, documentation, tooling, or process improvements.
  • Helped improve incident response, post‑incident reviews, change management, or on‑call practices.
  • Contributed to one or more meaningful build initiatives: migration, architecture improvement, platform modernization, CI/CD improvement, internal tooling, or developer experience enhancement.
  • Shown strong ability to work across both cloud and on‑premise environments.
  • Contributed to making infrastructure more secure, scalable, performant, cost‑efficient, and easier to operate.
  • Helped Engineering teams improve delivery quality and production readiness.
  • Recognized as a collaborative, structured, pragmatic, and reliable technical partner.

Requirements

Technical Skills

  • Strong experience in a similar role.
  • Solid experience operating production environments with high reliability, availability, and performance expectations.
  • Good understanding of cloud infrastructure, ideally AWS.
  • Strong knowledge of systems, networking, DNS, load balancing, security fundamentals, and infrastructure troubleshooting.
  • Experience with infrastructure as code, automation, configuration management, and CI/CD pipelines.
  • Experience with observability tools: metrics, logs, traces, alerting, dashboards, SLOs, SLIs.
  • Good understanding of containers, orchestration, service discovery, secrets management, and modern platform architecture.
  • Experience with incident management, post‑incident reviews, backup and restore, disaster recovery, capacity planning, and operational processes.
  • Ability to write scripts, automation, or internal tooling to reduce manual work and improve reliability.
  • Understanding of security best practices for infrastructure, cloud, identity, secrets, network segmentation, and production access.
  • Interest or experience in FinOps, cost optimization, performance optimization, and infrastructure efficiency.
  • Experience with developer experience, internal platforms, self‑service tooling, or platform engineering is a strong plus.

Soft skills

  • Strong production mindset: reliability, customer impact, resilience, security, and operational excellence.
  • Excellent communication skills with both technical and non‑technical stakeholders.
  • Structured, rigorous, autonomous, and pragmatic approach.
  • Ability to lead technical initiatives, migrations, or architecture discussions.
  • Collaborative, curious, and committed to continuous improvement.
#J-18808-Ljbffr
Entreprise
StrangeBee
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
Soyez le premier à postuler aux nouvelles offres
Soyez le premier à postuler aux nouvelles offres
Créez gratuitement et simplement une alerte pour être averti de l’ajout de nouvelles offres correspondant à vos attentes.
* Champs obligatoires
Ex: boulanger, comptable ou infirmière
Alerte crée avec succès