Site Reliability Engineer (SRE)
PARIS, 75
il y a 5 jours
Site Reliability Engineer
We are looking for a Site Reliability Engineer to strengthen our Infrastructure & Security department and help us scale our internal and customer-facing platforms.
Responsibilities
- Operate, maintain, and improve production and internal infrastructure environments across cloud and on‑premise platforms.
- Contribute to both run activities, such as incident response, monitoring, support, troubleshooting, maintenance, and reliability improvements, and build activities, such as architecture evolution, automation, migration, tooling, and platform transformation.
- Help design, build, and maintain resilient, scalable, secure, observable, and cost‑efficient infrastructure.
- Lead or contribute to technical migrations, modernization projects, and architecture transformation initiatives.
- Strengthen operational processes: incident management, change management, backup and restore, disaster recovery, on‑call practices, documentation, and post‑incident reviews.
- Improve observability across systems, services, and infrastructure through metrics, logs, traces, dashboards, alerting, and SLOs.
- Promote a strong production mindset across teams, with a focus on reliability, performance, security, customer impact, and operational simplicity.
- Collaborate closely to improve delivery quality and platform reliability.
- Contribute to Developer Experience by improving tooling, CI/CD workflows, infrastructure automation, environments, deployment processes, and self‑service capabilities.
- Support FinOps practices by monitoring costs, optimizing infrastructure usage, and helping teams make cost‑aware decisions.
- Build automation and tooling to reduce manual work, improve repeatability, and make infrastructure easier to operate.
- Participate in technical architecture discussions and provide guidance to infrastructure and engineering teams.
- Contribute to infrastructure roadmaps, technical standards, best practices, and long‑term platform strategy.
- Maintain strong documentation and knowledge sharing practices.
Success criteria (6 to 12 months)
- Built strong trust with Infrastructure, Engineering, Security, and Product teams.
- Demonstrated strong ownership of production systems and contributed to improving reliability, stability, and operational maturity.
- Helped improve observability through better dashboards, alerts, metrics, logs, traces, or SLOs.
- Contributed to reducing operational toil through automation, documentation, tooling, or process improvements.
- Helped improve incident response, post‑incident reviews, change management, or on‑call practices.
- Contributed to one or more meaningful build initiatives: migration, architecture improvement, platform modernization, CI/CD improvement, internal tooling, or developer experience enhancement.
- Shown strong ability to work across both cloud and on‑premise environments.
- Contributed to making infrastructure more secure, scalable, performant, cost‑efficient, and easier to operate.
- Helped Engineering teams improve delivery quality and production readiness.
- Recognized as a collaborative, structured, pragmatic, and reliable technical partner.
Requirements
Technical Skills
- Strong experience in a similar role.
- Solid experience operating production environments with high reliability, availability, and performance expectations.
- Good understanding of cloud infrastructure, ideally AWS.
- Strong knowledge of systems, networking, DNS, load balancing, security fundamentals, and infrastructure troubleshooting.
- Experience with infrastructure as code, automation, configuration management, and CI/CD pipelines.
- Experience with observability tools: metrics, logs, traces, alerting, dashboards, SLOs, SLIs.
- Good understanding of containers, orchestration, service discovery, secrets management, and modern platform architecture.
- Experience with incident management, post‑incident reviews, backup and restore, disaster recovery, capacity planning, and operational processes.
- Ability to write scripts, automation, or internal tooling to reduce manual work and improve reliability.
- Understanding of security best practices for infrastructure, cloud, identity, secrets, network segmentation, and production access.
- Interest or experience in FinOps, cost optimization, performance optimization, and infrastructure efficiency.
- Experience with developer experience, internal platforms, self‑service tooling, or platform engineering is a strong plus.
Soft skills
- Strong production mindset: reliability, customer impact, resilience, security, and operational excellence.
- Excellent communication skills with both technical and non‑technical stakeholders.
- Structured, rigorous, autonomous, and pragmatic approach.
- Ability to lead technical initiatives, migrations, or architecture discussions.
- Collaborative, curious, and committed to continuous improvement.
Entreprise
StrangeBee
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
PARIS, 75
il y a 5 jours
MONTPELLIER, 34
il y a 26 jours
LA ROCHE VINEUSE
il y a 26 jours
BIARRITZ, 64
il y a 26 jours