Site Reliability Engineer (SRE)
PARIS, 75
il y a 1 jour
Job Description
We are looking for a Site Reliability Engineer to strengthen our Infrastructure & Security department and help us scale our internal and customer-facing platforms.
In this role, you will contribute to both run and build activities: operating production environments, improving reliability, leading technical transformation initiatives, and designing modern, scalable, secure, and observable infrastructure.
You will work across cloud and on-premise environments, collaborate closely with Engineering, QA, Data, Security, and Product teams, and help improve our developer experience, operational excellence, production mindset, and infrastructure maturity.
Responsibilities
- Operate, maintain, and improve production and internal infrastructure environments across cloud and on-premise platforms.
- Contribute to both run activities, such as incident response, monitoring, support, troubleshooting, maintenance, and reliability improvements, and build activities, such as architecture evolution, automation, migration, tooling, and platform transformation.
- Help design, build, and maintain resilient, scalable, secure, observable, and cost-efficient infrastructure.
- Lead or contribute to technical migrations, modernization projects, and architecture transformation initiatives.
- Strengthen operational processes: incident management, change management, backup and restore, disaster recovery, on-call practices, documentation, and post-incident reviews.
- Improve observability across systems, services, and infrastructure through metrics, logs, traces, dashboards, alerting, and SLOs.
- Promote a strong production mindset across teams, with a focus on reliability, performance, security, customer impact, and operational simplicity.
- Collaborate closely to improve delivery quality and platform reliability.
- Contribute to Developer Experience by improving tooling, CI/CD workflows, infrastructure automation, environments, deployment processes, and self-service capabilities.
- Support FinOps practices by monitoring costs, optimizing infrastructure usage, and helping teams make cost-aware decisions.
- Build automation and tooling to reduce manual work, improve repeatability, and make infrastructure easier to operate.
- Participate in technical architecture discussions and provide guidance to infrastructure and engineering teams.
- Contribute to infrastructure roadmaps, technical standards, best practices, and long-term platform strategy.
- Maintain strong documentation and knowledge sharing practices.
Success Criteria (6 to 12 months)
- Built strong trust with Infrastructure, Engineering, Security, and Product teams.
- Demonstrated strong ownership of production systems and contributed to improving reliability, stability, and operational maturity.
- Helped improve observability through better dashboards, alerts, metrics, logs, traces, or SLOs.
- Contributed to reducing operational toil through automation, documentation, tooling, or process improvements.
- Helped improve incident response, post-incident reviews, change management, or on-call practices.
- Contributed to one or more meaningful build initiatives: migration, architecture improvement, platform modernization, CI/CD improvement, internal tooling, or developer experience enhancement.
- Shown strong ability to work across both cloud and on-premise environments.
- Contributed to making infrastructure more secure, scalable, performant, cost-efficient, and easier to operate.
- Helped Engineering teams improve delivery quality and production readiness.
- Recognized as a collaborative, structured, pragmatic, and reliable technical partner.
Requirements
Technical Skills
- Strong experience in a similar role.
- Solid experience operating production environments with high reliability, availability, and performance expectations.
- Good understanding of cloud infrastructure, ideally AWS.
- Strong knowledge of systems, networking, DNS, load balancing, security fundamentals, and infrastructure troubleshooting.
- Experience with infrastructure as code, automation, configuration management, and CI/CD pipelines.
- Experience with observability tools: metrics, logs, traces, alerting, dashboards, SLOs, SLIs.
- Good understanding of containers, orchestration, service discovery, secrets management, and modern platform architecture.
- Experience with incident management, post-incident reviews, backup and restore, disaster recovery, capacity planning, and operational processes.
- Ability to write scripts, automation, or internal tooling to reduce manual work and improve reliability.
- Understanding of security best practices for infrastructure, cloud, identity, secrets, network segmentation, and production access.
- Interest or experience in FinOps, cost optimization, performance optimization, and infrastructure efficiency.
- Experience with developer experience, internal platforms, self-service tooling, or platform engineering is a strong plus.
Soft Skills
- Strong production mindset: reliability, customer impact, resilience, security, and operational excellence.
- Excellent communication skills with both technical and non-technical stakeholders.
- Structured, rigorous, autonomous, and pragmatic approach.
- Ability to lead technical initiatives, migrations, or architecture discussions.
- Collaborative, curious, and committed to continuous improvement.
Entreprise
StrangeBee SAS
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
GRAND EST, FRANCE
il y a 5 jours
PARIS, 75
il y a 5 jours
REMOTE
il y a 5 jours
PARIS, 75
il y a 5 jours