Chargement en cours

Site Reliability Engineer (Data Platform)

PARIS, 75
il y a 20 heures

Requirements

  • Strong experience with Kubernetes in production environments
  • Experience with distributed data systems (or strong willingness to learn)
  • Solid understanding of SRE principles (monitoring, alerting, SLAs/SLOs)
  • Experience with Infrastructure as Code (Terraform or similar)
  • Familiarity with GitOps workflows
  • Experience with observability tools (Prometheus, Grafana, logging systems)
  • Comfortable working in cloud environments
  • Strong collaboration mindset and ability to work across teams
  • Fluent in English
  • (Desirable) Experience with Trino, Iceberg, or data lakehouse architectures
  • (Desirable) Experience with Ceph S3 or object storage systems
  • (Desirable) Knowledge of Kafka / Flink / Airflow
  • (Desirable) Experience with FinOps practices and cost optimization
  • (Desirable) Experience with Crossplane or platform self-service models
  • (Desirable) Programming skills (Python, Java, or Go)
  • (Desirable) Experience with multi-region / multi-DC architectures

What the job involves

  • Being an SRE at VeepeeTech means being part of a transversal SRE community while integrating a product-oriented Data Platform team
  • You will contribute to the reliability, scalability, and operability of critical data services by applying SRE and DevOps practices, while sharing knowledge across teams
  • The Data Platform is currently evolving toward a modern lakehouse architecture deployed on VeepeeCloud (our on-prem platform), based on technologies such as Trino, Iceberg, and object storage, with strong ambitions around performance, cost efficiency, and platform ownership
  • You will work in a distributed environment (France & Spain), within a team of 40–50 data professionals across engineering, analytics, data science, and governance
  • You will play a key role in ensuring the reliability and scalability of this next-generation data platform, while supporting the transition from public cloud to hybrid/on-prem architectures
  • Ensure reliability and performance of our data platform services (Trino, Iceberg, S3, Kafka, Flink)
  • Define and implement SRE best practices: SLIs/SLOs, error budgets, observability
  • Build and maintain monitoring, alerting, and incident response frameworks (Prometheus, Grafana, etc.)
  • Contribute to the migration from public datawarehouse cloud to VeepeeCloud lakehouse stack
  • Support coexistence between cloud and on-prem systems and ensure consistency and reliability
  • Help design resilient architectures for ingestion, transformation, and serving layers
  • Operate and improve services running on Kubernetes (GKE/EKS & on-prem clusters)
  • Automate infrastructure provisioning using Terraform, Atlantis, and/or Crossplane
  • Improve GitOps workflows for platform deployment and configuration
  • Collaborate with teams to optimize compute/storage usage (Trino queries, BigQuery slots, etc.)
  • Build tools and dashboards to track cost, usage, and efficiency
  • Support the transition toward cost-efficient on-prem workloads
  • Improve self-service capabilities for data teams (e.g., provisioning Trino/Iceberg resources)
  • Help teams adopt best practices in reliability, observability, and deployment
  • Write clear technical documentation and runbooks
  • Contribute to Disaster Recovery Plan (DRP) definition and implementation
  • Ensure multi-DC resilience (FR1 / NL1) and data replication strategies
  • Participate in incident management and postmortems
#J-18808-Ljbffr
Entreprise
Deepstreamtech
Plateforme de publication
WHATJOBS
Offres pouvant vous intéresser
PARIS, 75
il y a 9 jours
SAINT DENIS
il y a 25 jours
PARIS, 75
il y a 25 jours
ÎLE- E FRANCE, FRANCE
il y a 25 jours
Soyez le premier à postuler aux nouvelles offres
Soyez le premier à postuler aux nouvelles offres
Créez gratuitement et simplement une alerte pour être averti de l’ajout de nouvelles offres correspondant à vos attentes.
* Champs obligatoires
Ex: boulanger, comptable ou infirmière
Alerte crée avec succès