Site Reliability Engineer
We are looking for a skilled professional to join an organization undergoing a major transformation in the payments and fintech ecosystem. The environment covers hardware, software, and services aimed at supporting merchants in growing their business. Innovation and technological evolution are at the core of the organization’s strategy.
You will play a key role in strengthening platform reliability and differentiating the organization through technical excellence and operational performance.
You will be part of a small Site Reliability Engineering (SRE) team responsible for the automation and industrialization of database platforms hosted on an internal cloud. Following an initial onboarding and training phase on products and services, you will work across multiple layers of infrastructure — including storage, operating systems, and networking — offering a broad technical scope for learning and development.
Your focus will be on database infrastructure operations rather than application-level development .
Day-to-Day Responsibilities
- Maintain operational conditions and ensure service continuity
- Handle daily operational activities (run operations)
- Ensure Point-In-Time Recovery (PITR) capability across all databases according to defined retention policies
- Guarantee service quality in terms of performance, availability, access, high availability, and SLA compliance
- Develop bug fixes and new features for Puppet and Ansible modules
- Execute production changes following operational procedures
- Ensure compliance with PCI-DSS requirements
- Produce and maintain documentation related to tools and operational processes
- Participate in an on-call rotation (24/7) for at least one week per month
- Perform operations outside business hours on a regular basis (at least once per week)
Mandatory Skills
- Fluent English (written and spoken)
- Bachelor’s degree in Computer Science, Engineering, or equivalent professional experience
- Minimum 3 years of production experience with PostgreSQL (version 13+) , including: Clustering, Engine tuning, Streaming replication, Backup and PITR strategies
- Strong scripting skills (Bash and/or Python)
- Proven ability to debug, optimize code, and automate repetitive tasks
- Hands-on experience with cloud concepts and Infrastructure as Code tools (e.g., Terraform)
- Solid understanding of DevOps practices and CI/CD methodologies
- Comfortable working in Linux environments and multi-data-center architectures
- Experience with command-line environments and tools such as Git and Puppet
- Strong analytical and problem-solving mindset with attention to detail
- Ability to perform under pressure with a strong service and customer-oriented approach
- Proactive team player with transparency and collaboration values
- Up-to-date awareness of trends in database administration
Nice to Have
- Knowledge of Pacemaker / Corosync clustering technologies
- Experience with Terraform and Ansible automation
- Ability to create monitoring metrics and dashboards using Grafana
- Experience with log management tools such as ELK stack or Loki
- Exposure to both on-premise and cloud environments
- Understanding of networking fundamentals (IP, DHCP, DNS, BGP, load balancing, clustering, etc.)