en

Services

The UK's leading employers trust us to deliver fast, efficient talent solutions that are tailored to their exact requirements. Browse our range of bespoke services and resources.

Read more
Jobs

Let our industry specialists listen to your aspirations and present your story to the most esteemed organisations in the UK, as we collaborate to write the next chapter of your successful career.

See all jobs
Services

The UK's leading employers trust us to deliver fast, efficient talent solutions that are tailored to their exact requirements. Browse our range of bespoke services and resources.

Read more
About Robert Walters UK

Since our establishment in 1985, our belief remains the same: Building strong relationships with people is vital in a successful partnership.

Learn more

Work for us

Our people are the difference. Hear stories from our people to learn more about a career at Robert Walters UK

Learn more

SRE Engineer

Save job

You will be part of a close-knit team that values knowledge sharing, continuous learning, and professional growth. With access to industry-recognised certifications, strong mentorship, and technical development programmes, you will have every chance to advance your career while working on cutting-edge AWS native databases and automation projects.

SITE RELIABILITY ENGINEER

Salary: £400 - £500/PD Inside IR 35
Location: London

You will be part of a close-knit team that values knowledge sharing, continuous learning, and professional growth. With access to industry-recognised certifications, strong mentorship, and technical development programmes, you will have every chance to advance your career while working on cutting-edge AWS native databases and automation projects.


What you'll do:


As a Site Reliability Engineer based in London, you will play an integral role in supporting a wide range of AWS native databases including RDS, Aurora, Neptune, as well as CockroachDB. Your daily responsibilities will involve designing robust software solutions that enhance system performance while ensuring high availability for critical applications. You will work hand-in-hand with product engineering teams to improve observability tools and telemetry systems, driving forward automation initiatives that reduce manual intervention. By participating in incident management processes—facilitating transparent communication with stakeholders and leading blameless post-mortems—you will help foster a culture of continuous improvement. Your commitment to maintaining operational stability through rigorous change management practices will be essential as you plan and execute disaster recovery tests. The role also offers opportunities to collaborate on infrastructure simplification projects alongside other SREs, ensuring best practices are shared across teams. Success in this position requires not only technical proficiency but also excellent interpersonal skills to thrive in an environment that values teamwork, knowledge sharing, and mutual support.


* Design, code, test, and deliver software enhancements aimed at improving existing systems by adopting DevOps principles across all cloud database offerings.
* Troubleshoot complex incidents efficiently, communicate effectively with stakeholders at all levels, facilitate blameless post-mortems, and identify corrective actions to ensure permanent resolution.
* Actively participate throughout the development lifecycle to ensure reliability, scalability, and operational stability are maintained across all supported platforms.
* Define, create, and monitor application analytics to support improved service level objectives and drive data-informed decision making.
* Ensure strict adherence to change management release processes while accelerating automation initiatives for these workflows.
* Lead resiliency management planning by scheduling and executing disaster recovery tests with a focus on automating these activities wherever possible.
* Provide on-call support during production incidents outside standard working hours as required by the business needs.
* Contribute to enhancing product observability and telemetry by supporting ongoing modernisation efforts within the infrastructure.
* Collaborate closely with engineering teams to brainstorm ideas that simplify infrastructure management and streamline SRE practices.


What you bring:


* Proficiency in Python or Unix Shell scripting combined with solid SQL skills enables you to automate tasks efficiently across complex environments.
* A good understanding of development tools such as source code control software (e.g., Git), automated build systems, automated testing frameworks, and JIRA ensures smooth collaboration within multidisciplinary teams.
* Familiarity with infrastructure as code concepts allows you to contribute effectively towards automation goals using tools like Terraform or Puppet.
* Experience with build automation pipelines, test-driven development methodologies, continuous integration (CI), and continuous delivery (CD) practices is highly valued.
* Hands-on experience managing both relational (e.g., RDS/Aurora) and non-relational databases equips you to support diverse data storage requirements.
* Previous exposure to site reliability engineering concepts—including service level objectives (SLOs), service level agreements (SLAs), service level indicators (SLIs), and error budgets—will help you excel in this role.
* Practical experience or familiarity with at least one major public cloud provider (AWS preferred; Google Cloud or Azure also considered) is important for success.
* Experience managing configuration for large fleets of servers using declarative frameworks is advantageous for scaling operations smoothly.
* Knowledge of leveraging APIs securely along with authentication mechanisms and data structures enhances your ability to integrate systems seamlessly.
* Understanding microservice architectures, REST API design/development principles, Docker/Kubernetes containerisation technologies, and CI/CD integration is beneficial.

Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates

Contract Type: Temporary Interim Management

Specialism: Technology & Digital

Focus: DevOps & Cloud

Industry: Financial Services

Salary: £400 - £500 per day

Workplace Type: On-site

Experience Level: Mid Management

Location: London

Job Reference: FNTA37-7C001A7C

Date posted: 7 July 2025

Consultant: Josh Groenewald