Skip to content

Graduate
Site Reliability
Engineer

Pipeline open Graduate Programme UK Remote / Cardiff Full-Time
Salary
£25–30k
Location
UK Remote
Stack
Datadog-native
Why Now
We're growing the customer base and the SRE team is expanding. This is a real engineering role from day one, not a graduate scheme with a two-year rotation. You'll be working alongside senior engineers on real production environments, learning Datadog, AWS, and Azure at depth, and building the kind of hands-on operational experience that most graduates don't get until year three somewhere else. SREs join a shared on-call rota, typically one week in five or six, reducing as the team grows. On-call weeks are paid at £500, adding roughly £5–6k a year on top of salary.
About Us
We are the world's first "Powered by Datadog" accredited MSP, a Datadog-native cloud managed service provider built for European tech-led SMBs. Our founders have scaled and exited multiple technology businesses. We operate lean, move fast, and take observability seriously.

Critical Cloud delivers cloud operations across AWS and Azure. Our SRE team runs production environments for a portfolio of tech-led customers, monitoring, incident response, infrastructure automation, and continuous improvement. Everyone on the team touches real infrastructure for real customers.

The Role

This is the entry point into the SRE team. You'll work directly alongside senior engineers, learning how we operate production environments, instrument systems with Datadog, and respond to incidents. From the start you'll contribute to real work, monitoring customer environments, writing runbooks, supporting infrastructure changes, with progressively more ownership as your confidence and knowledge grows.

We're looking for a graduate with the core fundamentals, a genuine curiosity about how systems work under pressure, and the communication skills to operate in a customer-facing environment. You don't need production experience. You need to be the kind of engineer who asks why, reads the docs all the way through, and takes reliability seriously.

What You'll Do
Career Path

We're a small team. The path from graduate to senior is real and faster than most places.

Start
Graduate
Site Reliability
Engineer
Year 1–2
Junior
Site Reliability
Engineer
Year 2–3
Senior SRE
or Platform Eng
Year 3+
Staff SRE or
Lead Engineer
Requirements
Must Have
  • A degree in Computer Science, Software Engineering, or a related technical discipline or equivalent demonstrable self-taught fundamentals
  • Solid Linux fundamentals: CLI navigation, file systems, processes, networking basics
  • Comfort with scripting in Bash, Python, or similar, you've automated something, even if small
  • Understanding of core observability concepts: what metrics, logs, and traces are and what they tell you
  • Awareness of cloud fundamentals, you know what EC2, S3, VPCs, and load balancers do, even without production experience
  • Clear written and verbal communication, you'll be in customer-facing situations from early on
  • Right to work in the UK without sponsorship
Nice to Have
  • Any hands-on Datadog experience, trial, personal project, or university lab
  • Terraform or any infrastructure-as-code exposure
  • Docker or Kubernetes, even containerising a personal project counts
  • A cloud certification (AWS Cloud Practitioner, Azure Fundamentals, or equivalent)
  • Experience in a customer-facing environment, even outside tech
  • Any personal projects involving monitoring, automation, or infrastructure
Tech Stack
Datadog Core observability platform
AWS Primary cloud, multi-account
Azure Secondary cloud workloads
Terraform Infrastructure as code
Kubernetes Container orchestration
GitHub Actions CI/CD pipelines
PagerDuty Incident management
Python / Bash Automation & tooling
Compensation & Benefits
£25–30k
Base salary DOE
Remote-first
UK-based, async-friendly
Certs funded
Datadog, AWS, Azure & AI, contractual
On-call paid
~£5–6k/yr on top of salary
On-call allowance (in addition to base salary): SREs join a shared rota, typically one week in five or six, reducing as the team grows. Paid £500 per on-call week, which works out at roughly £5–6k a year on top of salary, varying with the rota size.
Who Thrives Here

We're a small, senior-heavy team and we hire graduates who want to operate at the level of someone two or three years ahead of where they are today. You'll be trusted early, expected to ask good questions, and supported by engineers who've done this before. The best fit is someone who's genuinely curious about how production systems break, who reads documentation properly, and who communicates proactively when something's unclear.

You don't need to know everything. You do need to be the kind of engineer who figures things out and who cares enough about reliability to want to prevent the same problem twice. We operate to ISO 27001, which means documentation, change control, and process discipline are part of the job. That's not bureaucracy. It's how you build things that stay working.

How We Work

Four principles that show up in how we operate real infrastructure for real customers.

Stay Curious

The engineers who progress fastest here ask "why" about every system they touch. Why is this alert configured this way? Why is this runbook written like this? Curiosity about the infrastructure you're operating is how you grow from observer to owner.

Own the Problem

When you're assigned a task or an investigation, you see it through. You don't get stuck and go quiet, you ask, escalate, and update. We don't hand things off and hope. We take problems to resolution and prevent them next time.

Operate at Scale

We run multiple customer environments simultaneously. Everything you build or document has to be operable by anyone on the team, consistent, clear, maintainable. Build for the engineer picking it up without context.

Earn Trust by Delivering

Customers trust us with what's mission-critical. Every stable environment, every clean runbook, every resolved issue is how we earn and keep that trust. Consistency is the only currency that matters here.

Join the pipeline

We're not actively hiring right now, but we keep applications on file. Tell us about something you've built or operated, what broke, what you learned, and what you'd do differently. Personal projects count. No templates.

Pipeline open Cover letter required Direct to founders
careers@criticalcloud.ai →