Junior
Site Reliability
Engineer

Hiring Now UK Remote / Cardiff Full-Time Entry–Mid Level
Salary
£35–40k
Location
UK Remote
Stack
Datadog-native
About Us
We are the world's first "Powered by Datadog" certified partner, a Datadog-native cloud managed service provider built for European tech-led SMBs. Our founders previously built and exited DevOpsGroup to Amdocs. We operate lean, move fast, and take observability seriously.

Critical Cloud delivers cloud operations across AWS and Azure through three commercial motions: Adopt, Optimise, and Manage. We're building the dominant Datadog-native managed service brand in Europe, and we want engineers who want to grow with us.

The Role

This is a ground-floor SRE role inside a fast-moving cloud MSP. You'll work directly with our senior engineers and founders, supporting real production environments for a portfolio of tech-led customers. Expect genuine exposure to Datadog, AWS/Azure, incident response, and infrastructure automation from day one, not ticket triaging.

We're looking for someone early in their career who has the fundamentals, the curiosity to go deep, and the communication skills to work directly with customers. You don't need to know everything. You need to be the kind of engineer who figures things out.

What You'll Do
Requirements
Must Have
  • Solid Linux fundamentals: CLI, networking, process management
  • Working knowledge of at least one cloud platform (AWS or Azure)
  • Comfort with scripting: Bash, Python, or similar
  • Understanding of core observability concepts: metrics, logs, traces
  • Clear written and verbal communication: you'll work with customers
  • Right to work in the UK without sponsorship
Nice to Have
  • Hands-on Datadog experience (any tier)
  • Terraform or other IaC tooling
  • Kubernetes or containerised workload exposure
  • Experience in an MSP or multi-customer environment
  • Familiarity with ISO 27001 or similar compliance frameworks
  • Any cloud certification (AWS, Azure, or Datadog)
Tech Stack
Datadog Core observability platform
AWS Primary cloud, multi-account
Azure Secondary cloud workloads
Terraform Infrastructure as code
Kubernetes Container orchestration
GitHub Actions CI/CD pipelines
PagerDuty Incident management
Python / Bash Automation & tooling
Compensation & Benefits
£35–40k
Base salary DOE
Remote-first
UK-based, async-friendly
DD certs
Funded training & certs
Who Thrives Here

We're a small, senior-heavy team. You won't be managed closely. You'll be trusted and expected to own your work. The best fit is someone who treats production environments with respect, communicates proactively when something's wrong, and genuinely wants to understand the systems they're operating, not just keep the lights on.

We operate to ISO 27001 and take our IMS seriously. That means documentation, change control, and process discipline matter. If that sounds like constraint, this probably isn't the role. If it sounds like craft, read on.

Ready to apply?

Send a cover letter and your CV. The cover letter matters most: tell us why Critical Cloud, what draws you to reliability engineering, and what you've built or operated. No templates.

Hiring now Cover letter required Direct to founders
careers@criticalcloud.ai →