Careers

Site Reliability Engineer (SRE)

Operate and improve AWS/Azure platforms under Critical Support, using Datadog as the operational foundation for unified observability, incident response, and continuous improvement.

Datadog (essential)AWSAzureIncident responseSLOsTerraform
Location
Cardiff / London / Dublin (hybrid)
Employment
Full-time
Level
Experienced (SRE / Platform / DevOps)
Team
Cloud Operations (Datadog-powered CMSP)

Role overview

This role is for someone who enjoys operating real production systems and making them better every week — not just keeping the lights on.

You’ll work across AWS and Azure environments for tech-led customers. Datadog is the backbone: metrics, logs, traces, security signals, cloud cost insights, and alerting all live in one place, and both we and the customer operate from that shared view.

You’ll be part of an on-call rotation and you’ll also deliver improvement engineering — automation, guardrails, and tuning — so platforms become more reliable, secure, and cost-controlled over time.

What you’ll do

The day-to-day responsibilities of the role.

What you’ll bring

Datadog skills are essential for this role.

Must-have

  • Hands-on Datadog experience in production (monitors, dashboards, logs/APM, alerting workflows).
  • Experience operating cloud platforms (AWS and/or Azure) with strong fundamentals in networking, Linux, and IAM.
  • Comfortable with incident management and post-incident review (blameless RCA, action tracking).
  • Infrastructure-as-code experience (Terraform preferred) and comfort with scripting/automation.
  • Clear written communication — you can explain incidents and changes without jargon.

Nice-to-have

  • Experience operating both AWS and Azure in production.
  • Kubernetes or container platform experience.
  • FinOps / cloud cost tooling experience (Datadog Cloud Cost Management, AWS Cost Explorer, Azure Cost Management).
  • Security tooling experience (SIEM signals, CSPM concepts, Datadog Security Monitoring).
  • Datadog certifications or proven contributions to observability standards.

What success looks like (first 90 days)

How we hire

A simple process designed to respect your time.

01

Intro call

Alignment on the role, expectations, and what you’re looking for.

02

Technical conversation

Real scenarios: Datadog signals, incidents, systems, trade-offs.

03

Practical exercise

A realistic task. No long take-home marathons.

04

Meet the team

Working style fit, then we move quickly.

Equal opportunities

Critical Cloud is an equal opportunity employer. We value diverse perspectives and are committed to creating an inclusive environment for everyone.

Apply for this role

Email us your CV and a short note on why this role fits you. We’ll get back to you as soon as we can.

Back to careers Apply by email