Graduate Engineer
AI Tooling &
Site Reliability
Critical Cloud is the world's first "Powered by Datadog" certified partner, a Datadog-native cloud MSP built for European tech-led SMBs. We're building an internal AI platform (the Critical Cloud Platform) to automate and augment how we operate customer environments. This role sits at the centre of that programme.
Half your time will be engineering AI-assisted tooling: LLM integrations, agents, and automation workflows that reduce toil and improve our operational quality. The other half will be hands-on SRE work: monitoring, incident support, infrastructure-as-code, and customer-facing operations. Each half makes you better at the other.
- Build and iterate on AI-assisted automation workflows using LLM APIs (Claude, OpenAI) integrated with cloud and observability tooling
- Develop tooling for automated infrastructure discovery, customer onboarding, and operational runbook generation
- Contribute to the Critical Cloud Platform: our internal AI governance framework and agent operating model
- Design and implement MCP (Model Context Protocol) integrations connecting AI agents to Datadog, AWS, and Azure APIs
- Write evaluation harnesses and regression tests to keep AI tool output reliable and auditable
- Document AI system behaviour against our constitutional operating framework and ISO 27001 controls
- Monitor and triage alerts across customer AWS and Azure environments using Datadog as the primary observability platform
- Participate in on-call rotations and incident response, contributing to postmortems and remediation work
- Support Datadog onboarding for new customers: instrumentation, dashboards, monitors, and SLO configuration
- Write and maintain Terraform modules for infrastructure provisioning and change management
- Produce and maintain operational runbooks, escalation guides, and change records to ISO 27001 standards
- Contribute SRE context back into AI tooling: you'll know what's worth automating because you've done it manually
- A degree in Computer Science, Software Engineering, or a related technical field (2:1 or above)
- Solid Python: comfortable writing scripts, working with APIs, and handling structured data
- Familiarity with cloud fundamentals (AWS or Azure), ideally through coursework, personal projects, or placement
- Experience consuming REST APIs or LLM APIs, whether through a project, dissertation, or side work
- Linux command-line confidence: networking basics, process management, file systems
- Clear written communication: you'll be writing docs and talking to customers
- Hands-on LLM work: prompt engineering, tool use, agent frameworks, or evaluation pipelines
- Terraform or any IaC tooling (even tutorials count)
- Datadog experience, even a free tier account you've played with
- Kubernetes or containerised workload exposure
- Any cloud or AI certification (AWS, Azure, Google, or Datadog)
- A GitHub profile with something worth showing us
We're a small team. Progression is real and fast, not managed by a committee.
AI & SRE
AI Platform / SRE
Specialise or Broaden
Platform or SRE
The ideal candidate doesn't have to choose between writing code and running infrastructure. They're curious about both and understand that the two inform each other. You'll build AI tooling that automates real operational problems precisely because you've experienced those problems hands-on in the SRE track.
We operate to ISO 27001. Everything we build, including AI systems, has to be explainable, auditable, and consistent with our governance framework. If you care about building AI tools that are reliable, not just impressive demos, you'll fit right in.
This is an early career role, but we don't run it like one. You'll have genuine ownership, direct access to founders, and the chance to shape a platform that will define how Critical Cloud operates at scale.
Sound like you?
Send a cover letter and your CV. The cover letter matters most: tell us what draws you to both AI tooling and reliability engineering, and share what you've built, whether a project, a repo, a dissertation, or anything real.