What is the difference between Critical Response and Critical Support?

Critical Response is incident-only: we detect, respond, and recover, but do not carry out proactive improvement engineering. Critical Support adds monthly improvement engineering hours (16–56 hrs/month) across six pillars. If you want your cloud platform to improve over time, not just be kept alive. Critical Support is the right service.

Stay in control when things break: cloud incident response, powered by Datadog

Critical Response
Rapid detection. Clear escalation. Fast recovery.

Critical Response is incident-response-only cloud cover for AWS and Azure. We detect, triage, respond, and recover, within the coverage window you choose. No proactive engineering. Just reliable, SRE-driven incident management when you need it. Agents accelerate the analysis. A human owns the outcome of every incident.

Talk to us Want proactive improvement too?

15 min

SEV-1 response time

Daytime · E&W · 24×7

Coverage options

AWS + Azure

Clouds covered

Full observability always

Who it's for

Teams that need after-hours or 24×7 cover but already have in-house day-to-day ops capability
Businesses that want to supplement their in-house on-call without replacing it
Scale-ups that need weekend and overnight cover as they grow but aren't ready for a full managed service

Want proactive improvement too? See Critical Support or Critical Support Lite.

How it works

Five-stage incident lifecycle

Every incident follows the same structured process, from first signal in Datadog to blameless postmortem.

Stage 01

Monitoring

Detail

Datadog telemetry, synthetic monitors, and alert rules watch your environment continuously. Bits AI SRE helps surface signal from noise.

Stage 02

Triage

Detail

On-call engineer classifies severity (SEV-1–4), assesses blast radius, and confirms ownership. Customer notified immediately for SEV-1.

Stage 03

Response

Detail

Runbooks executed, safe workarounds applied, rollback procedures followed as appropriate. All actions documented in real time in Datadog Incident Management.

Stage 04

Escalation

Detail

On-call routing, cloud-provider escalation, vendor coordination, and stakeholder communications, managed by our engineers so yours can focus on the fix.

Stage 05

Recovery & Review

Detail

Validated recovery, blameless RCA, and a written summary. Recovery time is a target (not a guarantee), SEV-1 60–120 min depending on plan.

Incidents covered

What Critical Response handles

Service outages, complete or partial failures affecting end users or dependent services
Performance degradation, sustained latency spikes, elevated error rates, or throughput collapse
Security alerts, operational triage and containment only (not SOC/MDR/forensics)

Integration and API failures, broken upstream or downstream dependencies causing customer impact
Cloud provider incidents, AWS/Azure provider events that affect your environment, with response and workaround coordination

Plans

Three plans, Daytime, Evenings & Weekends, 24×7

Choose the coverage window that fills your gap. All plans use the same 5-stage lifecycle and Datadog-native tooling. Response times are contractual. Recovery times are targets. Talk to us for pricing.

Feature	Daytime	Evenings & Weekends	24×7
Coverage hours	09:00–17:00 Mon–Fri	17:00–09:00 Mon–Fri + 24×7 weekends	Full 24×7×365
Severity covered	SEV-1	SEV-1 & SEV-2	SEV-1 & SEV-2
Response time	15 min (SEV-1)	SEV-1: 15 min · SEV-2: 30 min	SEV-1: 15 min · SEV-2: 15 min
Recovery target (SEV-1)	120-min target	90-min target	60-min target
Incident management time/month	4 hrs	4 hrs	8 hrs
Monitored services	Up to 10	Up to 10	Up to 20
External endpoints monitored	1 @ 5-min interval	1 @ 2-min interval	5 @ 1-min interval
Dashboards	Standard	Standard	Standard + 1 custom
Out-of-hours callouts	None	2 per month	4 per month
Runbooks & reports	Standard + monthly summary	Standard + monthly summary	Customised + detailed RCA trends

Recovery times are targets, not contractual guarantees. Response times (time to first engineer contact) are the contractual commitment.

Severity model

Four severity levels, SEV-1 to SEV-4

Classification happens at triage. SEV-1 and SEV-2 trigger immediate response within contracted hours.

SEV-1 · Critical

Complete outage or material risk

Total service unavailability, data loss risk, or severe breach of contractual obligations. Immediate response. 15-min response target on all plans.

SEV-2 · High

Significant degradation or partial outage

Major feature failure, severe performance degradation, or partial loss of service affecting a significant number of users. Covered on E&W and 24×7 plans.

SEV-3 · Moderate

Limited impact, workaround available

Non-critical issues with a viable workaround. Handled during business hours. Not covered under Critical Response out-of-hours plans.

SEV-4 · Low

Informational or minor

Minor issues, informational alerts, or configuration questions. Handled in-hours. Not covered under Critical Response out-of-hours plans.

Shared responsibility

Clear ownership during incidents

Critical Cloud owns: detection, classification, response execution, escalation to vendors, stakeholder communication, and recovery validation. All documented in Datadog Incident Management.

We notify you at SEV-1 detection and at key recovery milestones.
Material changes (infrastructure, configuration) need your approval.

You own: application code, business continuity decisions, customer communications, and access approvals.

You retain full IAM and admin control at all times.
You have full, real-time access to your Datadog environment throughout any incident.

Want proactive improvement too?

Critical Support, incident management plus monthly engineering

Critical Response covers you when things break. Critical Support also improves things so they break less often.

Read the full context

Monthly improvement engineering across six pillars, reliability, security, cost, performance, automation, and governance. If you're spending engineering time on reactive firefighting, Critical Support is built to change that.

Explore Critical Support Critical Support Lite

FAQ

What is Critical Response?

Critical Response is an incident-response-only service for AWS and Azure: detection, triage, response, escalation, and recovery within the coverage window you choose (Daytime, Evenings and Weekends, or 24×7). There is no proactive improvement engineering, for that, see Critical Support or Critical Support Lite.

Does Critical Response include proactive engineering?

No. Critical Response is incident management only. Each plan includes a small allocation of incident management time for runbook maintenance and operational overhead, but no improvement engineering backlog. For proactive improvement, see Critical Support or Critical Support Lite.

What does "recovery target" mean?

Recovery targets (e.g. 60-minute target for SEV-1 on 24×7) are working objectives we aim to meet. They reflect our operational capability and historic performance, but are not contractual guarantees, complex incidents take longer by their nature. Response time (time to first engineer contact) is the contractual commitment.

What happens after an incident?

For all SEV-1 incidents: a blameless postmortem and written RCA summary, shared with you within the agreed timeframe. Findings can be fed into an improvement backlog if you're on Critical Support or Lite.

Need reliable cover for the hours that matter?

Tell us about your AWS or Azure environment and we'll recommend the right coverage window.

Talk to us Critical Support

Critical ResponseRapid detection. Clear escalation. Fast recovery.

Five-stage incident lifecycle

Monitoring

Triage

Response

Escalation

Recovery & Review

What Critical Response handles

Three plans, Daytime, Evenings & Weekends, 24×7

Four severity levels, SEV-1 to SEV-4

Clear ownership during incidents

Critical Support, incident management plus monthly engineering

FAQ

Need reliable cover for the hours that matter?

Critical Response
Rapid detection. Clear escalation. Fast recovery.