Automate Azure Backups with Runbooks

Azure Backup's built-in policy engine handles scheduled backups, retention, and point-in-time recovery well for standard scenarios. It does not handle conditional logic, cross-resource dependencies, dynamic backup registration as new VMs are created, or post-backup validation tasks. Azure Automation runbooks fill these gaps: they are PowerShell or Python scripts that run on a schedule or in response to events, using the Azure modules to interact with backup infrastructure programmatically.

This guide covers the practical runbook patterns for Azure Backup that deliver operational value.

Set up the Automation Account

Create an Azure Automation Account (or use an existing one) in the same region as your Recovery Services vaults. The Automation Account needs a System-Assigned Managed Identity with the appropriate roles:

  • Backup Contributor on the Recovery Services vaults it manages
  • Virtual Machine Contributor on the subscription or resource groups containing the VMs (to read VM metadata)
  • Reader on the subscription for resource discovery

Grant these via IAM on the respective scopes. Using a managed identity avoids credential management: no passwords, no client secrets, automatic credential refresh.

Import the Az.RecoveryServices and Az.Automation modules in your Automation Account under Modules > Browse Gallery. These are the PowerShell modules that expose the Azure Backup commands.

Runbook 1: Auto-register new VMs for backup

New VMs are created regularly in active environments. Without automation, they are backed up only after someone manually adds them to a backup policy. VMs between creation and manual registration are unprotected.

A runbook that runs daily discovers VMs without backup coverage and registers them:

param(
    [string]$VaultName = "MyRecoveryVault",
    [string]$ResourceGroupName = "MyRG",
    [string]$PolicyName = "DefaultPolicy"
)

# Connect using managed identity
Connect-AzAccount -Identity

# Get all VMs in the subscription
$allVMs = Get-AzVM

# Get current protected items
$vault = Get-AzRecoveryServicesVault -Name $VaultName -ResourceGroupName $ResourceGroupName
Set-AzRecoveryServicesVaultContext -Vault $vault

$protectedItems = Get-AzRecoveryServicesBackupItem -BackupManagementType AzureVM -WorkloadType AzureVM

$protectedVMIds = $protectedItems | Select-Object -ExpandProperty SourceResourceId

$policy = Get-AzRecoveryServicesBackupProtectionPolicy -Name $PolicyName

foreach ($vm in $allVMs) {
    if ($vm.Id -notin $protectedVMIds) {
        Write-Output "Registering VM for backup: $($vm.Name)"

        Enable-AzRecoveryServicesBackupProtection `
            -ResourceGroupName $vm.ResourceGroupName `
            -Name $vm.Name `
            -Policy $policy

        Write-Output "Registered: $($vm.Name)"
    }
}

Schedule this runbook daily. Alert on any failures so unregistered VMs are investigated.

Runbook 2: On-demand backup before maintenance

Planned maintenance (OS patching, application upgrades, schema migrations) should be preceded by an on-demand backup. This creates a recovery point immediately before the change, reducing the RPO for rollback scenarios.

Trigger an on-demand backup runbook as part of the maintenance window automation:

param(
    [string]$VaultName,
    [string]$ResourceGroupName,
    [string]$VMName,
    [int]$RetentionDays = 30
)

Connect-AzAccount -Identity

$vault = Get-AzRecoveryServicesVault -Name $VaultName -ResourceGroupName $ResourceGroupName
Set-AzRecoveryServicesVaultContext -Vault $vault

$backupItem = Get-AzRecoveryServicesBackupItem `
    -BackupManagementType AzureVM `
    -WorkloadType AzureVM | 
    Where-Object { $_.Name -like "*$VMName*" }

if (-not $backupItem) {
    Write-Error "VM $VMName not found in backup items"
    exit 1
}

$expiryDate = (Get-Date).AddDays($RetentionDays)

$job = Backup-AzRecoveryServicesBackupItem `
    -Item $backupItem `
    -ExpiryDateTimeUTC $expiryDate.ToUniversalTime()

Write-Output "Backup job started: $($job.JobId)"
Write-Output "Monitor job status via Azure Portal or:"
Write-Output "Get-AzRecoveryServicesBackupJob -JobId '$($job.JobId)'"

Call this runbook from your deployment pipeline (Azure DevOps, GitHub Actions, or a deployment runbook) before any maintenance step that modifies data.

Runbook 3: Monitor backup job status and alert on failures

Azure Backup provides built-in alerts, but their default configuration requires investigation in the portal. A runbook that queries job status and sends structured Slack or Teams notifications is more actionable for operations teams:

param(
    [string]$VaultName,
    [string]$ResourceGroupName,
    [string]$WebhookUrl  # Slack or Teams incoming webhook
)

Connect-AzAccount -Identity

$vault = Get-AzRecoveryServicesVault -Name $VaultName -ResourceGroupName $ResourceGroupName
Set-AzRecoveryServicesVaultContext -Vault $vault

# Check jobs from the last 24 hours
$failedJobs = Get-AzRecoveryServicesBackupJob `
    -Status Failed `
    -From (Get-Date).AddHours(-24)

if ($failedJobs.Count -gt 0) {
    $message = "Azure Backup failures in the last 24h:`n"

    foreach ($job in $failedJobs) {
        $message += "- $($job.WorkloadName): $($job.ErrorDetails[0].ErrorMessage)`n"
    }

    $payload = @{
        text = $message
    } | ConvertTo-Json

    Invoke-RestMethod -Uri $WebhookUrl -Method Post -Body $payload -ContentType "application/json"

    Write-Output "Alert sent: $($failedJobs.Count) failed jobs"
} else {
    Write-Output "No failed backup jobs in the last 24h"
}

Schedule this runbook to run every 6 hours. It sends a structured alert only when failures exist, avoiding notification fatigue.

Runbook 4: Clean up expired recovery points

Azure Backup's retention policies automatically expire old recovery points according to the policy schedule. However, on-demand backup recovery points with custom expiry dates, or points created before a policy change, may accumulate. A periodic cleanup runbook ensures no orphaned recovery points run up storage costs:

Connect-AzAccount -Identity

$vaults = Get-AzRecoveryServicesVault

foreach ($vault in $vaults) {
    Set-AzRecoveryServicesVaultContext -Vault $vault

    $items = Get-AzRecoveryServicesBackupItem -BackupManagementType AzureVM -WorkloadType AzureVM

    foreach ($item in $items) {
        $rps = Get-AzRecoveryServicesBackupRecoveryPoint -Item $item |
            Where-Object { $_.RecoveryPointTime -lt (Get-Date).AddDays(-400) }

        foreach ($rp in $rps) {
            Write-Output "Removing expired RP: $($rp.RecoveryPointId) from $($item.Name)"
            Remove-AzRecoveryServicesBackupRecoveryPoint -RecoveryPoint $rp -Force
        }
    }
}

Run this monthly. Adjust the -400 day threshold to match your longest active retention policy.

Testing runbooks safely

Never test runbooks that modify backup configuration against production vaults. Use a dedicated test Recovery Services vault with non-production VMs to validate runbook logic. Use the WhatIf parameter where supported, and add Write-Output statements before any destructive operations during testing.

Once a runbook is validated, deploy it to production with the minimum required permissions on the managed identity. Do not grant Subscription Owner or Subscription Contributor to an automation account: grant only the specific roles the runbook needs.

Where Critical Cloud comes in

Backup automation is part of operating Azure at scale: without it, new VMs miss backup registration, maintenance windows risk data loss, and backup failures go unnoticed until a recovery is needed. We build and operate Azure Automation runbooks for backup management as part of the managed service for regulated and technology-led businesses. See how Critical Support works.