AWS Database Migration: Moving from On-Premise to RDS

Database migrations fail for two reasons: teams underestimate the complexity of the data, or they underestimate the complexity of the cutover. The technical mechanics of moving data are well-solved. The hard parts are schema compatibility, application coupling, and the window where both environments need to stay in sync.

This guide covers the full process of migrating an on-premise relational database to Amazon RDS, from initial assessment through cutover to post-migration validation.

What You Are Actually Moving

A database migration is not just a data copy. You are moving:

Schema: Tables, views, stored procedures, functions, triggers, sequences
Data: Current rows, including any that change during the migration
Dependencies: Application connection strings, database users and permissions, jobs and schedulers that run against the database
Constraints: Foreign keys, check constraints, custom types
Indexes: Primary keys, unique constraints, performance indexes

Each of these can introduce incompatibilities when moving between database engines or engine versions. A migration from on-premise MySQL 5.7 to RDS for MySQL 8.0 is not a like-for-like move. Neither is a migration from Oracle to RDS for PostgreSQL.

Before you start, classify your migration:

Homogeneous: Same engine (MySQL to MySQL, PostgreSQL to PostgreSQL). Typically simpler. Most schema and data constructs transfer directly.
Heterogeneous: Different engines (Oracle to PostgreSQL, SQL Server to MySQL). Requires schema conversion and application query changes. Significantly more complex.

AWS Tools for Database Migration

AWS Database Migration Service (DMS)

DMS is the core tool for migrating data to RDS. It supports both full-load (initial data copy) and change data capture (CDC, ongoing replication of changes while the source database remains live).

DMS runs as a replication instance in your AWS account. You configure source and target endpoints, and a replication task that defines what to copy and how.

For a migration with minimal downtime, the standard pattern is:

Full load: DMS copies all existing data from source to target
CDC: DMS continuously replicates ongoing changes from source to target
Cutover: Once replication lag is near zero, switch application connections to RDS and stop DMS

This approach keeps your source database fully operational during the migration. Application downtime is limited to the cutover window, typically minutes rather than hours.

AWS Schema Conversion Tool (SCT)

For heterogeneous migrations, SCT converts the source database schema (tables, views, stored procedures) to the target engine's syntax. It generates a report showing which objects it can convert automatically and which require manual intervention.

SCT is a free download from AWS. Run it early in the project. The conversion report is your complexity estimate: a schema that converts at 95% automated is a very different project from one that converts at 60%.

Stored procedures and functions are typically where manual conversion effort concentrates. SQL procedural code (PL/SQL, T-SQL) does not map cleanly across engines. Budget accordingly.

Pre-Migration: What to Do Before DMS Runs

Assess the source database

Document the schema fully: tables, row counts, estimated data size, dependent objects
Identify the highest-traffic tables and the tables with the most write activity (these need the tightest CDC monitoring)
List all database users, their privileges, and whether they need to exist on RDS (many do not; application users do, admin users often do not)
List all scheduled jobs, maintenance tasks, and any external systems that write directly to the database
Check for unsupported features in RDS: some source database features have no equivalent in managed RDS (e.g., Oracle RAC, SQL Server Reporting Services hosted on the database server)

Prepare the target RDS instance

Choose instance class based on production load, not source hardware (managed services often need fewer resources because of better I/O and connection handling)
Enable Multi-AZ for production migrations
Enable encryption at rest using KMS
Configure the parameter group before migration: settings like maxconnections, innodbbufferpoolsize (for MySQL), or shared_buffers (for PostgreSQL) affect performance and should be tuned before data arrives
Configure backup retention and maintenance windows
Set up security groups: the DMS replication instance needs inbound access to the source, and access to the target RDS instance

Network connectivity

DMS needs a network path from the source database to the target RDS instance. Options:

Direct Connect or VPN: For on-premise sources, the DMS replication instance connects over your existing Direct Connect or Site-to-Site VPN
Public endpoint: DMS can connect to on-premise databases over the internet if the source has a public IP and appropriate firewall rules. Not recommended for sensitive data.
AWS SCT Data Extraction Agents: For very large databases or constrained connectivity, SCT extraction agents run on-premise and push data to AWS storage, from where DMS reads it

Running the Migration

Step 1: Create the DMS replication instance

The replication instance is an EC2 instance managed by DMS that runs the migration engine. Size it based on your data volume and CDC requirements:

For databases under 1 TB with moderate write activity: dms.r5.large
For larger or higher-throughput databases: dms.r5.xlarge or larger
Enable Multi-AZ on the replication instance for production migrations to prevent a replication instance failure stopping the migration

Step 2: Configure source and target endpoints

Source endpoint: your on-premise database. DMS supports MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and others. Provide the hostname, port, username, and password. Test the connection before proceeding.

Target endpoint: your RDS instance. Same process. Ensure the RDS security group allows inbound from the DMS replication instance's subnet.

Step 3: Create and run the full-load task

Configure a replication task with:

Task type: Migrate existing data (full load only) or Migrate existing data and replicate ongoing changes (full load plus CDC)
Table mappings: specify which schemas and tables to include. Use explicit include rules rather than wildcarding everything.
LOB settings: large object columns (BLOB, CLOB, VARBINARY over a certain size) need specific LOB mode configuration. Incorrect LOB settings are a common source of full-load errors.
Task settings: enable CloudWatch logging for the task. You need the logs to diagnose failures.

Monitor the full load in the DMS console. Watch for:

Table load errors (usually schema incompatibilities or missing permissions)
LOB truncation warnings
Full load completion percentage and estimated time remaining

Large databases (hundreds of GB to TB range) take hours to days for full load. Run the initial full load during a low-traffic period to reduce CDC backlog.

Step 4: Monitor and validate CDC

Once full load completes, DMS switches automatically to CDC mode (if you configured a combined task). CDC keeps the target RDS instance up to date with ongoing changes on the source.

Key metrics to watch:

CDCLatencySource: Lag between the source database and the DMS replication instance. Should be near zero once full load is complete.
CDCLatencyTarget: Lag between the DMS replication instance and the target RDS instance. Should also be near zero.
CDCThroughputBandwidth: Data volume being replicated per second. Unusual spikes may indicate a batch job or bulk operation on the source.

Let CDC run for at least 24-48 hours before cutover. This validates that the replication is stable across a full business cycle including any overnight batch processes.

Step 5: Validate data integrity

Row count comparison alone is not sufficient validation. Validate:

Row counts match per table
Checksums or hash comparisons on high-value tables
Sample data spot-checks: select specific records by primary key on source and target and compare
Verify any tables that DMS reported errors on during full load have been manually remediated

AWS DMS provides a Data Validation feature that runs row-level comparison automatically. Enable it for production migrations. It adds some overhead to the migration but provides confidence before cutover.

The Cutover

Cutover is the window where you switch application connections from the source database to RDS. The goal is to minimise this window.

Minimum downtime cutover process:

Confirm CDC replication lag is near zero and stable
Put the application into a read-only or maintenance mode
Confirm the final batch of writes has replicated: check CDCLatencyTarget is zero
Stop the DMS replication task
Run final data validation checks
Update application connection strings to point to the RDS endpoint
Restart the application
Run smoke tests confirming the application is functioning correctly against RDS
Monitor error rates and latency for 15-30 minutes before declaring success

Do not delete the source database during this window. Keep it intact until you have validated the application is running correctly against RDS for at least 24 hours.

Post-Migration

Performance baseline

Run your application against RDS for at least a week before comparing performance to on-premise. Initial periods often show different query plans as the RDS query cache warms and statistics build.

Enable RDS Performance Insights and identify the top wait events and top SQL by load. Compare against your pre-migration baseline. Queries that were slow on-premise may need index adjustments in RDS.

Decommission the source

Once you are confident in RDS, decommission the source database. Before you do:

Take a final backup of the source and store it (S3 Glacier is appropriate for long-term retention)
Confirm all application connection strings have been updated and there are no remaining connections to the source
Update any monitoring, alerting, or job schedulers that referenced the source database hostname

Observability

RDS publishes metrics to CloudWatch: CPU, connections, free storage, read/write IOPS and latency, replica lag. Configure alarms on free storage (you do not want to hit the ceiling unexpectedly) and on connections (a connection leak is easier to diagnose early).

For query-level performance visibility, Performance Insights is the first tool to reach for. For application-to-database tracing, Datadog's database monitoring integrates with RDS to show query execution plans, latency percentiles, and slow query identification from the same dashboard as your application performance metrics.

Where Critical Cloud Comes In

Database migrations carry real risk and require careful planning, execution, and post-migration monitoring. Critical Cloud runs AWS migrations for technology-led businesses, with the infrastructure, tooling, and observability in place from the start. If you are planning a database migration to RDS and want experienced hands on the project, see how we work.