AWS Database Migration: Moving from On-Premise to RDS
Database migrations fail for two reasons: teams underestimate the complexity of the data, or they underestimate the complexity of the cutover. The technical mechanics of moving data are well-solved. The hard parts are schema compatibility, application coupling, and the window where both environments need to stay in sync.
This guide covers the full process of migrating an on-premise relational database to Amazon RDS, from initial assessment through cutover to post-migration validation.
What You Are Actually Moving
A database migration is not just a data copy. You are moving:
- Schema: Tables, views, stored procedures, functions, triggers, sequences
- Data: Current rows, including any that change during the migration
- Dependencies: Application connection strings, database users and permissions, jobs and schedulers that run against the database
- Constraints: Foreign keys, check constraints, custom types
- Indexes: Primary keys, unique constraints, performance indexes
Each of these can introduce incompatibilities when moving between database engines or engine versions. A migration from on-premise MySQL 5.7 to RDS for MySQL 8.0 is not a like-for-like move. Neither is a migration from Oracle to RDS for PostgreSQL.
Before you start, classify your migration:
- Homogeneous: Same engine (MySQL to MySQL, PostgreSQL to PostgreSQL). Typically simpler. Most schema and data constructs transfer directly.
- Heterogeneous: Different engines (Oracle to PostgreSQL, SQL Server to MySQL). Requires schema conversion and application query changes. Significantly more complex.
AWS Tools for Database Migration
AWS Database Migration Service (DMS)
DMS is the core tool for migrating data to RDS. It supports both full-load (initial data copy) and change data capture (CDC, ongoing replication of changes while the source database remains live).
DMS runs as a replication instance in your AWS account. You configure source and target endpoints, and a replication task that defines what to copy and how.
For a migration with minimal downtime, the standard pattern is:
- Full load: DMS copies all existing data from source to target
- CDC: DMS continuously replicates ongoing changes from source to target
- Cutover: Once replication lag is near zero, switch application connections to RDS and stop DMS
This approach keeps your source database fully operational during the migration. Application downtime is limited to the cutover window, typically minutes rather than hours.
AWS Schema Conversion Tool (SCT)
For heterogeneous migrations, SCT converts the source database schema (tables, views, stored procedures) to the target engine's syntax. It generates a report showing which objects it can convert automatically and which require manual intervention.
SCT is a free download from AWS. Run it early in the project. The conversion report is your complexity estimate: a schema that converts at 95% automated is a very different project from one that converts at 60%.
Stored procedures and functions are typically where manual conversion effort concentrates. SQL procedural code (PL/SQL, T-SQL) does not map cleanly across engines. Budget accordingly.
Pre-Migration: What to Do Before DMS Runs
Assess the source database
- Document the schema fully: tables, row counts, estimated data size, dependent objects
- Identify the highest-traffic tables and the tables with the most write activity (these need the tightest CDC monitoring)
- List all database users, their privileges, and whether they need to exist on RDS (many do not; application users do, admin users often do not)
- List all scheduled jobs, maintenance tasks, and any external systems that write directly to the database
- Check for unsupported features in RDS: some source database features have no equivalent in managed RDS (e.g., Oracle RAC, SQL Server Reporting Services hosted on the database server)
Prepare the target RDS instance
- Choose instance class based on production load, not source hardware (managed services often need fewer resources because of better I/O and connection handling)
- Enable Multi-AZ for production migrations
- Enable encryption at rest using KMS
- Configure the parameter group before migration: settings like
maxconnections,innodbbufferpoolsize(for MySQL), orshared_buffers(for PostgreSQL) affect performance and should be tuned before data arrives - Configure backup retention and maintenance windows
- Set up security groups: the DMS replication instance needs inbound access to the source, and access to the target RDS instance
Network connectivity
DMS needs a network path from the source database to the target RDS instance. Options:
- Direct Connect or VPN: For on-premise sources, the DMS replication instance connects over your existing Direct Connect or Site-to-Site VPN
- Public endpoint: DMS can connect to on-premise databases over the internet if the source has a public IP and appropriate firewall rules. Not recommended for sensitive data.
- AWS SCT Data Extraction Agents: For very large databases or constrained connectivity, SCT extraction agents run on-premise and push data to AWS storage, from where DMS reads it
Running the Migration
Step 1: Create the DMS replication instance
The replication instance is an EC2 instance managed by DMS that runs the migration engine. Size it based on your data volume and CDC requirements:
- For databases under 1 TB with moderate write activity:
dms.r5.large - For larger or higher-throughput databases:
dms.r5.xlargeor larger - Enable Multi-AZ on the replication instance for production migrations to prevent a replication instance failure stopping the migration
Step 2: Configure source and target endpoints
Source endpoint: your on-premise database. DMS supports MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and others. Provide the hostname, port, username, and password. Test the connection before proceeding.
Target endpoint: your RDS instance. Same process. Ensure the RDS security group allows inbound from the DMS replication instance's subnet.
Step 3: Create and run the full-load task
Configure a replication task with:
- Task type:
Migrate existing data(full load only) orMigrate existing data and replicate ongoing changes(full load plus CDC) - Table mappings: specify which schemas and tables to include. Use explicit include rules rather than wildcarding everything.
- LOB settings: large object columns (BLOB, CLOB, VARBINARY over a certain size) need specific LOB mode configuration. Incorrect LOB settings are a common source of full-load errors.
- Task settings: enable CloudWatch logging for the task. You need the logs to diagnose failures.
Monitor the full load in the DMS console. Watch for:
- Table load errors (usually schema incompatibilities or missing permissions)
- LOB truncation warnings
- Full load completion percentage and estimated time remaining
Large databases (hundreds of GB to TB range) take hours to days for full load. Run the initial full load during a low-traffic period to reduce CDC backlog.
Step 4: Monitor and validate CDC
Once full load completes, DMS switches automatically to CDC mode (if you configured a combined task). CDC keeps the target RDS instance up to date with ongoing changes on the source.
Key metrics to watch:
- CDCLatencySource: Lag between the source database and the DMS replication instance. Should be near zero once full load is complete.
- CDCLatencyTarget: Lag between the DMS replication instance and the target RDS instance. Should also be near zero.
- CDCThroughputBandwidth: Data volume being replicated per second. Unusual spikes may indicate a batch job or bulk operation on the source.
Let CDC run for at least 24-48 hours before cutover. This validates that the replication is stable across a full business cycle including any overnight batch processes.
Step 5: Validate data integrity
Row count comparison alone is not sufficient validation. Validate:
- Row counts match per table
- Checksums or hash comparisons on high-value tables
- Sample data spot-checks: select specific records by primary key on source and target and compare
- Verify any tables that DMS reported errors on during full load have been manually remediated
AWS DMS provides a Data Validation feature that runs row-level comparison automatically. Enable it for production migrations. It adds some overhead to the migration but provides confidence before cutover.
The Cutover
Cutover is the window where you switch application connections from the source database to RDS. The goal is to minimise this window.
Minimum downtime cutover process:
- Confirm CDC replication lag is near zero and stable
- Put the application into a read-only or maintenance mode
- Confirm the final batch of writes has replicated: check CDCLatencyTarget is zero
- Stop the DMS replication task
- Run final data validation checks
- Update application connection strings to point to the RDS endpoint
- Restart the application
- Run smoke tests confirming the application is functioning correctly against RDS
- Monitor error rates and latency for 15-30 minutes before declaring success
Do not delete the source database during this window. Keep it intact until you have validated the application is running correctly against RDS for at least 24 hours.
Post-Migration
Performance baseline
Run your application against RDS for at least a week before comparing performance to on-premise. Initial periods often show different query plans as the RDS query cache warms and statistics build.
Enable RDS Performance Insights and identify the top wait events and top SQL by load. Compare against your pre-migration baseline. Queries that were slow on-premise may need index adjustments in RDS.
Decommission the source
Once you are confident in RDS, decommission the source database. Before you do:
- Take a final backup of the source and store it (S3 Glacier is appropriate for long-term retention)
- Confirm all application connection strings have been updated and there are no remaining connections to the source
- Update any monitoring, alerting, or job schedulers that referenced the source database hostname
Observability
RDS publishes metrics to CloudWatch: CPU, connections, free storage, read/write IOPS and latency, replica lag. Configure alarms on free storage (you do not want to hit the ceiling unexpectedly) and on connections (a connection leak is easier to diagnose early).
For query-level performance visibility, Performance Insights is the first tool to reach for. For application-to-database tracing, Datadog's database monitoring integrates with RDS to show query execution plans, latency percentiles, and slow query identification from the same dashboard as your application performance metrics.
Where Critical Cloud Comes In
Database migrations carry real risk and require careful planning, execution, and post-migration monitoring. Critical Cloud runs AWS migrations for technology-led businesses, with the infrastructure, tooling, and observability in place from the start. If you are planning a database migration to RDS and want experienced hands on the project, see how we work.