Why Data Migration Testing Matters
Data migration moves data from one system to another — often from a legacy database to a modern platform, between cloud providers, or during major application rewrites. These are high-risk operations because:
- Data loss during migration can be catastrophic and sometimes irreversible.
- Schema differences between old and new systems require complex transformations.
- Downtime during migration directly impacts business operations.
- Migrated data may behave differently in the new system.
Migration Types
| Type | Description | Risk Level |
|---|---|---|
| Big bang | All data migrated at once during a maintenance window | High — long downtime, all-or-nothing |
| Trickle | Data migrated incrementally over time | Medium — complex sync, but lower downtime |
| Parallel run | Both systems run simultaneously with synchronized data | Low — but expensive and complex |
| Blue-green | New system prepared completely, then traffic switched | Medium — instant rollback possible |
The Migration Testing Process
Phase 1: Pre-Migration Analysis
Before writing any tests, analyze the migration:
Source schema → Mapping document → Target schema
Create a mapping document:
| Source Table | Source Column | Transform | Target Table | Target Column |
|---|---|---|---|---|
| old_users | username | LOWER() | users | email_prefix |
| old_users | full_name | SPLIT | users | first_name, last_name |
| old_orders | order_amt | * exchange_rate | orders | amount_usd |
| old_orders | status_code | Map: 1→pending, 2→completed | orders | status |
Phase 2: Test Data Preparation
- Identify edge cases in source data: NULL values, special characters, maximum length strings, dates at epoch boundaries, monetary precision issues.
- Sample representative data: Extract a meaningful subset that covers all data patterns.
- Create known-good test records: Records with pre-calculated expected results for verification.
Phase 3: Migration Execution Tests
Record count verification:
-- Before migration
SELECT COUNT(*) FROM legacy_db.users; -- 1,250,000
SELECT COUNT(*) FROM legacy_db.orders; -- 5,430,000
SELECT COUNT(*) FROM legacy_db.order_items; -- 18,900,000
-- After migration
SELECT COUNT(*) FROM new_db.users; -- Expected: 1,250,000
SELECT COUNT(*) FROM new_db.orders; -- Expected: 5,430,000
SELECT COUNT(*) FROM new_db.order_items; -- Expected: 18,900,000
Aggregate value verification:
-- Financial totals must match
SELECT SUM(order_total) FROM legacy_db.orders;
SELECT SUM(amount_usd) FROM new_db.orders;
-- Difference should be zero (or within acceptable rounding tolerance)
Sample record verification:
-- Check specific known records
SELECT * FROM new_db.users WHERE legacy_id = 12345;
-- Verify each field matches expected transformation
Phase 4: Data Quality Validation
Referential integrity:
-- Every order references a valid user
SELECT COUNT(*) FROM new_db.orders o
LEFT JOIN new_db.users u ON o.user_id = u.id
WHERE u.id IS NULL;
-- Expected: 0
No orphaned records:
-- Every order item references a valid order
SELECT COUNT(*) FROM new_db.order_items oi
LEFT JOIN new_db.orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
-- Expected: 0
Data type correctness:
-- Dates migrated correctly (not shifted by timezone)
SELECT legacy_id, legacy_created_at, new_created_at
FROM migration_audit
WHERE ABS(EXTRACT(EPOCH FROM legacy_created_at - new_created_at)) > 1;
-- Expected: 0 rows (no date shifts)
Phase 5: Application Validation
After data migration, test the application against the migrated data:
- Can users log in with their existing credentials?
- Do existing orders display correctly?
- Can users perform new operations (create orders, update profiles)?
- Do reports generate correct numbers?
- Do search and filter functions work with migrated data?
Rollback Strategy Testing
Every migration needs a tested rollback plan:
- Backup verification: Restore the backup to a test environment. Verify all data is intact.
- Rollback script: If using scripts, test that they correctly reverse the migration.
- Point of no return: Identify when rollback is no longer feasible (e.g., after new transactions occur in the new system).
- Rollback time: Measure how long rollback takes. Can the business tolerate that downtime?
Performance Testing
Migrations on large datasets can take hours. Test with production-scale data:
- Measure total migration time with realistic data volume.
- Identify bottlenecks (slow queries, network bandwidth, disk I/O).
- Test parallel migration (multiple tables or partitions simultaneously).
- Plan for the maintenance window based on measured time + safety margin.
Exercise: Data Migration Testing Plan
Scenario
Your company is migrating from a legacy PostgreSQL database to a new system. The legacy database has:
- 2 million user records
- 8 million order records
- 25 million order item records
The new system has a different schema with renamed columns, split fields, and data type changes.
Task 1: Create a Mapping Document
Given the legacy schema:
-- Legacy schema
CREATE TABLE legacy_users (
user_id INTEGER,
username VARCHAR(50),
full_name VARCHAR(200),
email_addr VARCHAR(255),
acct_status CHAR(1), -- A=active, I=inactive, S=suspended
reg_date VARCHAR(10), -- 'MM/DD/YYYY' format
last_login VARCHAR(19) -- 'MM/DD/YYYY HH:MI:SS' format
);
CREATE TABLE legacy_orders (
order_num VARCHAR(20),
cust_id INTEGER,
order_amt DECIMAL(10,2),
currency CHAR(3),
order_status INTEGER, -- 1=new, 2=processing, 3=shipped, 4=delivered, 5=cancelled
order_dt VARCHAR(10)
);
Create a complete mapping document to this target schema:
-- Target schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
legacy_id INTEGER UNIQUE,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(255) UNIQUE,
status VARCHAR(20),
registered_at TIMESTAMP WITH TIME ZONE,
last_login_at TIMESTAMP WITH TIME ZONE
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
legacy_order_number VARCHAR(20) UNIQUE,
user_id INTEGER REFERENCES users(id),
amount_usd DECIMAL(12,2),
original_currency VARCHAR(3),
original_amount DECIMAL(12,2),
status VARCHAR(20),
created_at TIMESTAMP WITH TIME ZONE
);
Task 2: Write Verification Queries
Write SQL queries to verify:
- All legacy users exist in the new system (count and sample comparison).
- Names were split correctly (test with names containing multiple spaces, hyphens, suffixes).
- Status codes were mapped correctly (A→active, I→inactive, S→suspended).
- Dates were converted from string to timestamp correctly (including timezone handling).
- Order amounts were converted to USD using the correct exchange rate.
- All foreign key relationships are intact.
Task 3: Edge Case Test Data
Create test records that cover migration edge cases:
- User with NULL full_name.
- User with name “Mary Jane Watson-Parker III” (complex name splitting).
- User with non-ASCII characters in name.
- Order with 0.00 amount.
- Order dated “02/29/2024” (leap year).
- Order with currency code that has no exchange rate.
Task 4: Rollback Plan
Document a rollback plan:
- What is backed up before migration?
- How is the backup restored?
- What is the point of no return?
- How long does rollback take?
- What is the communication plan if rollback is needed?
Deliverables
- Complete field-by-field mapping document.
- Verification SQL queries with expected results.
- Edge case test data with expected migration outcomes.
- Rollback plan document.