Why Data Migration Testing Matters

Data migration moves data from one system to another — often from a legacy database to a modern platform, between cloud providers, or during major application rewrites. These are high-risk operations because:

  • Data loss during migration can be catastrophic and sometimes irreversible.
  • Schema differences between old and new systems require complex transformations.
  • Downtime during migration directly impacts business operations.
  • Migrated data may behave differently in the new system.

Migration Types

TypeDescriptionRisk Level
Big bangAll data migrated at once during a maintenance windowHigh — long downtime, all-or-nothing
TrickleData migrated incrementally over timeMedium — complex sync, but lower downtime
Parallel runBoth systems run simultaneously with synchronized dataLow — but expensive and complex
Blue-greenNew system prepared completely, then traffic switchedMedium — instant rollback possible

The Migration Testing Process

Phase 1: Pre-Migration Analysis

Before writing any tests, analyze the migration:

Source schema → Mapping document → Target schema

Create a mapping document:

Source TableSource ColumnTransformTarget TableTarget Column
old_usersusernameLOWER()usersemail_prefix
old_usersfull_nameSPLITusersfirst_name, last_name
old_ordersorder_amt* exchange_rateordersamount_usd
old_ordersstatus_codeMap: 1→pending, 2→completedordersstatus

Phase 2: Test Data Preparation

  1. Identify edge cases in source data: NULL values, special characters, maximum length strings, dates at epoch boundaries, monetary precision issues.
  2. Sample representative data: Extract a meaningful subset that covers all data patterns.
  3. Create known-good test records: Records with pre-calculated expected results for verification.

Phase 3: Migration Execution Tests

Record count verification:

-- Before migration
SELECT COUNT(*) FROM legacy_db.users;          -- 1,250,000
SELECT COUNT(*) FROM legacy_db.orders;         -- 5,430,000
SELECT COUNT(*) FROM legacy_db.order_items;    -- 18,900,000

-- After migration
SELECT COUNT(*) FROM new_db.users;             -- Expected: 1,250,000
SELECT COUNT(*) FROM new_db.orders;            -- Expected: 5,430,000
SELECT COUNT(*) FROM new_db.order_items;       -- Expected: 18,900,000

Aggregate value verification:

-- Financial totals must match
SELECT SUM(order_total) FROM legacy_db.orders;
SELECT SUM(amount_usd) FROM new_db.orders;
-- Difference should be zero (or within acceptable rounding tolerance)

Sample record verification:

-- Check specific known records
SELECT * FROM new_db.users WHERE legacy_id = 12345;
-- Verify each field matches expected transformation

Phase 4: Data Quality Validation

Referential integrity:

-- Every order references a valid user
SELECT COUNT(*) FROM new_db.orders o
LEFT JOIN new_db.users u ON o.user_id = u.id
WHERE u.id IS NULL;
-- Expected: 0

No orphaned records:

-- Every order item references a valid order
SELECT COUNT(*) FROM new_db.order_items oi
LEFT JOIN new_db.orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
-- Expected: 0

Data type correctness:

-- Dates migrated correctly (not shifted by timezone)
SELECT legacy_id, legacy_created_at, new_created_at
FROM migration_audit
WHERE ABS(EXTRACT(EPOCH FROM legacy_created_at - new_created_at)) > 1;
-- Expected: 0 rows (no date shifts)

Phase 5: Application Validation

After data migration, test the application against the migrated data:

  1. Can users log in with their existing credentials?
  2. Do existing orders display correctly?
  3. Can users perform new operations (create orders, update profiles)?
  4. Do reports generate correct numbers?
  5. Do search and filter functions work with migrated data?

Rollback Strategy Testing

Every migration needs a tested rollback plan:

  1. Backup verification: Restore the backup to a test environment. Verify all data is intact.
  2. Rollback script: If using scripts, test that they correctly reverse the migration.
  3. Point of no return: Identify when rollback is no longer feasible (e.g., after new transactions occur in the new system).
  4. Rollback time: Measure how long rollback takes. Can the business tolerate that downtime?

Performance Testing

Migrations on large datasets can take hours. Test with production-scale data:

  1. Measure total migration time with realistic data volume.
  2. Identify bottlenecks (slow queries, network bandwidth, disk I/O).
  3. Test parallel migration (multiple tables or partitions simultaneously).
  4. Plan for the maintenance window based on measured time + safety margin.

Exercise: Data Migration Testing Plan

Scenario

Your company is migrating from a legacy PostgreSQL database to a new system. The legacy database has:

  • 2 million user records
  • 8 million order records
  • 25 million order item records

The new system has a different schema with renamed columns, split fields, and data type changes.

Task 1: Create a Mapping Document

Given the legacy schema:

-- Legacy schema
CREATE TABLE legacy_users (
    user_id INTEGER,
    username VARCHAR(50),
    full_name VARCHAR(200),
    email_addr VARCHAR(255),
    acct_status CHAR(1), -- A=active, I=inactive, S=suspended
    reg_date VARCHAR(10), -- 'MM/DD/YYYY' format
    last_login VARCHAR(19) -- 'MM/DD/YYYY HH:MI:SS' format
);

CREATE TABLE legacy_orders (
    order_num VARCHAR(20),
    cust_id INTEGER,
    order_amt DECIMAL(10,2),
    currency CHAR(3),
    order_status INTEGER, -- 1=new, 2=processing, 3=shipped, 4=delivered, 5=cancelled
    order_dt VARCHAR(10)
);

Create a complete mapping document to this target schema:

-- Target schema
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    legacy_id INTEGER UNIQUE,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    email VARCHAR(255) UNIQUE,
    status VARCHAR(20),
    registered_at TIMESTAMP WITH TIME ZONE,
    last_login_at TIMESTAMP WITH TIME ZONE
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    legacy_order_number VARCHAR(20) UNIQUE,
    user_id INTEGER REFERENCES users(id),
    amount_usd DECIMAL(12,2),
    original_currency VARCHAR(3),
    original_amount DECIMAL(12,2),
    status VARCHAR(20),
    created_at TIMESTAMP WITH TIME ZONE
);

Task 2: Write Verification Queries

Write SQL queries to verify:

  1. All legacy users exist in the new system (count and sample comparison).
  2. Names were split correctly (test with names containing multiple spaces, hyphens, suffixes).
  3. Status codes were mapped correctly (A→active, I→inactive, S→suspended).
  4. Dates were converted from string to timestamp correctly (including timezone handling).
  5. Order amounts were converted to USD using the correct exchange rate.
  6. All foreign key relationships are intact.

Task 3: Edge Case Test Data

Create test records that cover migration edge cases:

  1. User with NULL full_name.
  2. User with name “Mary Jane Watson-Parker III” (complex name splitting).
  3. User with non-ASCII characters in name.
  4. Order with 0.00 amount.
  5. Order dated “02/29/2024” (leap year).
  6. Order with currency code that has no exchange rate.

Task 4: Rollback Plan

Document a rollback plan:

  1. What is backed up before migration?
  2. How is the backup restored?
  3. What is the point of no return?
  4. How long does rollback take?
  5. What is the communication plan if rollback is needed?

Deliverables

  1. Complete field-by-field mapping document.
  2. Verification SQL queries with expected results.
  3. Edge case test data with expected migration outcomes.
  4. Rollback plan document.