Monorepos have become the preferred code organization strategy at many of the world’s largest technology companies. According to a study by Google, Microsoft, and Meta, all three companies successfully manage tens of thousands of projects in single repositories, with Google’s monorepo containing over 2 billion lines of code. According to a survey by JetBrains 2023, 34% of professional developers now work in monorepo environments, up from 12% in 2019 — driven by the adoption of tools like Nx, Turborepo, and Bazel. The testing challenge in monorepos is unique: changes in a shared library might affect dozens of downstream applications, requiring intelligent affected-test detection to avoid running the entire test suite on every commit. This guide covers affected-test detection, test sharding strategies, and parallel execution patterns for monorepo environments.
TL;DR: Monorepo testing requires affected-test detection (only run tests for packages affected by a change), test sharding (distribute tests across parallel workers), and build caching (skip unchanged packages). Use Nx or Turborepo for affected detection, configure test sharding in CI (GitHub Actions, CircleCI), and cache test results for unchanged packages.
Understanding Monorepo Testing Challenges
Traditional multi-repo testing strategies don’t scale to monorepos. The key challenges include:
Scale Challenges
Code Volume:
- Single repo with 50+ projects
- Millions of lines of code
- Thousands of dependencies
- Complex interdependencies
Test Suite Size:
- 10,000+ test files
- 100,000+ individual tests
- Hours of execution time
- Massive resource consumption
Change Impact:
- Single commit affects multiple projects
- Cascading test requirements
- Difficult to determine what to test
- Risk of over-testing or under-testing
Performance Challenges
Build Times:
- Full builds taking 60+ minutes
- Developers waiting hours for CI feedback
- Reduced productivity
- Context switching overhead
Resource Usage:
- Hundreds of concurrent CI jobs
- Expensive compute costs
- Network bandwidth saturation
- Storage requirements for artifacts
“In a monorepo, the biggest testing trap is running everything on every commit. The goal is zero unnecessary test runs — only test what could possibly be broken by this change, and test it thoroughly.” — Yuri Kan, Senior QA Lead
Fundamental Strategies
1. Affected Project Detection
Only test what changed:
// affected-detector.js
const { execSync } = require('child_process');
const fs = require('fs');
const path = require('path');
class AffectedDetector {
constructor(workspaceRoot) {
this.workspaceRoot = workspaceRoot;
this.projectGraph = this.buildProjectGraph();
}
buildProjectGraph() {
// Build dependency graph of all projects
const packages = this.discoverPackages();
const graph = new Map();
for (const pkg of packages) {
const deps = this.getPackageDependencies(pkg);
graph.set(pkg.name, {
path: pkg.path,
dependencies: deps,
dependents: []
});
}
// Build reverse dependencies (dependents)
for (const [name, data] of graph.entries()) {
for (const dep of data.dependencies) {
if (graph.has(dep)) {
graph.get(dep).dependents.push(name);
}
}
}
return graph;
}
getAffectedProjects(baseBranch = 'main') {
// Get changed files
const changedFiles = execSync(
`git diff --name-only ${baseBranch}...HEAD`,
{ encoding: 'utf-8' }
).trim().split('\n');
// Determine which projects are affected
const affected = new Set();
for (const file of changedFiles) {
const project = this.getProjectForFile(file);
if (project) {
affected.add(project);
// Add all dependent projects
this.addDependents(project, affected);
}
}
return Array.from(affected);
}
getProjectForFile(filePath) {
// Find which project owns this file
for (const [name, data] of this.projectGraph.entries()) {
if (filePath.startsWith(data.path)) {
return name;
}
}
return null;
}
addDependents(projectName, affected) {
const project = this.projectGraph.get(projectName);
if (!project) return;
for (const dependent of project.dependents) {
if (!affected.has(dependent)) {
affected.add(dependent);
// Recursively add dependents
this.addDependents(dependent, affected);
}
}
}
shouldRunE2ETests(affectedProjects) {
// Run E2E if core projects are affected
const coreProjects = ['api', 'web-app', 'auth'];
return affectedProjects.some(p => coreProjects.includes(p));
}
}
module.exports = { AffectedDetector };
2. Incremental Testing
Use caching to avoid retesting unchanged code:
# .github/workflows/monorepo-test.yml
name: Monorepo Tests
on: [push, pull_request]
jobs:
detect-affected:
runs-on: ubuntu-latest
outputs:
affected: ${{ steps.affected.outputs.projects }}
matrix: ${{ steps.affected.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for accurate diff
- name: Detect affected projects
id: affected
run: |
node scripts/detect-affected.js \
--base=origin/${{ github.base_ref || 'main' }} \
--output=json > affected.json
echo "projects=$(cat affected.json | jq -c '.projects')" >> $GITHUB_OUTPUT
echo "matrix=$(cat affected.json | jq -c '.matrix')" >> $GITHUB_OUTPUT
- name: Upload affected list
uses: actions/upload-artifact@v3
with:
name: affected-projects
path: affected.json
test:
needs: detect-affected
if: needs.detect-affected.outputs.affected != '[]'
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJson(needs.detect-affected.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- name: Restore test cache
uses: actions/cache@v3
with:
path: |
node_modules
.test-cache
key: test-${{ matrix.project }}-${{ hashFiles(format('packages/{0}/**', matrix.project)) }}
- name: Run tests for ${{ matrix.project }}
run: |
npm run test --workspace=packages/${{ matrix.project }}
- name: Upload results
uses: actions/upload-artifact@v3
if: always()
with:
name: test-results-${{ matrix.project }}
path: packages/${{ matrix.project }}/test-results/
3. Smart Test Prioritization
Run critical tests first:
// test-prioritizer.ts
interface TestPriority {
name: string;
priority: number;
estimatedDuration: number;
criticalPath: boolean;
}
class TestPrioritizer {
private testHistory: Map<string, TestHistory> = new Map();
prioritize(tests: string[]): TestPriority[] {
return tests
.map(test => ({
name: test,
priority: this.calculatePriority(test),
estimatedDuration: this.estimateDuration(test),
criticalPath: this.isCriticalPath(test)
}))
.sort((a, b) => {
// Critical path tests first
if (a.criticalPath !== b.criticalPath) {
return a.criticalPath ? -1 : 1;
}
// Then by priority
if (a.priority !== b.priority) {
return b.priority - a.priority;
}
// Finally by estimated duration (fast tests first)
return a.estimatedDuration - b.estimatedDuration;
});
}
calculatePriority(testName: string): number {
const history = this.testHistory.get(testName);
if (!history) return 50;
// Factors affecting priority:
// 1. Failure rate (higher = higher priority)
const failureRate = history.failures / history.runs;
// 2. Recency of failures
const daysSinceLastFailure = this.daysSince(history.lastFailure);
const recencyScore = Math.max(0, 100 - daysSinceLastFailure * 2);
// 3. Test flakiness (lower priority for flaky tests)
const flakinessScore = history.flakiness * -50;
return Math.min(100,
(failureRate * 100 * 0.4) +
(recencyScore * 0.4) +
flakinessScore +
20 // Base priority
);
}
isCriticalPath(testName: string): boolean {
const criticalPatterns = [
/auth/i,
/payment/i,
/security/i,
/core/i
];
return criticalPatterns.some(pattern => pattern.test(testName));
}
estimateDuration(testName: string): number {
const history = this.testHistory.get(testName);
if (!history) return 5000; // Default 5 seconds
// Use P95 duration for estimation
return history.durationP95;
}
private daysSince(date: Date): number {
const now = new Date();
return (now.getTime() - date.getTime()) / (1000 * 60 * 60 * 24);
}
}
Advanced Techniques
Distributed Test Execution
Parallelize across multiple machines:
# .github/workflows/distributed-tests.yml
name: Distributed Tests
jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.matrix.outputs.value }}
steps:
- uses: actions/checkout@v4
- name: Generate test matrix
id: matrix
run: |
# Intelligently distribute tests across runners
node scripts/generate-test-matrix.js \
--runners=20 \
--strategy=balanced \
--output=json > matrix.json
echo "value=$(cat matrix.json)" >> $GITHUB_OUTPUT
test:
needs: generate-matrix
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- name: Run test shard ${{ matrix.shard }}
run: |
# Each runner executes its assigned tests
npm run test:shard -- \
--shard=${{ matrix.shard }} \
--total=${{ matrix.total }} \
--tests="${{ matrix.tests }}"
- name: Upload shard results
uses: actions/upload-artifact@v3
if: always()
with:
name: results-shard-${{ matrix.shard }}
path: test-results/
aggregate:
needs: test
runs-on: ubuntu-latest
if: always()
steps:
- name: Download all results
uses: actions/download-artifact@v3
with:
path: all-results/
- name: Merge and report
run: |
node scripts/merge-test-results.js \
--input=all-results/ \
--output=final-report.html
# Generate summary
node scripts/summarize-results.js \
--input=all-results/ \
>> $GITHUB_STEP_SUMMARY
Smart Build Caching
Cache at multiple levels:
// build-cache-manager.ts
import crypto from 'crypto';
import fs from 'fs';
import path from 'path';
interface CacheKey {
project: string;
hash: string;
dependencies: string[];
}
class BuildCacheManager {
private cacheDir: string;
constructor(cacheDir: string) {
this.cacheDir = cacheDir;
}
computeCacheKey(project: string): CacheKey {
// Hash includes:
// 1. Project source files
const sourceHash = this.hashDirectory(`packages/${project}/src`);
// 2. Dependencies
const deps = this.getProjectDependencies(project);
const depsHash = this.hashDependencies(deps);
// 3. Configuration files
const configHash = this.hashFiles([
`packages/${project}/package.json`,
`packages/${project}/tsconfig.json`,
'.eslintrc.json',
'jest.config.js'
]);
const combinedHash = crypto
.createHash('sha256')
.update(sourceHash + depsHash + configHash)
.digest('hex')
.substring(0, 16);
return {
project,
hash: combinedHash,
dependencies: deps
};
}
async getCached(key: CacheKey): Promise<Buffer | null> {
const cachePath = path.join(
this.cacheDir,
key.project,
`${key.hash}.tar.gz`
);
if (fs.existsSync(cachePath)) {
return fs.readFileSync(cachePath);
}
return null;
}
async setCached(key: CacheKey, data: Buffer): Promise<void> {
const cachePath = path.join(
this.cacheDir,
key.project,
`${key.hash}.tar.gz`
);
fs.mkdirSync(path.dirname(cachePath), { recursive: true });
fs.writeFileSync(cachePath, data);
// Clean old cache entries
await this.cleanOldCaches(key.project, 10); // Keep last 10
}
private hashDirectory(dir: string): string {
const hash = crypto.createHash('sha256');
const files = this.getAllFiles(dir);
for (const file of files.sort()) {
const content = fs.readFileSync(file);
hash.update(content);
}
return hash.digest('hex');
}
private getAllFiles(dir: string): string[] {
if (!fs.existsSync(dir)) return [];
const files: string[] = [];
const entries = fs.readdirSync(dir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(dir, entry.name);
if (entry.isDirectory()) {
files.push(...this.getAllFiles(fullPath));
} else {
files.push(fullPath);
}
}
return files;
}
}
Test Impact Analysis
Predict which tests are likely to fail:
# test_impact_analyzer.py
from sklearn.ensemble import RandomForestClassifier
import numpy as np
class TestImpactAnalyzer:
def __init__(self):
self.model = RandomForestClassifier(n_estimators=100)
self.trained = False
def train(self, historical_data):
"""Train model on historical test failures"""
features = []
labels = []
for record in historical_data:
feature_vector = self.extract_features(record)
features.append(feature_vector)
labels.append(1 if record['failed'] else 0)
X = np.array(features)
y = np.array(labels)
self.model.fit(X, y)
self.trained = True
def extract_features(self, record):
"""Extract features from a test record"""
return [
len(record['changed_files']),
record['lines_changed'],
1 if any('test' in f for f in record['changed_files']) else 0,
1 if any('core' in f for f in record['changed_files']) else 0,
record['time_since_last_change'],
record['author_test_failure_rate'],
record['time_of_day'], # Flaky tests often fail at certain times
record['concurrent_builds'] # Resource contention indicator
]
def predict_failures(self, current_change):
"""Predict which tests are likely to fail"""
if not self.trained:
raise Exception("Model not trained")
all_tests = self.get_all_tests()
predictions = []
for test in all_tests:
features = self.extract_features({
**current_change,
'test_name': test
})
probability = self.model.predict_proba([features])[0][1]
predictions.append({
'test': test,
'failure_probability': probability
})
# Return tests sorted by failure probability
return sorted(
predictions,
key=lambda x: x['failure_probability'],
reverse=True
)
Real-World Examples
Google’s Approach: Bazel
Google uses Bazel for monorepo builds with:
Features:
- Precise dependency tracking
- Hermetic builds (fully reproducible)
- Aggressive caching
- Distributed execution
Results:
- Billions of lines of code
- Thousands of developers
- Average build time: < 10 minutes
- Cache hit rate: > 90%
Microsoft: Git Virtual File System (GVFS)
Microsoft developed GVFS for Windows repository:
Stats:
- 3.5 million files
- 300+ GB repository
- 4,000+ engineers
- Virtualized file system for scale
Meta (Facebook): Buck2
Meta’s build system optimizations:
- Incremental builds
- Remote execution
- Intelligent test selection
- Parallel execution
Impact:
- 90% reduction in test time
- Sub-minute feedback for most changes
- Massive cost savings
Best Practices
1. Establish Clear Project Boundaries
monorepo/
├── packages/
│ ├── api/ # Backend API
│ ├── web-app/ # Frontend app
│ ├── mobile/ # Mobile app
│ └── shared/ # Shared utilities
├── tools/ # Build tools
└── tests/
├── unit/ # Fast unit tests
├── integration/ # Integration tests
└── e2e/ # E2E tests (expensive)
2. Implement Progressive Testing
stages:
- name: Fast Tests
tests: [lint, unit]
timeout: 5min
on_failure: block_merge
- name: Integration Tests
tests: [integration]
timeout: 15min
on_failure: block_merge
requires: Fast Tests
- name: E2E Tests
tests: [e2e]
timeout: 30min
on_failure: notify
requires: Integration Tests
run_if: affected_projects.includes('api', 'web-app')
3. Monitor Test Health
interface TestHealthMetrics {
totalTests: number;
averageDuration: number;
flakyTestCount: number;
cacheHitRate: number;
parallelizationEfficiency: number;
}
function calculateHealth(metrics: TestHealthMetrics): number {
const weights = {
flakiness: 0.3, // Lower is better
duration: 0.2, // Lower is better
cacheHit: 0.25, // Higher is better
parallelization: 0.25 // Higher is better
};
const flakinessScore = Math.max(0, 100 - metrics.flakyTestCount * 10);
const durationScore = Math.max(0, 100 - (metrics.averageDuration / 60));
const cacheScore = metrics.cacheHitRate * 100;
const parallelScore = metrics.parallelizationEfficiency * 100;
return (
flakinessScore * weights.flakiness +
durationScore * weights.duration +
cacheScore * weights.cacheHit +
parallelScore * weights.parallelization
);
}
Conclusion
Testing a monorepo requires sophisticated strategies beyond traditional testing approaches. By implementing affected project detection, incremental testing, smart prioritization, and distributed execution, you can maintain fast feedback cycles even as your monorepo grows.
Key Takeaways:
- Only test what changed—use affected project detection
- Cache aggressively at all levels
- Distribute tests intelligently across runners
- Prioritize critical tests for fast feedback
- Monitor and continuously optimize test performance
Action Plan:
- Implement affected project detection this week
- Add incremental testing with caching
- Set up distributed test execution
- Monitor test health metrics
- Review and optimize monthly
Related Topics:
- Matrix Testing - Parallel execution strategies
- Cost Optimization - Reduce monorepo CI costs
- Flaky Test Management - Handle flakiness at scale
Remember: The goal is not to test less, but to test smarter. With proper strategies, your monorepo can provide faster feedback than multiple repositories while maintaining comprehensive test coverage.
Official Resources
FAQ
What is affected-test detection in monorepos?
Affected-test detection identifies which packages and their downstream dependents are impacted by a given code change, so only their tests are run. Tools like Nx use a dependency graph to calculate the affected set: if you change package A, and packages B and C depend on A, tests for A, B, and C run, but not packages D, E, F.
How do I set up test sharding in a monorepo CI pipeline?
Split tests across parallel jobs based on test count or execution time. Nx Cloud and Turborepo Remote Cache support intelligent task distribution. For GitHub Actions, use matrix strategy with SHARD and TOTAL_SHARDS environment variables. Aim for equal execution time per shard, not equal test count.
What are the challenges of E2E testing in monorepos?
E2E tests in monorepos face: which version of each package to test together (version coordination), how to spin up dependent services for each E2E test (service orchestration), and how to avoid E2E test isolation failures when services share databases or external resources.
How do I enforce test isolation between packages in a monorepo?
Use separate test databases per package (prefixed tables or separate schemas), avoid global state in shared utilities, use dependency injection over module-level singletons, and run package tests in separate processes to prevent module registry contamination. Tools like Jest’s –projects config enable isolated package testing.
See Also
- Docker Image Testing and Security: Complete Guide to Container Vulnerability Scanning - Master Docker image security with Trivy, Snyk, and Grype. Learn…
- Cost Estimation Testing for Infrastructure as Code: Complete Guide - Master cost estimation testing for IaC with Infracost, terraform…
- Matrix Testing in CI/CD Pipelines - Matrix Testing in CI/CD Pipelines: comprehensive guide covering…
- Feature Flag Testing in CI/CD: Complete Implementation Guide - Feature Flag Testing in CI/CD: comprehensive guide covering best…
