Introduction to Selenium Grid 4
Selenium Grid 4 represents a complete architectural redesign of the distributed test execution framework that has been the backbone of parallel browser testing for over a decade. Unlike its predecessors, Grid 4 introduces modern observability features, a fully asynchronous communication layer, and native support for containerized deployments.
The fourth major version addresses critical pain points from Grid 3: difficult troubleshooting, limited scalability, and complex setup procedures. With built-in support for Docker, Kubernetes, GraphQL APIs, and OpenTelemetry tracing, Grid 4 brings enterprise-grade reliability to open-source test infrastructure.
This comprehensive guide explores Grid 4’s architecture, deployment strategies, advanced configuration options, and real-world implementation patterns for teams scaling browser automation from dozens to thousands of concurrent sessions.
Architectural Evolution
From Hub-Node to Distributed Components
Selenium Grid 4 breaks the monolithic Hub-Node model into six specialized components that can be deployed independently or combined:
Router: Entry point for all WebDriver commands, distributes requests to appropriate services
Distributor: Manages node registration and assigns new session requests to available nodes based on capabilities
Session Map: Maintains the mapping between session IDs and the nodes executing those sessions
New Session Queue: Buffers incoming session requests when all nodes are busy, implementing FIFO queueing
Event Bus: Asynchronous message broker (HTTP, Redis, RabbitMQ, or Kafka) for inter-component communication
Node: Executes WebDriver commands on actual browser instances
This microservices-inspired architecture enables:
- Horizontal scaling of individual bottleneck components
- Zero-downtime rolling updates
- Cloud-native deployment patterns
- Better fault isolation
Standalone, Hub-Node, and Fully Distributed Modes
Grid 4 supports three deployment topologies:
Mode | Use Case | Components | Scalability |
---|---|---|---|
Standalone | Local development, CI pipelines | All-in-one process | Single machine |
Hub | Small-to-medium teams | Hub + Nodes | Vertical scaling |
Distributed | Enterprise deployments | Independent components | Horizontal scaling |
Standalone mode combines all components in a single JVM process, ideal for Docker Compose setups or GitHub Actions workflows. Hub mode groups Router, Distributor, Session Map, and Queue into a single Hub process while Nodes run separately. Fully distributed mode deploys each component independently for maximum flexibility.
GraphQL API for Grid Introspection
One of Grid 4’s most powerful features is its GraphQL endpoint that exposes real-time grid state, session information, and node capabilities.
Querying Grid Status
The GraphQL interface at /graphql
provides rich metadata about grid health:
{
grid {
totalSlots
usedSlots
sessionCount
maxSession
nodes {
id
status
uri
slots {
stereotype
sessions {
id
capabilities
startTime
}
}
}
}
}
This enables building custom dashboards, capacity planning tools, and integration with monitoring systems like Grafana. Unlike Grid 3’s limited JSON status endpoint, the GraphQL API allows clients to request exactly the data they need.
Dynamic Capability Discovery
Teams can query available browser versions and platform combinations programmatically:
{
grid {
nodes {
osInfo {
name
version
arch
}
slots {
stereotype
}
}
}
}
This is invaluable for test frameworks that need to discover capabilities dynamically or validate test matrix configurations against actual grid capacity.
Docker and Kubernetes Deployment
Official Docker Images
Selenium Project maintains regularly updated Docker images for all Grid components:
# docker-compose.yml for Hub-Node topology
version: "3"
services:
selenium-hub:
image: selenium/hub:4.15.0
ports:
- "4444:4444"
environment:
- SE_SESSION_REQUEST_TIMEOUT=300
- SE_NODE_SESSION_TIMEOUT=300
chrome:
image: selenium/node-chrome:4.15.0
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_MAX_SESSIONS=3
firefox:
image: selenium/node-firefox:4.15.0
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_MAX_SESSIONS=3
Each browser node includes VNC server for live debugging (selenium/node-chrome-debug:4.15.0
variants), and video recording is available through standalone images with selenium/video:latest
sidecar containers.
Kubernetes with Helm Charts
For production-scale deployments, the official Helm chart provides declarative configuration:
helm repo add selenium https://www.selenium.dev/docker-selenium
helm install selenium-grid selenium/selenium-grid \
--set isolateComponents=true \
--set chromeNode.replicas=5 \
--set firefoxNode.replicas=3 \
--set edgeNode.replicas=2
The chart supports:
- Autoscaling with KEDA (Kubernetes Event-Driven Autoscaling)
- Persistent session recording storage
- Ingress configuration for external access
- Sidecar containers for observability agents
Observability with OpenTelemetry
Grid 4’s integration with OpenTelemetry provides distributed tracing across all components, enabling visibility into request flows from client to browser execution.
Tracing Configuration
Enable tracing by setting environment variables:
SE_ENABLE_TRACING=true
SE_TRACING_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
OTEL_SERVICE_NAME=selenium-grid
This exports traces to OTLP-compatible backends like Jaeger, Zipkin, or commercial APM tools. Each WebDriver command generates spans showing:
- Session creation time
- Node selection latency
- Command execution duration
- Network round-trip times
Integration with Monitoring Stacks
Grid 4 exposes Prometheus metrics at /metrics
:
# EXAMPLE
selenium_grid_sessions_active 12
selenium_grid_sessions_queued 3
selenium_grid_node_count 8
selenium_grid_slot_utilization 0.75
Combined with Grafana dashboards, teams gain real-time visibility into grid performance, capacity planning data, and failure rate analysis.
Advanced Configuration
Session Request Timeout and Retry Policies
Grid 4 introduces fine-grained timeout controls:
# Hub/Router configuration
--session-request-timeout 300 # Wait 5min for available slot
--session-retry-interval 5 # Check for slots every 5sec
--healthcheck-interval 60 # Node health check frequency
# Node configuration
--heartbeat-period 30 # Report status every 30sec
--register-period 60 # Re-register if disconnected
--drain-after-session-count 100 # Auto-restart after 100 sessions
The --drain-after-session-count
setting is particularly useful for preventing memory leaks in long-running nodes by gracefully replacing them after a configured number of sessions.
Custom Capability Matchers
For complex browser configurations (specific Chrome flags, custom profiles), Grid 4 allows custom capability matching logic:
public class CustomCapabilityMatcher implements CapabilityMatcher {
@Override
public boolean matches(Capabilities nodeCapabilities,
Capabilities requestedCapabilities) {
// Custom logic for specialized browser configurations
String requiredExtension = (String) requestedCapabilities
.getCapability("customExtension");
return nodeCapabilities.getCapability("availableExtensions")
.toString().contains(requiredExtension);
}
}
This enables routing tests requiring specific browser extensions, locale settings, or performance profiles to appropriately configured nodes.
Relay Configuration for Existing Nodes
Grid 4 can integrate with external Selenium servers (Sauce Labs, BrowserStack, legacy Grid 3 nodes) using relay configuration:
java -jar selenium-server.jar relay \
--service-url "https://ondemand.us-west-1.saucelabs.com:443/wd/hub" \
--config relay-sauce.toml
This allows hybrid deployments where some browsers run locally while others use cloud providers, all accessible through a single Grid endpoint.
Comparison with Alternatives
Feature | Selenium Grid 4 | Selenoid | Moon (Aerokube) | Zalenium |
---|---|---|---|---|
Protocol Support | WebDriver, CDP | WebDriver, CDP | WebDriver, CDP, Playwright | WebDriver |
Browser Video | Via Docker sidecar | Built-in | Built-in | Built-in |
Kubernetes Native | Helm chart | Yes | Yes | Deprecated |
GraphQL API | ✅ Yes | ❌ No | ❌ No | ❌ No |
OpenTelemetry | ✅ Native | ❌ Manual | ✅ Native | ❌ No |
Active Development | ✅ Official Selenium | ✅ Active | ✅ Active | ❌ Archived |
License | Apache 2.0 | Apache 2.0 | Commercial | Apache 2.0 |
Selenoid offers faster startup times and lower resource usage through direct container management but lacks Grid 4’s observability features.
Moon is Aerokube’s commercial Kubernetes-native solution with advanced features like browser caching and built-in VNC/video but requires a paid license.
Zalenium (now archived) pioneered Docker-based grid deployment but has been superseded by official Selenium Docker images.
Pricing and Licensing
Selenium Grid 4: Completely free and open-source (Apache 2.0 license). No feature limitations, commercial use allowed.
Infrastructure costs depend on deployment model:
- Cloud VMs: $50-500/month for small-medium grids (AWS EC2, GCP Compute)
- Kubernetes Clusters: $100-2000/month depending on scale (EKS, GKE, AKS)
- Managed Selenium Services: $150-1500/month (Grid 4-compatible providers)
Commercial support is available through:
- Sauce Labs: Grid-compatible cloud execution starting at $149/month
- BrowserStack: Grid-compatible infrastructure starting at $99/month
- Consulting firms: Implementation and optimization services ($150-250/hour)
Integration Examples
CI/CD Pipeline Integration
GitHub Actions workflow running tests against Grid:
name: E2E Tests
on: [push]
jobs:
test:
runs-on: ubuntu-latest
services:
selenium-hub:
image: selenium/hub:4.15.0
ports:
- 4444:4444
chrome:
image: selenium/node-chrome:4.15.0
env:
SE_EVENT_BUS_HOST: selenium-hub
SE_EVENT_BUS_PUBLISH_PORT: 4442
SE_EVENT_BUS_SUBSCRIBE_PORT: 4443
steps:
- uses: actions/checkout@v3
- name: Run tests
run: mvn test -Dselenium.grid.url=http://localhost:4444
Test Framework Configuration
Configure test frameworks to use Grid:
Java (Selenium 4):
RemoteWebDriver driver = new RemoteWebDriver(
new URL("http://grid:4444"),
new ChromeOptions()
);
Python (pytest-selenium):
@pytest.fixture
def selenium(selenium):
selenium.command_executor._url = "http://grid:4444/wd/hub"
return selenium
JavaScript (WebdriverIO):
exports.config = {
hostname: 'grid',
port: 4444,
path: '/wd/hub',
capabilities: [{
browserName: 'chrome'
}]
}
Best Practices
Capacity Planning
Calculate required Grid capacity using:
Required Nodes = (Total Tests × Avg Test Duration) / (Target Completion Time × Sessions per Node)
For 1000 tests averaging 3 minutes each, targeting 30-minute completion with 5 sessions per node:
Required Nodes = (1000 × 3) / (30 × 5) = 20 nodes
Add 20% buffer for failures and queue spikes.
Node Stability
Implement node lifecycle management:
- Set
--drain-after-session-count
to prevent memory leaks - Configure health checks with reasonable timeouts
- Use node labels for routing tests to specialized configurations
- Monitor disk space for logs and video recordings
Security Considerations
Grid 4 has no built-in authentication. Production deployments should:
- Deploy behind a reverse proxy with authentication (Nginx, Traefik)
- Use network segmentation to isolate grid components
- Implement rate limiting to prevent resource exhaustion
- Scan browser images for vulnerabilities regularly
Conclusion
Selenium Grid 4 modernizes distributed test execution with cloud-native architecture, comprehensive observability, and production-ready deployment patterns. The GraphQL API, OpenTelemetry integration, and microservices design make it suitable for organizations running thousands of daily test executions.
While alternatives like Selenoid offer performance advantages in specific scenarios, Grid 4’s official status, active development, and rich ecosystem make it the default choice for teams already invested in Selenium WebDriver. For greenfield projects, evaluate Grid 4 alongside cloud execution platforms and newer protocols like Playwright’s built-in parallelization to determine the best fit for your architecture and scale requirements.