CockroachDB: The Ultimate Reference Guide for Developers

Introduction to CockroachDB

CockroachDB is a distributed SQL database designed for global, cloud-native applications that require high availability, strong consistency, and scale. Built from the ground up to support zero-downtime deployments, survive disasters, and scale horizontally with minimal operational overhead, CockroachDB combines the familiarity of SQL with the resilience and scalability of a cloud-native architecture. Its unique architecture enables global data distribution while maintaining ACID transactions, making it ideal for applications requiring both relational data models and geographical distribution.

Core Concepts and Principles

Architectural Components

  • Node: Individual instances of CockroachDB forming the cluster
  • Range: Contiguous chunks of the key-space (64MB by default)
  • Replica: Copy of a range, distributed across nodes for redundancy
  • Raft: Consensus protocol used to maintain consistency between replicas
  • Store: Physical location on disk where a node stores replicas
  • Locality: Physical locations of nodes (used for data placement)
  • SQL Layer: Provides SQL interface for interacting with the database

Key Differentiators

  • Distributed by Default: Scales horizontally by adding nodes
  • Survivability: Designed to survive machine, rack, and data center failures
  • Consistency: Provides serializable isolation, the highest ANSI isolation level
  • Geo-Partitioning: Allows data to reside in specific geographic locations
  • SQL Compatibility: Highly compatible with PostgreSQL

Consistency Model

  • Strict Serializability: Transactions appear to execute in some serial order
  • Consensus: Uses Raft to achieve consensus among replicas
  • Replication Factor: Typically 3 or 5 replicas for fault tolerance
  • Read/Write Quorum: Majority of replicas must acknowledge operations
  • TrueTime: Google Spanner-inspired approach using atomic clocks (in enterprise version)

CockroachDB Deployment

Installation Methods

MethodCommand/ProcessBest For
Binarycurl https://binaries.cockroachdb.com/cockroach-{version}.linux-amd64.tgz | tar -xzDevelopment, testing
Dockerdocker pull cockroachdb/cockroach:v22.2.0Container environments
KubernetesUsing Helm chart or OperatorProduction clusters
CloudCockroachDB Dedicated/ServerlessManaged service

Cluster Setup

# Start first node
cockroach start --insecure --store=node1 --listen-addr=localhost:26257 \
--http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259

# Start additional nodes (different ports)
cockroach start --insecure --store=node2 --listen-addr=localhost:26258 \
--http-addr=localhost:8081 --join=localhost:26257,localhost:26258,localhost:26259

# Initialize the cluster
cockroach init --insecure --host=localhost:26257

Secure Cluster Setup

# Create certificates
cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-node localhost $(hostname) --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key

# Start secure node
cockroach start --certs-dir=certs --store=node1 --listen-addr=localhost:26257 \
--http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259

SQL Basics and Data Management

Database Operations

-- Create a database
CREATE DATABASE myapp;

-- Show databases
SHOW DATABASES;

-- Use a database
USE myapp;

-- Show tables
SHOW TABLES;

-- Show schema of a table
SHOW CREATE TABLE users;

-- Show indexes
SHOW INDEXES FROM users;

Table Operations

-- Create table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name STRING NOT NULL,
    email STRING UNIQUE,
    created_at TIMESTAMPTZ DEFAULT now(),
    INDEX (name),
    FAMILY user_info (id, name, email, created_at)
);

-- Alter table
ALTER TABLE users ADD COLUMN active BOOL DEFAULT true;

-- Create index
CREATE INDEX idx_user_email ON users (email);

-- Optimized index for specific queries
CREATE INDEX idx_users_name_email ON users (name) STORING (email);

Data Manipulation

-- Insert data
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');

-- Insert multiple rows
INSERT INTO users (name, email) VALUES 
    ('Bob', 'bob@example.com'),
    ('Charlie', 'charlie@example.com');

-- Update data
UPDATE users SET active = false WHERE email = 'alice@example.com';

-- Delete data
DELETE FROM users WHERE email = 'bob@example.com';

-- Transactions
BEGIN;
INSERT INTO accounts (id, balance) VALUES (1, 1000), (2, 0);
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Query Performance

-- Explain a query
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

-- Analyze with execution statistics
EXPLAIN ANALYZE SELECT * FROM users WHERE name LIKE 'A%';

-- Show extended query plan
EXPLAIN (VERBOSE) SELECT u.name, o.order_date 
FROM users u JOIN orders o ON u.id = o.user_id;

Advanced Features

Geo-Partitioning (Enterprise Feature)

-- Define regions
ALTER DATABASE myapp CONFIGURE ZONE USING num_replicas=5;

-- Create table with geo-partitioning
CREATE TABLE users (
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    name STRING NOT NULL,
    email STRING,
    region STRING NOT NULL,
    PRIMARY KEY (region, id)
) PARTITION BY LIST (region) (
    PARTITION us VALUES IN ('us-east', 'us-west'),
    PARTITION eu VALUES IN ('eu-west', 'eu-central'),
    PARTITION ap VALUES IN ('ap-east', 'ap-south')
);

-- Configure zone for partitions
ALTER PARTITION us OF TABLE users CONFIGURE ZONE USING
    constraints = '[+region=us]';
ALTER PARTITION eu OF TABLE users CONFIGURE ZONE USING
    constraints = '[+region=eu]';

Change Data Capture (CDC)

-- Create a changefeed
CREATE CHANGEFEED FOR TABLE orders INTO 'kafka://broker:9092';

-- CDC with options
CREATE CHANGEFEED FOR TABLE orders 
INTO 'webhook-https://example.com/webhook'
WITH updated, resolved='10s';

JSON Support

-- Create table with JSONB
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    data JSONB
);

-- Insert JSON data
INSERT INTO documents (data) VALUES ('{"name": "Alice", "tags": ["admin", "user"]}');

-- Query JSON fields
SELECT data->>'name' AS name FROM documents;

-- Query with JSON conditions
SELECT * FROM documents 
WHERE data->'tags' ? 'admin' AND (data->>'name') = 'Alice';

-- Update JSON
UPDATE documents SET data = jsonb_set(data, '{name}', '"Bob"');

Time-Travel Queries

-- Query data as of a specific time
SELECT * FROM users AS OF SYSTEM TIME '-10m';

-- Get data at a timestamp
SELECT * FROM users AS OF SYSTEM TIME '2023-04-15 10:00:00';

-- Restore from a previous state
CREATE TABLE users_restore AS SELECT * FROM users AS OF SYSTEM TIME '-1h';

Performance Optimization Techniques

Schema Design Best Practices

  • Use UUID or sequences for primary keys instead of natural keys
  • Keep column family size under 4KB
  • Use interleaved tables for parent-child relationships with frequent joins
  • Use appropriate data types (INT vs. BIGINT, STRING vs. fixed CHAR)
  • Create indexes for frequently filtered or joined columns

Indexing Strategies

Index TypeUse CaseExample
PrimaryUnique identifierPRIMARY KEY (id)
SecondaryFilter/sortCREATE INDEX idx_user_name ON users (name)
CompositeMultiple columnsCREATE INDEX idx_user_region_name ON users (region, name)
CoveringAvoid lookupsCREATE INDEX idx_cover ON users (email) STORING (name, created_at)
UniqueEnforcing uniquenessCREATE UNIQUE INDEX idx_unique_email ON users (email)
InvertedJSONB fieldsCREATE INVERTED INDEX idx_data ON documents (data)

Query Optimization

  • Use prepared statements for repetitive queries
  • Filter on indexed columns in the WHERE clause
  • Avoid SELECT * when only specific columns are needed
  • Use EXPLAIN ANALYZE to identify performance bottlenecks
  • Consider LIMIT with large result sets
  • Use JOIN hints when the optimizer chooses suboptimal plans

Common Bottlenecks and Solutions

IssueSignsSolution
Hot rangesRange contention alertsReview primary key design, add nodes
Inefficient queriesHigh latencyReview and optimize queries with EXPLAIN
Insufficient memoryNode OOM errorsIncrease node memory, optimize queries
Network issuesConnection timeoutsCheck network latency, optimize locality
Disk I/O bottlenecksHigh disk utilizationUpgrade disk, optimize queries, add nodes

Administration and Operations

Cluster Management

# Monitor nodes
cockroach node status --insecure

# Decommission a node
cockroach node decommission 3 --insecure

# List ranges
cockroach range list --insecure

# Backup a cluster
cockroach backup DATABASE myapp TO 'azure://backup/myapp' \
WITH revision_history;

# Restore from backup
cockroach restore DATABASE myapp FROM 'azure://backup/myapp' \
AS OF SYSTEM TIME '2023-05-01 15:00:00';

User Management

-- Create user
CREATE USER myuser WITH PASSWORD 'securepassword';

-- Grant privileges
GRANT SELECT, INSERT ON DATABASE myapp TO myuser;
GRANT ALL ON TABLE users TO myuser;

-- Create role
CREATE ROLE readonly;
GRANT SELECT ON DATABASE myapp TO readonly;
GRANT readonly TO myuser;

-- Show grants
SHOW GRANTS ON DATABASE myapp;
SHOW GRANTS FOR myuser;

Monitoring and Logging

  • DB Console: Web UI available at http://localhost:8080
  • Prometheus Integration: Exposes metrics at /metrics endpoint
  • Logs: Located at cockroach-data/logs by default
  • Health Endpoints: HTTP endpoint at /health
  • Key Metrics to Monitor:
    • CPU/Memory utilization
    • Disk IOPS and space
    • Network bandwidth
    • P99 query latency
    • Raft proposal success rate
    • Range replication status

Common CLI Commands

# Start SQL shell
cockroach sql --insecure

# Execute SQL file
cockroach sql --insecure < schema.sql

# Dump schema
cockroach dump myapp --insecure > myapp_schema.sql

# Import CSV
cockroach import csv DATA 'file:///path/to/data.csv' WITH delimiter = ',';

# Generate cert
cockroach cert create-client username --certs-dir=certs --ca-key=ca.key

Troubleshooting Guide

Common Issues and Solutions

IssuePossible CausesSolutions
Node won’t startPort conflicts, permissionsCheck ports, disk permissions
Cluster won’t initializeJoin flags incorrectVerify –join flags on all nodes
Slow queriesMissing indexes, large data setEXPLAIN ANALYZE, add indexes
Replication failuresNetwork issues, disk spaceCheck connectivity, disk space
Connection timeoutsFirewall, overloaded nodesCheck network rules, scale cluster
Out of memoryQuery complexity, insufficient RAMOptimize queries, increase memory

Diagnostic Commands

-- Check node status
SELECT * FROM crdb_internal.gossip_nodes;

-- Examine ranges
SELECT * FROM crdb_internal.ranges;

-- Identify slow queries
SELECT * FROM crdb_internal.node_statement_statistics 
ORDER BY service_lat DESC LIMIT 10;

-- Check table statistics
SHOW STATISTICS FOR TABLE users;

-- Verify index usage
SELECT * FROM crdb_internal.index_usage_statistics 
WHERE table_name = 'users';

Resources for Further Learning

Official Documentation

  • CockroachDB Docs: https://www.cockroachlabs.com/docs/
  • GitHub Repository: https://github.com/cockroachdb/cockroach
  • Community Forum: https://forum.cockroachlabs.com/

Learning Resources

  • CockroachDB University: https://university.cockroachlabs.com/
  • CockroachDB Blog: https://www.cockroachlabs.com/blog/
  • Webinars and Workshops: https://www.cockroachlabs.com/events/

Tools and Integrations

  • ORMs: GORM, SQLAlchemy, Django ORM, Hibernate
  • Connection Poolers: PgBouncer
  • Migration Tools: Liquibase, Flyway
  • Observability: Prometheus, Grafana
  • CI/CD: Kubernetes Operators, Helm Charts

This comprehensive cheatsheet provides a structured overview of CockroachDB’s core concepts, deployment options, management commands, and performance optimization techniques. While covering both basic and advanced features, it serves as a quick reference for developers and database administrators working with CockroachDB in various environments.

Scroll to Top