Database Indexing Complete Reference Guide & Cheatsheet

What is Database Indexing?

Database indexing is a data structure technique that improves the speed of data retrieval operations on database tables. Think of it like a book’s index – instead of reading every page to find a topic, you use the index to jump directly to the relevant pages. Indexes create shortcuts to your data, dramatically reducing query execution time from potentially scanning millions of rows to finding exact matches in milliseconds.

Why Database Indexing Matters:

  • Performance: Reduces query execution time from seconds/minutes to milliseconds
  • Scalability: Enables applications to handle growing data volumes efficiently
  • User Experience: Faster page loads and responsive applications
  • Resource Optimization: Reduces CPU usage and memory consumption
  • Cost Savings: Lower infrastructure costs through improved efficiency

Core Concepts & Principles

Fundamental Index Components

Index Structure: Most indexes use B-tree (balanced tree) structures that maintain sorted data and provide logarithmic search time O(log n).

Key Components:

  • Index Key: The column(s) used to create the index
  • Row Locator: Pointer to the actual data row location
  • Index Pages: Physical storage units containing index entries
  • Root/Leaf Nodes: Tree structure components for navigation

How Indexes Work

  1. Without Index: Database scans entire table sequentially (Table Scan)
  2. With Index: Database uses index tree to locate data directly (Index Seek)
  3. Index Lookup: Additional step to retrieve non-indexed columns from actual table

Index Types & Categories

Primary Index Types

Index TypeDescriptionUse CasePerformance
ClusteredPhysical storage order matches index orderPrimary keys, range queriesFastest for range scans
Non-ClusteredSeparate structure pointing to data rowsFrequently queried columnsFast for exact matches
UniqueEnsures no duplicate valuesEmail addresses, usernamesFast + data integrity
CompositeMultiple columns in single indexMulti-column WHERE clausesEfficient for combined filters

Specialized Index Types

Partial Indexes: Index only subset of rows meeting specific conditions

CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Functional Indexes: Index based on expression or function result

CREATE INDEX idx_upper_lastname ON users(UPPER(last_name));

Covering Indexes: Include all columns needed for query (no table lookup required)

CREATE INDEX idx_user_details ON users(user_id) INCLUDE (name, email, status);

Step-by-Step Index Creation Process

1. Analysis Phase

  • Identify Slow Queries: Use query execution plans and performance monitoring
  • Analyze WHERE Clauses: Find frequently filtered columns
  • Review JOIN Conditions: Identify foreign key relationships
  • Check ORDER BY: Find frequently sorted columns

2. Index Design Phase

  • Choose Index Type: Clustered vs Non-clustered based on usage
  • Select Key Columns: Most selective columns first in composite indexes
  • Consider Column Order: Place most selective columns leftmost
  • Plan for Coverage: Include frequently accessed columns

3. Implementation Phase

-- Basic index creation
CREATE INDEX idx_customer_lastname ON customers(last_name);

-- Composite index with optimal column order  
CREATE INDEX idx_order_search ON orders(customer_id, order_date, status);

-- Covering index for complete query optimization
CREATE INDEX idx_product_lookup ON products(category_id) 
INCLUDE (product_name, price, description);

4. Testing & Validation Phase

  • Compare Execution Plans: Before and after index creation
  • Measure Query Performance: Use actual execution times
  • Monitor Index Usage: Track index utilization statistics
  • Validate Data Integrity: Ensure results remain consistent

Index Optimization Techniques

Column Selection Strategies

Selectivity Analysis: Choose columns that filter out the most rows

-- High selectivity (good for indexing)
SELECT COUNT(DISTINCT email) / COUNT(*) FROM users; -- Result: 0.95+

-- Low selectivity (poor for indexing)  
SELECT COUNT(DISTINCT gender) / COUNT(*) FROM users; -- Result: 0.5

Composite Index Column Ordering:

  1. Equality Conditions: Columns with = operators first
  2. Most Selective: Highest cardinality columns early
  3. Range Conditions: Range filters last in composite indexes

Performance Optimization Methods

TechniquePurposeImplementation
Index HintsForce specific index usageSELECT * FROM users WITH (INDEX(idx_lastname))
Partial ScansLimit index scan rangeUse BETWEEN, <, > operators effectively
Index IntersectionCombine multiple single-column indexesLet optimizer use multiple indexes together
Statistics UpdatesMaintain accurate cardinality estimatesUPDATE STATISTICS table_name

Common Challenges & Solutions

Challenge 1: Over-Indexing

Problem: Too many indexes slow down INSERT/UPDATE/DELETE operations Solution:

  • Audit index usage regularly using system views
  • Remove unused indexes (< 5% utilization)
  • Consolidate overlapping indexes into composite indexes

Challenge 2: Index Fragmentation

Problem: B-tree structure becomes inefficient over time Solution:

-- Check fragmentation level
SELECT avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats();

-- Rebuild highly fragmented indexes (>30%)
ALTER INDEX idx_name ON table_name REBUILD;

-- Reorganize moderately fragmented indexes (5-30%)  
ALTER INDEX idx_name ON table_name REORGANIZE;

Challenge 3: Composite Index Column Order

Problem: Wrong column order makes index ineffective Solution: Follow the “Most Selective First” rule

-- Instead of this (less selective first)
CREATE INDEX bad_idx ON orders(status, customer_id, order_date);

-- Use this (most selective first)
CREATE INDEX good_idx ON orders(customer_id, order_date, status);

Challenge 4: Missing Index Scenarios

Problem: Queries still slow despite having indexes Solution:

  • Check for functions in WHERE clauses (breaks index usage)
  • Verify data type matching between columns and parameters
  • Ensure leading column of composite index is used in WHERE clause

Best Practices & Practical Tips

Index Creation Best Practices

DO:

  • Create indexes on foreign key columns used in JOINs
  • Index columns frequently used in WHERE, ORDER BY, GROUP BY clauses
  • Use covering indexes for frequently executed queries
  • Monitor index usage and remove unused indexes
  • Create composite indexes with proper column ordering

DON’T:

  • Index every column “just in case”
  • Create indexes on small tables (< 1000 rows)
  • Index columns with low selectivity (gender, boolean flags)
  • Ignore maintenance overhead on write-heavy tables

Performance Monitoring Tips

Key Metrics to Track:

  • Query execution time improvements
  • Index usage statistics and scan ratios
  • Index fragmentation levels
  • Storage space consumption
  • Impact on INSERT/UPDATE/DELETE performance

Useful Queries for Index Management:

-- Find unused indexes
SELECT s.name, i.name, user_seeks, user_scans, user_lookups
FROM sys.dm_db_index_usage_stats us
RIGHT JOIN sys.indexes i ON us.object_id = i.object_id 
JOIN sys.tables s ON i.object_id = s.object_id
WHERE user_seeks = 0 AND user_scans = 0 AND user_lookups = 0;

-- Identify missing indexes
SELECT mid.statement, migs.avg_user_impact, migs.user_seeks
FROM sys.dm_db_missing_index_group_stats migs
JOIN sys.dm_db_missing_index_details mid 
ON migs.group_handle = mid.index_handle
ORDER BY migs.avg_user_impact DESC;

Maintenance Schedule Recommendations

FrequencyTaskPurpose
DailyMonitor slow query logIdentify performance issues early
WeeklyCheck index fragmentationPlan rebuilding activities
MonthlyReview index usage statsRemove unused indexes
QuarterlyFull index auditOptimize entire indexing strategy

Database-Specific Considerations

MySQL Indexing

  • InnoDB: Clustered indexes mandatory (PRIMARY KEY)
  • Index Hints: USE INDEX, FORCE INDEX, IGNORE INDEX
  • Prefix Indexing: CREATE INDEX idx_name ON table(column(10))

PostgreSQL Indexing

  • Multiple Index Types: B-tree, Hash, GiST, GIN, BRIN
  • Partial Indexes: Highly efficient for conditional data
  • Expression Indexes: Index computed values

SQL Server Indexing

  • Clustered vs Non-Clustered: One clustered per table, 999 non-clustered max
  • Included Columns: INCLUDE clause for covering indexes
  • Filtered Indexes: WHERE clause in index definition

Quick Reference Commands

Essential SQL Commands

-- Create basic index
CREATE INDEX idx_name ON table_name(column_name);

-- Create composite index  
CREATE INDEX idx_name ON table_name(col1, col2, col3);

-- Create unique index
CREATE UNIQUE INDEX idx_name ON table_name(column_name);

-- Drop index
DROP INDEX idx_name ON table_name;

-- Show execution plan
EXPLAIN SELECT * FROM table_name WHERE column_name = 'value';

Performance Analysis Commands

-- MySQL: Show index usage
SHOW INDEX FROM table_name;

-- PostgreSQL: Index size and usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes;

-- SQL Server: Index fragmentation
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('table_name'), NULL, NULL, NULL);

Resources for Further Learning

Documentation & References

Tools & Utilities

  • Database Execution Plan Analyzers: Built-in EXPLAIN tools
  • Performance Monitoring: pg_stat_statements (PostgreSQL), Performance Schema (MySQL)
  • Index Advisors: Database-specific automated recommendation tools

Advanced Topics to Explore

  • Partitioned Indexes: For very large tables
  • Columnstore Indexes: For analytical workloads
  • Spatial Indexes: For geographic data
  • Full-Text Indexes: For search functionality
  • Memory-Optimized Indexes: For in-memory databases

Last Updated: May 2025 | This cheatsheet covers fundamental to intermediate database indexing concepts applicable across major RDBMS platforms.

Scroll to Top