Introduction
This cheat sheet offers a comprehensive guide to data visualization in Python using both Matplotlib and Seaborn. Matplotlib is a versatile, low-level plotting library that gives you precise control over every element of a visualization. Seaborn, built on top of Matplotlib, provides a high-level interface focused on statistical visualizations with aesthetic appeal and simpler syntax.
Setting Up
Importing Libraries
# Matplotlib
import matplotlib.pyplot as plt
import matplotlib as mpl
# Seaborn
import seaborn as sns
# Common data libraries
import numpy as np
import pandas as pd
Basic Plot Setup – Matplotlib
# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 6))
# Create a figure with multiple subplots
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
# Using the OO interface
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111) # Single subplot
Basic Plot Setup – Seaborn
# Set default Seaborn theme
sns.set_theme()
# Common themes
sns.set_style("whitegrid") # Options: darkgrid, whitegrid, dark, white, ticks
sns.set_context("notebook") # Options: paper, notebook, talk, poster
Matplotlib Core Plot Types
Line Plots
# Simple line plot
plt.plot(x, y)
# Multiple lines with labels
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
# Customized line
plt.plot(x, y, color='blue', linestyle='--', linewidth=2, marker='o', markersize=8)
Scatter Plots
# Simple scatter plot
plt.scatter(x, y)
# Scatter with varying size and color
plt.scatter(x, y, s=sizes, c=colors, alpha=0.5, cmap='viridis')
plt.colorbar() # Add colorbar for reference
Bar Charts
# Vertical bar chart
plt.bar(x, height, width=0.8)
# Horizontal bar chart
plt.barh(y, width, height=0.8)
# Stacked bar chart
plt.bar(x, y1)
plt.bar(x, y2, bottom=y1)
Histograms
# Simple histogram
plt.hist(data, bins=30)
# Normalized histogram with custom bins
plt.hist(data, bins=bins, density=True, alpha=0.7)
# Multiple histograms
plt.hist([data1, data2], bins=30, label=['Data 1', 'Data 2'], alpha=0.7)
plt.legend()
Pie Charts
# Simple pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
# Exploded pie chart with custom colors
plt.pie(sizes, labels=labels, explode=explode, colors=colors,
autopct='%1.1f%%', shadow=True)
Box Plots
# Single box plot
plt.boxplot(data)
# Multiple box plots
plt.boxplot([data1, data2, data3], labels=['Group 1', 'Group 2', 'Group 3'])
Heatmaps
# Create a heatmap
plt.imshow(data, cmap='viridis')
plt.colorbar()
# With specific x and y labels
plt.imshow(data, cmap='viridis')
plt.xticks(range(len(x_labels)), x_labels)
plt.yticks(range(len(y_labels)), y_labels)
plt.colorbar()
3D Plots
from mpl_toolkits.mplot3d import Axes3D
# 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c=colors, marker='o')
# 3D surface plot
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(X, Y, Z, cmap='viridis')
fig.colorbar(surf)
Seaborn Core Plot Types
Relational Plots
# Scatter plot
sns.scatterplot(x='column1', y='column2', data=df)
# Line plot
sns.lineplot(x='column1', y='column2', data=df)
# Scatter plot with additional dimensions
sns.scatterplot(x='column1', y='column2', hue='category', size='value',
style='group', data=df)
# Relational plot with facets
sns.relplot(x='column1', y='column2', hue='category',
col='group1', row='group2', data=df, kind='scatter')
Distribution Plots
# Histogram
sns.histplot(data=df, x='column', bins=30)
# Kernel density estimate (KDE)
sns.kdeplot(data=df, x='column')
# Both histogram and KDE
sns.histplot(data=df, x='column', kde=True)
# Distribution plot
sns.displot(data=df, x='column', kind='hist', kde=True)
# Joint distribution plot
sns.jointplot(x='column1', y='column2', data=df, kind='scatter')
Categorical Plots
# Box plot
sns.boxplot(x='category', y='value', data=df)
# Violin plot
sns.violinplot(x='category', y='value', data=df)
# Bar plot (mean with confidence intervals)
sns.barplot(x='category', y='value', data=df)
# Count plot (frequency)
sns.countplot(x='category', data=df)
# Categorical scatter plot (strip plot)
sns.stripplot(x='category', y='value', data=df, jitter=True)
# Swarm plot (non-overlapping scatter)
sns.swarmplot(x='category', y='value', data=df)
# Combined categorical plot
sns.catplot(x='category', y='value', data=df, kind='box')
Matrix Plots
# Heatmap from correlation matrix
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
# Cluster map (hierarchically clustered heatmap)
sns.clustermap(df.corr(), cmap='coolwarm')
Regression Plots
# Simple linear regression
sns.regplot(x='column1', y='column2', data=df)
# Linear regression with categorical variables
sns.lmplot(x='column1', y='column2', hue='category', data=df)
# Residual plot
sns.residplot(x='column1', y='column2', data=df)
Pair Plots
# Grid of plots for multiple variables
sns.pairplot(df, hue='category')
# Grid with custom plot types
sns.pairplot(df, hue='category', diag_kind='kde',
plot_kws={'alpha': 0.6}, diag_kws={'fill': True})
Joint Plots
# Scatter plot with histograms
sns.jointplot(x='column1', y='column2', data=df)
# Hexbin plot for dense data
sns.jointplot(x='column1', y='column2', data=df, kind='hex')
# Kernel density estimate
sns.jointplot(x='column1', y='column2', data=df, kind='kde')
Customizing Plots
Matplotlib Plot Customization
# Title and labels
plt.title('Main Title', fontsize=16)
plt.xlabel('X Axis Label', fontsize=12)
plt.ylabel('Y Axis Label', fontsize=12)
plt.suptitle('Super Title', fontsize=18) # Figure-level title
# Axis limits
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
# Grids
plt.grid(True, linestyle='--', alpha=0.7)
# Ticks
plt.xticks(rotation=45)
plt.yticks(fontsize=10)
# Custom tick positions and labels
plt.xticks(tick_positions, tick_labels)
# Legend
plt.legend(loc='best', fontsize=12, frameon=True, framealpha=0.7)
# Add text to plot
plt.text(x, y, 'Text annotation', fontsize=12, ha='center')
# Add an annotation with arrow
plt.annotate('Peak', xy=(x, y), xytext=(x+1, y+1),
arrowprops=dict(facecolor='black', shrink=0.05))
# Set figure size after creation
plt.gcf().set_size_inches(10, 6)
# Tight layout to avoid label cutoff
plt.tight_layout()
OO Interface Customization
# Using the object-oriented interface for more control
fig, ax = plt.subplots()
# Title and labels
ax.set_title('Title')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
# Axis limits
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
# Legend
ax.legend(loc='upper right')
# Grid
ax.grid(True)
# Spine visibility
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Spine positioning
ax.spines['bottom'].set_position(('outward', 10))
Seaborn Plot Customization
# Setting style and context
sns.set_style("whitegrid")
sns.set_context("notebook", font_scale=1.5)
# Setting palette
sns.set_palette("Set2")
# Custom color palette
custom_palette = sns.color_palette("husl", 8)
sns.set_palette(custom_palette)
# Plot-specific styling
sns.lineplot(x='column1', y='column2', data=df,
hue='category', palette='viridis', linewidth=2.5)
# Despine plot (remove spines)
sns.despine(left=True, bottom=True)
# Color map customization
cmap = sns.color_palette("icefire", as_cmap=True)
sns.heatmap(data, cmap=cmap)
Subplots and Figures
Matplotlib Subplots
# Basic subplots grid
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
# Accessing subplots in a 2x2 grid
axes[0, 0].plot(x, y1)
axes[0, 1].scatter(x, y2)
axes[1, 0].bar(x, y3)
axes[1, 1].hist(y4)
# Subplots with shared axes
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, y1)
axes[1].plot(x, y2)
# Subplots with different sizes
fig = plt.figure(figsize=(12, 8))
ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=2)
ax2 = plt.subplot2grid((3, 3), (0, 2), rowspan=2)
ax3 = plt.subplot2grid((3, 3), (1, 0), colspan=2, rowspan=2)
Seaborn Facet Grids
# FacetGrid for multiple plots
g = sns.FacetGrid(df, col="category", row="group", height=3)
g.map(sns.histplot, "value")
# Relational plot with facets
g = sns.relplot(x="time", y="value", hue="event",
col="group", row="subgroup", data=df)
# Custom function on FacetGrid
def custom_plot(x, y, **kwargs):
plt.scatter(x, y, **kwargs)
plt.plot(x, np.poly1d(np.polyfit(x, y, 1))(x), color='red')
g = sns.FacetGrid(df, col="category")
g.map(custom_plot, "column1", "column2")
Colors and Colormaps
Matplotlib Colors
# Basic color names
plt.plot(x, y, color='blue')
# Hex color codes
plt.plot(x, y, color='#FF5733')
# RGB tuples
plt.plot(x, y, color=(0.2, 0.4, 0.6))
# Common colormaps
plt.scatter(x, y, c=z, cmap='viridis')
plt.scatter(x, y, c=z, cmap='plasma')
plt.scatter(x, y, c=z, cmap='inferno')
plt.scatter(x, y, c=z, cmap='magma')
plt.scatter(x, y, c=z, cmap='cividis')
# Diverging colormaps
plt.imshow(data, cmap='coolwarm')
plt.imshow(data, cmap='RdBu_r') # Reversed red-blue
Seaborn Palettes
# Color palette types
sns.color_palette("deep") # Default
sns.color_palette("pastel") # Pastel version of default
sns.color_palette("dark") # Darker version
sns.color_palette("colorblind") # Colorblind friendly
# Sequential palettes
sns.color_palette("Blues", 8) # 8 increasing blue shades
sns.color_palette("rocket", 8) # Fire-like sequential palette
# Diverging palettes
sns.color_palette("vlag", 10) # Violet to light to amber to green
sns.color_palette("icefire", 10) # Blue to white to red
# Setting palette for all plots
sns.set_palette("Set2")
# Custom palette from list of colors
custom_pal = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e"]
sns.set_palette(custom_pal)
Saving and Displaying Plots
Matplotlib Save/Display
# Save figure
plt.savefig('plot.png', dpi=300, bbox_inches='tight')
# Save with specific format
plt.savefig('plot.svg', format='svg', transparent=True)
plt.savefig('plot.pdf', format='pdf')
# Show plot
plt.show()
# Close the current figure
plt.close()
# Close all figures
plt.close('all')
Seaborn Save/Display
# Seaborn plots still use matplotlib's save functions
plot = sns.lineplot(x='column1', y='column2', data=df)
fig = plot.get_figure()
fig.savefig('seaborn_plot.png', dpi=300)
# For FacetGrid or other grid-based plots
g = sns.FacetGrid(df, col="category")
g.map(plt.hist, "value")
g.savefig('facetgrid_plot.png')
Common Plotting Patterns
Matplotlib Patterns
# Plot with multiple y-axes
fig, ax1 = plt.subplots()
ax1.plot(x, y1, 'b-')
ax1.set_ylabel('Y1', color='b')
ax2 = ax1.twinx() # Create second y-axis sharing same x-axis
ax2.plot(x, y2, 'r-')
ax2.set_ylabel('Y2', color='r')
# Fill between lines
plt.fill_between(x, y1, y2, alpha=0.2)
# Plot date data
import matplotlib.dates as mdates
plt.plot(dates, values)
plt.gcf().autofmt_xdate() # Auto-format x-axis date labels
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
# Error bars
plt.errorbar(x, y, yerr=y_errors, fmt='o')
# Logarithmic scales
plt.semilogy(x, y) # log scale on y-axis
plt.semilogx(x, y) # log scale on x-axis
plt.loglog(x, y) # log scale on both axes
Seaborn Patterns
# Marginal distributions
sns.jointplot(x='column1', y='column2', data=df, kind='scatter',
marginal_kws=dict(bins=25, fill=True))
# Visualize correlation matrix
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0)
# Categorical plot with multiple comparisons
sns.catplot(x='category', y='value', hue='group', kind='box',
palette='Set3', height=6, aspect=1.5, data=df)
# Regression plot with confidence intervals
sns.lmplot(x='column1', y='column2', hue='category',
palette='Set1', height=6, data=df)
# Plot with rug plot and kernel density estimate
sns.displot(data=df, x='column', kde=True, rug=True)
# Pair plot with custom diagonals
sns.pairplot(df, diag_kind='kde', plot_kws={'alpha': 0.6})
Integration with Pandas
Pandas Direct Plotting
# Line plot from DataFrame
df.plot(x='column1', y='column2')
# Multiple line plot
df.plot(x='column1', y=['column2', 'column3', 'column4'])
# Scatter plot
df.plot.scatter(x='column1', y='column2')
# Histogram
df['column'].plot.hist(bins=30)
# Box plot
df.plot.box()
# Area plot
df.plot.area(alpha=0.5)
# Pie chart
df.plot.pie(y='column', figsize=(10, 10))
Seaborn with Pandas
# Load sample dataset
df = sns.load_dataset('tips')
# Plot with DataFrame
sns.scatterplot(data=df, x='total_bill', y='tip', hue='time')
# Plot with Series
sns.histplot(df['total_bill'])
# Grouped boxplot
sns.boxplot(x='day', y='total_bill', hue='sex', data=df)
# Pivoted heatmap
pivot_data = df.pivot_table(index='day', columns='sex', values='total_bill')
sns.heatmap(pivot_data, annot=True)
Statistical Visualization
Matplotlib Statistics
# Histogram with density curve
counts, bins, patches = plt.hist(data, bins=30, density=True, alpha=0.7)
plt.plot(bins, stats.norm.pdf(bins, np.mean(data), np.std(data)), 'r-')
# QQ plot
from scipy import stats
stats.probplot(data, plot=plt)
# Boxplot with notches
plt.boxplot(data, notch=True, patch_artist=True)
Seaborn Statistics
# Distribution plot with rug and kde
sns.displot(data=df, x='column', rug=True, kde=True)
# Violin plot with inner points
sns.violinplot(x='category', y='value', data=df, inner='points')
# Linear regression plot with confidence interval
sns.regplot(x='column1', y='column2', data=df, ci=95)
# Pair grid with different plots
g = sns.PairGrid(df, hue='category')
g.map_diag(sns.histplot)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.add_legend()
# Advanced categorical plot
sns.catplot(x='column1', y='column2', kind='violin', inner='stick',
palette='pastel', dodge=True, data=df)
Pandas Integration Examples
Single-variable Analysis
# Series histogram
df['numeric_column'].hist(bins=30, alpha=0.7)
# Series kde
df['numeric_column'].plot.kde()
# Both together with Seaborn
sns.displot(df['numeric_column'], kde=True, bins=30)
# Count plot for categorical data
df['category_column'].value_counts().plot.bar()
# Or with Seaborn
sns.countplot(x='category_column', data=df)
Two-variable Analysis
# Scatter plot
df.plot.scatter(x='column1', y='column2', alpha=0.5)
# Seaborn enhanced scatter with regression line
sns.regplot(x='column1', y='column2', data=df)
# Categorical relationship
sns.boxplot(x='category', y='value', data=df)
# Heatmap of correlation
corr = df.select_dtypes('number').corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
Multi-variable Analysis
# Scatter with size and color dimensions
df.plot.scatter(x='column1', y='column2', s=df['size_column']*10,
c='color_column', cmap='viridis', alpha=0.7)
# Seaborn pair plot by category
sns.pairplot(df, hue='category')
# Parallel coordinates for multiple dimensions
from pandas.plotting import parallel_coordinates
parallel_coordinates(df, 'category')
# Andrews curves
from pandas.plotting import andrews_curves
andrews_curves(df, 'category')
Best Practices and Tips
General Visualization Tips
- Keep it simple: Focus on what story you want your data to tell
- Choose appropriate plot types:
- Line plots for trends over time
- Bar charts for comparing categories
- Scatter plots for relationships between variables
- Histograms for distributions
- Avoid misleading visualizations:
- Use appropriate axis scales
- Start y-axis at zero when appropriate
- Avoid excessive 3D effects
- Make plots accessible:
- Use colorblind-friendly palettes
- Add clear labels and legends
- Use appropriate font sizes
- Optimize for the audience:
- Technical details for technical audiences
- Simpler plots for general audiences
Matplotlib Tips
- Use the object-oriented interface for complex plots and fine-grained control
- Set style parameters once with
plt.rcParams
for consistent styling across plots - Use
bbox_inches='tight'
when saving to avoid cutting off labels - Create custom plotting functions for repeated plot types
- Use colormaps appropriately:
- Sequential (e.g., ‘viridis’) for continuous data
- Diverging (e.g., ‘coolwarm’) for data with a meaningful midpoint
- Qualitative (e.g., ‘Set3’) for categorical data
Seaborn Tips
- Start with high-level functions (
displot
,catplot
, etc.) for quick exploration - Use
relplot
anddisplot
for multi-faceted figures - Set style and context at the beginning of your notebook/script
- Use Seaborn for statistical visualizations, Matplotlib for custom plots
- Save figure handles to add customization on top of Seaborn plots
Resources for Further Learning
Official Documentation
Books
- “Python for Data Analysis” by Wes McKinney
- “Python Data Science Handbook” by Jake VanderPlas
- “Fundamentals of Data Visualization” by Claus O. Wilke