Bonus: Advanced Visualization
Annotations and Drawing on Plots
Section titled “Annotations and Drawing on Plots”Adding Text and Annotations
Section titled “Adding Text and Annotations”Reference:
ax.text(x, y, 'text')- Add text at coordinatesax.annotate('text', xy=(x, y), xytext=(x2, y2))- Add annotation with arrowax.arrow(x, y, dx, dy)- Add arrowax.axhline(y=value)- Add horizontal lineax.axvline(x=value)- Add vertical line
Example:
# Annotate important pointsfig, ax = plt.subplots(figsize=(10, 6))ax.plot(data)
# Add text annotationax.text(50, data[50], 'Peak Value', fontsize=12, ha='center')
# Add arrow annotationax.annotate('Important Event', xy=(100, data[100]), xytext=(150, data[100] + 10), arrowprops=dict(arrowstyle='->', color='red'))
# Add reference linesax.axhline(y=data.mean(), color='gray', linestyle='--', alpha=0.7)ax.axvline(x=50, color='gray', linestyle='--', alpha=0.7)Drawing Shapes and Patches
Section titled “Drawing Shapes and Patches”Reference:
from matplotlib.patches import Rectangle, Circle, Polygon
# Add shapes to plotsrect = Rectangle((x, y), width, height, color='blue', alpha=0.3)circle = Circle((x, y), radius, color='red', alpha=0.3)polygon = Polygon([(x1, y1), (x2, y2), (x3, y3)], color='green', alpha=0.3)
ax.add_patch(rect)ax.add_patch(circle)ax.add_patch(polygon)matplotlib Configuration
Section titled “matplotlib Configuration”Global Configuration
Section titled “Global Configuration”Reference:
plt.rcParams- Access all configuration parametersplt.rc('font', size=12)- Set font sizeplt.rc('figure', figsize=(8, 6))- Set default figure sizeplt.rcdefaults()- Reset to defaults
Example:
# Custom matplotlib configurationplt.rcParams.update({ 'font.size': 12, 'font.family': 'serif', 'axes.linewidth': 1.2, 'axes.grid': True, 'grid.alpha': 0.3, 'figure.figsize': (10, 6), 'savefig.dpi': 300, 'savefig.bbox': 'tight'})
# Create plot with custom settingsfig, ax = plt.subplots()ax.plot(data)Style Sheets
Section titled “Style Sheets”Reference:
# Available stylesplt.style.available # List all available styles
# Use a styleplt.style.use('seaborn-v0_8')plt.style.use('ggplot')plt.style.use('bmh')
# Create custom styleplt.style.use({ 'figure.facecolor': 'white', 'axes.facecolor': 'lightgray', 'axes.grid': True, 'grid.color': 'white'})Advanced pandas Plotting
Section titled “Advanced pandas Plotting”Subplot Layouts
Section titled “Subplot Layouts”Reference:
# Advanced subplot optionsdf.plot(subplots=True, layout=(2, 2), sharex=True, sharey=True)df.plot(subplots=True, figsize=(12, 8), title='Custom Title')Stacked and Grouped Plots
Section titled “Stacked and Grouped Plots”Reference:
# Stacked bar plotsdf.plot.bar(stacked=True, alpha=0.7)
# Grouped bar plotsdf.plot.bar(x='category', y='value', color=['red', 'blue', 'green'])
# Area plotsdf.plot.area(alpha=0.7, stacked=True)Advanced seaborn Features
Section titled “Advanced seaborn Features”Statistical Visualization
Section titled “Statistical Visualization”Reference:
sns.pairplot()- Pairwise relationshipssns.jointplot()- Joint distributionssns.violinplot()- Distribution shapessns.heatmap()- Correlation matricessns.clustermap()- Hierarchical clustering heatmap
Example:
# Advanced seaborn statistical plotsfig, axes = plt.subplots(2, 2, figsize=(15, 12))
# Pair plot for correlation analysissns.pairplot(df, hue='category')
# Joint plot with regressionsns.jointplot(data=df, x='x', y='y', kind='reg')
# Violin plot for distribution comparisonsns.violinplot(data=df, x='category', y='value')
# Heatmap for correlation matrixcorrelation_matrix = df.corr()sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')Facet Grids and Categorical Plots
Section titled “Facet Grids and Categorical Plots”Reference:
# Advanced categorical plotssns.catplot(data=df, x='category', y='value', hue='group', kind='box')sns.catplot(data=df, x='category', y='value', col='time', row='group')
# Facet gridg = sns.FacetGrid(df, col='category', row='group')g.map(sns.scatterplot, 'x', 'y')Advanced matplotlib Customization
Section titled “Advanced matplotlib Customization”Publication-Quality Plots
Section titled “Publication-Quality Plots”Reference:
import matplotlib.pyplot as pltimport numpy as np
# Set publication-quality defaultsplt.rcParams.update({ 'figure.figsize': (8, 6), 'font.size': 12, 'font.family': 'serif', 'axes.linewidth': 1.2, 'xtick.major.size': 5, 'ytick.major.size': 5, 'legend.frameon': True, 'legend.fancybox': False, 'legend.shadow': False})
# Create publication-quality plotfig, ax = plt.subplots(figsize=(8, 6))
# Your plotting code herex = np.linspace(0, 10, 100)y = np.sin(x)
ax.plot(x, y, linewidth=2, label='sin(x)')ax.set_xlabel('X values', fontsize=14)ax.set_ylabel('Y values', fontsize=14)ax.set_title('Publication-Quality Plot', fontsize=16, fontweight='bold')ax.legend(fontsize=12)ax.grid(True, alpha=0.3)
# Save with high DPIplt.savefig('publication_plot.png', dpi=300, bbox_inches='tight')plt.show()Custom Color Palettes
Section titled “Custom Color Palettes”Reference:
# Define custom color palettecolors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
# Or use colormapfrom matplotlib.colors import LinearSegmentedColormap
# Create custom colormapcolors = ['#FF0000', '#FFFF00', '#00FF00', '#00FFFF', '#0000FF']n_bins = 100cmap = LinearSegmentedColormap.from_list('custom', colors, N=n_bins)
# Use in plotplt.imshow(data, cmap=cmap)plt.colorbar()Interactive Visualizations
Section titled “Interactive Visualizations”Bokeh for Interactive Plots
Section titled “Bokeh for Interactive Plots”Reference:
from bokeh.plotting import figure, show, output_notebookfrom bokeh.models import HoverToolimport numpy as np
# Enable notebook outputoutput_notebook()
# Create interactive plotp = figure(title="Interactive Scatter Plot", x_axis_label='X', y_axis_label='Y', width=600, height=400)
# Add hover toolhover = HoverTool(tooltips=[("index", "$index"), ("(x,y)", "($x, $y)")])p.add_tools(hover)
# Generate datax = np.random.randn(100)y = np.random.randn(100)
# Add scatter plotp.circle(x, y, size=10, alpha=0.6, color='blue')
# Show plotshow(p)Plotly for Interactive Dashboards
Section titled “Plotly for Interactive Dashboards”Reference:
import plotly.express as pximport plotly.graph_objects as gofrom plotly.subplots import make_subplots
# Create interactive scatter plotfig = px.scatter(df, x='total_bill', y='tip', color='time', size='size', hover_data=['day', 'smoker'], title='Interactive Tips Analysis')
# Add trend linefig.add_trace(go.Scatter(x=df['total_bill'], y=df['tip'], mode='lines', name='Trend', line=dict(dash='dash')))
# Show plotfig.show()Advanced seaborn Features
Section titled “Advanced seaborn Features”Statistical Plotting
Section titled “Statistical Plotting”Reference:
import seaborn as sns
# Statistical plotssns.regplot(data=df, x='x', y='y') # Regression plotsns.residplot(data=df, x='x', y='y') # Residual plotsns.distplot(data=df, x='column') # Distribution plotsns.kdeplot(data=df, x='x', y='y') # 2D density plot
# Advanced statistical plotssns.pairplot(df, hue='category', diag_kind='kde')sns.jointplot(data=df, x='x', y='y', kind='hex')sns.clustermap(df.corr(), annot=True, cmap='coolwarm')Custom Themes and Styles
Section titled “Custom Themes and Styles”Reference:
# Set custom themesns.set_theme(style="whitegrid", palette="husl", font_scale=1.2, rc={"figure.figsize": (10, 8)})
# Or create custom stylecustom_style = { 'axes.spines.left': True, 'axes.spines.bottom': True, 'axes.spines.top': False, 'axes.spines.right': False, 'axes.grid': True, 'grid.alpha': 0.3}
sns.set_style("white", rc=custom_style)Animation and Dynamic Plots
Section titled “Animation and Dynamic Plots”matplotlib Animation
Section titled “matplotlib Animation”Reference:
import matplotlib.animation as animationfrom matplotlib.animation import FuncAnimation
# Create animated plotfig, ax = plt.subplots()line, = ax.plot([], [], 'b-', linewidth=2)ax.set_xlim(0, 10)ax.set_ylim(-1, 1)
def animate(frame): x = np.linspace(0, 10, 100) y = np.sin(x + frame * 0.1) line.set_data(x, y) return line,
# Create animationanim = FuncAnimation(fig, animate, frames=100, interval=50, blit=True)
# Save as GIFanim.save('sine_wave.gif', writer='pillow', fps=20)Real-time Data Visualization
Section titled “Real-time Data Visualization”Reference:
import timeimport random
# Real-time plottingfig, ax = plt.subplots()x_data, y_data = [], []
def update_plot(): # Add new data point x_data.append(time.time()) y_data.append(random.random())
# Keep only last 100 points if len(x_data) > 100: x_data.pop(0) y_data.pop(0)
# Update plot ax.clear() ax.plot(x_data, y_data) ax.set_title('Real-time Data') plt.pause(0.1)
# Run for 10 secondsstart_time = time.time()while time.time() - start_time < 10: update_plot()Advanced Color Theory
Section titled “Advanced Color Theory”Colorblind-Friendly Palettes
Section titled “Colorblind-Friendly Palettes”Reference:
# Colorblind-friendly palettescolorblind_palettes = { 'colorblind': ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'], 'viridis': ['#440154', '#31688e', '#35b779', '#fde725'], 'plasma': ['#0d0887', '#7e03a8', '#cc4778', '#f0f921']}
# Use in plotssns.set_palette(colorblind_palettes['viridis'])Color Psychology in Data Visualization
Section titled “Color Psychology in Data Visualization”Reference:
# Emotional color associationsemotional_colors = { 'trust': '#1f77b4', # Blue 'energy': '#ff7f0e', # Orange 'growth': '#2ca02c', # Green 'danger': '#d62728', # Red 'luxury': '#9467bd', # Purple 'warmth': '#bcbd22' # Yellow}
# Use contextuallydef choose_color_for_data(data_type, value): if data_type == 'sales' and value > 1000: return emotional_colors['growth'] elif data_type == 'errors' and value > 10: return emotional_colors['danger'] else: return emotional_colors['trust']Performance Optimization
Section titled “Performance Optimization”Large Dataset Visualization
Section titled “Large Dataset Visualization”Reference:
# For large datasets, use samplingdef plot_large_dataset(df, sample_size=10000): if len(df) > sample_size: df_sample = df.sample(sample_size) print(f"Sampled {sample_size} points from {len(df)} total") else: df_sample = df
# Use efficient plot types plt.scatter(df_sample['x'], df_sample['y'], alpha=0.1, s=1) plt.show()
# Or use hexbin for densityplt.hexbin(df['x'], df['y'], gridsize=50, cmap='Blues')plt.colorbar()Memory-Efficient Plotting
Section titled “Memory-Efficient Plotting”Reference:
# Clear memory between plotsimport gc
def memory_efficient_plotting(): # Create plot fig, ax = plt.subplots() ax.plot(data) plt.show()
# Clean up plt.close(fig) gc.collect()Export and Sharing
Section titled “Export and Sharing”Multiple Format Export
Section titled “Multiple Format Export”Reference:
# Export to multiple formatsdef export_plot(fig, filename_base): # High-res PNG fig.savefig(f'{filename_base}.png', dpi=300, bbox_inches='tight')
# Vector formats fig.savefig(f'{filename_base}.svg', bbox_inches='tight') fig.savefig(f'{filename_base}.pdf', bbox_inches='tight')
# Web formats fig.savefig(f'{filename_base}.jpg', dpi=150, bbox_inches='tight')Interactive HTML Export
Section titled “Interactive HTML Export”Reference:
# Export interactive plots to HTMLimport plotly.offline as pyo
# Create plotly figurefig = px.scatter(df, x='x', y='y')
# Export to HTMLpyo.plot(fig, filename='interactive_plot.html', auto_open=False)Advanced Statistical Visualization
Section titled “Advanced Statistical Visualization”Confidence Intervals
Section titled “Confidence Intervals”Reference:
# Add confidence intervals to plotsdef plot_with_confidence(x, y, ax): # Calculate confidence interval mean_y = np.mean(y) std_y = np.std(y) n = len(y) se = std_y / np.sqrt(n) ci = 1.96 * se # 95% confidence interval
# Plot mean line ax.axhline(mean_y, color='red', linestyle='-', linewidth=2)
# Plot confidence interval ax.axhspan(mean_y - ci, mean_y + ci, alpha=0.3, color='red')
# Add labels ax.text(0.02, 0.98, f'Mean: {mean_y:.2f} ± {ci:.2f}', transform=ax.transAxes, verticalalignment='top')Statistical Annotations
Section titled “Statistical Annotations”Reference:
# Add statistical annotationsfrom scipy import stats
def add_statistical_annotations(ax, x, y): # Calculate correlation r, p_value = stats.pearsonr(x, y)
# Add text annotation ax.text(0.05, 0.95, f'r = {r:.3f}\np = {p_value:.3f}', transform=ax.transAxes, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))Custom Plot Types
Section titled “Custom Plot Types”Waterfall Charts
Section titled “Waterfall Charts”Reference:
def create_waterfall_chart(data, labels): """Create waterfall chart for showing cumulative changes""" fig, ax = plt.subplots(figsize=(10, 6))
# Calculate cumulative values cumulative = np.cumsum([0] + data)
# Create bars for i, (label, value) in enumerate(zip(labels, data)): color = 'green' if value >= 0 else 'red' ax.bar(i, value, bottom=cumulative[i], color=color, alpha=0.7) ax.text(i, cumulative[i] + value/2, f'{value:.1f}', ha='center', va='center')
ax.set_xticks(range(len(labels))) ax.set_xticklabels(labels, rotation=45) ax.set_title('Waterfall Chart') ax.grid(True, alpha=0.3)
plt.tight_layout() plt.show()Sankey Diagrams
Section titled “Sankey Diagrams”Reference:
# Sankey diagram for flow visualizationdef create_sankey_diagram(): import plotly.graph_objects as go
# Define flows source = [0, 1, 0, 2, 3, 3] target = [2, 3, 3, 4, 4, 5] value = [8, 4, 2, 8, 4, 2]
fig = go.Figure(data=[go.Sankey( node=dict( pad=15, thickness=20, line=dict(color="black", width=0.5), label=["A", "B", "C", "D", "E", "F"] ), link=dict( source=source, target=target, value=value ) )])
fig.update_layout(title_text="Sankey Diagram", font_size=10) fig.show()Visualization Testing and Validation
Section titled “Visualization Testing and Validation”Automated Plot Testing
Section titled “Automated Plot Testing”Reference:
# Test plot propertiesdef test_plot_properties(fig, expected_properties): """Test that plot has expected properties""" ax = fig.axes[0]
# Test title if 'title' in expected_properties: assert ax.get_title() == expected_properties['title']
# Test axis labels if 'xlabel' in expected_properties: assert ax.get_xlabel() == expected_properties['xlabel']
# Test data range if 'xlim' in expected_properties: xlim = ax.get_xlim() assert xlim[0] == expected_properties['xlim'][0] assert xlim[1] == expected_properties['xlim'][1]Plot Quality Metrics
Section titled “Plot Quality Metrics”Reference:
# Calculate plot quality metricsdef calculate_plot_quality(fig): """Calculate various quality metrics for a plot""" ax = fig.axes[0]
metrics = { 'has_title': bool(ax.get_title()), 'has_xlabel': bool(ax.get_xlabel()), 'has_ylabel': bool(ax.get_ylabel()), 'has_legend': bool(ax.get_legend()), 'has_grid': ax.grid, 'aspect_ratio': fig.get_figwidth() / fig.get_figheight() }
return metricsThese advanced topics will help you create professional, publication-ready visualizations and handle complex visualization challenges in your data science work.