Histogram Plot
Overview
The Histogram Plot visualizes distribution data from gem5 simulator histogram variables. It supports single or multiple histograms grouped by categorical variables, with configurable bucket sizes and normalization modes.
Features
✅ Single Histogram - Display distribution for one variable ✅ Grouped Histograms - Multiple histograms by categorical variable ✅ Configurable Bucket Size - Rebin data to different granularities ✅ Normalization Modes - Count, probability, percent, or density ✅ Cumulative Distribution - Show CDF instead of PDF ✅ Publication Quality - Plotly-based with full styling support
Usage
Basic Example
from src.plotting.plot_factory import PlotFactory
# Create histogram plot
plot = PlotFactory.create_plot("histogram", plot_id=1, name="Latency Distribution")
# Configure
config = {
"histogram_variable": "latency",
"title": "Request Latency Distribution",
"xlabel": "Latency (cycles)",
"ylabel": "Count",
"bucket_size": 100,
"normalization": "count",
"group_by": None,
"cumulative": False,
}
# Generate figure
fig = plot.create_figure(data, config)
Grouped Histograms
config = {
"histogram_variable": "latency",
"title": "Latency by Benchmark",
"xlabel": "Latency (cycles)",
"ylabel": "Count",
"bucket_size": 100,
"normalization": "count",
"group_by": "benchmark", # Group by categorical variable
"cumulative": False,
}
fig = plot.create_figure(data, config)
Normalized Distribution
config = {
"histogram_variable": "latency",
"title": "Latency Probability Distribution",
"xlabel": "Latency (cycles)",
"ylabel": "Probability",
"bucket_size": 100,
"normalization": "probability", # Normalize to [0, 1]
"group_by": None,
"cumulative": False,
}
fig = plot.create_figure(data, config)
Cumulative Distribution Function (CDF)
config = {
"histogram_variable": "latency",
"title": "Cumulative Latency Distribution",
"xlabel": "Latency (cycles)",
"ylabel": "CDF",
"bucket_size": 100,
"normalization": "probability",
"group_by": None,
"cumulative": True, # Show CDF
}
fig = plot.create_figure(data, config)
Configuration Options
| Parameter | Type | Description | Options |
|---|---|---|---|
histogram_variable | str | Base variable name (before “..”) | Any histogram variable |
title | str | Plot title | Any string |
xlabel | str | X-axis label | Any string |
ylabel | str | Y-axis label | Any string |
bucket_size | int | Size of histogram buckets | Positive integer |
normalization | str | How to normalize heights | count, probability, percent, density |
group_by | str|None | Categorical variable for grouping | Column name or None |
cumulative | bool | Show cumulative distribution | true or false |
Data Format
The histogram plot expects data with columns in this format:
variable_name..bucket_range
Example columns:
latency..0-100
latency..100-200
latency..200-300
latency..300-400
Example DataFrame
import pandas as pd
data = pd.DataFrame({
"benchmark": ["A", "B"],
"latency..0-100": [5, 8],
"latency..100-200": [10, 12],
"latency..200-300": [15, 18],
"latency..300-400": [8, 10],
})
Normalization Modes
Count (Default)
Raw counts from the data.
Probability
Normalized to sum to 1.0:
probability = count / total_count
Percent
Normalized to sum to 100:
percent = (count / total_count) * 100
Density
Normalized by bin width:
density = count / (total_count * bin_width)
Integration with gem5
Histogram variables from gem5 are automatically detected and can be parsed:
from src.web.facade import BackendFacade
facade = BackendFacade()
# Scan for histogram variables
scan_futures = facade.submit_scan_async(stats_dir, "stats.txt")
scan_results = [f.result() for f in scan_futures]
vars_found = facade.finalize_scan(scan_results)
# Find histogram variable
hist_var = next(v for v in vars_found if v["type"] == "histogram")
# Parse
variables = [{"name": hist_var["name"], "type": "histogram"}]
parse_futures = facade.submit_parse_async(stats_dir, "stats.txt", variables, output_dir)
parse_results = [f.result() for f in parse_futures]
csv_path = facade.finalize_parsing(output_dir, parse_results)
# Load and plot
data = pd.read_csv(csv_path)
plot = PlotFactory.create_plot("histogram", plot_id=1, name="Distribution")
fig = plot.create_figure(data, config)
Testing
Run histogram plot tests:
# Unit tests
pytest tests/unit/test_histogram_plot.py -v
# Integration tests
pytest tests/integration/test_histogram_plot_integration.py -v
# All histogram tests
pytest -k "histogram" -v
Type Safety
The histogram plot implementation is fully typed with mypy strict mode:
mypy src/plotting/types/histogram_plot.py --strict
Architecture
The histogram plot follows the project’s layered architecture:
- Layer A (Data): Histogram variables parsed from gem5
- Layer B (Domain):
HistogramPlotclass with visualization logic - Layer C (Presentation): Streamlit UI integration
Uses the Factory Pattern for creation and Strategy Pattern for different normalization modes.