Parsing gem5 Statistics
Complete guide to parsing gem5 simulator output with RING-5.
Overview
RING-5 provides a powerful async-first parsing system for gem5 stats.txt files:
- Parallel Processing - Parse multiple files simultaneously
- Type Detection - Automatic variable type recognition
- Pattern Aggregation - Consolidate repeated variables (cpu0, cpu1 → cpu\d+)
- Progress Tracking - Real-time feedback during parsing
Parsing Workflow
1. Scan for Variables
Discover all available statistics in your gem5 output:
from src.web.facade import BackendFacade
facade = BackendFacade()
# Submit async scan
scan_futures = facade.submit_scan_async(
stats_path="/path/to/gem5/output",
stats_pattern="stats.txt",
limit=10 # Number of files to scan (-1 for all)
)
# Wait for completion
scan_results = []
for future in scan_futures:
try:
result = future.result(timeout=30)
if result:
scan_results.append(result)
except Exception as e:
print(f"Scan failed: {e}")
# Aggregate discovered variables
variables = facade.finalize_scan(scan_results)
print(f"Found {len(variables)} variables")
What Scanning Does:
- Reads gem5 stats files
- Detects variable types (scalar, vector, histogram, etc.)
- Identifies vector entries
- Aggregates repeated patterns
- Returns structured variable list
2. Select Variables
Choose which variables to parse:
# Filter by type
scalar_vars = [v for v in variables if v["type"] == "scalar"]
vector_vars = [v for v in variables if v["type"] == "vector"]
# Select specific variables
selected = [
{"name": "simTicks", "type": "scalar"},
{"name": "system.cpu.ipc", "type": "scalar"},
{
"name": "system.cpu.op_class",
"type": "vector",
"vectorEntries": ["IntAlu", "IntMult", "FloatAdd"]
}
]
Variable Structure:
{
"name": "system.cpu.ipc",
"type": "scalar", # or vector, histogram, distribution, configuration
"entries": [], # For vectors: list of entry names
"min": 0.0, # For histograms: minimum value
"max": 100.0, # For histograms: maximum value
}
3. Parse Data
Extract actual values:
import tempfile
output_dir = tempfile.mkdtemp()
# Submit async parse
parse_futures = facade.submit_parse_async(
stats_path="/path/to/gem5/output",
stats_pattern="stats.txt",
variables=selected,
output_dir=output_dir,
scanned_vars=variables # REQUIRED for pattern variables
)
# Wait for completion
parse_results = []
for future in parse_futures:
try:
result = future.result(timeout=60)
if result:
parse_results.append(result)
except Exception as e:
print(f"Parse failed: {e}")
# Finalize to consolidated CSV
csv_path = facade.finalize_parsing(output_dir, parse_results)
print(f"Data saved to: {csv_path}")
4. Load Data
import pandas as pd
# Load parsed data
data = pd.read_csv(csv_path)
print(data.head())
gem5 Variable Types
Scalar
Single numeric values:
simTicks 123456789 # Total simulation ticks
system.cpu.ipc 1.23 # Instructions per cycle
Parse as:
{"name": "simTicks", "type": "scalar"}
Vector
Arrays with named entries:
system.cpu.op_class::IntAlu 1234
system.cpu.op_class::IntMult 567
system.cpu.op_class::FloatAdd 890
Parse as:
{
"name": "system.cpu.op_class",
"type": "vector",
"vectorEntries": ["IntAlu", "IntMult", "FloatAdd"]
}
Histogram
Distribution with buckets:
system.cpu.latency::histogram
0-10: 45
10-20: 123
20-30: 67
Parse as:
{
"name": "system.cpu.latency",
"type": "histogram",
"min": 0,
"max": 30
}
Distribution
Statistical distribution with min/max:
system.mem.latency
min: 10
max: 500
mean: 45.6
Parse as:
{"name": "system.mem.latency", "type": "distribution"}
Configuration
gem5 configuration values:
system.cpu.type AtomicSimpleCPU
system.mem.size 2GB
Parse as:
{"name": "system.cpu.type", "type": "configuration"}
Pattern Aggregation
RING-5 automatically consolidates repeated variables:
Before Aggregation
system.cpu0.numCycles
system.cpu1.numCycles
system.cpu2.numCycles
...
system.cpu15.numCycles
After Aggregation
system.cpu\d+.numCycles [vector]
entries: ["0", "1", "2", ..., "15"]
Benefit: Reduces 12,000+ variables to ~700 manageable patterns.
Using Pattern Variables
When parsing pattern variables, always pass scanned_vars:
# Pattern variable with regex
pattern_var = {
"name": r"system.cpu\d+.ipc",
"type": "scalar"
}
# MUST pass scanned_vars to resolve patterns
parse_futures = facade.submit_parse_async(
stats_path=path,
stats_pattern="stats.txt",
variables=[pattern_var],
output_dir=output_dir,
scanned_vars=variables # Critical!
)
Advanced Parsing
Limit Files
For large directories, parse subset first:
scan_futures = facade.submit_scan_async(
stats_path=path,
stats_pattern="stats.txt",
limit=5 # Only scan 5 files
)
Multiple Patterns
Parse different file patterns:
# Scan all .txt files
scan_futures = facade.submit_scan_async(
stats_path=path,
stats_pattern="*.txt",
limit=-1
)
Error Handling
scan_results = []
failed = []
for future in scan_futures:
try:
result = future.result(timeout=30)
if result:
scan_results.append(result)
except TimeoutError:
failed.append("Timeout")
except Exception as e:
failed.append(str(e))
if failed:
print(f"Failed scans: {len(failed)}")
for err in failed:
print(f" - {err}")
Performance Tips
- Limit scans for testing:
limit=5 - Parallel processing is automatic
- Cache scanned variables in StateManager
- Batch parse multiple variables at once
- Monitor memory with large datasets
Common Issues
No Variables Found
- Verify stats.txt files exist
- Check file pattern matches
- Ensure files are not empty
- Verify file permissions
Parsing Timeouts
- Increase timeout value
- Reduce number of files
- Check for corrupted files
- Verify sufficient memory
Pattern Variables Not Resolving
- Always pass
scanned_varsparameter - Verify patterns match variable names
- Check entries are correct
Memory Issues
- Parse fewer files at once
- Clear data between analyses
- Use CSV pooling
- Increase system RAM
UI Workflow
In the Streamlit interface:
- Data Sources tab → Browse directory
- Enter pattern:
stats.txt - Click Scan for Variables
- Filter/search variables
- Select desired variables
- Click Parse Statistics
- Monitor progress bar
- Data loads automatically
Best Practices
DO:
- Scan before parsing
- Start with small limits for testing
- Cache scanned variables
- Handle timeouts gracefully
-
Validate data after parsing
DON’T:
- Skip scanning step
- Parse without selecting variables
- Forget scanned_vars for patterns
- Ignore error handling
- Parse entire large directories without testing
Next Steps
- Data Transformations - Process parsed data
- Creating Plots - Visualize results
- Pattern Aggregation - Deep dive into patterns
Need help? Debugging Guide or Open an issue