Architecture Overview

RING-5 follows a clean layered architecture with strict separation of concerns, async-first design, and production-grade patterns.

High-Level Architecture

 Layer C: Presentation (Streamlit)
 • UI Components • Pages • State Management

 BackendFacade

 Layer B: Domain (Business Logic)
 • Plotting • Transformations • Analysis
 • NO UI imports • Pure functions • Testable

 Layer A: Data (Ingestion & Parsing)
 • File I/O • Perl parsers • Type mapping
 • Async workers • Pattern aggregation

Design Principles

1. Layered Architecture

Layer A (Data): File ingestion and parsing

Parse service and scanner service
Perl parser integration
Type mappers for gem5 variables
NO business logic

Layer B (Domain): Business logic and analysis

Statistical computations
Plot generation
Data transformations
NO UI dependencies

Layer C (Presentation): User interface

Streamlit components
State management
User interactions
Calls Layer B through BackendFacade

2. Async-First Design

All I/O-bound operations use concurrent.futures:

# CORRECT: Async pattern
futures = service.submit_scan_async(path, pattern, limit=10)
results = [f.result() for f in futures]
data = service.finalize_scan(results)

# WRONG: Don't create sync wrappers
def scan_sync(path): # Anti-pattern
 futures = submit_scan_async(path)
 return [f.result() for f in futures]

Key Rules:

Always use submit_*_async() + finalize_*() pattern
Never block the UI thread
Use WorkPool for parallel execution
Handle timeouts gracefully

3. Design Patterns

Factory Pattern (Plots and Shapers):

plot = PlotFactory.create_plot("bar", plot_id=1, name="My Plot")
shaper = ShaperFactory.create_shaper("normalize", config)

Facade Pattern (Backend Access):

facade = BackendFacade() # Single entry point
data = facade.load_csv_file(path)
plot = facade.create_plot("bar", config)

Strategy Pattern (Parsing):

# Different strategies for different variable types
scalar_parser = get_parser("scalar")
vector_parser = get_parser("vector")

Singleton (Configuration and Pools):

WorkPool.initialize(max_workers=8)
ConfigManager.load_config(path)

4. Type Safety

Strict typing everywhere:

def process_data(
 input_file: Path,
 config: Dict[str, Any],
 timeout: int = 30
) -> pd.DataFrame:
 """Process gem5 data from file."""
 result: pd.DataFrame = pd.read_csv(input_file)
 return result

Type checking:

mypy in strict mode
No implicit Any
All function signatures typed
TypedDict for structured data

5. Immutability

DataFrames are never modified in-place:

# CORRECT: Return new DataFrame
result = data.drop(columns=['x'])
filtered = result[result['value'] > 0]

# WRONG: In-place modification
data.drop(columns=['x'], inplace=True) #

Project Structure

RING-5/
 src/
 core/ # Shared utilities
 benchmark.py # Benchmark handling
 performance.py # Performance metrics
 parsers/ # Layer A: Data ingestion
 parser.py # Main parser
 scanner.py # Variable scanner
 parse_service.py # Async orchestration
 scanner_service.py # Async scanning
 type_mapper.py # Type detection
 pattern_aggregator.py # Pattern consolidation
 perl/ # Perl parser scripts
 workers/ # Async task workers
 plotting/ # Layer B/C: Visualization
 base_plot.py # Abstract base
 plot_factory.py # Factory
 plot_renderer.py # Rendering
 export.py # Export utilities
 types/ # Concrete plots
 bar_plot.py
 line_plot.py
 scatter_plot.py
 grouped_bar_plot.py
 grouped_stacked_bar_plot.py
 histogram_plot.py
 config/ # Configuration management
 config_manager.py
 schemas/ # JSON schemas
 web/ # Layer C: UI
 facade.py # Backend facade (MAIN API)
 state_manager.py # Session state
 services/ # UI services
 csv_pool.py
 variable_service.py
 shapers/ # Data transformations
 repositories/ # Data access
 ui/ # Streamlit components
 components/ # Reusable widgets
 pages/ # App pages
 tests/
 unit/ # Unit tests
 integration/ # Integration tests
 e2e/ # End-to-end tests
 data/ # Test fixtures
 .agent/ # AI agent configuration
 rules/ # Project rules
 workflows/ # Development workflows
 skills/ # Reusable knowledge
 docs/ # Documentation (you are here!)

Data Flow

Parsing Workflow

1. User selects stats directory
 ↓
2. Scanner discovers variables (async)
 • Scans multiple files in parallel
 • Detects variable types
 • Aggregates patterns (cpu0, cpu1 → cpu\d+)
 ↓
3. User selects variables to parse
 ↓
4. Parser extracts data (async)
 • Calls appropriate Perl parser per type
 • Processes files in parallel
 • Consolidates into CSVs
 ↓
5. Data loaded into memory
 • CSV pool management
 • Efficient caching
 ↓
6. Ready for analysis and visualization

Transformation Pipeline

Raw Data
 ↓
ColumnSelector: Keep relevant columns
 ↓
Filter: Remove unwanted rows
 ↓
Normalize: Divide by baseline
 ↓
Aggregate: Group and compute means
 ↓
Rename: Clean column names
 ↓
Sort: Order rows
 ↓
Transformed Data → Ready for plotting

Plotting Workflow

Transformed Data + Plot Config
 ↓
PlotFactory.create_plot(type, id, name)
 ↓
Concrete Plot Class (BarPlot, LinePlot, etc.)
 ↓
create_figure(data, config) → go.Figure
 ↓
PlotRenderer.render(figure)
 ↓
Display in UI or Export

Key Components

BackendFacade

Single entry point to all backend functionality:

class BackendFacade:
 # Scanning
 def submit_scan_async(...)
 def finalize_scan(...)

 # Parsing
 def submit_parse_async(...)
 def finalize_parsing(...)

 # Data Access
 def load_csv_file(...)
 def apply_shapers(...)

 # Plotting
 def create_plot(...)
 def render_plot(...)

StateManager

Manages Streamlit session state:

Scanned variables
Selected variables
Loaded data
Plot configurations
Portfolio settings

WorkPool

Manages concurrent execution:

Fixed thread pool
Task submission
Result collection
Error handling

ShaperFactory

Creates data transformers:

Column selector
Filter
Normalize
Aggregate
Rename
Sort
Custom shapers

Testing Strategy

Unit Tests

Pure functions tested in isolation
Mock external dependencies
Fast execution (<1s per test)

Integration Tests

Multi-component workflows
Real data parsing
Database interactions

End-to-End Tests

Full user workflows
UI interactions (planned)
Browser automation (planned)

Coverage: 77% (target: 85%)

Performance Considerations

Async Parsing

Parallel file processing
Non-blocking I/O
Progress reporting

Memory Management

CSV pooling
Lazy loading
Garbage collection hints

Caching

Scanned variable cache
Compiled regex patterns
Plot layout templates

Error Handling

Fail Fast

if not stats_path.exists():
 raise FileNotFoundError(f"Path not found: {stats_path}")

User-Friendly Messages

try:
 data = parse_file(path)
except ParseError as e:
 st.error(f"Failed to parse {path.name}: {e}")
 logger.error(f"Parse error: {e}", exc_info=True)

Graceful Degradation

# Continue with other files if one fails
for future in futures:
 try:
 result = future.result(timeout=30)
 results.append(result)
 except Exception as e:
 logger.warning(f"Task failed: {e}")
 # Continue processing other files

Extension Points

Adding New Plot Types

Create class inheriting BasePlot
Implement create_figure()
Register in PlotFactory
Add UI configuration

See Adding Plot Types

Adding New Shapers

Create class with transform() method
Register in ShaperFactory
Add UI controls

See Adding Shapers

Adding New Variable Types

Create Perl parser script
Add to TypeMapper
Update scanner logic
Add tests

See API Reference for details

Best Practices

DO

Follow layered architecture
Use async patterns
Write tests first (TDD)
Type all functions
Return new DataFrames
Handle errors gracefully
Document public APIs

DON’T

Mix UI and business logic
Create sync wrappers for async APIs
Modify DataFrames in-place
Use bare except clauses
Forget type hints
Skip tests
Leave TODOs in production code

Next: Development Setup to start building with this architecture.