Architecture Overview
RING-5 follows a clean layered architecture with strict separation of concerns, async-first design, and production-grade patterns.
High-Level Architecture
Layer C: Presentation (Streamlit)
• UI Components • Pages • State Management
BackendFacade
Layer B: Domain (Business Logic)
• Plotting • Transformations • Analysis
• NO UI imports • Pure functions • Testable
Layer A: Data (Ingestion & Parsing)
• File I/O • Perl parsers • Type mapping
• Async workers • Pattern aggregation
Design Principles
1. Layered Architecture
Layer A (Data): File ingestion and parsing
- Parse service and scanner service
- Perl parser integration
- Type mappers for gem5 variables
- NO business logic
Layer B (Domain): Business logic and analysis
- Statistical computations
- Plot generation
- Data transformations
- NO UI dependencies
Layer C (Presentation): User interface
- Streamlit components
- State management
- User interactions
- Calls Layer B through BackendFacade
2. Async-First Design
All I/O-bound operations use concurrent.futures:
# CORRECT: Async pattern
futures = service.submit_scan_async(path, pattern, limit=10)
results = [f.result() for f in futures]
data = service.finalize_scan(results)
# WRONG: Don't create sync wrappers
def scan_sync(path): # Anti-pattern
futures = submit_scan_async(path)
return [f.result() for f in futures]
Key Rules:
- Always use
submit_*_async()+finalize_*()pattern - Never block the UI thread
- Use WorkPool for parallel execution
- Handle timeouts gracefully
3. Design Patterns
Factory Pattern (Plots and Shapers):
plot = PlotFactory.create_plot("bar", plot_id=1, name="My Plot")
shaper = ShaperFactory.create_shaper("normalize", config)
Facade Pattern (Backend Access):
facade = BackendFacade() # Single entry point
data = facade.load_csv_file(path)
plot = facade.create_plot("bar", config)
Strategy Pattern (Parsing):
# Different strategies for different variable types
scalar_parser = get_parser("scalar")
vector_parser = get_parser("vector")
Singleton (Configuration and Pools):
WorkPool.initialize(max_workers=8)
ConfigManager.load_config(path)
4. Type Safety
Strict typing everywhere:
def process_data(
input_file: Path,
config: Dict[str, Any],
timeout: int = 30
) -> pd.DataFrame:
"""Process gem5 data from file."""
result: pd.DataFrame = pd.read_csv(input_file)
return result
Type checking:
- mypy in strict mode
- No implicit
Any - All function signatures typed
- TypedDict for structured data
5. Immutability
DataFrames are never modified in-place:
# CORRECT: Return new DataFrame
result = data.drop(columns=['x'])
filtered = result[result['value'] > 0]
# WRONG: In-place modification
data.drop(columns=['x'], inplace=True) #
Project Structure
RING-5/
src/
core/ # Shared utilities
benchmark.py # Benchmark handling
performance.py # Performance metrics
parsers/ # Layer A: Data ingestion
parser.py # Main parser
scanner.py # Variable scanner
parse_service.py # Async orchestration
scanner_service.py # Async scanning
type_mapper.py # Type detection
pattern_aggregator.py # Pattern consolidation
perl/ # Perl parser scripts
workers/ # Async task workers
plotting/ # Layer B/C: Visualization
base_plot.py # Abstract base
plot_factory.py # Factory
plot_renderer.py # Rendering
export.py # Export utilities
types/ # Concrete plots
bar_plot.py
line_plot.py
scatter_plot.py
grouped_bar_plot.py
grouped_stacked_bar_plot.py
histogram_plot.py
config/ # Configuration management
config_manager.py
schemas/ # JSON schemas
web/ # Layer C: UI
facade.py # Backend facade (MAIN API)
state_manager.py # Session state
services/ # UI services
csv_pool.py
variable_service.py
shapers/ # Data transformations
repositories/ # Data access
ui/ # Streamlit components
components/ # Reusable widgets
pages/ # App pages
tests/
unit/ # Unit tests
integration/ # Integration tests
e2e/ # End-to-end tests
data/ # Test fixtures
.agent/ # AI agent configuration
rules/ # Project rules
workflows/ # Development workflows
skills/ # Reusable knowledge
docs/ # Documentation (you are here!)
Data Flow
Parsing Workflow
1. User selects stats directory
↓
2. Scanner discovers variables (async)
• Scans multiple files in parallel
• Detects variable types
• Aggregates patterns (cpu0, cpu1 → cpu\d+)
↓
3. User selects variables to parse
↓
4. Parser extracts data (async)
• Calls appropriate Perl parser per type
• Processes files in parallel
• Consolidates into CSVs
↓
5. Data loaded into memory
• CSV pool management
• Efficient caching
↓
6. Ready for analysis and visualization
Transformation Pipeline
Raw Data
↓
ColumnSelector: Keep relevant columns
↓
Filter: Remove unwanted rows
↓
Normalize: Divide by baseline
↓
Aggregate: Group and compute means
↓
Rename: Clean column names
↓
Sort: Order rows
↓
Transformed Data → Ready for plotting
Plotting Workflow
Transformed Data + Plot Config
↓
PlotFactory.create_plot(type, id, name)
↓
Concrete Plot Class (BarPlot, LinePlot, etc.)
↓
create_figure(data, config) → go.Figure
↓
PlotRenderer.render(figure)
↓
Display in UI or Export
Key Components
BackendFacade
Single entry point to all backend functionality:
class BackendFacade:
# Scanning
def submit_scan_async(...)
def finalize_scan(...)
# Parsing
def submit_parse_async(...)
def finalize_parsing(...)
# Data Access
def load_csv_file(...)
def apply_shapers(...)
# Plotting
def create_plot(...)
def render_plot(...)
StateManager
Manages Streamlit session state:
- Scanned variables
- Selected variables
- Loaded data
- Plot configurations
- Portfolio settings
WorkPool
Manages concurrent execution:
- Fixed thread pool
- Task submission
- Result collection
- Error handling
ShaperFactory
Creates data transformers:
- Column selector
- Filter
- Normalize
- Aggregate
- Rename
- Sort
- Custom shapers
Testing Strategy
Unit Tests
- Pure functions tested in isolation
- Mock external dependencies
- Fast execution (<1s per test)
Integration Tests
- Multi-component workflows
- Real data parsing
- Database interactions
End-to-End Tests
- Full user workflows
- UI interactions (planned)
- Browser automation (planned)
Coverage: 77% (target: 85%)
Performance Considerations
Async Parsing
- Parallel file processing
- Non-blocking I/O
- Progress reporting
Memory Management
- CSV pooling
- Lazy loading
- Garbage collection hints
Caching
- Scanned variable cache
- Compiled regex patterns
- Plot layout templates
Error Handling
Fail Fast
if not stats_path.exists():
raise FileNotFoundError(f"Path not found: {stats_path}")
User-Friendly Messages
try:
data = parse_file(path)
except ParseError as e:
st.error(f"Failed to parse {path.name}: {e}")
logger.error(f"Parse error: {e}", exc_info=True)
Graceful Degradation
# Continue with other files if one fails
for future in futures:
try:
result = future.result(timeout=30)
results.append(result)
except Exception as e:
logger.warning(f"Task failed: {e}")
# Continue processing other files
Extension Points
Adding New Plot Types
- Create class inheriting
BasePlot - Implement
create_figure() - Register in
PlotFactory - Add UI configuration
Adding New Shapers
- Create class with
transform()method - Register in
ShaperFactory - Add UI controls
See Adding Shapers
Adding New Variable Types
- Create Perl parser script
- Add to
TypeMapper - Update scanner logic
- Add tests
See API Reference for details
Best Practices
DO
- Follow layered architecture
- Use async patterns
- Write tests first (TDD)
- Type all functions
- Return new DataFrames
- Handle errors gracefully
- Document public APIs
DON’T
- Mix UI and business logic
- Create sync wrappers for async APIs
- Modify DataFrames in-place
- Use bare
exceptclauses - Forget type hints
- Skip tests
- Leave TODOs in production code
Related Documentation
Next: Development Setup to start building with this architecture.