Shaper API

Complete API reference for RING-5’s data transformation system.

Overview

Shapers transform DataFrames in the analysis pipeline. All shapers:

Follow the Strategy Pattern
Are immutable (return new DataFrames)
Are chainable in pipelines
Use callable interface (__call__)

ShaperFactory

Class: `ShaperFactory`

Factory for creating shaper instances.

Location: src/web/services/shapers/shaper_factory.py

Methods

`create_shaper(shaper_type, config)`

Create shaper instance by type.

Parameters:

shaper_type (str): Shaper type identifier
config (Dict[str, Any]): Configuration dictionary

Returns: Shaper instance (callable)

Raises: ValueError - If shaper type unknown

Supported Shaper Types:

"column_selector" - Select/rename columns
"sort" - Sort rows
"mean_calculator" - Calculate means by group
"normalize" - Normalize values
"filter" - Filter rows
"transformer" - Apply custom transformations

Example:

from src.web.services.shapers.shaper_factory import ShaperFactory
import pandas as pd

# Create shaper
config = {"columns": ["benchmark", "ipc"], "rename": {"ipc": "Instructions Per Cycle"}}
shaper = ShaperFactory.create_shaper("column_selector", config)

# Apply to data
data = pd.read_csv("data.csv")
result = shaper(data)  # Callable interface

Built-in Shapers

ColumnSelector

Select and optionally rename columns.

Configuration:

{
    "columns": List[str],              # Columns to keep
    "rename": Dict[str, str],          # Optional: old_name → new_name
}

Example:

config = {
    "columns": ["benchmark", "ipc", "cache_misses"],
    "rename": {"ipc": "IPC", "cache_misses": "Cache Misses"}
}
shaper = ShaperFactory.create_shaper("column_selector", config)
result = shaper(data)

Location: src/web/services/shapers/column_selector.py

SortShaper

Sort DataFrame by column(s).

Configuration:

{
    "column": str | List[str],         # Column(s) to sort by
    "ascending": bool | List[bool],    # Sort direction
}

Example:

# Single column
config = {"column": "ipc", "ascending": False}
shaper = ShaperFactory.create_shaper("sort", config)

# Multiple columns
config = {
    "column": ["benchmark", "ipc"],
    "ascending": [True, False]
}

Location: src/web/services/shapers/sort_shaper.py

MeanCalculator

Calculate mean values grouped by dimension(s).

Configuration:

{
    "group_by": str | List[str],       # Grouping column(s)
    "value_columns": List[str],        # Columns to average
}

Example:

config = {
    "group_by": "benchmark",
    "value_columns": ["ipc", "cache_hit_rate"]
}
shaper = ShaperFactory.create_shaper("mean_calculator", config)
result = shaper(data)  # Aggregated data

Location: src/web/services/shapers/mean_calculator.py

NormalizeShaper

Normalize values to baseline or range.

Configuration:

{
    "method": str,                     # "baseline" | "minmax" | "zscore"
    "column": str,                     # Column to normalize
    "baseline_value": float,           # For baseline method
    "baseline_column": str,            # Or baseline from column
}

Methods:

"baseline": Divide by baseline value
"minmax": Scale to [0, 1] range
"zscore": Standardize to mean=0, std=1

Example (Baseline):

config = {
    "method": "baseline",
    "column": "ipc",
    "baseline_value": 1.0  # Normalize to 1.0
}
shaper = ShaperFactory.create_shaper("normalize", config)
result = shaper(data)

Example (Min-Max):

config = {
    "method": "minmax",
    "column": "ipc"
}
shaper = ShaperFactory.create_shaper("normalize", config)

Location: src/web/services/shapers/normalize_shaper.py

FilterShaper

Filter rows based on conditions.

Configuration:

{
    "column": str,                     # Column to filter
    "operator": str,                   # Comparison operator
    "value": Any,                      # Value to compare
}

Supported Operators:

">", ">=" - Greater than
"<", "<=" - Less than
"==", "!=" - Equality
"contains" - String contains (case-insensitive)
"in" - Value in list

Example (Numeric):

config = {
    "column": "ipc",
    "operator": ">",
    "value": 1.5
}
shaper = ShaperFactory.create_shaper("filter", config)
result = shaper(data)  # Only rows where ipc > 1.5

Example (String):

config = {
    "column": "benchmark",
    "operator": "in",
    "value": ["mcf", "omnetpp", "xalancbmk"]
}
shaper = ShaperFactory.create_shaper("filter", config)

Location: src/web/services/shapers/filter_shaper.py

TransformerShaper

Apply custom transformation functions.

Configuration:

{
    "transformations": List[Dict],     # List of transformations
}

Transformation Dict:

{
    "type": str,                       # Transformation type
    "column": str,                     # Target column
    "new_column": str,                 # Output column (optional)
    "params": Dict,                    # Transformation parameters
}

Supported Transformations:

"multiply": Multiply by constant
"add": Add constant
"log": Logarithm
"sqrt": Square root
"inverse": 1/x

Example:

config = {
    "transformations": [
        {
            "type": "multiply",
            "column": "cycles",
            "new_column": "cycles_billions",
            "params": {"factor": 1e-9}
        },
        {
            "type": "log",
            "column": "ipc",
            "new_column": "log_ipc",
            "params": {"base": 10}
        }
    ]
}
shaper = ShaperFactory.create_shaper("transformer", config)

Location: src/web/services/shapers/transformer_shaper.py

Pipeline Execution

Function: `apply_shaper_pipeline(data, pipeline)`

Apply multiple shapers sequentially.

Location: src/web/services/shapers/pipeline.py

Parameters:

data (pd.DataFrame): Input data
pipeline (List[Dict]): List of shaper configurations

Returns: pd.DataFrame - Transformed data

Example:

from src.web.services.shapers.pipeline import apply_shaper_pipeline

pipeline = [
    {
        "type": "filter",
        "column": "benchmark",
        "operator": "in",
        "value": ["mcf", "omnetpp"]
    },
    {
        "type": "column_selector",
        "columns": ["benchmark", "ipc"],
        "rename": {"ipc": "IPC"}
    },
    {
        "type": "sort",
        "column": "IPC",
        "ascending": False
    }
]

result = apply_shaper_pipeline(data, pipeline)

Creating Custom Shapers

Template

from typing import Dict, Any
import pandas as pd


class MyCustomShaper:
    """
    Description of transformation.

    Configuration:
        param1 (type): Description
        param2 (type): Description
    """

    def __init__(self, config: Dict[str, Any]) -> None:
        """Initialize with configuration."""
        self.config = config
        self.param1 = config["param1"]
        self.param2 = config.get("param2", default_value)

    def __call__(self, data: pd.DataFrame) -> pd.DataFrame:
        """
        Apply transformation.

        Args:
            data: Input DataFrame

        Returns:
            Transformed DataFrame (new instance)
        """
        # Validate
        if "required_column" not in data.columns:
            raise KeyError("required_column missing")

        # Transform (immutable)
        result = data.copy()
        # ... apply transformation ...
        return result

Registration

Add to ShaperFactory._shapers:

class ShaperFactory:
    _shapers = {
        "column_selector": ColumnSelector,
        "my_custom_shaper": MyCustomShaper,  # Add here
        # ...
    }

Best Practices

Immutability: Always return new DataFrame, never modify input
Validation: Check required columns/config before transformation
Type Hints: Full type annotations on all methods
Error Handling: Clear, actionable error messages
Documentation: Docstrings with examples
Testing: Test basic operation, immutability, errors, edge cases

Error Handling

Common Exceptions

KeyError (missing column):

try:
    result = shaper(data)
except KeyError as e:
    st.error(f"Required column missing: {e}")

ValueError (invalid config):

try:
    shaper = ShaperFactory.create_shaper("sort", {})
except ValueError as e:
    st.error(f"Invalid configuration: {e}")

Type Definitions

ShaperConfig (TypedDict)

from typing import TypedDict, Any, List, Dict

class ShaperConfig(TypedDict, total=False):
    type: str
    # Type-specific fields
    columns: List[str]
    rename: Dict[str, str]
    column: str
    ascending: bool
    group_by: str | List[str]
    value_columns: List[str]
    method: str
    operator: str
    value: Any

Performance

Memory Usage:

Each shaper creates new DataFrame (copy)
Pipeline of N shapers creates N DataFrames
Use efficient pandas operations (vectorized)

Optimization Tips:

Filter early in pipeline (reduce data size)
Select columns before expensive operations
Use vectorized pandas operations, avoid loops

Next Steps

Data Transformations: ../Data-Transformations.md
Adding Shapers: ../Adding-Shapers.md
Pandas Reference: https://pandas.pydata.org/docs/

Shaper API

Overview

ShaperFactory

Class: ShaperFactory

Methods

create_shaper(shaper_type, config)

Built-in Shapers

ColumnSelector

SortShaper

MeanCalculator

NormalizeShaper

FilterShaper

TransformerShaper

Pipeline Execution

Function: apply_shaper_pipeline(data, pipeline)

Creating Custom Shapers

Template

Registration

Best Practices

Error Handling

Common Exceptions

Type Definitions

ShaperConfig (TypedDict)

Performance

Next Steps

Class: `ShaperFactory`

`create_shaper(shaper_type, config)`

Function: `apply_shaper_pipeline(data, pipeline)`