curriculum_utils

Module: GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils

🎯 Overview

Curriculum learning 📈 utilities that implement adaptive difficulty progression based on training performance. The module provides intelligent curriculum management that automatically adjusts task difficulty by monitoring reward function values and updating learning parameters accordingly.

🧠 Core Concept

Curriculum learning gradually increases task difficulty as the agent improves, leading to more stable and efficient training. The StdUpdater class monitors specific reward metrics and adjusts difficulty parameters (typically standard deviations in reward functions) to maintain optimal learning challenge.

🏆 Class: `StdUpdater`

Purpose: Implements adaptive curriculum learning by monitoring reward performance and updating difficulty parameters in real-time.

📦 Constructor

def __init__(
    self, 
    std_list: list[float], 
    reward_key: str, 
    reward_threshold: float = 0.8, 
    reward_hist: int = 128, 
    step_threshold_down: int = 32*1200, 
    step_threshold_up: int = 32*300
)

Parameters

std_list (list[float]): List of difficulty levels (typically standard deviations) from easiest to hardest
reward_key (str): Name of the reward term to monitor for curriculum progression
reward_threshold (float): Performance threshold for advancing curriculum (default: 0.8)
reward_hist (int): Number of episodes to average for performance evaluation (default: 128)
step_threshold_down (int): Minimum steps before reducing difficulty (default: 32*1200 = 38,400)
step_threshold_up (int): Minimum steps before increasing difficulty (default: 32*300 = 9,600)

State Variables

level (int): Current curriculum level (index in std_list)
step_num_last_change (int): Step count when curriculum last changed
reward_buf (torch.Tensor): Rolling buffer for reward history
reward_pos_id (int): Current position in circular reward buffer

⚡ Core Method: `update()`

def update(self, env: ManagerBasedRLEnv) -> float

Purpose: Updates curriculum level based on current performance and returns appropriate difficulty parameter.

Parameters

env (ManagerBasedRLEnv): The training environment instance

Returns

float: Current difficulty parameter (from std_list[level])

Algorithm Flow

📊 Performance Monitoring

# Extract current reward performance
reward = env.extras["log"][f"Episode_Reward/{self.reward_key}"]
reward_weight = env.reward_manager._term_cfgs[...].weight

# Normalize by reward weight
normalized_reward = reward / reward_weight

# Update rolling average
self.reward_buf[self.reward_pos_id] = reward
average_reward = torch.mean(self.reward_buf).item()

🎯 Curriculum Decision Logic

# Increase difficulty (level up)
if (steps_since_change > step_threshold_up and 
    performance > reward_threshold and 
    level < max_level):
    level += 1

# Decrease difficulty (level down)  
elif (steps_since_change > step_threshold_down and
      performance < reward_threshold and
      level > 0):
    level -= 1

🔄 Parameter Update

return self.std_list[self.level]  # Return current difficulty parameter

🎪 Usage Example

1. Setup Curriculum Controller

from GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils import StdUpdater

# Create curriculum controller for linear velocity tracking
lin_vel_std_updater = StdUpdater(
    std_list=[0.35, 0.34, 0.33, 0.25, 0.2],  # Easy → Hard progression
    reward_threshold=0.7,                     # 70% performance threshold
    reward_key="track_lin_vel_xy_exp"        # Target reward term
)

2. Integrate with Reward Function

def track_lin_vel_xy_yaw_frame_exp_custom(
    env, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
    """Adaptive linear velocity tracking with curriculum learning."""
    
    # 🎯 Get adaptive difficulty parameter
    adaptive_std = lin_vel_std_updater.update(env)
    
    # Extract robot data
    asset = env.scene[asset_cfg.name]
    vel_yaw = quat_rotate_inverse(yaw_quat(asset.data.root_quat_w), asset.data.root_lin_vel_w[:, :3])
    
    # Compute tracking error
    lin_vel_error = torch.sum(
        torch.square(env.command_manager.get_command(command_name)[:, :2] - vel_yaw[:, :2]), dim=1
    )
    
    # Return adaptive reward using curriculum-adjusted standard deviation
    return torch.exp(-lin_vel_error / adaptive_std**2)

🎯 Curriculum Progression Strategy

📈 Difficulty Levels

std_list = [0.35, 0.34, 0.33, 0.25, 0.2]
#           Easy ←─────────────────→ Hard

Level 0 (std=0.35): Very forgiving, large tolerance for errors
Level 1 (std=0.34): Slightly tighter tolerance
Level 2 (std=0.33): Moderate precision required
Level 3 (std=0.25): High precision demanded
Level 4 (std=0.2): Expert-level accuracy required

⏰ Timing Controls

🔺 Level Up Conditions:

Performance > reward_threshold (e.g., 70%)
At least step_threshold_up steps since last change (9,600 steps)
Not already at maximum difficulty

🔻 Level Down Conditions:

Performance < reward_threshold
At least step_threshold_down steps since last change (38,400 steps)
Not already at minimum difficulty

📊 Performance Evaluation

Rolling Average: Uses reward_hist episodes (default: 128) for stable performance assessment
Weight Normalization: Accounts for reward term weights in the reward manager
Hysteresis: Different thresholds for level up vs. level down prevent oscillation

🎨 Advanced Usage Patterns

🔀 Multiple Curriculum Controllers

# Different curriculum schedules for different skills
position_curriculum = StdUpdater([0.5, 0.3, 0.1], "position_tracking", 0.75)
velocity_curriculum = StdUpdater([0.8, 0.5, 0.2], "velocity_tracking", 0.70)
balance_curriculum = StdUpdater([0.4, 0.2, 0.1], "balance_control", 0.80)

def adaptive_reward_function(env, ...):
    pos_std = position_curriculum.update(env)
    vel_std = velocity_curriculum.update(env)  
    bal_std = balance_curriculum.update(env)
    # Use adaptive parameters in reward computation

📈 Custom Progression Curves

# Exponential difficulty progression
exponential_curriculum = StdUpdater(
    std_list=[1.0, 0.5, 0.25, 0.125, 0.0625],  # Each level halves tolerance
    reward_key="precision_task"
)

# Linear difficulty progression  
linear_curriculum = StdUpdater(
    std_list=[0.5, 0.4, 0.3, 0.2, 0.1],        # Linear decrease
    reward_key="tracking_task"
)

💡 Design Benefits

🎯 Training Efficiency

Guided Learning: Prevents overwhelming the agent with impossible tasks
Stable Progression: Hysteresis prevents curriculum oscillation
Performance-Based: Adapts to actual learning rather than fixed schedules

🔧 Flexibility

Configurable Thresholds: Adjustable performance and timing criteria
Multiple Metrics: Can monitor different reward terms independently
Easy Integration: Simple drop-in replacement for static parameters

📊 Robustness

Statistical Stability: Rolling averages reduce noise in curriculum decisions
Fallback Protection: Automatic difficulty reduction if performance degrades
Monitoring: Built-in logging for curriculum progression analysis

🔗 Integration Points

Reward Functions: Primary integration point for adaptive difficulty
Training Loops: Automatic curriculum progression during training
Environment Manager: Accesses reward and step information
Logging Systems: Tracks curriculum progression for analysis

🎯 Overview​

🧠 Core Concept​

🏆 Class: StdUpdater​

📦 Constructor​

Parameters​

State Variables​

⚡ Core Method: update()​

Parameters​

Returns​

Algorithm Flow​

🎪 Usage Example​

1. Setup Curriculum Controller​

2. Integrate with Reward Function​

🎯 Curriculum Progression Strategy​

📈 Difficulty Levels​

⏰ Timing Controls​

📊 Performance Evaluation​

🎨 Advanced Usage Patterns​

🔀 Multiple Curriculum Controllers​

📈 Custom Progression Curves​

💡 Design Benefits​

🎯 Training Efficiency​

🔧 Flexibility​

📊 Robustness​

🔗 Integration Points​

🎯 Overview

🧠 Core Concept

🏆 Class: `StdUpdater`

📦 Constructor

Parameters

State Variables

⚡ Core Method: `update()`

Parameters

Returns

Algorithm Flow

🎪 Usage Example

1. Setup Curriculum Controller

2. Integrate with Reward Function

🎯 Curriculum Progression Strategy

📈 Difficulty Levels

⏰ Timing Controls

📊 Performance Evaluation

🎨 Advanced Usage Patterns

🔀 Multiple Curriculum Controllers

📈 Custom Progression Curves

💡 Design Benefits

🎯 Training Efficiency

🔧 Flexibility

📊 Robustness

🔗 Integration Points