curriculum_utils
Module: GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils
🎯 Overview
Curriculum learning 📈 utilities that implement adaptive difficulty progression based on training performance. The module provides intelligent curriculum management that automatically adjusts task difficulty by monitoring reward function values and updating learning parameters accordingly.
🧠 Core Concept
Curriculum learning gradually increases task difficulty as the agent improves, leading to more stable and efficient training. The StdUpdater class monitors specific reward metrics and adjusts difficulty parameters (typically standard deviations in reward functions) to maintain optimal learning challenge.
🏆 Class: StdUpdater
Purpose: Implements adaptive curriculum learning by monitoring reward performance and updating difficulty parameters in real-time.
📦 Constructor
def __init__(
    self, 
    std_list: list[float], 
    reward_key: str, 
    reward_threshold: float = 0.8, 
    reward_hist: int = 128, 
    step_threshold_down: int = 32*1200, 
    step_threshold_up: int = 32*300
)
Parameters
- std_list(- list[float]): List of difficulty levels (typically standard deviations) from easiest to hardest
- reward_key(- str): Name of the reward term to monitor for curriculum progression
- reward_threshold(- float): Performance threshold for advancing curriculum (default: 0.8)
- reward_hist(- int): Number of episodes to average for performance evaluation (default: 128)
- step_threshold_down(- int): Minimum steps before reducing difficulty (default: 32*1200 = 38,400)
- step_threshold_up(- int): Minimum steps before increasing difficulty (default: 32*300 = 9,600)
State Variables
- level(- int): Current curriculum level (index in- std_list)
- step_num_last_change(- int): Step count when curriculum last changed
- reward_buf(- torch.Tensor): Rolling buffer for reward history
- reward_pos_id(- int): Current position in circular reward buffer
⚡ Core Method: update()
def update(self, env: ManagerBasedRLEnv) -> float
Purpose: Updates curriculum level based on current performance and returns appropriate difficulty parameter.
Parameters
- env(- ManagerBasedRLEnv): The training environment instance
Returns
- float: Current difficulty parameter (from- std_list[level])
Algorithm Flow
- 
📊 Performance Monitoring # Extract current reward performance
 reward = env.extras["log"][f"Episode_Reward/{self.reward_key}"]
 reward_weight = env.reward_manager._term_cfgs[...].weight
 # Normalize by reward weight
 normalized_reward = reward / reward_weight
 # Update rolling average
 self.reward_buf[self.reward_pos_id] = reward
 average_reward = torch.mean(self.reward_buf).item()
- 
🎯 Curriculum Decision Logic # Increase difficulty (level up)
 if (steps_since_change > step_threshold_up and
 performance > reward_threshold and
 level < max_level):
 level += 1
 # Decrease difficulty (level down)
 elif (steps_since_change > step_threshold_down and
 performance < reward_threshold and
 level > 0):
 level -= 1
- 
🔄 Parameter Update return self.std_list[self.level] # Return current difficulty parameter
🎪 Usage Example
1. Setup Curriculum Controller
from GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils import StdUpdater
# Create curriculum controller for linear velocity tracking
lin_vel_std_updater = StdUpdater(
    std_list=[0.35, 0.34, 0.33, 0.25, 0.2],  # Easy → Hard progression
    reward_threshold=0.7,                     # 70% performance threshold
    reward_key="track_lin_vel_xy_exp"        # Target reward term
)
2. Integrate with Reward Function
def track_lin_vel_xy_yaw_frame_exp_custom(
    env, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
    """Adaptive linear velocity tracking with curriculum learning."""
    
    # 🎯 Get adaptive difficulty parameter
    adaptive_std = lin_vel_std_updater.update(env)
    
    # Extract robot data
    asset = env.scene[asset_cfg.name]
    vel_yaw = quat_rotate_inverse(yaw_quat(asset.data.root_quat_w), asset.data.root_lin_vel_w[:, :3])
    
    # Compute tracking error
    lin_vel_error = torch.sum(
        torch.square(env.command_manager.get_command(command_name)[:, :2] - vel_yaw[:, :2]), dim=1
    )
    
    # Return adaptive reward using curriculum-adjusted standard deviation
    return torch.exp(-lin_vel_error / adaptive_std**2)
🎯 Curriculum Progression Strategy
📈 Difficulty Levels
std_list = [0.35, 0.34, 0.33, 0.25, 0.2]
#           Easy ←─────────────────→ Hard
- Level 0 (std=0.35): Very forgiving, large tolerance for errors
- Level 1 (std=0.34): Slightly tighter tolerance
- Level 2 (std=0.33): Moderate precision required
- Level 3 (std=0.25): High precision demanded
- Level 4 (std=0.2): Expert-level accuracy required
⏰ Timing Controls
🔺 Level Up Conditions:
- Performance > reward_threshold(e.g., 70%)
- At least step_threshold_upsteps since last change (9,600 steps)
- Not already at maximum difficulty
🔻 Level Down Conditions:
- Performance < reward_threshold
- At least step_threshold_downsteps since last change (38,400 steps)
- Not already at minimum difficulty
📊 Performance Evaluation
- Rolling Average: Uses reward_histepisodes (default: 128) for stable performance assessment
- Weight Normalization: Accounts for reward term weights in the reward manager
- Hysteresis: Different thresholds for level up vs. level down prevent oscillation
🎨 Advanced Usage Patterns
🔀 Multiple Curriculum Controllers
# Different curriculum schedules for different skills
position_curriculum = StdUpdater([0.5, 0.3, 0.1], "position_tracking", 0.75)
velocity_curriculum = StdUpdater([0.8, 0.5, 0.2], "velocity_tracking", 0.70)
balance_curriculum = StdUpdater([0.4, 0.2, 0.1], "balance_control", 0.80)
def adaptive_reward_function(env, ...):
    pos_std = position_curriculum.update(env)
    vel_std = velocity_curriculum.update(env)  
    bal_std = balance_curriculum.update(env)
    # Use adaptive parameters in reward computation
📈 Custom Progression Curves
# Exponential difficulty progression
exponential_curriculum = StdUpdater(
    std_list=[1.0, 0.5, 0.25, 0.125, 0.0625],  # Each level halves tolerance
    reward_key="precision_task"
)
# Linear difficulty progression  
linear_curriculum = StdUpdater(
    std_list=[0.5, 0.4, 0.3, 0.2, 0.1],        # Linear decrease
    reward_key="tracking_task"
)
💡 Design Benefits
🎯 Training Efficiency
- Guided Learning: Prevents overwhelming the agent with impossible tasks
- Stable Progression: Hysteresis prevents curriculum oscillation
- Performance-Based: Adapts to actual learning rather than fixed schedules
🔧 Flexibility
- Configurable Thresholds: Adjustable performance and timing criteria
- Multiple Metrics: Can monitor different reward terms independently
- Easy Integration: Simple drop-in replacement for static parameters
📊 Robustness
- Statistical Stability: Rolling averages reduce noise in curriculum decisions
- Fallback Protection: Automatic difficulty reduction if performance degrades
- Monitoring: Built-in logging for curriculum progression analysis
🔗 Integration Points
- Reward Functions: Primary integration point for adaptive difficulty
- Training Loops: Automatic curriculum progression during training
- Environment Manager: Accesses reward and step information
- Logging Systems: Tracks curriculum progression for analysis