Skip to main content

curriculum_utils

Module: GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils

🎯 Overview

Curriculum learning 📈 utilities that implement adaptive difficulty progression based on training performance. The module provides intelligent curriculum management that automatically adjusts task difficulty by monitoring reward function values and updating learning parameters accordingly.

🧠 Core Concept

Curriculum learning gradually increases task difficulty as the agent improves, leading to more stable and efficient training. The StdUpdater class monitors specific reward metrics and adjusts difficulty parameters (typically standard deviations in reward functions) to maintain optimal learning challenge.

🏆 Class: StdUpdater

Purpose: Implements adaptive curriculum learning by monitoring reward performance and updating difficulty parameters in real-time.

📦 Constructor

def __init__(
self,
std_list: list[float],
reward_key: str,
reward_threshold: float = 0.8,
reward_hist: int = 128,
step_threshold_down: int = 32*1200,
step_threshold_up: int = 32*300
)

Parameters

  • std_list (list[float]): List of difficulty levels (typically standard deviations) from easiest to hardest
  • reward_key (str): Name of the reward term to monitor for curriculum progression
  • reward_threshold (float): Performance threshold for advancing curriculum (default: 0.8)
  • reward_hist (int): Number of episodes to average for performance evaluation (default: 128)
  • step_threshold_down (int): Minimum steps before reducing difficulty (default: 32*1200 = 38,400)
  • step_threshold_up (int): Minimum steps before increasing difficulty (default: 32*300 = 9,600)

State Variables

  • level (int): Current curriculum level (index in std_list)
  • step_num_last_change (int): Step count when curriculum last changed
  • reward_buf (torch.Tensor): Rolling buffer for reward history
  • reward_pos_id (int): Current position in circular reward buffer

Core Method: update()

def update(self, env: ManagerBasedRLEnv) -> float

Purpose: Updates curriculum level based on current performance and returns appropriate difficulty parameter.

Parameters

  • env (ManagerBasedRLEnv): The training environment instance

Returns

  • float: Current difficulty parameter (from std_list[level])

Algorithm Flow

  1. 📊 Performance Monitoring

    # Extract current reward performance
    reward = env.extras["log"][f"Episode_Reward/{self.reward_key}"]
    reward_weight = env.reward_manager._term_cfgs[...].weight

    # Normalize by reward weight
    normalized_reward = reward / reward_weight

    # Update rolling average
    self.reward_buf[self.reward_pos_id] = reward
    average_reward = torch.mean(self.reward_buf).item()
  2. 🎯 Curriculum Decision Logic

    # Increase difficulty (level up)
    if (steps_since_change > step_threshold_up and
    performance > reward_threshold and
    level < max_level):
    level += 1

    # Decrease difficulty (level down)
    elif (steps_since_change > step_threshold_down and
    performance < reward_threshold and
    level > 0):
    level -= 1
  3. 🔄 Parameter Update

    return self.std_list[self.level]  # Return current difficulty parameter

🎪 Usage Example

1. Setup Curriculum Controller

from GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils import StdUpdater

# Create curriculum controller for linear velocity tracking
lin_vel_std_updater = StdUpdater(
std_list=[0.35, 0.34, 0.33, 0.25, 0.2], # Easy → Hard progression
reward_threshold=0.7, # 70% performance threshold
reward_key="track_lin_vel_xy_exp" # Target reward term
)

2. Integrate with Reward Function

def track_lin_vel_xy_yaw_frame_exp_custom(
env, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
"""Adaptive linear velocity tracking with curriculum learning."""

# 🎯 Get adaptive difficulty parameter
adaptive_std = lin_vel_std_updater.update(env)

# Extract robot data
asset = env.scene[asset_cfg.name]
vel_yaw = quat_rotate_inverse(yaw_quat(asset.data.root_quat_w), asset.data.root_lin_vel_w[:, :3])

# Compute tracking error
lin_vel_error = torch.sum(
torch.square(env.command_manager.get_command(command_name)[:, :2] - vel_yaw[:, :2]), dim=1
)

# Return adaptive reward using curriculum-adjusted standard deviation
return torch.exp(-lin_vel_error / adaptive_std**2)

🎯 Curriculum Progression Strategy

📈 Difficulty Levels

std_list = [0.35, 0.34, 0.33, 0.25, 0.2]
# Easy ←─────────────────→ Hard
  • Level 0 (std=0.35): Very forgiving, large tolerance for errors
  • Level 1 (std=0.34): Slightly tighter tolerance
  • Level 2 (std=0.33): Moderate precision required
  • Level 3 (std=0.25): High precision demanded
  • Level 4 (std=0.2): Expert-level accuracy required

⏰ Timing Controls

🔺 Level Up Conditions:

  • Performance > reward_threshold (e.g., 70%)
  • At least step_threshold_up steps since last change (9,600 steps)
  • Not already at maximum difficulty

🔻 Level Down Conditions:

  • Performance < reward_threshold
  • At least step_threshold_down steps since last change (38,400 steps)
  • Not already at minimum difficulty

📊 Performance Evaluation

  • Rolling Average: Uses reward_hist episodes (default: 128) for stable performance assessment
  • Weight Normalization: Accounts for reward term weights in the reward manager
  • Hysteresis: Different thresholds for level up vs. level down prevent oscillation

🎨 Advanced Usage Patterns

🔀 Multiple Curriculum Controllers

# Different curriculum schedules for different skills
position_curriculum = StdUpdater([0.5, 0.3, 0.1], "position_tracking", 0.75)
velocity_curriculum = StdUpdater([0.8, 0.5, 0.2], "velocity_tracking", 0.70)
balance_curriculum = StdUpdater([0.4, 0.2, 0.1], "balance_control", 0.80)

def adaptive_reward_function(env, ...):
pos_std = position_curriculum.update(env)
vel_std = velocity_curriculum.update(env)
bal_std = balance_curriculum.update(env)
# Use adaptive parameters in reward computation

📈 Custom Progression Curves

# Exponential difficulty progression
exponential_curriculum = StdUpdater(
std_list=[1.0, 0.5, 0.25, 0.125, 0.0625], # Each level halves tolerance
reward_key="precision_task"
)

# Linear difficulty progression
linear_curriculum = StdUpdater(
std_list=[0.5, 0.4, 0.3, 0.2, 0.1], # Linear decrease
reward_key="tracking_task"
)

💡 Design Benefits

🎯 Training Efficiency

  • Guided Learning: Prevents overwhelming the agent with impossible tasks
  • Stable Progression: Hysteresis prevents curriculum oscillation
  • Performance-Based: Adapts to actual learning rather than fixed schedules

🔧 Flexibility

  • Configurable Thresholds: Adjustable performance and timing criteria
  • Multiple Metrics: Can monitor different reward terms independently
  • Easy Integration: Simple drop-in replacement for static parameters

📊 Robustness

  • Statistical Stability: Rolling averages reduce noise in curriculum decisions
  • Fallback Protection: Automatic difficulty reduction if performance degrades
  • Monitoring: Built-in logging for curriculum progression analysis

🔗 Integration Points

  • Reward Functions: Primary integration point for adaptive difficulty
  • Training Loops: Automatic curriculum progression during training
  • Environment Manager: Accesses reward and step information
  • Logging Systems: Tracks curriculum progression for analysis