curriculum_utils
Module: GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils
🎯 Overview
Curriculum learning 📈 utilities that implement adaptive difficulty progression based on training performance. The module provides intelligent curriculum management that automatically adjusts task difficulty by monitoring reward function values and updating learning parameters accordingly.
🧠 Core Concept
Curriculum learning gradually increases task difficulty as the agent improves, leading to more stable and efficient training. The StdUpdater
class monitors specific reward metrics and adjusts difficulty parameters (typically standard deviations in reward functions) to maintain optimal learning challenge.
🏆 Class: StdUpdater
Purpose: Implements adaptive curriculum learning by monitoring reward performance and updating difficulty parameters in real-time.
📦 Constructor
def __init__(
self,
std_list: list[float],
reward_key: str,
reward_threshold: float = 0.8,
reward_hist: int = 128,
step_threshold_down: int = 32*1200,
step_threshold_up: int = 32*300
)
Parameters
std_list
(list[float]
): List of difficulty levels (typically standard deviations) from easiest to hardestreward_key
(str
): Name of the reward term to monitor for curriculum progressionreward_threshold
(float
): Performance threshold for advancing curriculum (default: 0.8)reward_hist
(int
): Number of episodes to average for performance evaluation (default: 128)step_threshold_down
(int
): Minimum steps before reducing difficulty (default: 32*1200 = 38,400)step_threshold_up
(int
): Minimum steps before increasing difficulty (default: 32*300 = 9,600)
State Variables
level
(int
): Current curriculum level (index instd_list
)step_num_last_change
(int
): Step count when curriculum last changedreward_buf
(torch.Tensor
): Rolling buffer for reward historyreward_pos_id
(int
): Current position in circular reward buffer
⚡ Core Method: update()
def update(self, env: ManagerBasedRLEnv) -> float
Purpose: Updates curriculum level based on current performance and returns appropriate difficulty parameter.
Parameters
env
(ManagerBasedRLEnv
): The training environment instance
Returns
float
: Current difficulty parameter (fromstd_list[level]
)
Algorithm Flow
-
📊 Performance Monitoring
# Extract current reward performance
reward = env.extras["log"][f"Episode_Reward/{self.reward_key}"]
reward_weight = env.reward_manager._term_cfgs[...].weight
# Normalize by reward weight
normalized_reward = reward / reward_weight
# Update rolling average
self.reward_buf[self.reward_pos_id] = reward
average_reward = torch.mean(self.reward_buf).item() -
🎯 Curriculum Decision Logic
# Increase difficulty (level up)
if (steps_since_change > step_threshold_up and
performance > reward_threshold and
level < max_level):
level += 1
# Decrease difficulty (level down)
elif (steps_since_change > step_threshold_down and
performance < reward_threshold and
level > 0):
level -= 1 -
🔄 Parameter Update
return self.std_list[self.level] # Return current difficulty parameter
🎪 Usage Example
1. Setup Curriculum Controller
from GBC.gyms.isaaclab_45.lab_tasks.curriculum_utils import StdUpdater
# Create curriculum controller for linear velocity tracking
lin_vel_std_updater = StdUpdater(
std_list=[0.35, 0.34, 0.33, 0.25, 0.2], # Easy → Hard progression
reward_threshold=0.7, # 70% performance threshold
reward_key="track_lin_vel_xy_exp" # Target reward term
)
2. Integrate with Reward Function
def track_lin_vel_xy_yaw_frame_exp_custom(
env, std: float, command_name: str, asset_cfg: SceneEntityCfg = SceneEntityCfg("robot")
) -> torch.Tensor:
"""Adaptive linear velocity tracking with curriculum learning."""
# 🎯 Get adaptive difficulty parameter
adaptive_std = lin_vel_std_updater.update(env)
# Extract robot data
asset = env.scene[asset_cfg.name]
vel_yaw = quat_rotate_inverse(yaw_quat(asset.data.root_quat_w), asset.data.root_lin_vel_w[:, :3])
# Compute tracking error
lin_vel_error = torch.sum(
torch.square(env.command_manager.get_command(command_name)[:, :2] - vel_yaw[:, :2]), dim=1
)
# Return adaptive reward using curriculum-adjusted standard deviation
return torch.exp(-lin_vel_error / adaptive_std**2)
🎯 Curriculum Progression Strategy
📈 Difficulty Levels
std_list = [0.35, 0.34, 0.33, 0.25, 0.2]
# Easy ←─────────────────→ Hard
- Level 0 (std=0.35): Very forgiving, large tolerance for errors
- Level 1 (std=0.34): Slightly tighter tolerance
- Level 2 (std=0.33): Moderate precision required
- Level 3 (std=0.25): High precision demanded
- Level 4 (std=0.2): Expert-level accuracy required
⏰ Timing Controls
🔺 Level Up Conditions:
- Performance >
reward_threshold
(e.g., 70%) - At least
step_threshold_up
steps since last change (9,600 steps) - Not already at maximum difficulty
🔻 Level Down Conditions:
- Performance <
reward_threshold
- At least
step_threshold_down
steps since last change (38,400 steps) - Not already at minimum difficulty
📊 Performance Evaluation
- Rolling Average: Uses
reward_hist
episodes (default: 128) for stable performance assessment - Weight Normalization: Accounts for reward term weights in the reward manager
- Hysteresis: Different thresholds for level up vs. level down prevent oscillation
🎨 Advanced Usage Patterns
🔀 Multiple Curriculum Controllers
# Different curriculum schedules for different skills
position_curriculum = StdUpdater([0.5, 0.3, 0.1], "position_tracking", 0.75)
velocity_curriculum = StdUpdater([0.8, 0.5, 0.2], "velocity_tracking", 0.70)
balance_curriculum = StdUpdater([0.4, 0.2, 0.1], "balance_control", 0.80)
def adaptive_reward_function(env, ...):
pos_std = position_curriculum.update(env)
vel_std = velocity_curriculum.update(env)
bal_std = balance_curriculum.update(env)
# Use adaptive parameters in reward computation
📈 Custom Progression Curves
# Exponential difficulty progression
exponential_curriculum = StdUpdater(
std_list=[1.0, 0.5, 0.25, 0.125, 0.0625], # Each level halves tolerance
reward_key="precision_task"
)
# Linear difficulty progression
linear_curriculum = StdUpdater(
std_list=[0.5, 0.4, 0.3, 0.2, 0.1], # Linear decrease
reward_key="tracking_task"
)
💡 Design Benefits
🎯 Training Efficiency
- Guided Learning: Prevents overwhelming the agent with impossible tasks
- Stable Progression: Hysteresis prevents curriculum oscillation
- Performance-Based: Adapts to actual learning rather than fixed schedules
🔧 Flexibility
- Configurable Thresholds: Adjustable performance and timing criteria
- Multiple Metrics: Can monitor different reward terms independently
- Easy Integration: Simple drop-in replacement for static parameters
📊 Robustness
- Statistical Stability: Rolling averages reduce noise in curriculum decisions
- Fallback Protection: Automatic difficulty reduction if performance degrades
- Monitoring: Built-in logging for curriculum progression analysis
🔗 Integration Points
- Reward Functions: Primary integration point for adaptive difficulty
- Training Loops: Automatic curriculum progression during training
- Environment Manager: Accesses reward and step information
- Logging Systems: Tracks curriculum progression for analysis