ref_command_manager
Module: GBC.gyms.isaaclab_45.managers.ref_command_manager
This module implements a hybrid command management system that seamlessly switches between reinforcement learning (RL) and imitation learning (IL) modes. The ReferenceCommandManager
extends IsaacLab's standard CommandManager
to intelligently override commands with reference velocities when available, enabling agents to dynamically alternate between imitation of reference motions and free exploration based on randomly generated commands.
🎯 Core Functionality
The ReferenceCommandManager
provides:
- Dual-Mode Operation: Automatic switching between RL (random commands) and IL (reference-based commands) modes
- Intelligent Override: Reference velocities override random commands when available and valid
- Seamless Fallback: Graceful fallback to standard RL commands when reference data is unavailable
- Temporal Synchronization: Real-time integration with reference observation manager for consistent command generation
- Training Flexibility: Enables mixed training scenarios with both imitation and exploration phases
📚 Dependencies
from __future__ import annotations
import inspect
import torch
import weakref
from abc import abstractmethod
from collections.abc import Sequence
from prettytable import PrettyTable
from typing import TYPE_CHECKING
from isaaclab.managers.command_manager import CommandManager
if TYPE_CHECKING:
from isaaclab.envs import ManagerBasedRLEnv
🏭 ReferenceCommandManager Class
Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager
Class Definition:
class ReferenceCommandManager(CommandManager):
"""Manages the reference commands for a reference observation manager.
Overrides command manager's commands when reference are available.
"""
def __init__(self, cfg: object, env: ManagerBasedRLEnv):
super().__init__(cfg, env)
📥 Initialization Parameters:
- cfg (object): Configuration object for command generation (inherits IsaacLab CommandManager configuration)
- env (ManagerBasedRLEnv): Reference-enabled RL environment instance
🔧 Inheritance Structure:
isaaclab.managers.CommandManager
↓ (extends)
GBC.gyms.isaaclab_45.managers.ReferenceCommandManager
🎯 Design Philosophy:
- Backward Compatibility: Fully compatible with existing IsaacLab command configurations
- Reference Integration: Seamless integration with
RefObservationManager
for reference data access - Mode Transparency: Command switching is transparent to training algorithms and reward functions
🔧 Core Methods
📊 String Representation
Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager.__str__
Method Signature:
def __str__(self) -> str:
🔧 Implementation:
def __str__(self):
msg = super().__str__()
return msg.replace("CommandManager", "ReferenceCommandManager")
🎯 Purpose: Provides clear identification of the reference-enabled command manager in logging and debugging output while preserving all inherited status information.
⚡ Intelligent Command Override
Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager.get_command
Method Signature:
def get_command(self, name: str) -> torch.Tensor:
📥 Input Parameters:
- name (str): Command name to retrieve (e.g., "base_velocity")
📤 Return Values:
- torch.Tensor: Command tensor with potential reference velocity override
[num_envs, command_dim]
🔧 Advanced Override Logic:
Phase 1: Standard Command Generation
def get_command(self, name):
# Generate standard RL command using parent class
orig_command = super().get_command(name)
# Only override for base_velocity commands with reference manager available
if hasattr(self._env, "ref_observation_manager") and name == "base_velocity":
# Proceed to reference override logic
else:
# Return standard RL command for all other cases
return orig_command
Phase 2: Reference Data Extraction
try:
# Extract linear velocity from reference observations
# Format: [num_envs, 3] - (x, y, z) linear velocities
lin_vel, lin_mask = self._env.ref_observation_manager.get_term("base_lin_vel")
# Extract angular velocity from reference observations
# Format: [num_envs, 3] - (roll, pitch, yaw) angular velocities
ang_vel, ang_mask = self._env.ref_observation_manager.get_term("base_ang_vel")
# Handle missing mask data
if lin_mask is None:
lin_mask = torch.zeros_like(lin_vel[:, 0], dtype=torch.bool)
# Construct reference command: [lin_x, lin_y, ang_z]
# Note: Only uses planar motion components (x, y translation + z rotation)
override_command = torch.cat([lin_vel[:, :2], ang_vel[:, 2:]], dim=1)
except Exception as e:
# Graceful fallback on any error (missing terms, shape mismatches, etc.)
return orig_command
Phase 3: Selective Command Override
# Apply mask-based selective override
# mask: [num_envs] boolean tensor indicating valid reference data
mask = lin_mask # Use linear velocity mask as primary validity indicator
# Override commands only where reference data is valid
orig_command[mask] = override_command[mask]
return orig_command
🔧 Override Decision Logic:
Reference Mode (IL) Conditions:
- Environment Compatibility:
ref_observation_manager
exists - Command Type: Requested command is
"base_velocity"
- Data Availability: Reference terms
"base_lin_vel"
and"base_ang_vel"
are accessible - Validity Mask: Reference observations have valid mask (not out of sequence bounds)
Fallback Mode (RL) Conditions:
- No Reference Manager: Environment lacks reference observation manager
- Different Command Type: Non-velocity commands (e.g., joint targets, task-specific commands)
- Data Unavailability: Reference terms missing or inaccessible
- Invalid Reference: Reference observations masked as invalid (sequence ended, etc.)
- Exception Handling: Any error during reference data extraction
⚠️ Important Technical Note:
Velocity Accuracy Limitation: Linear velocities computed from AMASS translation data may exhibit inaccuracies due to discrete sampling and numerical differentiation. When using reference velocity commands, the reward weight for
tracking_lin_vel
should be set relatively small to account for this limitation while still providing directional guidance.
🔄 Operational Modes
🎯 Pure Imitation Learning Mode
Scenario: All environments have valid reference data
# All environments follow reference velocities
mask = torch.ones(num_envs, dtype=torch.bool) # All True
final_command[mask] = reference_command[mask] # All override
Characteristics:
- Behavior: Agent follows reference motion trajectories precisely
- Training: Supervised learning from demonstration data
- Command Source: Reference observation manager provides all velocity commands
🔀 Mixed Training Mode
Scenario: Some environments have valid reference data, others use random commands
# Partial override based on reference availability
mask = torch.tensor([True, False, True, False]) # Mixed validity
final_command[mask] = reference_command[mask] # Partial override
Characteristics:
- Behavior: Mixed imitation and exploration across environment batch
- Training: Hybrid learning combining demonstration and exploration
- Command Source: Reference manager + random command generator
🎲 Pure Reinforcement Learning Mode
Scenario: No reference data available or accessible
# No override, standard RL commands used
mask = torch.zeros(num_envs, dtype=torch.bool) # All False
final_command = original_rl_command # No override
Characteristics:
- Behavior: Standard RL exploration and learning
- Training: Pure reinforcement learning from environment interaction
- Command Source: Standard IsaacLab command generator
💡 Usage Examples
🚀 Basic Integration
from GBC.gyms.isaaclab_45.managers import ReferenceCommandManager
from isaaclab.managers import CommandTermCfg
from isaaclab.utils import configclass
# Configure standard RL command generation
@configclass
class CommandCfg:
# Standard base velocity commands for RL training
base_velocity = CommandTermCfg(
func=random_velocity_command, # Random velocity generation
params={
"lin_vel_range": [-2.0, 2.0],
"ang_vel_range": [-1.0, 1.0],
}
)
# Environment configuration with reference command manager
env_cfg.commands = CommandCfg()
# Manager automatically handles RL/IL mode switching
env = ManagerBasedRefRLEnv(env_cfg)
🔄 Dynamic Mode Switching During Training
# Training loop with automatic mode switching
for episode in range(num_episodes):
obs, ref_obs = env.reset()
for step in range(max_episode_steps):
# Get current commands (automatically switches based on reference availability)
velocity_commands = env.command_manager.get_command("base_velocity")
# Commands will be:
# - Reference velocities where ref_obs mask is True (IL mode)
# - Random RL velocities where ref_obs mask is False (RL mode)
action = policy(obs, ref_obs)
obs, ref_obs, reward, done, timeout, extras = env.step(action)
# Log mode distribution for analysis
if hasattr(env, "ref_observation_manager"):
ref_mask = env.ref_observation_manager.get_term("base_lin_vel")[1]
il_ratio = ref_mask.float().mean().item()
print(f"IL/RL ratio: {il_ratio:.2f}/{1-il_ratio:.2f}")
🎯 Custom Command Override
class CustomReferenceCommandManager(ReferenceCommandManager):
"""Extended command manager with custom override logic"""
def get_command(self, name: str) -> torch.Tensor:
orig_command = super(ReferenceCommandManager, self).get_command(name)
if hasattr(self._env, "ref_observation_manager"):
if name == "base_velocity":
# Standard base velocity override
return super().get_command(name)
elif name == "joint_targets":
# Custom joint target override
try:
ref_joint_pos, mask = self._env.ref_observation_manager.get_term("joint_positions")
if mask is not None and mask.any():
orig_command[mask] = ref_joint_pos[mask]
except:
pass
return orig_command
🚨 Best Practices and Guidelines
✅ Configuration Guidelines
- Command Compatibility: Ensure reference observation terms match expected command dimensions
- Velocity Accuracy: Use smaller reward weights for velocity tracking due to numerical limitations
- Error Handling: Leverage built-in exception handling for robust training
- Mask Validation: Verify reference masks properly indicate data validity
🔧 Performance Optimization
- Reference Term Caching: Reference observations are cached by the observation manager
- Selective Override: Only environments with valid reference data incur override costs
- Exception Efficiency: Exception handling provides fast fallback without training interruption
- Memory Efficiency: No additional storage overhead beyond standard command manager
📊 Training Strategy
# Recommended training progression
training_phases = {
"phase_1": {
"reference_ratio": 1.0, # Pure imitation learning
"episodes": 1000,
"focus": "Learning basic motion patterns"
},
"phase_2": {
"reference_ratio": 0.7, # Mixed training
"episodes": 2000,
"focus": "Balancing imitation and exploration"
},
"phase_3": {
"reference_ratio": 0.3, # RL-heavy training
"episodes": 1000,
"focus": "Generalizing beyond demonstrations"
}
}
This intelligent command management system provides seamless integration between imitation learning and reinforcement learning modes, enabling flexible training strategies that leverage both demonstration data and autonomous exploration for robust humanoid robot locomotion learning.