ref_command_manager

Module: GBC.gyms.isaaclab_45.managers.ref_command_manager

This module implements a hybrid command management system that seamlessly switches between reinforcement learning (RL) and imitation learning (IL) modes. The ReferenceCommandManager extends IsaacLab's standard CommandManager to intelligently override commands with reference velocities when available, enabling agents to dynamically alternate between imitation of reference motions and free exploration based on randomly generated commands.

🎯 Core Functionality

The ReferenceCommandManager provides:

Dual-Mode Operation: Automatic switching between RL (random commands) and IL (reference-based commands) modes
Intelligent Override: Reference velocities override random commands when available and valid
Seamless Fallback: Graceful fallback to standard RL commands when reference data is unavailable
Temporal Synchronization: Real-time integration with reference observation manager for consistent command generation
Training Flexibility: Enables mixed training scenarios with both imitation and exploration phases

📚 Dependencies

from __future__ import annotations
import inspect
import torch
import weakref
from abc import abstractmethod
from collections.abc import Sequence
from prettytable import PrettyTable
from typing import TYPE_CHECKING
from isaaclab.managers.command_manager import CommandManager

if TYPE_CHECKING:
    from isaaclab.envs import ManagerBasedRLEnv

🏭 ReferenceCommandManager Class

Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager

Class Definition:

class ReferenceCommandManager(CommandManager):
    """Manages the reference commands for a reference observation manager.

    Overrides command manager's commands when reference are available.
    """
    def __init__(self, cfg: object, env: ManagerBasedRLEnv):
        super().__init__(cfg, env)

📥 Initialization Parameters:

cfg (object): Configuration object for command generation (inherits IsaacLab CommandManager configuration)
env (ManagerBasedRLEnv): Reference-enabled RL environment instance

🔧 Inheritance Structure:

isaaclab.managers.CommandManager
    ↓ (extends)
GBC.gyms.isaaclab_45.managers.ReferenceCommandManager

🎯 Design Philosophy:

Backward Compatibility: Fully compatible with existing IsaacLab command configurations
Reference Integration: Seamless integration with RefObservationManager for reference data access
Mode Transparency: Command switching is transparent to training algorithms and reward functions

🔧 Core Methods

📊 String Representation

Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager.__str__

Method Signature:

def __str__(self) -> str:

🔧 Implementation:

def __str__(self):
    msg = super().__str__()
    return msg.replace("CommandManager", "ReferenceCommandManager")

🎯 Purpose: Provides clear identification of the reference-enabled command manager in logging and debugging output while preserving all inherited status information.

⚡ Intelligent Command Override

Module Name: GBC.gyms.isaaclab_45.managers.ref_command_manager.ReferenceCommandManager.get_command

Method Signature:

def get_command(self, name: str) -> torch.Tensor:

📥 Input Parameters:

name (str): Command name to retrieve (e.g., "base_velocity")

📤 Return Values:

torch.Tensor: Command tensor with potential reference velocity override [num_envs, command_dim]

🔧 Advanced Override Logic:

Phase 1: Standard Command Generation

def get_command(self, name):
    # Generate standard RL command using parent class
    orig_command = super().get_command(name)
    
    # Only override for base_velocity commands with reference manager available
    if hasattr(self._env, "ref_observation_manager") and name == "base_velocity":
        # Proceed to reference override logic
    else:
        # Return standard RL command for all other cases
        return orig_command

Phase 2: Reference Data Extraction

try:
    # Extract linear velocity from reference observations
    # Format: [num_envs, 3] - (x, y, z) linear velocities
    lin_vel, lin_mask = self._env.ref_observation_manager.get_term("base_lin_vel")
    
    # Extract angular velocity from reference observations  
    # Format: [num_envs, 3] - (roll, pitch, yaw) angular velocities
    ang_vel, ang_mask = self._env.ref_observation_manager.get_term("base_ang_vel")
    
    # Handle missing mask data
    if lin_mask is None:
        lin_mask = torch.zeros_like(lin_vel[:, 0], dtype=torch.bool)
    
    # Construct reference command: [lin_x, lin_y, ang_z]
    # Note: Only uses planar motion components (x, y translation + z rotation)
    override_command = torch.cat([lin_vel[:, :2], ang_vel[:, 2:]], dim=1)
    
except Exception as e:
    # Graceful fallback on any error (missing terms, shape mismatches, etc.)
    return orig_command

Phase 3: Selective Command Override

# Apply mask-based selective override
# mask: [num_envs] boolean tensor indicating valid reference data
mask = lin_mask  # Use linear velocity mask as primary validity indicator

# Override commands only where reference data is valid
orig_command[mask] = override_command[mask]

return orig_command

🔧 Override Decision Logic:

Reference Mode (IL) Conditions:

Environment Compatibility: ref_observation_manager exists
Command Type: Requested command is "base_velocity"
Data Availability: Reference terms "base_lin_vel" and "base_ang_vel" are accessible
Validity Mask: Reference observations have valid mask (not out of sequence bounds)

Fallback Mode (RL) Conditions:

No Reference Manager: Environment lacks reference observation manager
Different Command Type: Non-velocity commands (e.g., joint targets, task-specific commands)
Data Unavailability: Reference terms missing or inaccessible
Invalid Reference: Reference observations masked as invalid (sequence ended, etc.)
Exception Handling: Any error during reference data extraction

⚠️ Important Technical Note:

Velocity Accuracy Limitation: Linear velocities computed from AMASS translation data may exhibit inaccuracies due to discrete sampling and numerical differentiation. When using reference velocity commands, the reward weight for tracking_lin_vel should be set relatively small to account for this limitation while still providing directional guidance.

🔄 Operational Modes

🎯 Pure Imitation Learning Mode

Scenario: All environments have valid reference data

# All environments follow reference velocities
mask = torch.ones(num_envs, dtype=torch.bool)  # All True
final_command[mask] = reference_command[mask]   # All override

Characteristics:

Behavior: Agent follows reference motion trajectories precisely
Training: Supervised learning from demonstration data
Command Source: Reference observation manager provides all velocity commands

🔀 Mixed Training Mode

Scenario: Some environments have valid reference data, others use random commands

# Partial override based on reference availability
mask = torch.tensor([True, False, True, False])  # Mixed validity
final_command[mask] = reference_command[mask]    # Partial override

Characteristics:

Behavior: Mixed imitation and exploration across environment batch
Training: Hybrid learning combining demonstration and exploration
Command Source: Reference manager + random command generator

🎲 Pure Reinforcement Learning Mode

Scenario: No reference data available or accessible

# No override, standard RL commands used
mask = torch.zeros(num_envs, dtype=torch.bool)  # All False  
final_command = original_rl_command             # No override

Characteristics:

Behavior: Standard RL exploration and learning
Training: Pure reinforcement learning from environment interaction
Command Source: Standard IsaacLab command generator

💡 Usage Examples

🚀 Basic Integration

from GBC.gyms.isaaclab_45.managers import ReferenceCommandManager
from isaaclab.managers import CommandTermCfg
from isaaclab.utils import configclass

# Configure standard RL command generation
@configclass
class CommandCfg:
    # Standard base velocity commands for RL training
    base_velocity = CommandTermCfg(
        func=random_velocity_command,  # Random velocity generation
        params={
            "lin_vel_range": [-2.0, 2.0],
            "ang_vel_range": [-1.0, 1.0],
        }
    )

# Environment configuration with reference command manager
env_cfg.commands = CommandCfg()

# Manager automatically handles RL/IL mode switching
env = ManagerBasedRefRLEnv(env_cfg)

🔄 Dynamic Mode Switching During Training

# Training loop with automatic mode switching
for episode in range(num_episodes):
    obs, ref_obs = env.reset()
    
    for step in range(max_episode_steps):
        # Get current commands (automatically switches based on reference availability)
        velocity_commands = env.command_manager.get_command("base_velocity")
        
        # Commands will be:
        # - Reference velocities where ref_obs mask is True (IL mode)
        # - Random RL velocities where ref_obs mask is False (RL mode)
        
        action = policy(obs, ref_obs)
        obs, ref_obs, reward, done, timeout, extras = env.step(action)
        
        # Log mode distribution for analysis
        if hasattr(env, "ref_observation_manager"):
            ref_mask = env.ref_observation_manager.get_term("base_lin_vel")[1]
            il_ratio = ref_mask.float().mean().item()
            print(f"IL/RL ratio: {il_ratio:.2f}/{1-il_ratio:.2f}")

🎯 Custom Command Override

class CustomReferenceCommandManager(ReferenceCommandManager):
    """Extended command manager with custom override logic"""
    
    def get_command(self, name: str) -> torch.Tensor:
        orig_command = super(ReferenceCommandManager, self).get_command(name)
        
        if hasattr(self._env, "ref_observation_manager"):
            if name == "base_velocity":
                # Standard base velocity override
                return super().get_command(name)
            
            elif name == "joint_targets":
                # Custom joint target override
                try:
                    ref_joint_pos, mask = self._env.ref_observation_manager.get_term("joint_positions")
                    if mask is not None and mask.any():
                        orig_command[mask] = ref_joint_pos[mask]
                except:
                    pass
                    
        return orig_command

🚨 Best Practices and Guidelines

✅ Configuration Guidelines

Command Compatibility: Ensure reference observation terms match expected command dimensions
Velocity Accuracy: Use smaller reward weights for velocity tracking due to numerical limitations
Error Handling: Leverage built-in exception handling for robust training
Mask Validation: Verify reference masks properly indicate data validity

🔧 Performance Optimization

Reference Term Caching: Reference observations are cached by the observation manager
Selective Override: Only environments with valid reference data incur override costs
Exception Efficiency: Exception handling provides fast fallback without training interruption
Memory Efficiency: No additional storage overhead beyond standard command manager

📊 Training Strategy

# Recommended training progression
training_phases = {
    "phase_1": {
        "reference_ratio": 1.0,     # Pure imitation learning
        "episodes": 1000,
        "focus": "Learning basic motion patterns"
    },
    "phase_2": {
        "reference_ratio": 0.7,     # Mixed training  
        "episodes": 2000,
        "focus": "Balancing imitation and exploration"
    },
    "phase_3": {
        "reference_ratio": 0.3,     # RL-heavy training
        "episodes": 1000, 
        "focus": "Generalizing beyond demonstrations"
    }
}

This intelligent command management system provides seamless integration between imitation learning and reinforcement learning modes, enabling flexible training strategies that leverage both demonstration data and autonomous exploration for robust humanoid robot locomotion learning.

🎯 Core Functionality​

📚 Dependencies​

🏭 ReferenceCommandManager Class​

🔧 Core Methods​

📊 String Representation​

⚡ Intelligent Command Override​

Phase 1: Standard Command Generation​

Phase 2: Reference Data Extraction​

Phase 3: Selective Command Override​

Reference Mode (IL) Conditions:​

Fallback Mode (RL) Conditions:​

🔄 Operational Modes​

🎯 Pure Imitation Learning Mode​

🔀 Mixed Training Mode​

🎲 Pure Reinforcement Learning Mode​

💡 Usage Examples​

🚀 Basic Integration​

🔄 Dynamic Mode Switching During Training​

🎯 Custom Command Override​

🚨 Best Practices and Guidelines​

✅ Configuration Guidelines​

🔧 Performance Optimization​

📊 Training Strategy​

🎯 Core Functionality

📚 Dependencies

🏭 ReferenceCommandManager Class

🔧 Core Methods

📊 String Representation

⚡ Intelligent Command Override

Phase 1: Standard Command Generation

Phase 2: Reference Data Extraction

Phase 3: Selective Command Override

Reference Mode (IL) Conditions:

Fallback Mode (RL) Conditions:

🔄 Operational Modes

🎯 Pure Imitation Learning Mode

🔀 Mixed Training Mode

🎲 Pure Reinforcement Learning Mode

💡 Usage Examples

🚀 Basic Integration

🔄 Dynamic Mode Switching During Training

🎯 Custom Command Override

🚨 Best Practices and Guidelines

✅ Configuration Guidelines

🔧 Performance Optimization

📊 Training Strategy