ref_vecenv_wrapper

Module: GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl.ref_vecenv_wrapper

🎯 Overview

The RslRlReferenceVecEnvWrapper is a specialized environment wrapper that bridges RSL-RL training algorithms with IsaacLab reference-based environments. This wrapper enables sophisticated imitation learning by providing both standard policy observations and reference motion data to the training algorithm.

🎪 What Makes This Special?

Unlike standard RL environments that only provide current state observations, this wrapper extends the interface to include reference observations - crucial data about target motions, expert demonstrations, or motion capture sequences that guide the learning process.

🏗️ Class: `RslRlReferenceVecEnvWrapper`

Inherits from: RslRlVecEnvWrapper

Purpose: Wraps ManagerBasedRefRLEnv to provide dual observation streams (policy + reference) for advanced imitation learning algorithms.

📦 Constructor

def __init__(self, env: ManagerBasedRefRLEnv)

Parameters:

env (ManagerBasedRefRLEnv): The reference-enabled IsaacLab environment to wrap

Functionality: Initializes the wrapper around a reference-capable environment, setting up dual observation processing pipelines.

🔧 Core Methods

1. 🎯 Reference Observation Retrieval

def get_reference_observations() -> tuple[ref_obs_type, dict]

Purpose: Computes and retrieves reference observations at the current simulation time.

Returns:

ref_obs_type: Policy-relevant reference observations
dict: Complete reference observation dictionary with metadata

Key Features:

⏰ Time-Synchronized: Uses current episode time for temporal alignment
🎭 Motion Tracking: Provides target states for imitation learning
📊 Structured Data: Organized observation groups for different purposes

Implementation Details:

cur_time = self.unwrapped.episode_length_buf.to(torch.float32) * self.unwrapped.step_dt
obs_dict = self.unwrapped.ref_observation_manager.compute(cur_time)
return obs_dict["policy"], {"ref_observations": obs_dict}

2. 🔄 Enhanced Reset Method

def reset() -> tuple[torch.Tensor, ref_obs_type, dict]

Purpose: Resets the environment and provides both standard and reference initial observations.

Returns:

torch.Tensor: Standard policy observations
ref_obs_type: Reference observations for imitation learning
dict: Complete observation dictionaries with metadata

Key Features:

🎬 Dual Initialization: Sets up both observation streams
📋 Metadata Preservation: Maintains observation structure information
🔄 Synchronized Reset: Ensures temporal alignment between observation types

3. ⚡ Enhanced Step Method

def step(actions: torch.Tensor) -> tuple[torch.Tensor, ref_obs_type, torch.Tensor, torch.Tensor, dict]

Purpose: Executes actions and returns comprehensive step information with dual observations.

Parameters:

actions (torch.Tensor): Actions to execute in the environment

Returns:

torch.Tensor: Next policy observations
ref_obs_type: Next reference observations
torch.Tensor: Rewards from the step
torch.Tensor: Done flags (terminated | truncated)
dict: Additional information and metadata

🚨 Critical Implementation Note:

# IMPORTANT: Reference observations MUST be computed before reward calculation
# to ensure proper term storage updates for reward computation

Key Features:

📊 Dual Observation Streams: Provides both policy and reference data
⏱️ Temporal Synchronization: Maintains time alignment across observation types
🎯 Reward Integration: Supports reference-aware reward computation
🔄 Metadata Management: Preserves observation structure and timing information

🎭 Integration with Training Algorithms

🧠 RSL-RL Compatibility

The wrapper provides the exact interface expected by RSL-RL algorithms while extending it for reference-based learning:

# Standard RL interface
obs, reward, done, info = env.step(action)

# Enhanced reference interface  
obs, ref_obs, reward, done, info = ref_wrapper.step(action)

📊 Observation Flow

graph TD
    A[Environment Step] --> B[Compute Policy Observations]
    A --> C[Compute Reference Observations]
    B --> D[RSL-RL Algorithm]
    C --> D
    D --> E[Action Selection]
    E --> A

🎯 Key Features & Benefits

⚡ Efficient Data Management

Synchronized Computation: Policy and reference observations computed together
Optimized Memory Usage: Efficient tensor operations and data structures
Temporal Alignment: Precise time-based observation coordination

🎭 Advanced Learning Support

Imitation Learning: Enables teacher-student training paradigms
Motion Tracking: Supports motion capture data integration
Curriculum Learning: Facilitates progressive skill development

🔧 Developer-Friendly Design

Clean Interface: Extends familiar RSL-RL patterns
Comprehensive Metadata: Rich information for debugging and analysis
Flexible Integration: Works with various reference data sources

💡 Usage Examples

Basic Reference Training Setup

from GBC.gyms.isaaclab_45.envs import ManagerBasedRefRLEnv
from GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl import RslRlReferenceVecEnvWrapper

# Create reference-enabled environment
env = ManagerBasedRefRLEnv(cfg=your_ref_env_cfg)

# Wrap for RSL-RL compatibility
wrapped_env = RslRlReferenceVecEnvWrapper(env)

# Training loop with dual observations
obs, ref_obs, info = wrapped_env.reset()
for step in range(max_steps):
    action = policy(obs, ref_obs)  # Use both observation types
    obs, ref_obs, reward, done, info = wrapped_env.step(action)

Advanced Feature Access

# Access complete observation dictionaries
full_obs_dict = info["observations"]
full_ref_dict = info["ref_observations"]

# Extract specific observation groups
critic_obs = full_obs_dict["critic"]
reference_poses = full_ref_dict["target_poses"]

🎯 Best Practices

⏰ Timing Considerations: Always ensure reference observations are computed before reward calculations
📊 Memory Management: Use appropriate batch sizes for dual observation processing
🔄 Synchronization: Maintain temporal alignment between policy and reference data
🎭 Learning Strategy: Leverage both observation types for comprehensive policy training

ManagerBasedRefRLEnv: The underlying reference-capable environment
ReferenceObservationManager: Computes and manages reference observations
RSL-RL Algorithms: Training algorithms that consume the dual observation interface

🎯 Overview​

🎪 What Makes This Special?​

🏗️ Class: RslRlReferenceVecEnvWrapper​

📦 Constructor​

🔧 Core Methods​

1. 🎯 Reference Observation Retrieval​

2. 🔄 Enhanced Reset Method​

3. ⚡ Enhanced Step Method​

🎭 Integration with Training Algorithms​

🧠 RSL-RL Compatibility​

📊 Observation Flow​

🎯 Key Features & Benefits​

⚡ Efficient Data Management​

🎭 Advanced Learning Support​

🔧 Developer-Friendly Design​

💡 Usage Examples​

Basic Reference Training Setup​

Advanced Feature Access​

🎯 Best Practices​

🔗 Related Components​

🎯 Overview

🎪 What Makes This Special?

🏗️ Class: `RslRlReferenceVecEnvWrapper`

📦 Constructor

🔧 Core Methods

1. 🎯 Reference Observation Retrieval

2. 🔄 Enhanced Reset Method

3. ⚡ Enhanced Step Method

🎭 Integration with Training Algorithms

🧠 RSL-RL Compatibility

📊 Observation Flow

🎯 Key Features & Benefits

⚡ Efficient Data Management

🎭 Advanced Learning Support

🔧 Developer-Friendly Design

💡 Usage Examples

Basic Reference Training Setup

Advanced Feature Access

🎯 Best Practices

🔗 Related Components