Skip to main content

ref_vecenv_wrapper

Module: GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl.ref_vecenv_wrapper

🎯 Overview

The RslRlReferenceVecEnvWrapper is a specialized environment wrapper that bridges RSL-RL training algorithms with IsaacLab reference-based environments. This wrapper enables sophisticated imitation learning by providing both standard policy observations and reference motion data to the training algorithm.

🎪 What Makes This Special?

Unlike standard RL environments that only provide current state observations, this wrapper extends the interface to include reference observations - crucial data about target motions, expert demonstrations, or motion capture sequences that guide the learning process.

🏗️ Class: RslRlReferenceVecEnvWrapper

Inherits from: RslRlVecEnvWrapper

Purpose: Wraps ManagerBasedRefRLEnv to provide dual observation streams (policy + reference) for advanced imitation learning algorithms.

📦 Constructor

def __init__(self, env: ManagerBasedRefRLEnv)

Parameters:

  • env (ManagerBasedRefRLEnv): The reference-enabled IsaacLab environment to wrap

Functionality: Initializes the wrapper around a reference-capable environment, setting up dual observation processing pipelines.

🔧 Core Methods

1. 🎯 Reference Observation Retrieval

def get_reference_observations() -> tuple[ref_obs_type, dict]

Purpose: Computes and retrieves reference observations at the current simulation time.

Returns:

  • ref_obs_type: Policy-relevant reference observations
  • dict: Complete reference observation dictionary with metadata

Key Features:

  • Time-Synchronized: Uses current episode time for temporal alignment
  • 🎭 Motion Tracking: Provides target states for imitation learning
  • 📊 Structured Data: Organized observation groups for different purposes

Implementation Details:

cur_time = self.unwrapped.episode_length_buf.to(torch.float32) * self.unwrapped.step_dt
obs_dict = self.unwrapped.ref_observation_manager.compute(cur_time)
return obs_dict["policy"], {"ref_observations": obs_dict}

2. 🔄 Enhanced Reset Method

def reset() -> tuple[torch.Tensor, ref_obs_type, dict]

Purpose: Resets the environment and provides both standard and reference initial observations.

Returns:

  • torch.Tensor: Standard policy observations
  • ref_obs_type: Reference observations for imitation learning
  • dict: Complete observation dictionaries with metadata

Key Features:

  • 🎬 Dual Initialization: Sets up both observation streams
  • 📋 Metadata Preservation: Maintains observation structure information
  • 🔄 Synchronized Reset: Ensures temporal alignment between observation types

3. ⚡ Enhanced Step Method

def step(actions: torch.Tensor) -> tuple[torch.Tensor, ref_obs_type, torch.Tensor, torch.Tensor, dict]

Purpose: Executes actions and returns comprehensive step information with dual observations.

Parameters:

  • actions (torch.Tensor): Actions to execute in the environment

Returns:

  • torch.Tensor: Next policy observations
  • ref_obs_type: Next reference observations
  • torch.Tensor: Rewards from the step
  • torch.Tensor: Done flags (terminated | truncated)
  • dict: Additional information and metadata

🚨 Critical Implementation Note:

# IMPORTANT: Reference observations MUST be computed before reward calculation
# to ensure proper term storage updates for reward computation

Key Features:

  • 📊 Dual Observation Streams: Provides both policy and reference data
  • ⏱️ Temporal Synchronization: Maintains time alignment across observation types
  • 🎯 Reward Integration: Supports reference-aware reward computation
  • 🔄 Metadata Management: Preserves observation structure and timing information

🎭 Integration with Training Algorithms

🧠 RSL-RL Compatibility

The wrapper provides the exact interface expected by RSL-RL algorithms while extending it for reference-based learning:

# Standard RL interface
obs, reward, done, info = env.step(action)

# Enhanced reference interface
obs, ref_obs, reward, done, info = ref_wrapper.step(action)

📊 Observation Flow

graph TD
A[Environment Step] --> B[Compute Policy Observations]
A --> C[Compute Reference Observations]
B --> D[RSL-RL Algorithm]
C --> D
D --> E[Action Selection]
E --> A

🎯 Key Features & Benefits

⚡ Efficient Data Management

  • Synchronized Computation: Policy and reference observations computed together
  • Optimized Memory Usage: Efficient tensor operations and data structures
  • Temporal Alignment: Precise time-based observation coordination

🎭 Advanced Learning Support

  • Imitation Learning: Enables teacher-student training paradigms
  • Motion Tracking: Supports motion capture data integration
  • Curriculum Learning: Facilitates progressive skill development

🔧 Developer-Friendly Design

  • Clean Interface: Extends familiar RSL-RL patterns
  • Comprehensive Metadata: Rich information for debugging and analysis
  • Flexible Integration: Works with various reference data sources

💡 Usage Examples

Basic Reference Training Setup

from GBC.gyms.isaaclab_45.envs import ManagerBasedRefRLEnv
from GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl import RslRlReferenceVecEnvWrapper

# Create reference-enabled environment
env = ManagerBasedRefRLEnv(cfg=your_ref_env_cfg)

# Wrap for RSL-RL compatibility
wrapped_env = RslRlReferenceVecEnvWrapper(env)

# Training loop with dual observations
obs, ref_obs, info = wrapped_env.reset()
for step in range(max_steps):
action = policy(obs, ref_obs) # Use both observation types
obs, ref_obs, reward, done, info = wrapped_env.step(action)

Advanced Feature Access

# Access complete observation dictionaries
full_obs_dict = info["observations"]
full_ref_dict = info["ref_observations"]

# Extract specific observation groups
critic_obs = full_obs_dict["critic"]
reference_poses = full_ref_dict["target_poses"]

🎯 Best Practices

  1. ⏰ Timing Considerations: Always ensure reference observations are computed before reward calculations
  2. 📊 Memory Management: Use appropriate batch sizes for dual observation processing
  3. 🔄 Synchronization: Maintain temporal alignment between policy and reference data
  4. 🎭 Learning Strategy: Leverage both observation types for comprehensive policy training
  • ManagerBasedRefRLEnv: The underlying reference-capable environment
  • ReferenceObservationManager: Computes and manages reference observations
  • RSL-RL Algorithms: Training algorithms that consume the dual observation interface