ref_vecenv_wrapper
Module: GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl.ref_vecenv_wrapper
🎯 Overview
The RslRlReferenceVecEnvWrapper
is a specialized environment wrapper that bridges RSL-RL training algorithms with IsaacLab reference-based environments. This wrapper enables sophisticated imitation learning by providing both standard policy observations and reference motion data to the training algorithm.
🎪 What Makes This Special?
Unlike standard RL environments that only provide current state observations, this wrapper extends the interface to include reference observations - crucial data about target motions, expert demonstrations, or motion capture sequences that guide the learning process.
🏗️ Class: RslRlReferenceVecEnvWrapper
Inherits from: RslRlVecEnvWrapper
Purpose: Wraps ManagerBasedRefRLEnv
to provide dual observation streams (policy + reference) for advanced imitation learning algorithms.
📦 Constructor
def __init__(self, env: ManagerBasedRefRLEnv)
Parameters:
env
(ManagerBasedRefRLEnv
): The reference-enabled IsaacLab environment to wrap
Functionality: Initializes the wrapper around a reference-capable environment, setting up dual observation processing pipelines.
🔧 Core Methods
1. 🎯 Reference Observation Retrieval
def get_reference_observations() -> tuple[ref_obs_type, dict]
Purpose: Computes and retrieves reference observations at the current simulation time.
Returns:
ref_obs_type
: Policy-relevant reference observationsdict
: Complete reference observation dictionary with metadata
Key Features:
- ⏰ Time-Synchronized: Uses current episode time for temporal alignment
- 🎭 Motion Tracking: Provides target states for imitation learning
- 📊 Structured Data: Organized observation groups for different purposes
Implementation Details:
cur_time = self.unwrapped.episode_length_buf.to(torch.float32) * self.unwrapped.step_dt
obs_dict = self.unwrapped.ref_observation_manager.compute(cur_time)
return obs_dict["policy"], {"ref_observations": obs_dict}
2. 🔄 Enhanced Reset Method
def reset() -> tuple[torch.Tensor, ref_obs_type, dict]
Purpose: Resets the environment and provides both standard and reference initial observations.
Returns:
torch.Tensor
: Standard policy observationsref_obs_type
: Reference observations for imitation learningdict
: Complete observation dictionaries with metadata
Key Features:
- 🎬 Dual Initialization: Sets up both observation streams
- 📋 Metadata Preservation: Maintains observation structure information
- 🔄 Synchronized Reset: Ensures temporal alignment between observation types
3. ⚡ Enhanced Step Method
def step(actions: torch.Tensor) -> tuple[torch.Tensor, ref_obs_type, torch.Tensor, torch.Tensor, dict]
Purpose: Executes actions and returns comprehensive step information with dual observations.
Parameters:
actions
(torch.Tensor
): Actions to execute in the environment
Returns:
torch.Tensor
: Next policy observationsref_obs_type
: Next reference observationstorch.Tensor
: Rewards from the steptorch.Tensor
: Done flags (terminated | truncated)dict
: Additional information and metadata
🚨 Critical Implementation Note:
# IMPORTANT: Reference observations MUST be computed before reward calculation
# to ensure proper term storage updates for reward computation
Key Features:
- 📊 Dual Observation Streams: Provides both policy and reference data
- ⏱️ Temporal Synchronization: Maintains time alignment across observation types
- 🎯 Reward Integration: Supports reference-aware reward computation
- 🔄 Metadata Management: Preserves observation structure and timing information
🎭 Integration with Training Algorithms
🧠 RSL-RL Compatibility
The wrapper provides the exact interface expected by RSL-RL algorithms while extending it for reference-based learning:
# Standard RL interface
obs, reward, done, info = env.step(action)
# Enhanced reference interface
obs, ref_obs, reward, done, info = ref_wrapper.step(action)
📊 Observation Flow
graph TD
A[Environment Step] --> B[Compute Policy Observations]
A --> C[Compute Reference Observations]
B --> D[RSL-RL Algorithm]
C --> D
D --> E[Action Selection]
E --> A
🎯 Key Features & Benefits
⚡ Efficient Data Management
- Synchronized Computation: Policy and reference observations computed together
- Optimized Memory Usage: Efficient tensor operations and data structures
- Temporal Alignment: Precise time-based observation coordination
🎭 Advanced Learning Support
- Imitation Learning: Enables teacher-student training paradigms
- Motion Tracking: Supports motion capture data integration
- Curriculum Learning: Facilitates progressive skill development
🔧 Developer-Friendly Design
- Clean Interface: Extends familiar RSL-RL patterns
- Comprehensive Metadata: Rich information for debugging and analysis
- Flexible Integration: Works with various reference data sources
💡 Usage Examples
Basic Reference Training Setup
from GBC.gyms.isaaclab_45.envs import ManagerBasedRefRLEnv
from GBC.gyms.isaaclab_45.lab_tasks.utils.wrappers.rsl_rl import RslRlReferenceVecEnvWrapper
# Create reference-enabled environment
env = ManagerBasedRefRLEnv(cfg=your_ref_env_cfg)
# Wrap for RSL-RL compatibility
wrapped_env = RslRlReferenceVecEnvWrapper(env)
# Training loop with dual observations
obs, ref_obs, info = wrapped_env.reset()
for step in range(max_steps):
action = policy(obs, ref_obs) # Use both observation types
obs, ref_obs, reward, done, info = wrapped_env.step(action)
Advanced Feature Access
# Access complete observation dictionaries
full_obs_dict = info["observations"]
full_ref_dict = info["ref_observations"]
# Extract specific observation groups
critic_obs = full_obs_dict["critic"]
reference_poses = full_ref_dict["target_poses"]
🎯 Best Practices
- ⏰ Timing Considerations: Always ensure reference observations are computed before reward calculations
- 📊 Memory Management: Use appropriate batch sizes for dual observation processing
- 🔄 Synchronization: Maintain temporal alignment between policy and reference data
- 🎭 Learning Strategy: Leverage both observation types for comprehensive policy training
🔗 Related Components
ManagerBasedRefRLEnv
: The underlying reference-capable environmentReferenceObservationManager
: Computes and manages reference observations- RSL-RL Algorithms: Training algorithms that consume the dual observation interface