Skip to main content

📊 Prepare Data Path

GBC requires several datasets and model files to function properly. This section will guide you through downloading the necessary data and configuring the file paths. The process involves downloading motion capture data, body model parameters, and setting up the configuration files.

🙄 Multiple Account Registration Required

Unfortunately, the Max Planck Institute (MPI) requires separate account registrations for each dataset. You'll need to create three different accounts for the following websites:

  • AMASS dataset
  • SMPL+H models
  • DMPL models

Yes, it's as inconvenient as it sounds, but that's how MPI has set up their system! 🤷‍♂️

1. 📥 Download AMASS Dataset

The AMASS dataset contains high-quality human motion data that GBC uses for training and evaluation.

Step 1: Register Account

First, create an account at https://amass.is.tue.mpg.de/

Step 2: Download Using GBC's Automated Script

GBC provides a convenient Python script to download the AMASS dataset with resume capability:

# Navigate to your GBC installation directory
cd /path/to/your/GBC

# Run the download script
python -m GBC.utils.data_preparation.download_amass \
--username YOUR_EMAIL \
--password YOUR_PASSWORD \
--output_dir /path/to/your/dataset/AMASS_FULL \
--smplh_only \
--cleanup

Command Line Parameters

  • --username: Your AMASS account email
  • --password: Your AMASS account password
  • --output_dir: Directory where the dataset will be downloaded
  • --smplh_only: ✅ Recommended - Only download SMPL+H compatible data (saves space and time)
  • --cleanup: Automatically delete compressed files after extraction
  • --specialize: (Optional) Download specific datasets like 'CMU', 'BioMotionLab_NTroje', etc.
  • --force: Force re-download even if files exist
💡 Pro Tips
  • The script supports resume functionality - if the download is interrupted, simply run the same command again
  • Using --smplh_only reduces download size significantly as GBC primarily uses SMPL+H models
  • The download can take several hours depending on your internet connection

Example Usage

# Download all SMPL+H datasets
python -m GBC.utils.data_preparation.download_amass \
-u your.email@example.com \
-p your_password \
-o ~/dataset/AMASS_FULL \
--smplh_only

# Download only CMU motion capture data
python -m GBC.utils.data_preparation.download_amass \
-u your.email@example.com \
-p your_password \
-o ~/dataset/AMASS_FULL \
--smplh_only \
--specialize CMU

2. 🤖 Download SMPL+H Model Parameters

The SMPL+H model provides the human body representation that GBC uses for motion processing.

Step 1: Register Account

Create an account at https://mano.is.tue.mpg.de/

Step 2: Download Model Files

Navigate to the Download section and select:

  • "Extended SMPL+H model (used in AMASS project)"

Option A: Manual Download

  1. Log in to the website
  2. Go to Downloads page
  3. Click on "Extended SMPL+H model (used in AMASS project)"
  4. Download and extract to your dataset directory

Option B: Direct Download (Advanced)

If you're comfortable with command-line tools:

# Using curl (replace with your credentials)
curl -u "your_username:your_password" \
"https://download.is.tue.mpg.de/download.php?domain=mano&resume=1&sfile=smplh.tar.xz" \
-o smplh.tar.xz

# Extract the archive
tar -xf smplh.tar.xz

Option C: Python Script (Advanced)

For developers familiar with Python, you can adapt the download_amass.py script for SMPL+H downloads.

Step 3: Verify Installation

Ensure the following file exists after extraction:

your_dataset_dir/
└── smplh/
└── neutral/
└── model.npz ✅ This file must exist

3. 🎭 Download DMPL Model Parameters

DMPL models provide additional body shape representations compatible with SMPL.

Step 1: Register Account

Create another account at https://smpl.is.tue.mpg.de/

Step 2: Download DMPL Files

  1. Navigate to https://smpl.is.tue.mpg.de/download.php
  2. Find the row: "Download DMPLs compatible with SMPL"
  3. Click download

Option A: Manual Download

Follow the website interface to download the files.

Option B: Direct Download (Advanced)

# Using direct download link
curl -u "your_username:your_password" \
"https://download.is.tue.mpg.de/download.php?domain=smpl&sfile=dmpls.tar.xz" \
-o dmpls.tar.xz

# Extract the archive
tar -xf dmpls.tar.xz

Step 3: Verify Installation

Ensure the following file exists:

your_dataset_dir/
└── dmpls/
└── male/
└── model.npz ✅ This file must exist

4. ⚙️ Configure Data Paths

Now you need to configure GBC to know where your data files are located. GBC uses a centralized configuration system in the assets.py file.

Understanding the Configuration

GBC's data configuration is defined in GBC/utils/base/assets.py. Here's what you need to configure:

@configclass
class DataPathsCfg:
# Main dataset directory
dataset_path: str = "/your/dataset/path"

# URDF file for robot model
urdf_path: str = "unitree_ros/robots/h1_2_description/h1_2.urdf"

# SMPL+H model
smplh_model_path: str = "smplh/neutral/model.npz"

# DMPL model
dmpls_model_path: str = "dmpls/male/model.npz"

# AMASS dataset
amass_dataset_path: str = "AMASS_FULL/"

Method 1: Modify assets.py Directly

Edit the GBC/utils/base/assets.py file:

# Find the DATA_PATHS configuration near the bottom of the file
DATA_PATHS = DataPathsCfg(
dataset_path="/your/actual/dataset/path", # Update this path
# Other paths are relative to dataset_path
)

Set the dataset path via environment variable:

# Add to your ~/.bashrc or ~/.zshrc
export GBC_DATASET_PATH="/your/dataset/path"

# Or set temporarily
export GBC_DATASET_PATH="/home/username/dataset"

Step 3: Verify Configuration

Test your configuration:

# Test in Python
from GBC.utils.base.assets import DATA_PATHS

print("Dataset path:", DATA_PATHS.dataset_path)
print("SMPL+H model:", DATA_PATHS.smplh_model_path)
print("DMPL model:", DATA_PATHS.dmpls_model_path)
print("AMASS dataset:", DATA_PATHS.amass_dataset_path)

# Check if files exist
import os
assert os.path.exists(DATA_PATHS.smplh_model_path), "SMPL+H model not found!"
assert os.path.exists(DATA_PATHS.dmpls_model_path), "DMPL model not found!"
assert os.path.exists(DATA_PATHS.amass_dataset_path), "AMASS dataset not found!"

print("✅ All data paths configured correctly!")

📁 Expected Directory Structure

After completing all steps, your dataset directory should look like this:

your_dataset_dir/
├── AMASS_FULL/
│ ├── CMU/
│ ├── BioMotionLab_NTroje/
│ ├── EKUT/
│ └── ... (other AMASS datasets)
├── smplh/
│ └── neutral/
│ └── model.npz
├── dmpls/
│ └── male/
│ └── model.npz
└── unitree_ros/ # this is just an example of unitree, replace this with your own robot's URDF
└── robots/
└── h1_2_description/
└── h1_2.urdf

5. ⚙️ Configure Data Preparation Settings

🙏 Development Note

The author apologizes for this additional configuration step. Due to the project being primarily maintained by a single developer, having two separate configuration files is a legacy issue. There wasn't sufficient time to unify all configurations before open-sourcing.

Community contributions are welcome! If you're interested in helping with:

  • Configuration unification
  • Code formatting tools and PR validation setup
  • General codebase improvements

Please feel free to submit pull requests! 🤝

The data_preparation_cfg.py file defines various configuration classes needed for motion retargeting and data preprocessing workflows. This centralized configuration system helps manage dependencies across multiple projects efficiently.

Understanding the Configuration Structure

The configuration file (GBC/utils/data_preparation/data_preparation_cfg.py) contains several specialized configuration classes:

🏗️ Base Configuration

@configclass
class BaseCfg:
smplh_model_path="/path/to/smplh/male/model.npz"
dmpls_model_path="/path/to/dmpls/male/model.npz"
urdf_path="/path/to/robot.urdf"
device="cuda" # or "cpu"

📊 Dataset Configurations

  • AMASSDatasetCfg: Basic AMASS dataset loading
  • AMASSDatasetInterpolateCfg: With temporal interpolation
  • AMASSDatasetSingleFrameCfg: Single frame processing

🤖 Robot Configuration

  • RobotKinematicsCfg: Defines mapping between SMPL joints and robot joints
  • FilterCfg: Signal filtering parameters for motion smoothing

🔄 Processing Configurations

  • AMASSActionConverterCfg: For converting human motions to robot actions
  • PoseRendererCfg: For rendering and visualization

Configuration Options

You have two approaches to configure these settings:

Option 1: Direct Modification (Simple)

Edit the file GBC/utils/data_preparation/data_preparation_cfg.py directly:

@configclass
class BaseCfg:
"""Base configuration - Update these paths"""
smplh_model_path="/your/dataset/path/smplh/male/model.npz" # Update this
dmpls_model_path="/your/dataset/path/dmpls/male/model.npz" # Update this
urdf_path="/your/dataset/path/your_robot/robot.urdf" # Update this
device="cuda" # Change to "cpu" if no GPU available

@configclass
class AMASSDatasetCfg(BaseCfg):
"""AMASS dataset configuration - Update root directory"""
root_dir: str = "/your/dataset/path/AMASS_FULL" # Update this
num_betas: int = 16
num_dmpls: int = 8
load_hands: bool = False

@configclass
class RobotKinematicsCfg(BaseCfg):
"""Robot kinematics - Update mapping for your robot"""
mapping_table = {
'Pelvis': 'base_link', # SMPL joint -> Robot joint
'L_Hip': 'your_left_hip_joint', # Update these mappings
'R_Hip': 'your_right_hip_joint', # for your specific robot
# ... add more mappings as needed
}

Create a separate configuration file that inherits from the base classes. This approach is cleaner for project management:

# Create: my_robot_config.py
from GBC.utils.data_preparation.data_preparation_cfg import *

@configclass
class MyRobotBaseCfg(BaseCfg):
"""My robot's base configuration"""
smplh_model_path="/home/user/dataset/smplh/male/model.npz"
dmpls_model_path="/home/user/dataset/dmpls/male/model.npz"
urdf_path="/home/user/dataset/my_robot/robot.urdf"
device="cuda"

@configclass
class MyRobotKinematicsCfg(RobotKinematicsCfg, MyRobotBaseCfg):
"""Custom robot kinematics mapping"""
mapping_table = {
'Pelvis': 'my_base_link',
'L_Hip': 'my_left_hip_yaw',
'R_Hip': 'my_right_hip_yaw',
# ... define your robot's joint mappings
}

@configclass
class MyDataConverterCfg(AMASSActionConverterCfg, MyRobotKinematicsCfg):
"""My data conversion configuration"""
root_dir: str = "/home/user/dataset/AMASS_FULL"
export_path: str = "/home/user/outputs/converted_actions"
batch_size: int = 256 # Adjust based on your GPU memory

Key Configuration Parameters

🎯 Essential Paths to Configure

  • smplh_model_path: Path to SMPL+H model file
  • dmpls_model_path: Path to DMPL model file
  • urdf_path: Path to your robot's URDF file
  • root_dir: AMASS dataset directory
  • export_path: Output directory for processed data

🤖 Robot-Specific Settings

  • mapping_table: Maps SMPL joint names to your robot's joint names
  • offset_map: Joint position offsets (if needed)

⚙️ Processing Parameters

  • interpolate_fps: Target frame rate for motion data (default: 50Hz)
  • batch_size: Processing batch size (adjust for your GPU memory)
  • filter_cutoff: Low-pass filter frequency for motion smoothing
  • device: Computing device ("cuda" or "cpu")

Verification

After configuration, test your setup:

# Test configuration loading
from your_config_file import MyDataConverterCfg

cfg = MyDataConverterCfg()
print("SMPL+H model:", cfg.smplh_model_path)
print("Robot URDF:", cfg.urdf_path)
print("AMASS data:", cfg.root_dir)
print("Device:", cfg.device)

# Verify files exist
import os
assert os.path.exists(cfg.smplh_model_path), "SMPL+H model not found!"
assert os.path.exists(cfg.urdf_path), "Robot URDF not found!"
assert os.path.exists(cfg.root_dir), "AMASS dataset not found!"

print("✅ Data preparation configuration verified!")
💡 Future Workflow

Once configured, you'll use these configuration classes in the upcoming data preparation and motion retargeting tutorials. Each processing script will reference the appropriate configuration class to ensure consistent paths and parameters across your workflow.

🎉 Data Preparation Complete!

Congratulations! You've successfully completed the data preparation phase. Your GBC installation now has:

  1. AMASS Dataset: High-quality human motion data
  2. SMPL+H Models: Human body representation parameters
  3. DMPL Models: Additional body shape parameters
  4. Path Configuration: All file paths properly configured
  5. Processing Configuration: Data preparation settings ready

You're now ready to proceed to the next phase: Robot Configuration and Motion Retargeting!

🔧 Troubleshooting

Common Configuration Issues

Issue: Import errors when loading configuration

  • Solution: Ensure GBC is properly installed and all required packages are available

Issue: File not found errors during verification

  • Solution: Double-check all file paths and ensure downloads completed successfully

Issue: GPU memory errors during processing

  • Solution: Reduce batch_size in your configuration or switch to device="cpu"

Issue: Robot joint mapping errors

  • Solution: Verify your robot's URDF file and update the mapping_table accordingly