NuScenes Dataset Tutorial: Coordinate Transformations and Bounding Box Processing¶

Table of Contents¶

Introduction
Dataset Structure
Coordinate Systems
Coordinate Transformations
3D Bounding Box Processing
2D Projection Pipeline
Code Implementation
Common Issues and Solutions
Best Practices

Introduction¶

The NuScenes dataset is a large-scale autonomous driving dataset that provides multimodal sensor data including cameras, LiDAR, and radar. One of the most challenging aspects of working with NuScenes is understanding and correctly implementing the coordinate system transformations required to project 3D annotations onto 2D camera images.

This tutorial focuses on the critical processes of:

Coordinate System Transformations: Converting between global, ego vehicle, and camera coordinate systems
3D Bounding Box Processing: Generating and manipulating 3D bounding boxes
2D Projection: Projecting 3D bounding boxes onto camera images
Ensuring Correctness: Validation techniques and common pitfalls

Dataset Structure¶

NuScenes organizes data hierarchically:

nuscenes/
├── samples/           # Keyframes (2Hz) with all sensor data
├── sweeps/           # Intermediate frames (20Hz) for LiDAR/radar
├── maps/             # HD maps for each location
└── v1.0-trainval/    # Annotation files (JSON)
    ├── sample.json
    ├── sample_data.json
    ├── sample_annotation.json
    ├── ego_pose.json
    ├── calibrated_sensor.json
    └── ...

Dataset Annotation Format¶

NuScenes provides comprehensive annotations in JSON format, with all spatial annotations defined in the global coordinate system.

Annotation Types and Coordinate Systems¶

1. 3D Bounding Box Annotations¶

File: sample_annotation.json
Coordinate System: Global coordinates
Format: Center + Size + Orientation

{
    "token": "unique_annotation_id",
    "sample_token": "sample_id",
    "instance_token": "object_instance_id",
    "category_name": "car",
    "translation": [x, y, z],           # 3D center in global coordinates (meters)
    "size": [width, length, height],    # Bounding box dimensions (meters)
    "rotation": [w, x, y, z],          # Quaternion orientation in global frame
    "visibility": 2,                    # Visibility level (1-4)
    "attribute_tokens": ["moving"],     # Object attributes
    "num_lidar_pts": 150,              # Number of LiDAR points inside box
    "num_radar_pts": 5                 # Number of radar points inside box
}

Key Points:

Translation: 3D center position [x, y, z] in global coordinates
Size: Box dimensions [width, length, height] in meters
Rotation: Quaternion [w, x, y, z] representing orientation in global frame
Coordinate Convention: Right-handed system (X=East, Y=North, Z=Up)

2. Ego Vehicle Pose¶

File: ego_pose.json
Coordinate System: Global coordinates
Purpose: Vehicle position and orientation at each timestamp

{
    "token": "ego_pose_token",
    "timestamp": 1532402927647951,
    "translation": [x, y, z],          # Ego position in global coordinates
    "rotation": [w, x, y, z]           # Ego orientation quaternion in global frame
}

3. Sensor Calibration¶

File: calibrated_sensor.json
Coordinate System: Relative to ego vehicle
Purpose: Sensor position and orientation relative to ego vehicle

{
    "token": "sensor_calibration_token",
    "sensor_token": "sensor_id",
    "translation": [x, y, z],          # Sensor position relative to ego vehicle
    "rotation": [w, x, y, z],          # Sensor orientation relative to ego vehicle
    "camera_intrinsic": [[fx, 0, cx],  # Camera intrinsic matrix (cameras only)
                         [0, fy, cy],
                         [0, 0, 1]]
}

4. Sample Data¶

File: sample_data.json
Purpose: Links sensor data files to timestamps and calibrations

{
    "token": "sample_data_token",
    "sample_token": "sample_id",
    "ego_pose_token": "ego_pose_id",
    "calibrated_sensor_token": "sensor_calibration_id",
    "filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
    "fileformat": "jpg",
    "timestamp": 1532402927612460,
    "is_key_frame": true
}

Coordinate Transformation Process Using Sample Data¶

The sample data structure enables the complete coordinate transformation pipeline. Here’s how to use the linked data for transformations:

Step-by-Step Transformation Workflow¶

1. Load Required Data Using Sample Data Tokens

# Using the sample_data.json information
sample_data = {
    "token": "sample_data_token",
    "sample_token": "sample_id", 
    "ego_pose_token": "ego_pose_id",
    "calibrated_sensor_token": "sensor_calibration_id",
    "filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
    "timestamp": 1532402927612460
}

# Load corresponding data using tokens
ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
sensor_calibration = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
sample_annotations = nusc.get('sample', sample_data['sample_token'])['anns']

2. Extract Transformation Matrices

# From ego_pose.json (Global coordinates)
ego_pose_data = {
    "translation": [463.12, 1080.45, 1.84],    # Ego position in global coordinates
    "rotation": [0.9659, 0.0, 0.0, 0.2588]     # Ego orientation quaternion
}

# From calibrated_sensor.json (Relative to ego vehicle)  
sensor_calibration_data = {
    "translation": [1.70, 0.0, 1.54],          # Camera position relative to ego
    "rotation": [0.7071, 0.0, 0.0, 0.7071],    # Camera orientation relative to ego
    "camera_intrinsic": [[1266.4, 0.0, 816.3], # Camera intrinsic matrix
                         [0.0, 1266.4, 491.5],
                         [0.0, 0.0, 1.0]]
}

3. Complete Transformation Pipeline Example

import numpy as np
from pyquaternion import Quaternion

# Example: Transform 3D bounding box from global to image coordinates

# Step 1: Get 3D bounding box in global coordinates
bbox_annotation = {
    "translation": [465.2, 1085.1, 1.2],       # Box center in global coordinates
    "size": [4.5, 1.8, 1.5],                   # [length, width, height]
    "rotation": [0.9848, 0.0, 0.0, 0.1736]     # Box orientation quaternion
}

# Generate 8 corners of 3D bounding box in global coordinates
box_corners_global = get_3d_box_corners(
    bbox_annotation['translation'],
    bbox_annotation['size'], 
    bbox_annotation['rotation']
)

# Step 2: Global → Ego Vehicle transformation
ego_translation = np.array(ego_pose_data['translation'])
ego_rotation = Quaternion(ego_pose_data['rotation'])
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

box_corners_ego = global_to_ego @ np.vstack([box_corners_global, np.ones((1, 8))])

# Step 3: Ego Vehicle → Camera transformation  
cam_translation = np.array(sensor_calibration_data['translation'])
cam_rotation = Quaternion(sensor_calibration_data['rotation'])
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

box_corners_camera = ego_to_cam @ box_corners_ego

# Step 4: Camera → Image projection
camera_intrinsic = np.array(sensor_calibration_data['camera_intrinsic'])
image_points = view_points(box_corners_camera[:3, :], camera_intrinsic, normalize=True)

# Step 5: Extract 2D bounding box
x_coords = image_points[0, :]
y_coords = image_points[1, :]
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]

print(f"2D Bounding Box: {bbox_2d}")
# Output: [245.3, 180.7, 580.1, 420.9] (pixel coordinates)

4. Data Flow Visualization

Sample Data Token Flow:
┌─────────────────┐    ego_pose_token    ┌─────────────────┐
│   sample_data   │ ──────────────────→  │    ego_pose     │
│                 │                      │ (Global coords) │
└─────────────────┘                      └─────────────────┘
         │                                        │
         │ calibrated_sensor_token                │ translation, rotation
         ▼                                        ▼
┌─────────────────┐                      ┌─────────────────┐
│ sensor_calib    │                      │ Transform Matrix│
│ (Ego relative)  │ ──────────────────→  │ Global → Ego    │
└─────────────────┘                      └─────────────────┘
         │                                        │
         │ camera_intrinsic                       │
         ▼                                        ▼
┌─────────────────┐                      ┌─────────────────┐
│ Camera Matrix   │                      │ 3D → 2D Project │
│ (Pixel coords)  │ ──────────────────→  │ Final Result    │
└─────────────────┘                      └─────────────────┘

Key Implementation Points¶

Token-Based Data Linking: Use ego_pose_token and calibrated_sensor_token to fetch transformation data
Timestamp Synchronization: The timestamp field ensures temporal alignment between sensors
Coordinate System Chain: Global → Ego → Camera → Image coordinates
Matrix Composition: Combine transformation matrices for efficient batch processing

# Efficient batch transformation
combined_transform = camera_intrinsic @ ego_to_cam @ global_to_ego
final_2d_points = combined_transform @ global_3d_points_homogeneous

Practical Usage Example¶

def process_sample_data(nusc, sample_data_token):
    """Complete pipeline from sample data token to 2D projections"""
    
    # 1. Load all required data using tokens
    sample_data = nusc.get('sample_data', sample_data_token)
    ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
    sensor_calib = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
    
    # 2. Get all annotations for this sample
    sample = nusc.get('sample', sample_data['sample_token'])
    
    # 3. Process each annotation
    results = []
    for ann_token in sample['anns']:
        annotation = nusc.get('sample_annotation', ann_token)
        
        # Transform from global to image coordinates
        bbox_2d = project_3d_box_to_2d(
            annotation, ego_pose, sensor_calib
        )
        
        results.append({
            'category': annotation['category_name'],
            'bbox_2d': bbox_2d,
            'visibility': annotation['visibility']
        })
    
    return results

This workflow demonstrates how the sample data structure enables seamless coordinate transformations by linking all necessary calibration and pose information through tokens.

Annotation Coordinate System Summary¶

Annotation Type	Coordinate System	Reference Frame	Units
3D Bounding Boxes	Global	World coordinates	Meters
Ego Poses	Global	World coordinates	Meters
Sensor Calibration	Ego Vehicle	Relative to ego	Meters
Camera Intrinsics	Camera	Pixel coordinates	Pixels

Object Categories¶

NuScenes includes 23 object categories across different domains:

Vehicles: car, truck, bus, trailer, construction_vehicle, emergency_vehicle, motorcycle, bicycle

Humans: adult, child, police_officer, construction_worker

Objects: traffic_cone, barrier, debris, pushable_pullable, movable_object

Static: animal (rare cases)

Visibility Levels¶

Annotations include visibility information for occlusion handling:

Level 1: 0-40% visible
Level 2: 40-60% visible
Level 3: 60-80% visible
Level 4: 80-100% visible

Implementation Notes¶

When working with NuScenes annotations:

All 3D annotations are in global coordinates - transform to desired coordinate system
Quaternions use [w, x, y, z] format - be careful with different libraries’ conventions
Box dimensions follow [width, length, height] - width=left-right, length=front-back
Use num_lidar_pts and num_radar_pts for filtering low-quality annotations
Check visibility levels for occlusion-aware training

Coordinate Systems¶

Understanding NuScenes coordinate systems is crucial for correct transformations:

1. Global Coordinate System¶

Origin: Arbitrary reference point in the world
Axes: Right-handed coordinate system
- X: East direction
- Y: North direction
- Z: Up direction (gravity opposite)
Units: Meters
Usage: All ego poses and annotations are defined in global coordinates

2. Ego Vehicle Coordinate System¶

Origin: Center of the ego vehicle’s rear axle
Axes: Right-handed coordinate system relative to vehicle
- X: Forward direction (vehicle’s front)
- Y: Left direction (vehicle’s left side)
- Z: Up direction (vehicle’s roof)
Units: Meters
Usage: Sensor calibrations are defined relative to ego vehicle

3. Camera Coordinate System¶

Origin: Camera’s optical center
Axes: Standard computer vision convention
- X: Right direction (image width)
- Y: Down direction (image height)
- Z: Forward direction (into the scene)
Units: Meters
Usage: 3D points in camera space before projection

4. Image Coordinate System¶

Origin: Top-left corner of the image
Axes: 2D pixel coordinates
- u: Horizontal axis (0 to image_width-1)
- v: Vertical axis (0 to image_height-1)
Units: Pixels
Usage: Final 2D bounding box coordinates

Coordinate Transformations¶

The transformation pipeline varies depending on the visualization type. Here’s a comprehensive overview:

Transformation Pipelines by Visualization Type¶

1. 2D Image Bounding Box¶

Global → Ego Vehicle → Camera → Image (2D Projection)

Purpose: Project 3D objects onto 2D camera images for object detection Output: 2D rectangular bounding boxes in pixel coordinates

2. 3D Image Bounding Box¶

Global → Ego Vehicle → Camera → Image (3D Wireframe)

Purpose: Visualize 3D object structure overlaid on camera images Output: 3D wireframe boxes projected onto 2D image plane

3. BEV (Bird’s Eye View)¶

Global → Ego Vehicle → BEV Projection

Purpose: Top-down view for spatial understanding and path planning Output: 2D boxes in ego vehicle coordinate system (X-Y plane)

4. 3D LiDAR Bounding Box¶

Global → Ego Vehicle → LiDAR Sensor

Purpose: 3D object detection and tracking in point cloud data Output: 3D oriented bounding boxes in LiDAR coordinate system

Mathematical Foundation¶

Each transformation uses homogeneous coordinates and 4x4 transformation matrices:

T = [R  t]
    [0  1]

Where:

R: 3x3 rotation matrix
t: 3x1 translation vector

Detailed Transformation Processes by Visualization Type¶

1. 2D Image Bounding Box Transformation¶

Complete Pipeline: Global → Ego Vehicle → Camera → Image (2D Projection)

Step 1: Global to Ego Vehicle¶

# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z]  # ego position in global coordinates
ego_rotation = [w, x, y, z]  # ego orientation quaternion

# Create inverse transformation matrix (global → ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

# Apply transformation to 3D box corners
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: Ego Vehicle to Camera¶

# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z]  # camera position relative to ego
cam_rotation = [w, x, y, z]  # camera orientation quaternion

# Create inverse transformation matrix (ego → camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

# Apply transformation
box_corners_camera = ego_to_cam @ box_corners_ego

Step 3: Camera to Image (2D Projection)¶

# Project 3D camera coordinates to 2D image pixels
# Using camera intrinsic matrix
image_points = view_points(box_corners_camera, camera_intrinsic, normalize=True)

# Extract 2D bounding box from projected corners
x_coords = image_points[0, :]  # u coordinates
y_coords = image_points[1, :]  # v coordinates

# Create 2D bounding box
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]

Key Characteristics:

Output: 2D rectangular box [x_min, y_min, x_max, y_max] in pixel coordinates
Use Case: Object detection, tracking in camera images
Coordinate System: Image pixels (u, v)
Implementation: See project_3d_box_to_2d() function

2. 3D Image Bounding Box Transformation¶

Complete Pipeline: Global → Ego Vehicle → Camera → Image (3D Wireframe)

Steps 1-2: Same as 2D (Global → Ego → Camera)¶

The first two steps are identical to 2D bounding box transformation.

Step 3: 3D Wireframe Projection¶

# Project all 8 corners of 3D box to image
corners_3d_camera = ego_to_cam @ (global_to_ego @ box_corners_global)
image_corners = view_points(corners_3d_camera, camera_intrinsic, normalize=True)

# Define 3D box edges (12 edges connecting 8 corners)
edges = [
    [0, 1], [1, 2], [2, 3], [3, 0],  # bottom face
    [4, 5], [5, 6], [6, 7], [7, 4],  # top face
    [0, 4], [1, 5], [2, 6], [3, 7]   # vertical edges
]

# Draw wireframe on image
for edge in edges:
    start_point = image_corners[:, edge[0]]
    end_point = image_corners[:, edge[1]]
    # Draw line from start_point to end_point

Key Characteristics:

Output: 8 projected corner points + 12 connecting edges
Use Case: 3D object visualization on camera images
Coordinate System: Image pixels (u, v) with depth information preserved
Visualization: Wireframe overlay showing 3D structure

3. BEV (Bird’s Eye View) Transformation¶

Simplified Pipeline: Global → Ego Vehicle → BEV Projection

Step 1: Global to Ego Vehicle (Same as above)¶

global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: BEV Projection (Top-Down View)¶

# Extract X-Y coordinates (ignore Z for top-down view)
bev_points = box_corners_ego[:2, :]  # Take only X, Y coordinates

# Convert to BEV image coordinates
# Typically: X-forward, Y-left in ego coordinates
# BEV image: X-right, Y-up in image coordinates
bev_x = bev_points[1, :] * scale + offset_x  # Y_ego → X_bev
bev_y = -bev_points[0, :] * scale + offset_y  # -X_ego → Y_bev

# Create 2D polygon from projected corners
bev_polygon = list(zip(bev_x, bev_y))

Key Characteristics:

Output: 2D polygon in BEV coordinate system
Use Case: Path planning, spatial reasoning, multi-object tracking
Coordinate System: Top-down view (X-Y plane of ego vehicle)
Advantages: No occlusion, consistent scale, easy distance measurement

4. 3D LiDAR Bounding Box Transformation¶

Simplified Pipeline: Global → Ego Vehicle → LiDAR Sensor

Step 1: Global to Ego Vehicle (Same as above)¶

global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: Ego to LiDAR Sensor (if needed)¶

# Most LiDAR sensors are mounted close to ego vehicle center
# Often LiDAR coordinate system ≈ ego coordinate system
# If transformation needed:
lidar_translation = [x, y, z]  # LiDAR position relative to ego
lidar_rotation = [w, x, y, z]  # LiDAR orientation quaternion

ego_to_lidar = transform_matrix(lidar_translation, lidar_rotation, inverse=True)
box_corners_lidar = ego_to_lidar @ box_corners_ego

Step 3: 3D Box Representation¶

# 3D bounding box in LiDAR coordinates
# Typically represented as:
# - Center: [x, y, z]
# - Dimensions: [length, width, height]  
# - Orientation: yaw angle or quaternion

box_3d = {
    'center': np.mean(box_corners_lidar[:3, :], axis=1),
    'size': [length, width, height],
    'orientation': yaw_angle
}

Key Characteristics:

Output: 3D oriented bounding box with center, size, and orientation
Use Case: 3D object detection, autonomous driving, robotics
Coordinate System: 3D LiDAR sensor coordinates
Advantages: Direct 3D measurements, no projection distortion

Transformation Summary¶

Visualization Type	Pipeline	Output Format	Primary Use Case
2D Image BBox	Global→Ego→Camera→Image	`[x_min, y_min, x_max, y_max]`	Object detection
3D Image BBox	Global→Ego→Camera→Image	8 corners + 12 edges	3D visualization
BEV	Global→Ego→BEV	2D polygon	Path planning
3D LiDAR BBox	Global→Ego→LiDAR	Center + Size + Orientation	3D detection

Original Transformation Steps (for reference)¶

1. Global to Ego Vehicle Transformation¶

Purpose: Transform from world coordinates to ego vehicle coordinates

Implementation: See transform_matrix() function

# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z]  # ego position in global coordinates
ego_rotation = [w, x, y, z]  # ego orientation quaternion

# Create inverse transformation matrix (global → ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

# Apply transformation
points_ego = global_to_ego @ points_global_homogeneous

Key Points:

Use inverse=True to get global→ego transformation
Ego pose represents ego vehicle’s position/orientation in global coordinates
Inverse transformation moves from global to ego coordinate system

2. Ego Vehicle to Camera Transformation¶

Purpose: Transform from ego vehicle coordinates to camera coordinates

Implementation: See transform_matrix() function

# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z]  # camera position relative to ego
cam_rotation = [w, x, y, z]  # camera orientation quaternion

# Create inverse transformation matrix (ego → camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

# Apply transformation
points_camera = ego_to_cam @ points_ego_homogeneous

Key Points:

Camera calibration is relative to ego vehicle
Use inverse=True to get ego→camera transformation
Results in 3D points in camera coordinate system

3. Camera to Image Transformation¶

Purpose: Project 3D camera coordinates to 2D image pixels

Implementation: See view_points() function

# Camera intrinsic matrix
K = [[fx,  0, cx],
     [ 0, fy, cy],
     [ 0,  0,  1]]

# Project 3D points to 2D
points_2d = view_points(points_3d_camera, K, normalize=True)

Key Points:

Uses perspective projection: u = fx * X/Z + cx, v = fy * Y/Z + cy
Points with Z ≤ 0 are behind the camera and invalid
normalize=True performs the division by Z coordinate

3D Bounding Box Processing¶

Bounding Box Representation¶

NuScenes represents 3D bounding boxes with:

Center: [x, y, z] in global coordinates
Size: [width, length, height] in meters
Rotation: Quaternion [w, x, y, z] in global coordinates

Corner Point Generation¶

Implementation: See get_3d_box_corners() function

The function generates 8 corner points of a 3D bounding box:

def get_3d_box_corners(center, size, rotation):
    """
    Generate 8 corner points of 3D bounding box.
    
    Corner ordering (NuScenes convention):
    Bottom face (z = -height/2):
      1 ---- 0
      |      |
      |      |  
      2 ---- 3
    
    Top face (z = +height/2):
      5 ---- 4
      |      |
      |      |
      6 ---- 7
    """

Process:

Define unit box corners centered at origin
Scale by box dimensions
Apply rotation using quaternion
Translate to final position

Key Points:

Corner ordering follows NuScenes convention
Vehicle front direction aligns with +Y axis in box coordinates
Rotation is applied before translation

2D Projection Pipeline¶

Complete Pipeline Implementation¶

Implementation: See project_3d_box_to_2d() function

The complete pipeline transforms 3D bounding boxes to 2D image coordinates:

def project_3d_box_to_2d(center_3d, size_3d, rotation_3d, 
                        cam_translation, cam_rotation, camera_intrinsic,
                        ego_translation, ego_rotation, debug=False):
    """
    Complete transformation pipeline:
    Global → Ego Vehicle → Camera → Image
    """

Step-by-Step Process¶

Step 1: Generate 3D Box Corners¶

# Generate 8 corner points in global coordinates
corners_3d_global = get_3d_box_corners(center_3d, size_3d, rotation_3d)

Step 2: Global → Ego Transformation¶

# Transform from global to ego vehicle coordinates
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
corners_ego = apply_transformation(global_to_ego, corners_3d_global)

Step 3: Ego → Camera Transformation¶

# Transform from ego to camera coordinates
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
corners_camera = apply_transformation(ego_to_cam, corners_ego)

Step 4: Camera → Image Projection¶

# Project 3D camera coordinates to 2D image pixels
corners_2d = view_points(corners_camera.T, camera_intrinsic, normalize=True)

Visibility and Validation¶

Implementation: See box_in_image() function

def box_in_image(corners_3d, corners_2d, intrinsic, imsize, vis_level):
    """
    Check if bounding box is visible in image.
    
    vis_level options:
    - BoxVisibility.ALL: All corners must be inside image
    - BoxVisibility.ANY: At least one corner must be visible  
    - BoxVisibility.NONE: No visibility requirement
    """

Code Implementation¶

Key Functions and Their Roles¶

view_points()
- Handles perspective projection from 3D to 2D
- Applies camera intrinsic matrix
- Normalizes homogeneous coordinates
transform_matrix()
- Creates 4x4 transformation matrices
- Handles translation and rotation
- Supports inverse transformations
quaternion_to_rotation_matrix()
- Converts quaternions to rotation matrices
- Handles both scipy and manual implementations
- Ensures numerical stability
get_3d_box_corners()
- Generates 8 corner points of 3D bounding box
- Follows NuScenes corner ordering convention
- Applies scaling, rotation, and translation
project_3d_box_to_2d()
- Complete transformation pipeline
- Handles all coordinate system conversions
- Provides debug information and error handling

Data Loading and Validation Functions¶

load_nuscenes_data()
- Loads all NuScenes JSON annotation files
- Returns structured dictionary with scenes, samples, annotations
- Handles file validation and error reporting
prepare_sample_data()
- Extracts data for specific sample index
- Organizes annotations, camera data, and sensor info
- Provides ready-to-use data structure for visualization
validate_dataset_structure()
- Validates NuScenes directory structure
- Checks for required folders and annotation files
- Returns boolean validation result
diagnose_dataset_issues()
- Comprehensive dataset health check
- Identifies missing files, corrupted data, and structure issues
- Provides detailed diagnostic report

Visualization and Rendering Functions¶

visualize_lidar_3d_open3d()
- Interactive 3D LiDAR point cloud visualization
- Uses Open3D for high-quality rendering
- Supports 3D bounding box overlays
visualize_bev_with_boxes()
- Bird’s Eye View (BEV) visualization
- Projects LiDAR points to 2D top-down view
- Renders 3D boxes as 2D rectangles in BEV space
visualize_lidar_projection()
- Projects LiDAR points onto camera images
- Color-codes points by distance or intensity
- Handles camera-LiDAR calibration
create_combined_visualization()
- Creates multi-panel visualization layouts
- Combines camera, LiDAR, and BEV views
- Generates publication-ready figures

Drawing and Rendering Utilities¶

draw_3d_box_2d()
- Draws 3D bounding box wireframes on 2D images
- Handles perspective projection and clipping
- Supports custom colors and line styles
draw_2d_bbox()
- Draws 2D bounding boxes on images
- Includes category labels and confidence scores
- Handles image boundary clipping
draw_bev_box_2d()
- Renders 2D boxes in Bird’s Eye View
- Handles rotation and scaling in BEV space
- Supports transparency and color coding

LiDAR Processing Functions¶

load_lidar_points()
- Loads LiDAR point cloud from .pcd files
- Handles different point cloud formats
- Returns numpy array with x, y, z, intensity
transform_lidar_to_ego()
- Transforms LiDAR points to ego vehicle coordinate system
- Applies sensor calibration parameters
- Handles translation and rotation transformations
project_lidar_to_camera()
- Projects 3D LiDAR points to camera image plane
- Applies full transformation pipeline
- Filters points behind camera or outside image

Coordinate System Utilities¶

get_lidar_calibration_info()
- Extracts LiDAR sensor calibration parameters
- Retrieves translation and rotation from ego vehicle
- Returns calibration dictionary for transformations
process_camera_data()
- Processes camera sensor data and calibration
- Extracts intrinsic and extrinsic parameters
- Prepares camera info for coordinate transformations
process_3d_annotations_for_camera()
- Filters and processes 3D annotations for specific camera
- Applies visibility checks and coordinate transformations
- Returns camera-specific annotation data

Utility and Helper Functions¶

get_2d_bbox_from_3d_projection()
- Computes 2D bounding box from projected 3D corners
- Finds min/max coordinates of projected points
- Handles edge cases and invalid projections
clip_line_to_image()
- Clips 3D box edges to image boundaries
- Implements line-rectangle intersection algorithm
- Prevents drawing outside image bounds
extract_nuscenes_subset()
- Creates smaller dataset subset for testing
- Copies relevant samples and annotations
- Maintains data structure integrity

Usage Example¶

# Load NuScenes data
nuscenes_data = load_nuscenes_data(nuscenes_root)
sample_data = prepare_sample_data(nuscenes_data, sample_idx=0)

# Validate dataset first
if not validate_dataset_structure(nuscenes_root):
    diagnosis = diagnose_dataset_issues(nuscenes_root)
    print("Dataset issues found:", diagnosis)

# Get annotation and camera info
annotation = sample_data['annotations'][0]  # First annotation
camera_info = sample_data['cameras']['CAM_FRONT']

# Extract 3D bounding box parameters
center_3d = annotation['translation']
size_3d = annotation['size'] 
rotation_3d = annotation['rotation']

# Extract camera and ego pose information
cam_translation = camera_info['translation']
cam_rotation = camera_info['rotation']
camera_intrinsic = camera_info['camera_intrinsic']
ego_translation = camera_info['ego_translation']
ego_rotation = camera_info['ego_rotation']

# Project 3D box to 2D
corners_2d, corners_3d = project_3d_box_to_2d(
    center_3d, size_3d, rotation_3d,
    cam_translation, cam_rotation, camera_intrinsic,
    ego_translation, ego_rotation, debug=True
)

# Check visibility
is_visible = box_in_image(
    corners_3d, corners_2d, camera_intrinsic, 
    (1600, 900), BoxVisibility.ANY
)

Common Issues and Solutions¶

1. Incorrect Coordinate System Assumptions¶

Problem: Mixing up coordinate system conventions Solution: Always verify axis directions and origins

# Global: X=East, Y=North, Z=Up
# Ego: X=Forward, Y=Left, Z=Up  
# Camera: X=Right, Y=Down, Z=Forward

2. Transformation Matrix Order¶

Problem: Applying transformations in wrong order Solution: Follow the pipeline: Global → Ego → Camera → Image

# Correct order
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

3. Quaternion Normalization¶

Problem: Unnormalized quaternions causing incorrect rotations Solution: Always normalize quaternions before use

q = np.array(quaternion)
q = q / np.linalg.norm(q)  # Normalize

4. Behind-Camera Points¶

Problem: Points with negative Z coordinates in camera space Solution: Check depth values and handle appropriately

depths = corners_camera[:, 2]
if np.any(depths <= 0):
    print("Warning: Some points behind camera")

5. Homogeneous Coordinates¶

Problem: Forgetting to use homogeneous coordinates for transformations Solution: Always add 1 as 4th dimension

points_homogeneous = np.ones((points.shape[0], 4))
points_homogeneous[:, :3] = points

Best Practices¶

1. Validation and Testing¶

Always validate transformations with known test cases
Check intermediate results at each transformation step
Use debug mode to trace coordinate transformations
Verify corner ordering matches NuScenes convention

2. Error Handling¶

Check for points behind camera (Z ≤ 0)
Validate input parameters (non-zero quaternions, valid matrices)
Handle edge cases (very small/large bounding boxes)
Provide meaningful error messages

3. Performance Optimization¶

Batch process multiple boxes when possible
Cache transformation matrices when processing multiple annotations
Use vectorized operations instead of loops
Pre-compute frequently used values

4. Code Organization¶

Separate coordinate transformation logic from visualization
Use consistent parameter naming across functions
Document coordinate system conventions clearly
Provide usage examples and test cases

5. Debugging Techniques¶

# Enable debug mode for detailed transformation info
corners_2d, corners_3d = project_3d_box_to_2d(..., debug=True)

# Visualize intermediate results
plt.figure(figsize=(15, 5))
plt.subplot(131); plot_global_coordinates(corners_global)
plt.subplot(132); plot_ego_coordinates(corners_ego)  
plt.subplot(133); plot_camera_coordinates(corners_camera)

6. Coordinate System Verification¶

def verify_coordinate_systems():
    """Verify coordinate system transformations with known points."""
    # Test point at ego vehicle center
    ego_center = np.array([0, 0, 0, 1])  # Origin in ego coordinates
    
    # Should transform to ego_translation in global coordinates
    global_point = ego_to_global @ ego_center
    assert np.allclose(global_point[:3], ego_translation)
    
    print("Coordinate system verification passed!")

This tutorial provides a comprehensive guide to understanding and implementing coordinate transformations in the NuScenes dataset. The key to success is understanding the coordinate system conventions, following the transformation pipeline correctly, and validating results at each step.