NuScenes Dataset Tutorial: Coordinate Transformations and Bounding Box ProcessingΒΆ

Table of ContentsΒΆ

  1. Introduction

  2. Dataset Structure

  3. Coordinate Systems

  4. Coordinate Transformations

  5. 3D Bounding Box Processing

  6. 2D Projection Pipeline

  7. Code Implementation

  8. Common Issues and Solutions

  9. Best Practices

IntroductionΒΆ

The NuScenes dataset is a large-scale autonomous driving dataset that provides multimodal sensor data including cameras, LiDAR, and radar. One of the most challenging aspects of working with NuScenes is understanding and correctly implementing the coordinate system transformations required to project 3D annotations onto 2D camera images.

This tutorial focuses on the critical processes of:

  • Coordinate System Transformations: Converting between global, ego vehicle, and camera coordinate systems

  • 3D Bounding Box Processing: Generating and manipulating 3D bounding boxes

  • 2D Projection: Projecting 3D bounding boxes onto camera images

  • Ensuring Correctness: Validation techniques and common pitfalls

Dataset StructureΒΆ

NuScenes organizes data hierarchically:

nuscenes/
β”œβ”€β”€ samples/           # Keyframes (2Hz) with all sensor data
β”œβ”€β”€ sweeps/           # Intermediate frames (20Hz) for LiDAR/radar
β”œβ”€β”€ maps/             # HD maps for each location
└── v1.0-trainval/    # Annotation files (JSON)
    β”œβ”€β”€ sample.json
    β”œβ”€β”€ sample_data.json
    β”œβ”€β”€ sample_annotation.json
    β”œβ”€β”€ ego_pose.json
    β”œβ”€β”€ calibrated_sensor.json
    └── ...

Dataset Annotation FormatΒΆ

NuScenes provides comprehensive annotations in JSON format, with all spatial annotations defined in the global coordinate system.

Annotation Types and Coordinate SystemsΒΆ

1. 3D Bounding Box AnnotationsΒΆ

File: sample_annotation.json
Coordinate System: Global coordinates
Format: Center + Size + Orientation

{
    "token": "unique_annotation_id",
    "sample_token": "sample_id",
    "instance_token": "object_instance_id",
    "category_name": "car",
    "translation": [x, y, z],           # 3D center in global coordinates (meters)
    "size": [width, length, height],    # Bounding box dimensions (meters)
    "rotation": [w, x, y, z],          # Quaternion orientation in global frame
    "visibility": 2,                    # Visibility level (1-4)
    "attribute_tokens": ["moving"],     # Object attributes
    "num_lidar_pts": 150,              # Number of LiDAR points inside box
    "num_radar_pts": 5                 # Number of radar points inside box
}

Key Points:

  • Translation: 3D center position [x, y, z] in global coordinates

  • Size: Box dimensions [width, length, height] in meters

  • Rotation: Quaternion [w, x, y, z] representing orientation in global frame

  • Coordinate Convention: Right-handed system (X=East, Y=North, Z=Up)

2. Ego Vehicle PoseΒΆ

File: ego_pose.json
Coordinate System: Global coordinates
Purpose: Vehicle position and orientation at each timestamp

{
    "token": "ego_pose_token",
    "timestamp": 1532402927647951,
    "translation": [x, y, z],          # Ego position in global coordinates
    "rotation": [w, x, y, z]           # Ego orientation quaternion in global frame
}

3. Sensor CalibrationΒΆ

File: calibrated_sensor.json
Coordinate System: Relative to ego vehicle
Purpose: Sensor position and orientation relative to ego vehicle

{
    "token": "sensor_calibration_token",
    "sensor_token": "sensor_id",
    "translation": [x, y, z],          # Sensor position relative to ego vehicle
    "rotation": [w, x, y, z],          # Sensor orientation relative to ego vehicle
    "camera_intrinsic": [[fx, 0, cx],  # Camera intrinsic matrix (cameras only)
                         [0, fy, cy],
                         [0, 0, 1]]
}

4. Sample DataΒΆ

File: sample_data.json
Purpose: Links sensor data files to timestamps and calibrations

{
    "token": "sample_data_token",
    "sample_token": "sample_id",
    "ego_pose_token": "ego_pose_id",
    "calibrated_sensor_token": "sensor_calibration_id",
    "filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
    "fileformat": "jpg",
    "timestamp": 1532402927612460,
    "is_key_frame": true
}

Coordinate Transformation Process Using Sample DataΒΆ

The sample data structure enables the complete coordinate transformation pipeline. Here’s how to use the linked data for transformations:

Step-by-Step Transformation WorkflowΒΆ

1. Load Required Data Using Sample Data Tokens

# Using the sample_data.json information
sample_data = {
    "token": "sample_data_token",
    "sample_token": "sample_id", 
    "ego_pose_token": "ego_pose_id",
    "calibrated_sensor_token": "sensor_calibration_id",
    "filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
    "timestamp": 1532402927612460
}

# Load corresponding data using tokens
ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
sensor_calibration = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
sample_annotations = nusc.get('sample', sample_data['sample_token'])['anns']

2. Extract Transformation Matrices

# From ego_pose.json (Global coordinates)
ego_pose_data = {
    "translation": [463.12, 1080.45, 1.84],    # Ego position in global coordinates
    "rotation": [0.9659, 0.0, 0.0, 0.2588]     # Ego orientation quaternion
}

# From calibrated_sensor.json (Relative to ego vehicle)  
sensor_calibration_data = {
    "translation": [1.70, 0.0, 1.54],          # Camera position relative to ego
    "rotation": [0.7071, 0.0, 0.0, 0.7071],    # Camera orientation relative to ego
    "camera_intrinsic": [[1266.4, 0.0, 816.3], # Camera intrinsic matrix
                         [0.0, 1266.4, 491.5],
                         [0.0, 0.0, 1.0]]
}

3. Complete Transformation Pipeline Example

import numpy as np
from pyquaternion import Quaternion

# Example: Transform 3D bounding box from global to image coordinates

# Step 1: Get 3D bounding box in global coordinates
bbox_annotation = {
    "translation": [465.2, 1085.1, 1.2],       # Box center in global coordinates
    "size": [4.5, 1.8, 1.5],                   # [length, width, height]
    "rotation": [0.9848, 0.0, 0.0, 0.1736]     # Box orientation quaternion
}

# Generate 8 corners of 3D bounding box in global coordinates
box_corners_global = get_3d_box_corners(
    bbox_annotation['translation'],
    bbox_annotation['size'], 
    bbox_annotation['rotation']
)

# Step 2: Global β†’ Ego Vehicle transformation
ego_translation = np.array(ego_pose_data['translation'])
ego_rotation = Quaternion(ego_pose_data['rotation'])
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

box_corners_ego = global_to_ego @ np.vstack([box_corners_global, np.ones((1, 8))])

# Step 3: Ego Vehicle β†’ Camera transformation  
cam_translation = np.array(sensor_calibration_data['translation'])
cam_rotation = Quaternion(sensor_calibration_data['rotation'])
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

box_corners_camera = ego_to_cam @ box_corners_ego

# Step 4: Camera β†’ Image projection
camera_intrinsic = np.array(sensor_calibration_data['camera_intrinsic'])
image_points = view_points(box_corners_camera[:3, :], camera_intrinsic, normalize=True)

# Step 5: Extract 2D bounding box
x_coords = image_points[0, :]
y_coords = image_points[1, :]
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]

print(f"2D Bounding Box: {bbox_2d}")
# Output: [245.3, 180.7, 580.1, 420.9] (pixel coordinates)

4. Data Flow Visualization

Sample Data Token Flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    ego_pose_token    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   sample_data   β”‚ ──────────────────→  β”‚    ego_pose     β”‚
β”‚                 β”‚                      β”‚ (Global coords) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                        β”‚
         β”‚ calibrated_sensor_token                β”‚ translation, rotation
         β–Ό                                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ sensor_calib    β”‚                      β”‚ Transform Matrixβ”‚
β”‚ (Ego relative)  β”‚ ──────────────────→  β”‚ Global β†’ Ego    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                        β”‚
         β”‚ camera_intrinsic                       β”‚
         β–Ό                                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Camera Matrix   β”‚                      β”‚ 3D β†’ 2D Project β”‚
β”‚ (Pixel coords)  β”‚ ──────────────────→  β”‚ Final Result    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Implementation PointsΒΆ

  1. Token-Based Data Linking: Use ego_pose_token and calibrated_sensor_token to fetch transformation data

  2. Timestamp Synchronization: The timestamp field ensures temporal alignment between sensors

  3. Coordinate System Chain: Global β†’ Ego β†’ Camera β†’ Image coordinates

  4. Matrix Composition: Combine transformation matrices for efficient batch processing

# Efficient batch transformation
combined_transform = camera_intrinsic @ ego_to_cam @ global_to_ego
final_2d_points = combined_transform @ global_3d_points_homogeneous

Practical Usage ExampleΒΆ

def process_sample_data(nusc, sample_data_token):
    """Complete pipeline from sample data token to 2D projections"""
    
    # 1. Load all required data using tokens
    sample_data = nusc.get('sample_data', sample_data_token)
    ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
    sensor_calib = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
    
    # 2. Get all annotations for this sample
    sample = nusc.get('sample', sample_data['sample_token'])
    
    # 3. Process each annotation
    results = []
    for ann_token in sample['anns']:
        annotation = nusc.get('sample_annotation', ann_token)
        
        # Transform from global to image coordinates
        bbox_2d = project_3d_box_to_2d(
            annotation, ego_pose, sensor_calib
        )
        
        results.append({
            'category': annotation['category_name'],
            'bbox_2d': bbox_2d,
            'visibility': annotation['visibility']
        })
    
    return results

This workflow demonstrates how the sample data structure enables seamless coordinate transformations by linking all necessary calibration and pose information through tokens.

Annotation Coordinate System SummaryΒΆ

Annotation Type

Coordinate System

Reference Frame

Units

3D Bounding Boxes

Global

World coordinates

Meters

Ego Poses

Global

World coordinates

Meters

Sensor Calibration

Ego Vehicle

Relative to ego

Meters

Camera Intrinsics

Camera

Pixel coordinates

Pixels

Object CategoriesΒΆ

NuScenes includes 23 object categories across different domains:

Vehicles: car, truck, bus, trailer, construction_vehicle, emergency_vehicle, motorcycle, bicycle

Humans: adult, child, police_officer, construction_worker

Objects: traffic_cone, barrier, debris, pushable_pullable, movable_object

Static: animal (rare cases)

Visibility LevelsΒΆ

Annotations include visibility information for occlusion handling:

  • Level 1: 0-40% visible

  • Level 2: 40-60% visible

  • Level 3: 60-80% visible

  • Level 4: 80-100% visible

Implementation NotesΒΆ

When working with NuScenes annotations:

  1. All 3D annotations are in global coordinates - transform to desired coordinate system

  2. Quaternions use [w, x, y, z] format - be careful with different libraries’ conventions

  3. Box dimensions follow [width, length, height] - width=left-right, length=front-back

  4. Use num_lidar_pts and num_radar_pts for filtering low-quality annotations

  5. Check visibility levels for occlusion-aware training

Coordinate SystemsΒΆ

Understanding NuScenes coordinate systems is crucial for correct transformations:

1. Global Coordinate SystemΒΆ

  • Origin: Arbitrary reference point in the world

  • Axes: Right-handed coordinate system

    • X: East direction

    • Y: North direction

    • Z: Up direction (gravity opposite)

  • Units: Meters

  • Usage: All ego poses and annotations are defined in global coordinates

2. Ego Vehicle Coordinate SystemΒΆ

  • Origin: Center of the ego vehicle’s rear axle

  • Axes: Right-handed coordinate system relative to vehicle

    • X: Forward direction (vehicle’s front)

    • Y: Left direction (vehicle’s left side)

    • Z: Up direction (vehicle’s roof)

  • Units: Meters

  • Usage: Sensor calibrations are defined relative to ego vehicle

3. Camera Coordinate SystemΒΆ

  • Origin: Camera’s optical center

  • Axes: Standard computer vision convention

    • X: Right direction (image width)

    • Y: Down direction (image height)

    • Z: Forward direction (into the scene)

  • Units: Meters

  • Usage: 3D points in camera space before projection

4. Image Coordinate SystemΒΆ

  • Origin: Top-left corner of the image

  • Axes: 2D pixel coordinates

    • u: Horizontal axis (0 to image_width-1)

    • v: Vertical axis (0 to image_height-1)

  • Units: Pixels

  • Usage: Final 2D bounding box coordinates

Coordinate TransformationsΒΆ

The transformation pipeline varies depending on the visualization type. Here’s a comprehensive overview:

Transformation Pipelines by Visualization TypeΒΆ

1. 2D Image Bounding BoxΒΆ

Global β†’ Ego Vehicle β†’ Camera β†’ Image (2D Projection)

Purpose: Project 3D objects onto 2D camera images for object detection Output: 2D rectangular bounding boxes in pixel coordinates

2. 3D Image Bounding BoxΒΆ

Global β†’ Ego Vehicle β†’ Camera β†’ Image (3D Wireframe)

Purpose: Visualize 3D object structure overlaid on camera images Output: 3D wireframe boxes projected onto 2D image plane

3. BEV (Bird’s Eye View)ΒΆ

Global β†’ Ego Vehicle β†’ BEV Projection

Purpose: Top-down view for spatial understanding and path planning Output: 2D boxes in ego vehicle coordinate system (X-Y plane)

4. 3D LiDAR Bounding BoxΒΆ

Global β†’ Ego Vehicle β†’ LiDAR Sensor

Purpose: 3D object detection and tracking in point cloud data Output: 3D oriented bounding boxes in LiDAR coordinate system

Mathematical FoundationΒΆ

Each transformation uses homogeneous coordinates and 4x4 transformation matrices:

T = [R  t]
    [0  1]

Where:

  • R: 3x3 rotation matrix

  • t: 3x1 translation vector

Detailed Transformation Processes by Visualization TypeΒΆ

1. 2D Image Bounding Box TransformationΒΆ

Complete Pipeline: Global β†’ Ego Vehicle β†’ Camera β†’ Image (2D Projection)

Step 1: Global to Ego VehicleΒΆ

# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z]  # ego position in global coordinates
ego_rotation = [w, x, y, z]  # ego orientation quaternion

# Create inverse transformation matrix (global β†’ ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

# Apply transformation to 3D box corners
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: Ego Vehicle to CameraΒΆ

# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z]  # camera position relative to ego
cam_rotation = [w, x, y, z]  # camera orientation quaternion

# Create inverse transformation matrix (ego β†’ camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

# Apply transformation
box_corners_camera = ego_to_cam @ box_corners_ego

Step 3: Camera to Image (2D Projection)ΒΆ

# Project 3D camera coordinates to 2D image pixels
# Using camera intrinsic matrix
image_points = view_points(box_corners_camera, camera_intrinsic, normalize=True)

# Extract 2D bounding box from projected corners
x_coords = image_points[0, :]  # u coordinates
y_coords = image_points[1, :]  # v coordinates

# Create 2D bounding box
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]

Key Characteristics:

  • Output: 2D rectangular box [x_min, y_min, x_max, y_max] in pixel coordinates

  • Use Case: Object detection, tracking in camera images

  • Coordinate System: Image pixels (u, v)

  • Implementation: See project_3d_box_to_2d() function

2. 3D Image Bounding Box TransformationΒΆ

Complete Pipeline: Global β†’ Ego Vehicle β†’ Camera β†’ Image (3D Wireframe)

Steps 1-2: Same as 2D (Global β†’ Ego β†’ Camera)ΒΆ

The first two steps are identical to 2D bounding box transformation.

Step 3: 3D Wireframe ProjectionΒΆ

# Project all 8 corners of 3D box to image
corners_3d_camera = ego_to_cam @ (global_to_ego @ box_corners_global)
image_corners = view_points(corners_3d_camera, camera_intrinsic, normalize=True)

# Define 3D box edges (12 edges connecting 8 corners)
edges = [
    [0, 1], [1, 2], [2, 3], [3, 0],  # bottom face
    [4, 5], [5, 6], [6, 7], [7, 4],  # top face
    [0, 4], [1, 5], [2, 6], [3, 7]   # vertical edges
]

# Draw wireframe on image
for edge in edges:
    start_point = image_corners[:, edge[0]]
    end_point = image_corners[:, edge[1]]
    # Draw line from start_point to end_point

Key Characteristics:

  • Output: 8 projected corner points + 12 connecting edges

  • Use Case: 3D object visualization on camera images

  • Coordinate System: Image pixels (u, v) with depth information preserved

  • Visualization: Wireframe overlay showing 3D structure

3. BEV (Bird’s Eye View) TransformationΒΆ

Simplified Pipeline: Global β†’ Ego Vehicle β†’ BEV Projection

Step 1: Global to Ego Vehicle (Same as above)ΒΆ

global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: BEV Projection (Top-Down View)ΒΆ

# Extract X-Y coordinates (ignore Z for top-down view)
bev_points = box_corners_ego[:2, :]  # Take only X, Y coordinates

# Convert to BEV image coordinates
# Typically: X-forward, Y-left in ego coordinates
# BEV image: X-right, Y-up in image coordinates
bev_x = bev_points[1, :] * scale + offset_x  # Y_ego β†’ X_bev
bev_y = -bev_points[0, :] * scale + offset_y  # -X_ego β†’ Y_bev

# Create 2D polygon from projected corners
bev_polygon = list(zip(bev_x, bev_y))

Key Characteristics:

  • Output: 2D polygon in BEV coordinate system

  • Use Case: Path planning, spatial reasoning, multi-object tracking

  • Coordinate System: Top-down view (X-Y plane of ego vehicle)

  • Advantages: No occlusion, consistent scale, easy distance measurement

4. 3D LiDAR Bounding Box TransformationΒΆ

Simplified Pipeline: Global β†’ Ego Vehicle β†’ LiDAR Sensor

Step 1: Global to Ego Vehicle (Same as above)ΒΆ

global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous

Step 2: Ego to LiDAR Sensor (if needed)ΒΆ

# Most LiDAR sensors are mounted close to ego vehicle center
# Often LiDAR coordinate system β‰ˆ ego coordinate system
# If transformation needed:
lidar_translation = [x, y, z]  # LiDAR position relative to ego
lidar_rotation = [w, x, y, z]  # LiDAR orientation quaternion

ego_to_lidar = transform_matrix(lidar_translation, lidar_rotation, inverse=True)
box_corners_lidar = ego_to_lidar @ box_corners_ego

Step 3: 3D Box RepresentationΒΆ

# 3D bounding box in LiDAR coordinates
# Typically represented as:
# - Center: [x, y, z]
# - Dimensions: [length, width, height]  
# - Orientation: yaw angle or quaternion

box_3d = {
    'center': np.mean(box_corners_lidar[:3, :], axis=1),
    'size': [length, width, height],
    'orientation': yaw_angle
}

Key Characteristics:

  • Output: 3D oriented bounding box with center, size, and orientation

  • Use Case: 3D object detection, autonomous driving, robotics

  • Coordinate System: 3D LiDAR sensor coordinates

  • Advantages: Direct 3D measurements, no projection distortion

Transformation SummaryΒΆ

Visualization Type

Pipeline

Output Format

Primary Use Case

2D Image BBox

Global→Ego→Camera→Image

[x_min, y_min, x_max, y_max]

Object detection

3D Image BBox

Global→Ego→Camera→Image

8 corners + 12 edges

3D visualization

BEV

Global→Ego→BEV

2D polygon

Path planning

3D LiDAR BBox

Global→Ego→LiDAR

Center + Size + Orientation

3D detection

Original Transformation Steps (for reference)ΒΆ

1. Global to Ego Vehicle TransformationΒΆ

Purpose: Transform from world coordinates to ego vehicle coordinates

Implementation: See transform_matrix() function

# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z]  # ego position in global coordinates
ego_rotation = [w, x, y, z]  # ego orientation quaternion

# Create inverse transformation matrix (global β†’ ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)

# Apply transformation
points_ego = global_to_ego @ points_global_homogeneous

Key Points:

  • Use inverse=True to get globalβ†’ego transformation

  • Ego pose represents ego vehicle’s position/orientation in global coordinates

  • Inverse transformation moves from global to ego coordinate system

2. Ego Vehicle to Camera TransformationΒΆ

Purpose: Transform from ego vehicle coordinates to camera coordinates

Implementation: See transform_matrix() function

# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z]  # camera position relative to ego
cam_rotation = [w, x, y, z]  # camera orientation quaternion

# Create inverse transformation matrix (ego β†’ camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

# Apply transformation
points_camera = ego_to_cam @ points_ego_homogeneous

Key Points:

  • Camera calibration is relative to ego vehicle

  • Use inverse=True to get egoβ†’camera transformation

  • Results in 3D points in camera coordinate system

3. Camera to Image TransformationΒΆ

Purpose: Project 3D camera coordinates to 2D image pixels

Implementation: See view_points() function

# Camera intrinsic matrix
K = [[fx,  0, cx],
     [ 0, fy, cy],
     [ 0,  0,  1]]

# Project 3D points to 2D
points_2d = view_points(points_3d_camera, K, normalize=True)

Key Points:

  • Uses perspective projection: u = fx * X/Z + cx, v = fy * Y/Z + cy

  • Points with Z ≀ 0 are behind the camera and invalid

  • normalize=True performs the division by Z coordinate

3D Bounding Box ProcessingΒΆ

Bounding Box RepresentationΒΆ

NuScenes represents 3D bounding boxes with:

  • Center: [x, y, z] in global coordinates

  • Size: [width, length, height] in meters

  • Rotation: Quaternion [w, x, y, z] in global coordinates

Corner Point GenerationΒΆ

Implementation: See get_3d_box_corners() function

The function generates 8 corner points of a 3D bounding box:

def get_3d_box_corners(center, size, rotation):
    """
    Generate 8 corner points of 3D bounding box.
    
    Corner ordering (NuScenes convention):
    Bottom face (z = -height/2):
      1 ---- 0
      |      |
      |      |  
      2 ---- 3
    
    Top face (z = +height/2):
      5 ---- 4
      |      |
      |      |
      6 ---- 7
    """

Process:

  1. Define unit box corners centered at origin

  2. Scale by box dimensions

  3. Apply rotation using quaternion

  4. Translate to final position

Key Points:

  • Corner ordering follows NuScenes convention

  • Vehicle front direction aligns with +Y axis in box coordinates

  • Rotation is applied before translation

2D Projection PipelineΒΆ

Complete Pipeline ImplementationΒΆ

Implementation: See project_3d_box_to_2d() function

The complete pipeline transforms 3D bounding boxes to 2D image coordinates:

def project_3d_box_to_2d(center_3d, size_3d, rotation_3d, 
                        cam_translation, cam_rotation, camera_intrinsic,
                        ego_translation, ego_rotation, debug=False):
    """
    Complete transformation pipeline:
    Global β†’ Ego Vehicle β†’ Camera β†’ Image
    """

Step-by-Step ProcessΒΆ

Step 1: Generate 3D Box CornersΒΆ

# Generate 8 corner points in global coordinates
corners_3d_global = get_3d_box_corners(center_3d, size_3d, rotation_3d)

Step 2: Global β†’ Ego TransformationΒΆ

# Transform from global to ego vehicle coordinates
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
corners_ego = apply_transformation(global_to_ego, corners_3d_global)

Step 3: Ego β†’ Camera TransformationΒΆ

# Transform from ego to camera coordinates
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
corners_camera = apply_transformation(ego_to_cam, corners_ego)

Step 4: Camera β†’ Image ProjectionΒΆ

# Project 3D camera coordinates to 2D image pixels
corners_2d = view_points(corners_camera.T, camera_intrinsic, normalize=True)

Visibility and ValidationΒΆ

Implementation: See box_in_image() function

def box_in_image(corners_3d, corners_2d, intrinsic, imsize, vis_level):
    """
    Check if bounding box is visible in image.
    
    vis_level options:
    - BoxVisibility.ALL: All corners must be inside image
    - BoxVisibility.ANY: At least one corner must be visible  
    - BoxVisibility.NONE: No visibility requirement
    """

Code ImplementationΒΆ

Key Functions and Their RolesΒΆ

  1. view_points()

    • Handles perspective projection from 3D to 2D

    • Applies camera intrinsic matrix

    • Normalizes homogeneous coordinates

  2. transform_matrix()

    • Creates 4x4 transformation matrices

    • Handles translation and rotation

    • Supports inverse transformations

  3. quaternion_to_rotation_matrix()

    • Converts quaternions to rotation matrices

    • Handles both scipy and manual implementations

    • Ensures numerical stability

  4. get_3d_box_corners()

    • Generates 8 corner points of 3D bounding box

    • Follows NuScenes corner ordering convention

    • Applies scaling, rotation, and translation

  5. project_3d_box_to_2d()

    • Complete transformation pipeline

    • Handles all coordinate system conversions

    • Provides debug information and error handling

Data Loading and Validation FunctionsΒΆ

  1. load_nuscenes_data()

    • Loads all NuScenes JSON annotation files

    • Returns structured dictionary with scenes, samples, annotations

    • Handles file validation and error reporting

  2. prepare_sample_data()

    • Extracts data for specific sample index

    • Organizes annotations, camera data, and sensor info

    • Provides ready-to-use data structure for visualization

  3. validate_dataset_structure()

    • Validates NuScenes directory structure

    • Checks for required folders and annotation files

    • Returns boolean validation result

  4. diagnose_dataset_issues()

    • Comprehensive dataset health check

    • Identifies missing files, corrupted data, and structure issues

    • Provides detailed diagnostic report

Visualization and Rendering FunctionsΒΆ

  1. visualize_lidar_3d_open3d()

    • Interactive 3D LiDAR point cloud visualization

    • Uses Open3D for high-quality rendering

    • Supports 3D bounding box overlays

  2. visualize_bev_with_boxes()

    • Bird’s Eye View (BEV) visualization

    • Projects LiDAR points to 2D top-down view

    • Renders 3D boxes as 2D rectangles in BEV space

  3. visualize_lidar_projection()

    • Projects LiDAR points onto camera images

    • Color-codes points by distance or intensity

    • Handles camera-LiDAR calibration

  4. create_combined_visualization()

    • Creates multi-panel visualization layouts

    • Combines camera, LiDAR, and BEV views

    • Generates publication-ready figures

Drawing and Rendering UtilitiesΒΆ

  1. draw_3d_box_2d()

    • Draws 3D bounding box wireframes on 2D images

    • Handles perspective projection and clipping

    • Supports custom colors and line styles

  2. draw_2d_bbox()

    • Draws 2D bounding boxes on images

    • Includes category labels and confidence scores

    • Handles image boundary clipping

  3. draw_bev_box_2d()

    • Renders 2D boxes in Bird’s Eye View

    • Handles rotation and scaling in BEV space

    • Supports transparency and color coding

LiDAR Processing FunctionsΒΆ

  1. load_lidar_points()

    • Loads LiDAR point cloud from .pcd files

    • Handles different point cloud formats

    • Returns numpy array with x, y, z, intensity

  2. transform_lidar_to_ego()

    • Transforms LiDAR points to ego vehicle coordinate system

    • Applies sensor calibration parameters

    • Handles translation and rotation transformations

  3. project_lidar_to_camera()

    • Projects 3D LiDAR points to camera image plane

    • Applies full transformation pipeline

    • Filters points behind camera or outside image

Coordinate System UtilitiesΒΆ

  1. get_lidar_calibration_info()

    • Extracts LiDAR sensor calibration parameters

    • Retrieves translation and rotation from ego vehicle

    • Returns calibration dictionary for transformations

  2. process_camera_data()

    • Processes camera sensor data and calibration

    • Extracts intrinsic and extrinsic parameters

    • Prepares camera info for coordinate transformations

  3. process_3d_annotations_for_camera()

    • Filters and processes 3D annotations for specific camera

    • Applies visibility checks and coordinate transformations

    • Returns camera-specific annotation data

Utility and Helper FunctionsΒΆ

  1. get_2d_bbox_from_3d_projection()

    • Computes 2D bounding box from projected 3D corners

    • Finds min/max coordinates of projected points

    • Handles edge cases and invalid projections

  2. clip_line_to_image()

    • Clips 3D box edges to image boundaries

    • Implements line-rectangle intersection algorithm

    • Prevents drawing outside image bounds

  3. extract_nuscenes_subset()

    • Creates smaller dataset subset for testing

    • Copies relevant samples and annotations

    • Maintains data structure integrity

Usage ExampleΒΆ

# Load NuScenes data
nuscenes_data = load_nuscenes_data(nuscenes_root)
sample_data = prepare_sample_data(nuscenes_data, sample_idx=0)

# Validate dataset first
if not validate_dataset_structure(nuscenes_root):
    diagnosis = diagnose_dataset_issues(nuscenes_root)
    print("Dataset issues found:", diagnosis)

# Get annotation and camera info
annotation = sample_data['annotations'][0]  # First annotation
camera_info = sample_data['cameras']['CAM_FRONT']

# Extract 3D bounding box parameters
center_3d = annotation['translation']
size_3d = annotation['size'] 
rotation_3d = annotation['rotation']

# Extract camera and ego pose information
cam_translation = camera_info['translation']
cam_rotation = camera_info['rotation']
camera_intrinsic = camera_info['camera_intrinsic']
ego_translation = camera_info['ego_translation']
ego_rotation = camera_info['ego_rotation']

# Project 3D box to 2D
corners_2d, corners_3d = project_3d_box_to_2d(
    center_3d, size_3d, rotation_3d,
    cam_translation, cam_rotation, camera_intrinsic,
    ego_translation, ego_rotation, debug=True
)

# Check visibility
is_visible = box_in_image(
    corners_3d, corners_2d, camera_intrinsic, 
    (1600, 900), BoxVisibility.ANY
)

Common Issues and SolutionsΒΆ

1. Incorrect Coordinate System AssumptionsΒΆ

Problem: Mixing up coordinate system conventions Solution: Always verify axis directions and origins

# Global: X=East, Y=North, Z=Up
# Ego: X=Forward, Y=Left, Z=Up  
# Camera: X=Right, Y=Down, Z=Forward

2. Transformation Matrix OrderΒΆ

Problem: Applying transformations in wrong order Solution: Follow the pipeline: Global β†’ Ego β†’ Camera β†’ Image

# Correct order
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)

3. Quaternion NormalizationΒΆ

Problem: Unnormalized quaternions causing incorrect rotations Solution: Always normalize quaternions before use

q = np.array(quaternion)
q = q / np.linalg.norm(q)  # Normalize

4. Behind-Camera PointsΒΆ

Problem: Points with negative Z coordinates in camera space Solution: Check depth values and handle appropriately

depths = corners_camera[:, 2]
if np.any(depths <= 0):
    print("Warning: Some points behind camera")

5. Homogeneous CoordinatesΒΆ

Problem: Forgetting to use homogeneous coordinates for transformations Solution: Always add 1 as 4th dimension

points_homogeneous = np.ones((points.shape[0], 4))
points_homogeneous[:, :3] = points

Best PracticesΒΆ

1. Validation and TestingΒΆ

  • Always validate transformations with known test cases

  • Check intermediate results at each transformation step

  • Use debug mode to trace coordinate transformations

  • Verify corner ordering matches NuScenes convention

2. Error HandlingΒΆ

  • Check for points behind camera (Z ≀ 0)

  • Validate input parameters (non-zero quaternions, valid matrices)

  • Handle edge cases (very small/large bounding boxes)

  • Provide meaningful error messages

3. Performance OptimizationΒΆ

  • Batch process multiple boxes when possible

  • Cache transformation matrices when processing multiple annotations

  • Use vectorized operations instead of loops

  • Pre-compute frequently used values

4. Code OrganizationΒΆ

  • Separate coordinate transformation logic from visualization

  • Use consistent parameter naming across functions

  • Document coordinate system conventions clearly

  • Provide usage examples and test cases

5. Debugging TechniquesΒΆ

# Enable debug mode for detailed transformation info
corners_2d, corners_3d = project_3d_box_to_2d(..., debug=True)

# Visualize intermediate results
plt.figure(figsize=(15, 5))
plt.subplot(131); plot_global_coordinates(corners_global)
plt.subplot(132); plot_ego_coordinates(corners_ego)  
plt.subplot(133); plot_camera_coordinates(corners_camera)

6. Coordinate System VerificationΒΆ

def verify_coordinate_systems():
    """Verify coordinate system transformations with known points."""
    # Test point at ego vehicle center
    ego_center = np.array([0, 0, 0, 1])  # Origin in ego coordinates
    
    # Should transform to ego_translation in global coordinates
    global_point = ego_to_global @ ego_center
    assert np.allclose(global_point[:3], ego_translation)
    
    print("Coordinate system verification passed!")

This tutorial provides a comprehensive guide to understanding and implementing coordinate transformations in the NuScenes dataset. The key to success is understanding the coordinate system conventions, following the transformation pipeline correctly, and validating results at each step.