NuScenes Dataset Tutorial: Coordinate Transformations and Bounding Box ProcessingΒΆ
Table of ContentsΒΆ
3D Bounding Box Processing
2D Projection Pipeline
IntroductionΒΆ
The NuScenes dataset is a large-scale autonomous driving dataset that provides multimodal sensor data including cameras, LiDAR, and radar. One of the most challenging aspects of working with NuScenes is understanding and correctly implementing the coordinate system transformations required to project 3D annotations onto 2D camera images.
This tutorial focuses on the critical processes of:
Coordinate System Transformations: Converting between global, ego vehicle, and camera coordinate systems
3D Bounding Box Processing: Generating and manipulating 3D bounding boxes
2D Projection: Projecting 3D bounding boxes onto camera images
Ensuring Correctness: Validation techniques and common pitfalls
Dataset StructureΒΆ
NuScenes organizes data hierarchically:
nuscenes/
βββ samples/ # Keyframes (2Hz) with all sensor data
βββ sweeps/ # Intermediate frames (20Hz) for LiDAR/radar
βββ maps/ # HD maps for each location
βββ v1.0-trainval/ # Annotation files (JSON)
βββ sample.json
βββ sample_data.json
βββ sample_annotation.json
βββ ego_pose.json
βββ calibrated_sensor.json
βββ ...
Dataset Annotation FormatΒΆ
NuScenes provides comprehensive annotations in JSON format, with all spatial annotations defined in the global coordinate system.
Annotation Types and Coordinate SystemsΒΆ
1. 3D Bounding Box AnnotationsΒΆ
File: sample_annotation.json
Coordinate System: Global coordinates
Format: Center + Size + Orientation
{
"token": "unique_annotation_id",
"sample_token": "sample_id",
"instance_token": "object_instance_id",
"category_name": "car",
"translation": [x, y, z], # 3D center in global coordinates (meters)
"size": [width, length, height], # Bounding box dimensions (meters)
"rotation": [w, x, y, z], # Quaternion orientation in global frame
"visibility": 2, # Visibility level (1-4)
"attribute_tokens": ["moving"], # Object attributes
"num_lidar_pts": 150, # Number of LiDAR points inside box
"num_radar_pts": 5 # Number of radar points inside box
}
Key Points:
Translation: 3D center position
[x, y, z]in global coordinatesSize: Box dimensions
[width, length, height]in metersRotation: Quaternion
[w, x, y, z]representing orientation in global frameCoordinate Convention: Right-handed system (X=East, Y=North, Z=Up)
2. Ego Vehicle PoseΒΆ
File: ego_pose.json
Coordinate System: Global coordinates
Purpose: Vehicle position and orientation at each timestamp
{
"token": "ego_pose_token",
"timestamp": 1532402927647951,
"translation": [x, y, z], # Ego position in global coordinates
"rotation": [w, x, y, z] # Ego orientation quaternion in global frame
}
3. Sensor CalibrationΒΆ
File: calibrated_sensor.json
Coordinate System: Relative to ego vehicle
Purpose: Sensor position and orientation relative to ego vehicle
{
"token": "sensor_calibration_token",
"sensor_token": "sensor_id",
"translation": [x, y, z], # Sensor position relative to ego vehicle
"rotation": [w, x, y, z], # Sensor orientation relative to ego vehicle
"camera_intrinsic": [[fx, 0, cx], # Camera intrinsic matrix (cameras only)
[0, fy, cy],
[0, 0, 1]]
}
4. Sample DataΒΆ
File: sample_data.json
Purpose: Links sensor data files to timestamps and calibrations
{
"token": "sample_data_token",
"sample_token": "sample_id",
"ego_pose_token": "ego_pose_id",
"calibrated_sensor_token": "sensor_calibration_id",
"filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
"fileformat": "jpg",
"timestamp": 1532402927612460,
"is_key_frame": true
}
Coordinate Transformation Process Using Sample DataΒΆ
The sample data structure enables the complete coordinate transformation pipeline. Hereβs how to use the linked data for transformations:
Step-by-Step Transformation WorkflowΒΆ
1. Load Required Data Using Sample Data Tokens
# Using the sample_data.json information
sample_data = {
"token": "sample_data_token",
"sample_token": "sample_id",
"ego_pose_token": "ego_pose_id",
"calibrated_sensor_token": "sensor_calibration_id",
"filename": "samples/CAM_FRONT/n015-2018-07-24-11-22-45+0800__CAM_FRONT__1532402927612460.jpg",
"timestamp": 1532402927612460
}
# Load corresponding data using tokens
ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
sensor_calibration = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
sample_annotations = nusc.get('sample', sample_data['sample_token'])['anns']
2. Extract Transformation Matrices
# From ego_pose.json (Global coordinates)
ego_pose_data = {
"translation": [463.12, 1080.45, 1.84], # Ego position in global coordinates
"rotation": [0.9659, 0.0, 0.0, 0.2588] # Ego orientation quaternion
}
# From calibrated_sensor.json (Relative to ego vehicle)
sensor_calibration_data = {
"translation": [1.70, 0.0, 1.54], # Camera position relative to ego
"rotation": [0.7071, 0.0, 0.0, 0.7071], # Camera orientation relative to ego
"camera_intrinsic": [[1266.4, 0.0, 816.3], # Camera intrinsic matrix
[0.0, 1266.4, 491.5],
[0.0, 0.0, 1.0]]
}
3. Complete Transformation Pipeline Example
import numpy as np
from pyquaternion import Quaternion
# Example: Transform 3D bounding box from global to image coordinates
# Step 1: Get 3D bounding box in global coordinates
bbox_annotation = {
"translation": [465.2, 1085.1, 1.2], # Box center in global coordinates
"size": [4.5, 1.8, 1.5], # [length, width, height]
"rotation": [0.9848, 0.0, 0.0, 0.1736] # Box orientation quaternion
}
# Generate 8 corners of 3D bounding box in global coordinates
box_corners_global = get_3d_box_corners(
bbox_annotation['translation'],
bbox_annotation['size'],
bbox_annotation['rotation']
)
# Step 2: Global β Ego Vehicle transformation
ego_translation = np.array(ego_pose_data['translation'])
ego_rotation = Quaternion(ego_pose_data['rotation'])
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ np.vstack([box_corners_global, np.ones((1, 8))])
# Step 3: Ego Vehicle β Camera transformation
cam_translation = np.array(sensor_calibration_data['translation'])
cam_rotation = Quaternion(sensor_calibration_data['rotation'])
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
box_corners_camera = ego_to_cam @ box_corners_ego
# Step 4: Camera β Image projection
camera_intrinsic = np.array(sensor_calibration_data['camera_intrinsic'])
image_points = view_points(box_corners_camera[:3, :], camera_intrinsic, normalize=True)
# Step 5: Extract 2D bounding box
x_coords = image_points[0, :]
y_coords = image_points[1, :]
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]
print(f"2D Bounding Box: {bbox_2d}")
# Output: [245.3, 180.7, 580.1, 420.9] (pixel coordinates)
4. Data Flow Visualization
Sample Data Token Flow:
βββββββββββββββββββ ego_pose_token βββββββββββββββββββ
β sample_data β βββββββββββββββββββ β ego_pose β
β β β (Global coords) β
βββββββββββββββββββ βββββββββββββββββββ
β β
β calibrated_sensor_token β translation, rotation
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β sensor_calib β β Transform Matrixβ
β (Ego relative) β βββββββββββββββββββ β Global β Ego β
βββββββββββββββββββ βββββββββββββββββββ
β β
β camera_intrinsic β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Camera Matrix β β 3D β 2D Project β
β (Pixel coords) β βββββββββββββββββββ β Final Result β
βββββββββββββββββββ βββββββββββββββββββ
Key Implementation PointsΒΆ
Token-Based Data Linking: Use
ego_pose_tokenandcalibrated_sensor_tokento fetch transformation dataTimestamp Synchronization: The
timestampfield ensures temporal alignment between sensorsCoordinate System Chain: Global β Ego β Camera β Image coordinates
Matrix Composition: Combine transformation matrices for efficient batch processing
# Efficient batch transformation
combined_transform = camera_intrinsic @ ego_to_cam @ global_to_ego
final_2d_points = combined_transform @ global_3d_points_homogeneous
Practical Usage ExampleΒΆ
def process_sample_data(nusc, sample_data_token):
"""Complete pipeline from sample data token to 2D projections"""
# 1. Load all required data using tokens
sample_data = nusc.get('sample_data', sample_data_token)
ego_pose = nusc.get('ego_pose', sample_data['ego_pose_token'])
sensor_calib = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
# 2. Get all annotations for this sample
sample = nusc.get('sample', sample_data['sample_token'])
# 3. Process each annotation
results = []
for ann_token in sample['anns']:
annotation = nusc.get('sample_annotation', ann_token)
# Transform from global to image coordinates
bbox_2d = project_3d_box_to_2d(
annotation, ego_pose, sensor_calib
)
results.append({
'category': annotation['category_name'],
'bbox_2d': bbox_2d,
'visibility': annotation['visibility']
})
return results
This workflow demonstrates how the sample data structure enables seamless coordinate transformations by linking all necessary calibration and pose information through tokens.
Annotation Coordinate System SummaryΒΆ
Annotation Type |
Coordinate System |
Reference Frame |
Units |
|---|---|---|---|
3D Bounding Boxes |
Global |
World coordinates |
Meters |
Ego Poses |
Global |
World coordinates |
Meters |
Sensor Calibration |
Ego Vehicle |
Relative to ego |
Meters |
Camera Intrinsics |
Camera |
Pixel coordinates |
Pixels |
Object CategoriesΒΆ
NuScenes includes 23 object categories across different domains:
Vehicles: car, truck, bus, trailer, construction_vehicle, emergency_vehicle, motorcycle, bicycle
Humans: adult, child, police_officer, construction_worker
Objects: traffic_cone, barrier, debris, pushable_pullable, movable_object
Static: animal (rare cases)
Visibility LevelsΒΆ
Annotations include visibility information for occlusion handling:
Level 1: 0-40% visible
Level 2: 40-60% visible
Level 3: 60-80% visible
Level 4: 80-100% visible
Implementation NotesΒΆ
When working with NuScenes annotations:
All 3D annotations are in global coordinates - transform to desired coordinate system
Quaternions use [w, x, y, z] format - be careful with different librariesβ conventions
Box dimensions follow [width, length, height] - width=left-right, length=front-back
Use
num_lidar_ptsandnum_radar_ptsfor filtering low-quality annotationsCheck visibility levels for occlusion-aware training
Coordinate SystemsΒΆ
Understanding NuScenes coordinate systems is crucial for correct transformations:
1. Global Coordinate SystemΒΆ
Origin: Arbitrary reference point in the world
Axes: Right-handed coordinate system
X: East direction
Y: North direction
Z: Up direction (gravity opposite)
Units: Meters
Usage: All ego poses and annotations are defined in global coordinates
2. Ego Vehicle Coordinate SystemΒΆ
Origin: Center of the ego vehicleβs rear axle
Axes: Right-handed coordinate system relative to vehicle
X: Forward direction (vehicleβs front)
Y: Left direction (vehicleβs left side)
Z: Up direction (vehicleβs roof)
Units: Meters
Usage: Sensor calibrations are defined relative to ego vehicle
3. Camera Coordinate SystemΒΆ
Origin: Cameraβs optical center
Axes: Standard computer vision convention
X: Right direction (image width)
Y: Down direction (image height)
Z: Forward direction (into the scene)
Units: Meters
Usage: 3D points in camera space before projection
4. Image Coordinate SystemΒΆ
Origin: Top-left corner of the image
Axes: 2D pixel coordinates
u: Horizontal axis (0 to image_width-1)
v: Vertical axis (0 to image_height-1)
Units: Pixels
Usage: Final 2D bounding box coordinates
Coordinate TransformationsΒΆ
The transformation pipeline varies depending on the visualization type. Hereβs a comprehensive overview:
Transformation Pipelines by Visualization TypeΒΆ
1. 2D Image Bounding BoxΒΆ
Global β Ego Vehicle β Camera β Image (2D Projection)
Purpose: Project 3D objects onto 2D camera images for object detection Output: 2D rectangular bounding boxes in pixel coordinates
2. 3D Image Bounding BoxΒΆ
Global β Ego Vehicle β Camera β Image (3D Wireframe)
Purpose: Visualize 3D object structure overlaid on camera images Output: 3D wireframe boxes projected onto 2D image plane
3. BEV (Birdβs Eye View)ΒΆ
Global β Ego Vehicle β BEV Projection
Purpose: Top-down view for spatial understanding and path planning Output: 2D boxes in ego vehicle coordinate system (X-Y plane)
4. 3D LiDAR Bounding BoxΒΆ
Global β Ego Vehicle β LiDAR Sensor
Purpose: 3D object detection and tracking in point cloud data Output: 3D oriented bounding boxes in LiDAR coordinate system
Mathematical FoundationΒΆ
Each transformation uses homogeneous coordinates and 4x4 transformation matrices:
T = [R t]
[0 1]
Where:
R: 3x3 rotation matrixt: 3x1 translation vector
Detailed Transformation Processes by Visualization TypeΒΆ
1. 2D Image Bounding Box TransformationΒΆ
Complete Pipeline: Global β Ego Vehicle β Camera β Image (2D Projection)
Step 1: Global to Ego VehicleΒΆ
# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z] # ego position in global coordinates
ego_rotation = [w, x, y, z] # ego orientation quaternion
# Create inverse transformation matrix (global β ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
# Apply transformation to 3D box corners
box_corners_ego = global_to_ego @ box_corners_global_homogeneous
Step 2: Ego Vehicle to CameraΒΆ
# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z] # camera position relative to ego
cam_rotation = [w, x, y, z] # camera orientation quaternion
# Create inverse transformation matrix (ego β camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
# Apply transformation
box_corners_camera = ego_to_cam @ box_corners_ego
Step 3: Camera to Image (2D Projection)ΒΆ
# Project 3D camera coordinates to 2D image pixels
# Using camera intrinsic matrix
image_points = view_points(box_corners_camera, camera_intrinsic, normalize=True)
# Extract 2D bounding box from projected corners
x_coords = image_points[0, :] # u coordinates
y_coords = image_points[1, :] # v coordinates
# Create 2D bounding box
bbox_2d = [min(x_coords), min(y_coords), max(x_coords), max(y_coords)]
Key Characteristics:
Output: 2D rectangular box
[x_min, y_min, x_max, y_max]in pixel coordinatesUse Case: Object detection, tracking in camera images
Coordinate System: Image pixels (u, v)
Implementation: See
project_3d_box_to_2d()function
2. 3D Image Bounding Box TransformationΒΆ
Complete Pipeline: Global β Ego Vehicle β Camera β Image (3D Wireframe)
Steps 1-2: Same as 2D (Global β Ego β Camera)ΒΆ
The first two steps are identical to 2D bounding box transformation.
Step 3: 3D Wireframe ProjectionΒΆ
# Project all 8 corners of 3D box to image
corners_3d_camera = ego_to_cam @ (global_to_ego @ box_corners_global)
image_corners = view_points(corners_3d_camera, camera_intrinsic, normalize=True)
# Define 3D box edges (12 edges connecting 8 corners)
edges = [
[0, 1], [1, 2], [2, 3], [3, 0], # bottom face
[4, 5], [5, 6], [6, 7], [7, 4], # top face
[0, 4], [1, 5], [2, 6], [3, 7] # vertical edges
]
# Draw wireframe on image
for edge in edges:
start_point = image_corners[:, edge[0]]
end_point = image_corners[:, edge[1]]
# Draw line from start_point to end_point
Key Characteristics:
Output: 8 projected corner points + 12 connecting edges
Use Case: 3D object visualization on camera images
Coordinate System: Image pixels (u, v) with depth information preserved
Visualization: Wireframe overlay showing 3D structure
3. BEV (Birdβs Eye View) TransformationΒΆ
Simplified Pipeline: Global β Ego Vehicle β BEV Projection
Step 1: Global to Ego Vehicle (Same as above)ΒΆ
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous
Step 2: BEV Projection (Top-Down View)ΒΆ
# Extract X-Y coordinates (ignore Z for top-down view)
bev_points = box_corners_ego[:2, :] # Take only X, Y coordinates
# Convert to BEV image coordinates
# Typically: X-forward, Y-left in ego coordinates
# BEV image: X-right, Y-up in image coordinates
bev_x = bev_points[1, :] * scale + offset_x # Y_ego β X_bev
bev_y = -bev_points[0, :] * scale + offset_y # -X_ego β Y_bev
# Create 2D polygon from projected corners
bev_polygon = list(zip(bev_x, bev_y))
Key Characteristics:
Output: 2D polygon in BEV coordinate system
Use Case: Path planning, spatial reasoning, multi-object tracking
Coordinate System: Top-down view (X-Y plane of ego vehicle)
Advantages: No occlusion, consistent scale, easy distance measurement
4. 3D LiDAR Bounding Box TransformationΒΆ
Simplified Pipeline: Global β Ego Vehicle β LiDAR Sensor
Step 1: Global to Ego Vehicle (Same as above)ΒΆ
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
box_corners_ego = global_to_ego @ box_corners_global_homogeneous
Step 2: Ego to LiDAR Sensor (if needed)ΒΆ
# Most LiDAR sensors are mounted close to ego vehicle center
# Often LiDAR coordinate system β ego coordinate system
# If transformation needed:
lidar_translation = [x, y, z] # LiDAR position relative to ego
lidar_rotation = [w, x, y, z] # LiDAR orientation quaternion
ego_to_lidar = transform_matrix(lidar_translation, lidar_rotation, inverse=True)
box_corners_lidar = ego_to_lidar @ box_corners_ego
Step 3: 3D Box RepresentationΒΆ
# 3D bounding box in LiDAR coordinates
# Typically represented as:
# - Center: [x, y, z]
# - Dimensions: [length, width, height]
# - Orientation: yaw angle or quaternion
box_3d = {
'center': np.mean(box_corners_lidar[:3, :], axis=1),
'size': [length, width, height],
'orientation': yaw_angle
}
Key Characteristics:
Output: 3D oriented bounding box with center, size, and orientation
Use Case: 3D object detection, autonomous driving, robotics
Coordinate System: 3D LiDAR sensor coordinates
Advantages: Direct 3D measurements, no projection distortion
Transformation SummaryΒΆ
Visualization Type |
Pipeline |
Output Format |
Primary Use Case |
|---|---|---|---|
2D Image BBox |
GlobalβEgoβCameraβImage |
|
Object detection |
3D Image BBox |
GlobalβEgoβCameraβImage |
8 corners + 12 edges |
3D visualization |
BEV |
GlobalβEgoβBEV |
2D polygon |
Path planning |
3D LiDAR BBox |
GlobalβEgoβLiDAR |
Center + Size + Orientation |
3D detection |
Original Transformation Steps (for reference)ΒΆ
1. Global to Ego Vehicle TransformationΒΆ
Purpose: Transform from world coordinates to ego vehicle coordinates
Implementation: See transform_matrix() function
# Get ego pose (position and orientation in global coordinates)
ego_translation = [x, y, z] # ego position in global coordinates
ego_rotation = [w, x, y, z] # ego orientation quaternion
# Create inverse transformation matrix (global β ego)
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
# Apply transformation
points_ego = global_to_ego @ points_global_homogeneous
Key Points:
Use
inverse=Trueto get globalβego transformationEgo pose represents ego vehicleβs position/orientation in global coordinates
Inverse transformation moves from global to ego coordinate system
2. Ego Vehicle to Camera TransformationΒΆ
Purpose: Transform from ego vehicle coordinates to camera coordinates
Implementation: See transform_matrix() function
# Get camera calibration (position and orientation relative to ego)
cam_translation = [x, y, z] # camera position relative to ego
cam_rotation = [w, x, y, z] # camera orientation quaternion
# Create inverse transformation matrix (ego β camera)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
# Apply transformation
points_camera = ego_to_cam @ points_ego_homogeneous
Key Points:
Camera calibration is relative to ego vehicle
Use
inverse=Trueto get egoβcamera transformationResults in 3D points in camera coordinate system
3. Camera to Image TransformationΒΆ
Purpose: Project 3D camera coordinates to 2D image pixels
Implementation: See view_points() function
# Camera intrinsic matrix
K = [[fx, 0, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
# Project 3D points to 2D
points_2d = view_points(points_3d_camera, K, normalize=True)
Key Points:
Uses perspective projection:
u = fx * X/Z + cx,v = fy * Y/Z + cyPoints with Z β€ 0 are behind the camera and invalid
normalize=Trueperforms the division by Z coordinate
3D Bounding Box ProcessingΒΆ
Bounding Box RepresentationΒΆ
NuScenes represents 3D bounding boxes with:
Center:
[x, y, z]in global coordinatesSize:
[width, length, height]in metersRotation: Quaternion
[w, x, y, z]in global coordinates
Corner Point GenerationΒΆ
Implementation: See get_3d_box_corners() function
The function generates 8 corner points of a 3D bounding box:
def get_3d_box_corners(center, size, rotation):
"""
Generate 8 corner points of 3D bounding box.
Corner ordering (NuScenes convention):
Bottom face (z = -height/2):
1 ---- 0
| |
| |
2 ---- 3
Top face (z = +height/2):
5 ---- 4
| |
| |
6 ---- 7
"""
Process:
Define unit box corners centered at origin
Scale by box dimensions
Apply rotation using quaternion
Translate to final position
Key Points:
Corner ordering follows NuScenes convention
Vehicle front direction aligns with +Y axis in box coordinates
Rotation is applied before translation
2D Projection PipelineΒΆ
Complete Pipeline ImplementationΒΆ
Implementation: See project_3d_box_to_2d() function
The complete pipeline transforms 3D bounding boxes to 2D image coordinates:
def project_3d_box_to_2d(center_3d, size_3d, rotation_3d,
cam_translation, cam_rotation, camera_intrinsic,
ego_translation, ego_rotation, debug=False):
"""
Complete transformation pipeline:
Global β Ego Vehicle β Camera β Image
"""
Step-by-Step ProcessΒΆ
Step 1: Generate 3D Box CornersΒΆ
# Generate 8 corner points in global coordinates
corners_3d_global = get_3d_box_corners(center_3d, size_3d, rotation_3d)
Step 2: Global β Ego TransformationΒΆ
# Transform from global to ego vehicle coordinates
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
corners_ego = apply_transformation(global_to_ego, corners_3d_global)
Step 3: Ego β Camera TransformationΒΆ
# Transform from ego to camera coordinates
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
corners_camera = apply_transformation(ego_to_cam, corners_ego)
Step 4: Camera β Image ProjectionΒΆ
# Project 3D camera coordinates to 2D image pixels
corners_2d = view_points(corners_camera.T, camera_intrinsic, normalize=True)
Visibility and ValidationΒΆ
Implementation: See box_in_image() function
def box_in_image(corners_3d, corners_2d, intrinsic, imsize, vis_level):
"""
Check if bounding box is visible in image.
vis_level options:
- BoxVisibility.ALL: All corners must be inside image
- BoxVisibility.ANY: At least one corner must be visible
- BoxVisibility.NONE: No visibility requirement
"""
Code ImplementationΒΆ
Key Functions and Their RolesΒΆ
-
Handles perspective projection from 3D to 2D
Applies camera intrinsic matrix
Normalizes homogeneous coordinates
-
Creates 4x4 transformation matrices
Handles translation and rotation
Supports inverse transformations
quaternion_to_rotation_matrix()Converts quaternions to rotation matrices
Handles both scipy and manual implementations
Ensures numerical stability
-
Generates 8 corner points of 3D bounding box
Follows NuScenes corner ordering convention
Applies scaling, rotation, and translation
-
Complete transformation pipeline
Handles all coordinate system conversions
Provides debug information and error handling
Data Loading and Validation FunctionsΒΆ
-
Loads all NuScenes JSON annotation files
Returns structured dictionary with scenes, samples, annotations
Handles file validation and error reporting
-
Extracts data for specific sample index
Organizes annotations, camera data, and sensor info
Provides ready-to-use data structure for visualization
-
Validates NuScenes directory structure
Checks for required folders and annotation files
Returns boolean validation result
-
Comprehensive dataset health check
Identifies missing files, corrupted data, and structure issues
Provides detailed diagnostic report
Visualization and Rendering FunctionsΒΆ
-
Interactive 3D LiDAR point cloud visualization
Uses Open3D for high-quality rendering
Supports 3D bounding box overlays
-
Birdβs Eye View (BEV) visualization
Projects LiDAR points to 2D top-down view
Renders 3D boxes as 2D rectangles in BEV space
-
Projects LiDAR points onto camera images
Color-codes points by distance or intensity
Handles camera-LiDAR calibration
create_combined_visualization()Creates multi-panel visualization layouts
Combines camera, LiDAR, and BEV views
Generates publication-ready figures
Drawing and Rendering UtilitiesΒΆ
-
Draws 3D bounding box wireframes on 2D images
Handles perspective projection and clipping
Supports custom colors and line styles
-
Draws 2D bounding boxes on images
Includes category labels and confidence scores
Handles image boundary clipping
-
Renders 2D boxes in Birdβs Eye View
Handles rotation and scaling in BEV space
Supports transparency and color coding
LiDAR Processing FunctionsΒΆ
-
Loads LiDAR point cloud from .pcd files
Handles different point cloud formats
Returns numpy array with x, y, z, intensity
-
Transforms LiDAR points to ego vehicle coordinate system
Applies sensor calibration parameters
Handles translation and rotation transformations
-
Projects 3D LiDAR points to camera image plane
Applies full transformation pipeline
Filters points behind camera or outside image
Coordinate System UtilitiesΒΆ
-
Extracts LiDAR sensor calibration parameters
Retrieves translation and rotation from ego vehicle
Returns calibration dictionary for transformations
-
Processes camera sensor data and calibration
Extracts intrinsic and extrinsic parameters
Prepares camera info for coordinate transformations
process_3d_annotations_for_camera()Filters and processes 3D annotations for specific camera
Applies visibility checks and coordinate transformations
Returns camera-specific annotation data
Utility and Helper FunctionsΒΆ
get_2d_bbox_from_3d_projection()Computes 2D bounding box from projected 3D corners
Finds min/max coordinates of projected points
Handles edge cases and invalid projections
-
Clips 3D box edges to image boundaries
Implements line-rectangle intersection algorithm
Prevents drawing outside image bounds
-
Creates smaller dataset subset for testing
Copies relevant samples and annotations
Maintains data structure integrity
Usage ExampleΒΆ
# Load NuScenes data
nuscenes_data = load_nuscenes_data(nuscenes_root)
sample_data = prepare_sample_data(nuscenes_data, sample_idx=0)
# Validate dataset first
if not validate_dataset_structure(nuscenes_root):
diagnosis = diagnose_dataset_issues(nuscenes_root)
print("Dataset issues found:", diagnosis)
# Get annotation and camera info
annotation = sample_data['annotations'][0] # First annotation
camera_info = sample_data['cameras']['CAM_FRONT']
# Extract 3D bounding box parameters
center_3d = annotation['translation']
size_3d = annotation['size']
rotation_3d = annotation['rotation']
# Extract camera and ego pose information
cam_translation = camera_info['translation']
cam_rotation = camera_info['rotation']
camera_intrinsic = camera_info['camera_intrinsic']
ego_translation = camera_info['ego_translation']
ego_rotation = camera_info['ego_rotation']
# Project 3D box to 2D
corners_2d, corners_3d = project_3d_box_to_2d(
center_3d, size_3d, rotation_3d,
cam_translation, cam_rotation, camera_intrinsic,
ego_translation, ego_rotation, debug=True
)
# Check visibility
is_visible = box_in_image(
corners_3d, corners_2d, camera_intrinsic,
(1600, 900), BoxVisibility.ANY
)
Common Issues and SolutionsΒΆ
1. Incorrect Coordinate System AssumptionsΒΆ
Problem: Mixing up coordinate system conventions Solution: Always verify axis directions and origins
# Global: X=East, Y=North, Z=Up
# Ego: X=Forward, Y=Left, Z=Up
# Camera: X=Right, Y=Down, Z=Forward
2. Transformation Matrix OrderΒΆ
Problem: Applying transformations in wrong order Solution: Follow the pipeline: Global β Ego β Camera β Image
# Correct order
global_to_ego = transform_matrix(ego_translation, ego_rotation, inverse=True)
ego_to_cam = transform_matrix(cam_translation, cam_rotation, inverse=True)
3. Quaternion NormalizationΒΆ
Problem: Unnormalized quaternions causing incorrect rotations Solution: Always normalize quaternions before use
q = np.array(quaternion)
q = q / np.linalg.norm(q) # Normalize
4. Behind-Camera PointsΒΆ
Problem: Points with negative Z coordinates in camera space Solution: Check depth values and handle appropriately
depths = corners_camera[:, 2]
if np.any(depths <= 0):
print("Warning: Some points behind camera")
5. Homogeneous CoordinatesΒΆ
Problem: Forgetting to use homogeneous coordinates for transformations Solution: Always add 1 as 4th dimension
points_homogeneous = np.ones((points.shape[0], 4))
points_homogeneous[:, :3] = points
Best PracticesΒΆ
1. Validation and TestingΒΆ
Always validate transformations with known test cases
Check intermediate results at each transformation step
Use debug mode to trace coordinate transformations
Verify corner ordering matches NuScenes convention
2. Error HandlingΒΆ
Check for points behind camera (Z β€ 0)
Validate input parameters (non-zero quaternions, valid matrices)
Handle edge cases (very small/large bounding boxes)
Provide meaningful error messages
3. Performance OptimizationΒΆ
Batch process multiple boxes when possible
Cache transformation matrices when processing multiple annotations
Use vectorized operations instead of loops
Pre-compute frequently used values
4. Code OrganizationΒΆ
Separate coordinate transformation logic from visualization
Use consistent parameter naming across functions
Document coordinate system conventions clearly
Provide usage examples and test cases
5. Debugging TechniquesΒΆ
# Enable debug mode for detailed transformation info
corners_2d, corners_3d = project_3d_box_to_2d(..., debug=True)
# Visualize intermediate results
plt.figure(figsize=(15, 5))
plt.subplot(131); plot_global_coordinates(corners_global)
plt.subplot(132); plot_ego_coordinates(corners_ego)
plt.subplot(133); plot_camera_coordinates(corners_camera)
6. Coordinate System VerificationΒΆ
def verify_coordinate_systems():
"""Verify coordinate system transformations with known points."""
# Test point at ego vehicle center
ego_center = np.array([0, 0, 0, 1]) # Origin in ego coordinates
# Should transform to ego_translation in global coordinates
global_point = ego_to_global @ ego_center
assert np.allclose(global_point[:3], ego_translation)
print("Coordinate system verification passed!")
This tutorial provides a comprehensive guide to understanding and implementing coordinate transformations in the NuScenes dataset. The key to success is understanding the coordinate system conventions, following the transformation pipeline correctly, and validating results at each step.