🧭 Comprehensive Tutorial: Understanding and Visualizing the Waymo Open Dataset v2

This tutorial provides an in-depth, practical, and mathematical explanation of how to interpret, transform, and visualize Waymo v2.1 LiDAR and 3D box data.
It’s written for researchers and developers who want to deeply understand how Waymo structures its data, what coordinate frames are used, and how to correctly align LiDAR, camera, and annotations.

Code Reference: This tutorial is complemented by the comprehensive data inspection utilities in waymo_parquet_inspector.py, which provides detailed field analysis and validation tools for all Waymo data components.


Table of Contents

  1. Dataset Overview

  2. File Structure & Contents

  3. Coordinate Frames

  4. Camera Data Components

  5. LiDAR Data Components

  6. Calibration Data

  7. LiDAR-to-Vehicle Transform Mathematics

  8. 3D Box Definitions

  9. Coordinate Alignment: LiDAR ↔ Box

  10. 2D and 3D Box Visualization

  11. Common Pitfalls & Debug Tips

  12. References


1️⃣ Dataset Overview

📊 Dataset Scale & Format

Waymo Open Dataset (WOD) provides synchronized LiDAR and camera data with ground-truth 3D bounding boxes.

  • 2,030 segments of 20s each, collected at 10Hz (390,000 frames)

  • Diverse geographies and conditions

  • Perception object assets data in a modular format (v2.0.0)

  • Extracted perception objects from multi-sensor data: all 5 cameras and the top lidar

🔧 Sensor Configuration

LiDAR Sensors:

  • 1 mid-range lidar

  • 4 short-range lidars

Camera Sensors:

  • 5 cameras (front and sides)

Data Synchronization:

  • Synchronized lidar and camera data

  • Lidar to camera projections

  • Sensor calibrations and vehicle poses

🏷️ Annotation Categories

3D Bounding Box Labels

  • 4 object classes: Vehicles, Pedestrians, Cyclists, Signs

  • High-quality labels for lidar data: 1,200 segments

  • 12.6M 3D bounding box labels with tracking IDs on lidar data

2D Bounding Box Labels

  • High-quality labels for camera data: 1,000 segments

  • 11.8M 2D bounding box labels with tracking IDs on camera data

2D Video Panoptic Segmentation

  • Subset: 100k camera images

  • 28 classes including:

    • Vehicles: Car, Bus, Truck, Other Large Vehicle, Trailer, Ego Vehicle, Motorcycle, Bicycle

    • People: Pedestrian, Cyclist, Motorcyclist

    • Animals: Ground Animal, Bird

    • Infrastructure: Pole, Sign, Traffic Light, Construction Cone, Pedestrian Object, Building

    • Road Elements: Road, Sidewalk, Road Marker, Lane Marker

    • Environment: Vegetation, Sky, Ground

    • Motion States: Static, Dynamic

  • Instance segmentation labels for Vehicle, Pedestrian and Cyclist classes

  • Consistent both across cameras and over time

Key Point Labels

  • 2 object classes: Pedestrians and Cyclists

  • 14 key points from nose to ankle

  • 200k object frames with 2D key point labels

  • 10k object frames with 3D key point labels

3D Semantic Segmentation

  • Segmentation labels: 1,150 segments

  • 23 classes including:

    • Vehicles: Car, Truck, Bus, Other Vehicle

    • People: Motorcyclist, Bicyclist, Pedestrian

    • Objects: Bicycle, Motorcycle, Sign, Traffic Light, Pole, Construction Cone

    • Environment: Building, Vegetation, Tree Trunk, Curb

    • Road Elements: Road, Lane Marker, Walkable, Sidewalk, Other Ground

    • Undefined: Undefined

🗺️ Map Data

  • 3D road graph data for each segment

  • Includes: lane centers, lane boundaries, road boundaries, crosswalks, speed bumps, stop signs, and entrances to driveways

🔗 Cross-Modal Associations

  • Association of 2D and 3D bounding boxes

  • Corresponding object IDs provided for 2 object classes: Pedestrians and Cyclists

🎯 Challenge Data

  • 3D Camera-Only Detection Challenge: 80 segments of 20s camera imagery

🚀 Advanced Features

LiDAR features include:

  • 3D point cloud sequences that support 3D object shape reconstruction

Camera features include:

  • Sequences of camera patches from the most_visible_camera

  • Projected lidar returns on the corresponding camera

  • Per-pixel camera rays information

  • Auto-labeled 2D panoptic segmentation that supports object NeRF reconstruction

The v2.0.1 Parquet version is designed for efficient columnar access and includes:

Component

Folder

Description

Schema Columns

Camera Images

camera_image/

RGB frames with pose/velocity metadata

15 columns with binary JPEG/PNG data

Camera Boxes

camera_box/

2D bounding boxes in pixel coordinates

12 columns with detection annotations

Camera Calibration

camera_calibration/

Intrinsics + extrinsics for all cameras

15 columns with calibration matrices

LiDAR Range Images

lidar/

Raw per-LiDAR sensor range images

7 columns with flattened range data

3D Boxes

lidar_box/

Ground-truth boxes in Vehicle Frame

21 columns with 3D object state

LiDAR Calibration

lidar_calibration/

Beam inclinations and extrinsic transforms

6 columns with sensor parameters

LiDAR Poses

lidar_pose/

Per-pixel transforms LiDAR → Vehicle → World

Pose transformation matrices

Segmentation

lidar_segmentation/

Point-wise semantic labels

Semantic segmentation masks

Projections

lidar_camera_projection/

LiDAR-to-camera pixel mappings

Cross-modal alignment data


2️⃣ File Structure & Contents

Each segment is approximately 20 seconds long, split into multiple Parquet files with standardized naming:

segment_id_start_end.parquet
Example: 10017090168044687777_6380_000_6400_000.parquet

Inside each data folder (e.g., training/lidar/), files contain rows corresponding to sensor measurements at specific timestamps.

Parquet Schema Structure

All Waymo v2.01 data follows a consistent schema pattern:

Key Fields (Common across all data types):
├── index: Unique row identifier (String)
├── key.segment_context_name: Segment ID (String)  
├── key.frame_timestamp_micros: Timestamp in microseconds (Int64)
└── key.[sensor]_name: Sensor identifier (Int8)

Component Fields:
└── [ComponentType].[field_hierarchy]: Actual data values
    ├── Scalar values: Direct numeric/string data
    ├── List values: Arrays (e.g., transformation matrices)
    └── Nested structures: Complex hierarchical data

Note: The [ComponentType] follows the pattern [SensorType]Component (e.g., CameraImageComponent, LiDARBoxComponent). See the inspector code for detailed field analysis of each component type.

LiDAR IDs (v2.1 five-sensor setup):

ID

Location

Yaw (deg)

Position (m)

1

Roof edge / back-right

+148°

[1.43, 0.0, 2.18]

2

Front bumper

[4.07, 0.0, 0.69]

3

Left side

+90°

[3.25, +1.02, 0.98]

4

Right side

−90°

[3.25, −1.02, 0.98]

5

Rear

180°

[−1.15, 0.0, 0.46]

⚠️ v2.1 no longer includes the 360° Top LiDAR used in early WOD versions.


3️⃣ Coordinate Frames

Understanding coordinate systems is the foundation for correct visualization and data alignment.

Vehicle Frame (Primary Reference Frame)

  • Origin: Vehicle center (geometric center of the ego vehicle)

  • Axes:

    • +X: Forward direction (vehicle’s front)

    • +Y: Left direction (driver’s left)

    • +Z: Upward direction (towards sky)

  • Usage: All 3D bounding boxes and calibration extrinsics are defined in this frame

  • Mathematical Representation: Right-handed coordinate system

\[\begin{split}\mathbf{p}_{\text{vehicle}} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \text{ where } \begin{cases} x > 0 & \text{forward} \\ y > 0 & \text{left} \\ z > 0 & \text{up} \end{cases}\end{split}\]

LiDAR Frame (Sensor-Specific)

  • Origin: Individual LiDAR sensor center

  • Axes: Same orientation as vehicle frame but translated/rotated

    • +X: Sensor’s forward direction

    • +Y: Sensor’s left direction

    • +Z: Sensor’s upward direction

  • Usage: Raw range image data is initially in this frame

  • Transform: Each LiDAR has a unique extrinsic transformation to vehicle frame

Camera Frame (OpenCV Convention)

  • Origin: Camera optical center

  • Axes:

    • +X: Right direction (image columns)

    • +Y: Down direction (image rows)

    • +Z: Forward direction (optical axis, into the scene)

  • Usage: Camera images and 2D bounding boxes

  • Projection: 3D points project to 2D image coordinates via intrinsic matrix

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_c/Z_c \\ Y_c/Z_c \\ 1 \end{bmatrix}\end{split}\]

where \(\mathbf{K}\) is the camera intrinsic matrix and \((X_c, Y_c, Z_c)\) are coordinates in camera frame.


4️⃣ Camera Data Components

Camera Images (camera_image/)

Schema: 15 columns containing RGB image data and comprehensive metadata

Field

Type

Description

index

String

Unique row identifier

key.segment_context_name

String

Segment/sequence identifier

key.frame_timestamp_micros

Int64

Frame timestamp in microseconds

key.camera_name

Int8

Camera ID (0-4 for FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT)

[CameraImageComponent].image

Binary

JPEG/PNG compressed image bytes

[CameraImageComponent].pose.transform

List[Double]

4×4 transformation matrix (16 elements)

[CameraImageComponent].velocity.linear_velocity.{x,y,z}

Double

Linear velocity components (m/s)

[CameraImageComponent].velocity.angular_velocity.{x,y,z}

Double

Angular velocity components (rad/s)

[CameraImageComponent].pose_timestamp

Double

Pose measurement timestamp

[CameraImageComponent].rolling_shutter_params.shutter

Double

Rolling shutter timing parameter

Usage Example:

# Extract image from binary data
image_bytes = row['[CameraImageComponent].image']
pil_image = Image.open(io.BytesIO(image_bytes))

# Extract pose matrix (4x4 transformation)
pose_flat = row['[CameraImageComponent].pose.transform']  # 16 elements
pose_matrix = np.array(pose_flat).reshape(4, 4)

Camera Boxes (camera_box/)

Schema: 12 columns containing 2D bounding box annotations

Field

Type

Description

key.camera_object_id

String

Unique object identifier per camera

[CameraBoxComponent].box.center.{x,y}

Double

Bounding box center coordinates (pixels)

[CameraBoxComponent].box.size.{x,y}

Double

Bounding box dimensions (width, height in pixels)

[CameraBoxComponent].type

Int8

Object class type ID

[CameraBoxComponent].difficulty_level.detection

Int8

Detection difficulty rating (1-5)

[CameraBoxComponent].difficulty_level.tracking

Int8

Tracking difficulty rating (1-5)

Camera Calibration (camera_calibration/)

Schema: 15 columns containing intrinsic and extrinsic calibration parameters

Field

Type

Description

[CameraCalibrationComponent].intrinsic.f_u

Double

Focal length in u direction (pixels)

[CameraCalibrationComponent].intrinsic.f_v

Double

Focal length in v direction (pixels)

[CameraCalibrationComponent].intrinsic.c_u

Double

Principal point u coordinate (pixels)

[CameraCalibrationComponent].intrinsic.c_v

Double

Principal point v coordinate (pixels)

[CameraCalibrationComponent].intrinsic.k1,k2,k3

Double

Radial distortion coefficients

[CameraCalibrationComponent].intrinsic.p1,p2

Double

Tangential distortion coefficients

[CameraCalibrationComponent].extrinsic.transform

List[Double]

4×4 camera-to-vehicle transformation

[CameraCalibrationComponent].width

Int32

Image width in pixels

[CameraCalibrationComponent].height

Int32

Image height in pixels

Intrinsic Matrix Construction: $\(\mathbf{K} = \begin{bmatrix} f_u & 0 & c_u \\ 0 & f_v & c_v \\ 0 & 0 & 1 \end{bmatrix}\)$


5️⃣ LiDAR Data Components

LiDAR Range Images (lidar/)

Schema: 11 columns containing range image data and sensor metadata

Each LiDAR captures a range image instead of a raw point cloud. This is a 2D representation where each pixel encodes distance, intensity, and other measurements.

Field

Type

Description

index

String

Unique row identifier

key.segment_context_name

String

Segment/sequence identifier

key.frame_timestamp_micros

Int64

Frame timestamp in microseconds

key.laser_name

Int8

LiDAR sensor ID (0-4 for TOP, FRONT, SIDE_LEFT, SIDE_RIGHT, REAR)

[LiDARComponent].range_image_return1.range

Binary

First return range data (compressed)

[LiDARComponent].range_image_return1.intensity

Binary

First return intensity data (compressed)

[LiDARComponent].range_image_return1.elongation

Binary

First return elongation data (compressed)

[LiDARComponent].range_image_return2.range

Binary

Second return range data (compressed)

[LiDARComponent].range_image_return2.intensity

Binary

Second return intensity data (compressed)

[LiDARComponent].range_image_return2.elongation

Binary

Second return elongation data (compressed)

[LiDARComponent].camera_projection_exclusion_mask

Binary

Exclusion mask for camera projections

Range Image Structure:

  • Dimensions: Typically H×W where H varies by sensor (64-200 rows), W is azimuth resolution

  • Encoding: Each pixel encodes distance measurement in meters

  • Returns: Two returns per laser beam (first and second reflection)

  • Compression: Data is compressed using Waymo’s proprietary format


6️⃣ LiDAR-to-Vehicle Transform Mathematics

Converting LiDAR range images to 3D point clouds in the vehicle coordinate frame requires several mathematical transformations.

Step 1: Range Image to Spherical Coordinates

Each pixel \((u, v)\) in the range image corresponds to spherical coordinates:

\[\begin{split}\begin{align} \text{azimuth} &= \frac{2\pi \cdot u}{W} - \pi \\ \text{inclination} &= \text{beam\_inclinations}[v] \\ \text{range} &= \text{range\_image}[v, u] \end{align}\end{split}\]

where \(W\) is the azimuth resolution (typically 2650 for Waymo LiDAR).

Step 2: Spherical to Cartesian Conversion (LiDAR Frame)

Convert spherical coordinates to 3D Cartesian coordinates in the LiDAR sensor frame:

\[\begin{split}\begin{align} x_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \cos(\text{azimuth}) \\ y_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \sin(\text{azimuth}) \\ z_{\text{lidar}} &= \text{range} \cdot \sin(\text{inclination}) \end{align}\end{split}\]

Step 3: LiDAR-to-Vehicle Transformation

Apply the extrinsic calibration matrix to transform from LiDAR frame to vehicle frame:

\[\begin{split}\begin{bmatrix} x_{\text{vehicle}} \\ y_{\text{vehicle}} \\ z_{\text{vehicle}} \\ 1 \end{bmatrix} = \mathbf{T}_{\text{lidar→vehicle}} \begin{bmatrix} x_{\text{lidar}} \\ y_{\text{lidar}} \\ z_{\text{lidar}} \\ 1 \end{bmatrix}\end{split}\]

where \(\mathbf{T}_{\text{lidar→vehicle}}\) is the 4×4 transformation matrix from the calibration data:

\[\begin{split}\mathbf{T}_{\text{lidar→vehicle}} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}\end{split}\]

with \(\mathbf{R} \in \mathbb{R}^{3 \times 3}\) being the rotation matrix and \(\mathbf{t} \in \mathbb{R}^{3}\) the translation vector.

Complete Transformation Pipeline

The complete transformation from range image pixel to vehicle coordinates:

\[\mathbf{p}_{\text{vehicle}} = \mathbf{T}_{\text{lidar→vehicle}} \cdot \mathbf{f}_{\text{spherical→cartesian}}(\text{range}[v,u], \text{azimuth}(u), \text{inclination}(v))\]

Implementation Note: The waymo_parquet_inspector.py script provides detailed field analysis for understanding the exact data formats and transformations.

LiDAR Boxes (lidar_box/)

Schema: 18 columns containing 3D bounding box annotations

Field

Type

Description

key.laser_object_id

String

Unique object identifier per LiDAR

[LiDARBoxComponent].box.center.{x,y,z}

Double

3D bounding box center in vehicle frame (meters)

[LiDARBoxComponent].box.size.{x,y,z}

Double

3D bounding box dimensions (length, width, height in meters)

[LiDARBoxComponent].box.heading

Double

Object orientation angle (radians)

[LiDARBoxComponent].type

Int8

Object class type ID

[LiDARBoxComponent].id

String

Persistent object tracking ID

[LiDARBoxComponent].detection_difficulty_level

Int8

Detection difficulty rating (1-5)

[LiDARBoxComponent].tracking_difficulty_level

Int8

Tracking difficulty rating (1-5)

[LiDARBoxComponent].num_lidar_points_in_box

Int32

Number of LiDAR points inside the box

3D Box Representation: $\(\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \theta\} \text{ where } \begin{cases} \mathbf{c} = [c_x, c_y, c_z]^T & \text{center position} \\ \mathbf{s} = [s_x, s_y, s_z]^T & \text{size (L×W×H)} \\ \theta & \text{heading angle} \end{cases}\)$

LiDAR Calibration (lidar_calibration/)

Schema: 8 columns containing sensor calibration parameters

Field

Type

Description

[LiDARCalibrationComponent].extrinsic.transform

List[Double]

4×4 LiDAR-to-vehicle transformation matrix

[LiDARCalibrationComponent].beam_inclinations

List[Double]

Vertical beam angle inclinations (radians)

[LiDARCalibrationComponent].beam_inclination_min

Double

Minimum beam inclination angle

[LiDARCalibrationComponent].beam_inclination_max

Double

Maximum beam inclination angle


7️⃣ Additional Data Components

Projected LiDAR (projected_lidar_labels/)

Schema: 14 columns containing LiDAR points projected onto camera images

Field

Type

Description

[ProjectedLiDARLabelsComponent].box.center.{x,y}

Double

Projected 2D box center (pixels)

[ProjectedLiDARLabelsComponent].box.size.{x,y}

Double

Projected 2D box size (pixels)

[ProjectedLiDARLabelsComponent].type

Int8

Object class type ID

[ProjectedLiDARLabelsComponent].id

String

Object tracking ID

[ProjectedLiDARLabelsComponent].detection_difficulty_level

Int8

Detection difficulty (1-5)

Segmentation Labels (lidar_segmentation/)

Schema: 7 columns containing point-wise semantic segmentation

Field

Type

Description

[LiDARSegmentationLabelComponent].pointcloud_to_image_projection

Binary

Point-to-pixel mapping data

[LiDARSegmentationLabelComponent].segmentation_label

Binary

Per-point semantic labels

[LiDARSegmentationLabelComponent].instance_id_to_global_id_mapping

Binary

Instance ID mappings

Statistics (stats/)

Schema: 9 columns containing frame-level statistics and metadata

Field

Type

Description

[StatsComponent].location

String

Geographic location identifier

[StatsComponent].time_of_day

String

Time period (Dawn, Day, Dusk, Night)

[StatsComponent].weather

String

Weather conditions

[StatsComponent].camera_object_counts

List[Int32]

Object counts per camera

[StatsComponent].lidar_object_counts

List[Int32]

Object counts per LiDAR

Usage for Data Analysis:

# Filter by weather conditions
sunny_frames = df[df['[StatsComponent].weather'] == 'sunny']

# Analyze object distribution
total_objects = df['[StatsComponent].lidar_object_counts'].apply(sum)
---

## 8️⃣ Range Image Processing

Each pixel encodes `(range, intensity, elongation, ...)` data in compressed binary format.

**Typical Range Image Dimensions**:

| Sensor | Shape (H, W, C) | Field of View |
|---------|-----------------|---------------|
| TOP LiDAR (#0) | 64 × 2650 × 4 | 360° horizontal |
| FRONT LiDAR (#1) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_LEFT (#2) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_RIGHT (#3) | 200 × 600 × 4 | ~100° horizontal |
| REAR (#4) | 200 × 600 × 4 | ~100° horizontal |

### Range Image to Point Cloud Conversion

**Step 1**: Decode the compressed range/intensity data from binary format
**Step 2**: Apply coordinate transformations to get 3D points in vehicle frame
**Step 3**: Filter invalid points (range = 0)

---

## 9️⃣ LiDAR Calibration Mathematics

### Extrinsic Transformation Matrix

The extrinsic matrix $\mathbf{T}_{V \leftarrow L}$ transforms points from LiDAR frame to vehicle frame:

$$\mathbf{T}_{V \leftarrow L} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$

**Storage Format**: Row-major order (`order="C"`) in Parquet, stored as 16-element list

**Example Transformation Matrix**:

[[-0.8478, -0.5304, -0.0025, 1.43 ], [ 0.5304, -0.8478, 0.0002, 0.00 ], [-0.0022, -0.0012, 1.0000, 2.184], [ 0.0000, 0.0000, 0.0000, 1.0000]]

→ Rotation yaw ≈ 148°, translation ≈ (1.43, 0.0, 2.18) meters

### Beam Inclination Angles

Vertical angles for each row in the range image, typically distributed linearly between minimum and maximum inclination values.

$$\theta_v = \text{beam\_inclinations}[v] \text{ for row } v \in [0, H-1]$$

---

## 🔟 Complete LiDAR Processing Pipeline

### Mathematical Transformation Steps

For each LiDAR pixel at position $(u, v)$ with range $r$:

**Step 1: Spherical Coordinates**
$$\begin{align}
\phi &= \frac{2\pi \cdot u}{W} - \pi \quad \text{(azimuth angle)} \\
\theta &= \text{beam\_inclinations}[v] \quad \text{(inclination angle)} \\
r &= \text{range\_image}[v, u] \quad \text{(distance in meters)}
\end{align}$$

**Step 2: LiDAR Frame Cartesian Coordinates**
$$\begin{align}
x_L &= r \cos(\theta) \cos(\phi) \\
y_L &= r \cos(\theta) \sin(\phi) \\
z_L &= r \sin(\theta)
\end{align}$$

**Step 3: Homogeneous Coordinates**
$$\mathbf{p}_L = \begin{bmatrix} x_L \\ y_L \\ z_L \\ 1 \end{bmatrix}$$

**Step 4: Transform to Vehicle Frame**
$$\mathbf{p}_V = \mathbf{T}_{V \leftarrow L} \cdot \mathbf{p}_L$$

### Implementation in NumPy

```python
# Create homogeneous coordinate matrix
pts_h = np.stack([x_L, y_L, z_L, np.ones_like(z_L)], axis=-1).reshape(-1, 4)

# Transform to vehicle frame (do NOT invert the matrix)
xyz_vehicle = (pts_h @ extrinsic_matrix.T)[:, :3]

Important: The dataset stores LiDAR→Vehicle transforms directly. Do not invert the matrix.


1️⃣1️⃣ 3D Bounding Box Specifications

Box Parameters in Vehicle Frame

Each 3D bounding box in lidar_box/ is defined by:

Parameter

Field

Description

Center

[LiDARBoxComponent].box.center.{x,y,z}

Box center position (meters)

Size

[LiDARBoxComponent].box.size.{x,y,z}

Length (X), Width (Y), Height (Z)

Heading

[LiDARBoxComponent].box.heading

Yaw angle (radians, CCW from +X axis)

Type

[LiDARBoxComponent].type

Object class (vehicle, pedestrian, cyclist)

Important Note: Box center Z-coordinate represents the object’s geometric center, not the bottom.

3D Box Mathematical Representation

\[\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \psi\}\]

where:

  • \(\mathbf{c} = [c_x, c_y, c_z]^T\) is the center position

  • \(\mathbf{s} = [s_x, s_y, s_z]^T\) is the size vector (length × width × height)

  • \(\psi\) is the heading angle (yaw rotation about Z-axis)


1️⃣2️⃣ Multi-Sensor Data Fusion

Coordinate Alignment Process

Step 1: Decode range images from all 5 LiDAR sensors Step 2: Transform each sensor’s points to vehicle frame using respective extrinsics Step 3: Merge all point clouds into unified coordinate system

# Process each LiDAR sensor
all_points = []
for sensor_id in range(5):  # 0=TOP, 1=FRONT, 2=SIDE_LEFT, 3=SIDE_RIGHT, 4=REAR
    # Extract sensor-specific data
    range_data = decode_range_image(sensor_data[sensor_id])
    extrinsic = get_extrinsic_matrix(sensor_id)
    
    # Transform to vehicle frame
    points_vehicle = transform_to_vehicle_frame(range_data, extrinsic)
    all_points.append(points_vehicle)

# Merge all sensors
merged_pointcloud = np.concatenate(all_points, axis=0)

Result: Both point cloud and 3D boxes are now in the same vehicle coordinate frame and align perfectly.


1️⃣3️⃣ Visualization and Projection

3D Visualization with Open3D

import open3d as o3d
import numpy as np

# Create point cloud visualization
pcd = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(xyz_vehicle))
pcd.paint_uniform_color([0.6, 0.6, 0.6])
geometries = [pcd]

# Add 3D bounding boxes
for box_data in boxes_3d:
    x, y, z = box_data['center']
    dx, dy, dz = box_data['size'] 
    yaw = box_data['heading']
    
    # Create rotation matrix for yaw
    c, s = np.cos(yaw), np.sin(yaw)
    R = np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]], dtype=np.float32)
    
    # Create oriented bounding box
    obb = o3d.geometry.OrientedBoundingBox(
        center=[x, y, z], 
        R=R, 
        extent=[dx, dy, dz]
    )
    obb.color = (1, 0, 0)  # Red color
    geometries.append(obb)

# Add coordinate frame
axis = o3d.geometry.TriangleMesh.create_coordinate_frame(size=5.0)
geometries.append(axis)

# Visualize
o3d.visualization.draw_geometries(geometries)

2D Projection onto Camera Images

Mathematical Projection Pipeline:

  1. Transform 3D points to camera frame: $\(\mathbf{p}_C = \mathbf{T}_{C \leftarrow V} \cdot \mathbf{p}_V\)$

  2. Project to image plane: $\(\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_C/Z_C \\ Y_C/Z_C \\ 1 \end{bmatrix}\)$

  3. Apply distortion correction (if needed): $\(\begin{align} r^2 &= u_n^2 + v_n^2 \\ u_d &= u_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + 2p_1u_nv_n + p_2(r^2 + 2u_n^2) \\ v_d &= v_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + p_1(r^2 + 2v_n^2) + 2p_2u_nv_n \end{align}\)$

def project_3d_to_2d(points_3d, camera_intrinsic, camera_extrinsic):
    """Project 3D points to camera image coordinates"""
    # Transform to camera frame
    vehicle_to_camera = np.linalg.inv(camera_extrinsic)
    points_homogeneous = np.hstack([points_3d, np.ones((len(points_3d), 1))])
    points_camera = (vehicle_to_camera @ points_homogeneous.T).T[:, :3]
    
    # Project to image plane
    points_2d_homogeneous = (camera_intrinsic @ points_camera.T).T
    image_points = points_2d_homogeneous[:, :2] / points_2d_homogeneous[:, 2:3]
    depths = points_camera[:, 2]
    
    return image_points, depths

1️⃣4️⃣ Common Issues and Solutions

Troubleshooting Guide

Problem

Cause

Solution

Box appears “floating” above ground

LiDAR mounted ~2m high, box Z is object center

This is normal behavior

Box appears “in front of” points

Using single LiDAR sensor only

Merge all 5 LiDAR sensors

Point cloud mirrored/flipped

Used np.linalg.inv(extrinsic)

Use extrinsic.T for matrix multiplication

Translation values all zeros

Used order='F' for reshape

Use order='C' (row-major)

Beam angles incorrect

Reused wrong beam inclinations

Read sensor-specific beam ranges

Point cloud appears “warped”

Mixed sensors with wrong extrinsics

Verify yaw angle per LiDAR sensor

Best Practices

Recommended Settings:

  • Use order="C" for array reshaping

  • Apply extrinsic.T for transformations (do not invert)

  • Set flip_rows=True, flip_cols=False for range image processing

  • Use azimuth = np.linspace(np.pi, -np.pi, W) for azimuth calculation

Validation Checks:

  • Verify point cloud and boxes align in 3D visualization

  • Check that merged multi-LiDAR coverage is 360°

  • Ensure camera projections fall within image boundaries

  • Validate coordinate frame orientations match expected directions


1️⃣5️⃣ References and Resources

Official Documentation

Community Tools and Converters

  • OpenCOOD: Multi-modal 3D detection framework with Waymo support

  • OpenMMLab: MMDetection3D parser examples

  • Waymo2KITTI: Format conversion utilities (GitHub community)

Data Analysis Tools

  • Field Inspector: - Comprehensive schema analysis

  • Visualization Scripts: Open3D and Matplotlib integration examples


✨ Summary

Complete Processing Pipeline

Step

Action

Coordinate Frame

1

Decode range image

LiDAR frame

2

Apply extrinsic transform

Vehicle frame

3

Merge all sensors

Vehicle frame

4

Visualize with boxes

Vehicle frame

5

Project to cameras

Camera/Image frame

Key Mathematical Transformations

\[\text{Range Image} \xrightarrow{\text{spherical→cartesian}} \text{LiDAR Frame} \xrightarrow{\mathbf{T}_{V \leftarrow L}} \text{Vehicle Frame} \xrightarrow{\mathbf{T}_{C \leftarrow V}} \text{Camera Frame}\]

When implemented correctly, the merged multi-LiDAR point cloud aligns perfectly with Waymo’s 3D bounding boxes and camera images, enabling robust multi-modal perception and analysis.