🧭 Comprehensive Tutorial: Understanding and Visualizing the Waymo Open Dataset v2¶

This tutorial provides an in-depth, practical, and mathematical explanation of how to interpret, transform, and visualize Waymo v2.1 LiDAR and 3D box data.
It’s written for researchers and developers who want to deeply understand how Waymo structures its data, what coordinate frames are used, and how to correctly align LiDAR, camera, and annotations.

Code Reference: This tutorial is complemented by the comprehensive data inspection utilities in waymo_parquet_inspector.py, which provides detailed field analysis and validation tools for all Waymo data components.

Table of Contents¶

Dataset Overview
File Structure & Contents
Coordinate Frames
Camera Data Components
LiDAR Data Components
Calibration Data
LiDAR-to-Vehicle Transform Mathematics
3D Box Definitions
Coordinate Alignment: LiDAR ↔ Box
2D and 3D Box Visualization
Common Pitfalls & Debug Tips
References

1️⃣ Dataset Overview¶

📊 Dataset Scale & Format¶

Waymo Open Dataset (WOD) provides synchronized LiDAR and camera data with ground-truth 3D bounding boxes.

2,030 segments of 20s each, collected at 10Hz (390,000 frames)
Diverse geographies and conditions
Perception object assets data in a modular format (v2.0.0)
Extracted perception objects from multi-sensor data: all 5 cameras and the top lidar

🔧 Sensor Configuration¶

LiDAR Sensors:

1 mid-range lidar
4 short-range lidars

Camera Sensors:

5 cameras (front and sides)

Data Synchronization:

Synchronized lidar and camera data
Lidar to camera projections
Sensor calibrations and vehicle poses

🏷️ Annotation Categories¶

3D Bounding Box Labels¶

4 object classes: Vehicles, Pedestrians, Cyclists, Signs
High-quality labels for lidar data: 1,200 segments
12.6M 3D bounding box labels with tracking IDs on lidar data

2D Bounding Box Labels¶

High-quality labels for camera data: 1,000 segments
11.8M 2D bounding box labels with tracking IDs on camera data

2D Video Panoptic Segmentation¶

Subset: 100k camera images
28 classes including:
- Vehicles: Car, Bus, Truck, Other Large Vehicle, Trailer, Ego Vehicle, Motorcycle, Bicycle
- People: Pedestrian, Cyclist, Motorcyclist
- Animals: Ground Animal, Bird
- Infrastructure: Pole, Sign, Traffic Light, Construction Cone, Pedestrian Object, Building
- Road Elements: Road, Sidewalk, Road Marker, Lane Marker
- Environment: Vegetation, Sky, Ground
- Motion States: Static, Dynamic
Instance segmentation labels for Vehicle, Pedestrian and Cyclist classes
Consistent both across cameras and over time

Key Point Labels¶

2 object classes: Pedestrians and Cyclists
14 key points from nose to ankle
200k object frames with 2D key point labels
10k object frames with 3D key point labels

3D Semantic Segmentation¶

Segmentation labels: 1,150 segments
23 classes including:
- Vehicles: Car, Truck, Bus, Other Vehicle
- People: Motorcyclist, Bicyclist, Pedestrian
- Objects: Bicycle, Motorcycle, Sign, Traffic Light, Pole, Construction Cone
- Environment: Building, Vegetation, Tree Trunk, Curb
- Road Elements: Road, Lane Marker, Walkable, Sidewalk, Other Ground
- Undefined: Undefined

🗺️ Map Data¶

3D road graph data for each segment
Includes: lane centers, lane boundaries, road boundaries, crosswalks, speed bumps, stop signs, and entrances to driveways

🎯 Challenge Data¶

3D Camera-Only Detection Challenge: 80 segments of 20s camera imagery

🚀 Advanced Features¶

LiDAR features include:

3D point cloud sequences that support 3D object shape reconstruction

Camera features include:

Sequences of camera patches from the most_visible_camera
Projected lidar returns on the corresponding camera
Per-pixel camera rays information
Auto-labeled 2D panoptic segmentation that supports object NeRF reconstruction

The v2.0.1 Parquet version is designed for efficient columnar access and includes:

Component	Folder	Description	Schema Columns
Camera Images	`camera_image/`	RGB frames with pose/velocity metadata	15 columns with binary JPEG/PNG data
Camera Boxes	`camera_box/`	2D bounding boxes in pixel coordinates	12 columns with detection annotations
Camera Calibration	`camera_calibration/`	Intrinsics + extrinsics for all cameras	15 columns with calibration matrices
LiDAR Range Images	`lidar/`	Raw per-LiDAR sensor range images	7 columns with flattened range data
3D Boxes	`lidar_box/`	Ground-truth boxes in Vehicle Frame	21 columns with 3D object state
LiDAR Calibration	`lidar_calibration/`	Beam inclinations and extrinsic transforms	6 columns with sensor parameters
LiDAR Poses	`lidar_pose/`	Per-pixel transforms LiDAR → Vehicle → World	Pose transformation matrices
Segmentation	`lidar_segmentation/`	Point-wise semantic labels	Semantic segmentation masks
Projections	`lidar_camera_projection/`	LiDAR-to-camera pixel mappings	Cross-modal alignment data

2️⃣ File Structure & Contents¶

Each segment is approximately 20 seconds long, split into multiple Parquet files with standardized naming:

segment_id_start_end.parquet
Example: 10017090168044687777_6380_000_6400_000.parquet

Inside each data folder (e.g., training/lidar/), files contain rows corresponding to sensor measurements at specific timestamps.

Parquet Schema Structure¶

All Waymo v2.01 data follows a consistent schema pattern:

Key Fields (Common across all data types):
├── index: Unique row identifier (String)
├── key.segment_context_name: Segment ID (String)  
├── key.frame_timestamp_micros: Timestamp in microseconds (Int64)
└── key.[sensor]_name: Sensor identifier (Int8)

Component Fields:
└── [ComponentType].[field_hierarchy]: Actual data values
    ├── Scalar values: Direct numeric/string data
    ├── List values: Arrays (e.g., transformation matrices)
    └── Nested structures: Complex hierarchical data

Note: The [ComponentType] follows the pattern [SensorType]Component (e.g., CameraImageComponent, LiDARBoxComponent). See the inspector code for detailed field analysis of each component type.

LiDAR IDs (v2.1 five-sensor setup):

ID	Location	Yaw (deg)	Position (m)
1	Roof edge / back-right	+148°	[1.43, 0.0, 2.18]
2	Front bumper	0°	[4.07, 0.0, 0.69]
3	Left side	+90°	[3.25, +1.02, 0.98]
4	Right side	−90°	[3.25, −1.02, 0.98]
5	Rear	180°	[−1.15, 0.0, 0.46]

⚠️ v2.1 no longer includes the 360° Top LiDAR used in early WOD versions.

3️⃣ Coordinate Frames¶

Understanding coordinate systems is the foundation for correct visualization and data alignment.

Vehicle Frame (Primary Reference Frame)¶

Origin: Vehicle center (geometric center of the ego vehicle)
Axes:
- +X: Forward direction (vehicle’s front)
- +Y: Left direction (driver’s left)
- +Z: Upward direction (towards sky)
Usage: All 3D bounding boxes and calibration extrinsics are defined in this frame
Mathematical Representation: Right-handed coordinate system

\[\begin{split}\mathbf{p}_{\text{vehicle}} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \text{ where } \begin{cases} x > 0 & \text{forward} \\ y > 0 & \text{left} \\ z > 0 & \text{up} \end{cases}\end{split}\]

LiDAR Frame (Sensor-Specific)¶

Origin: Individual LiDAR sensor center
Axes: Same orientation as vehicle frame but translated/rotated
- +X: Sensor’s forward direction
- +Y: Sensor’s left direction
- +Z: Sensor’s upward direction
Usage: Raw range image data is initially in this frame
Transform: Each LiDAR has a unique extrinsic transformation to vehicle frame

Camera Frame (OpenCV Convention)¶

Origin: Camera optical center
Axes:
- +X: Right direction (image columns)
- +Y: Down direction (image rows)
- +Z: Forward direction (optical axis, into the scene)
Usage: Camera images and 2D bounding boxes
Projection: 3D points project to 2D image coordinates via intrinsic matrix

\[\begin{split}\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_c/Z_c \\ Y_c/Z_c \\ 1 \end{bmatrix}\end{split}\]

where $\mathbf{K}$ is the camera intrinsic matrix and $(X_c, Y_c, Z_c)$ are coordinates in camera frame.

4️⃣ Camera Data Components¶

Camera Images (`camera_image/`)¶

Schema: 15 columns containing RGB image data and comprehensive metadata

Field	Type	Description
`index`	String	Unique row identifier
`key.segment_context_name`	String	Segment/sequence identifier
`key.frame_timestamp_micros`	Int64	Frame timestamp in microseconds
`key.camera_name`	Int8	Camera ID (0-4 for FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT)
`[CameraImageComponent].image`	Binary	JPEG/PNG compressed image bytes
`[CameraImageComponent].pose.transform`	List[Double]	4×4 transformation matrix (16 elements)
`[CameraImageComponent].velocity.linear_velocity.{x,y,z}`	Double	Linear velocity components (m/s)
`[CameraImageComponent].velocity.angular_velocity.{x,y,z}`	Double	Angular velocity components (rad/s)
`[CameraImageComponent].pose_timestamp`	Double	Pose measurement timestamp
`[CameraImageComponent].rolling_shutter_params.shutter`	Double	Rolling shutter timing parameter

Usage Example:

# Extract image from binary data
image_bytes = row['[CameraImageComponent].image']
pil_image = Image.open(io.BytesIO(image_bytes))

# Extract pose matrix (4x4 transformation)
pose_flat = row['[CameraImageComponent].pose.transform']  # 16 elements
pose_matrix = np.array(pose_flat).reshape(4, 4)

Camera Boxes (`camera_box/`)¶

Schema: 12 columns containing 2D bounding box annotations

Field	Type	Description
`key.camera_object_id`	String	Unique object identifier per camera
`[CameraBoxComponent].box.center.{x,y}`	Double	Bounding box center coordinates (pixels)
`[CameraBoxComponent].box.size.{x,y}`	Double	Bounding box dimensions (width, height in pixels)
`[CameraBoxComponent].type`	Int8	Object class type ID
`[CameraBoxComponent].difficulty_level.detection`	Int8	Detection difficulty rating (1-5)
`[CameraBoxComponent].difficulty_level.tracking`	Int8	Tracking difficulty rating (1-5)

Camera Calibration (`camera_calibration/`)¶

Schema: 15 columns containing intrinsic and extrinsic calibration parameters

Field	Type	Description
`[CameraCalibrationComponent].intrinsic.f_u`	Double	Focal length in u direction (pixels)
`[CameraCalibrationComponent].intrinsic.f_v`	Double	Focal length in v direction (pixels)
`[CameraCalibrationComponent].intrinsic.c_u`	Double	Principal point u coordinate (pixels)
`[CameraCalibrationComponent].intrinsic.c_v`	Double	Principal point v coordinate (pixels)
`[CameraCalibrationComponent].intrinsic.k1,k2,k3`	Double	Radial distortion coefficients
`[CameraCalibrationComponent].intrinsic.p1,p2`	Double	Tangential distortion coefficients
`[CameraCalibrationComponent].extrinsic.transform`	List[Double]	4×4 camera-to-vehicle transformation
`[CameraCalibrationComponent].width`	Int32	Image width in pixels
`[CameraCalibrationComponent].height`	Int32	Image height in pixels

Intrinsic Matrix Construction: $$\mathbf{K} = \begin{bmatrix} f_u & 0 & c_u \\ 0 & f_v & c_v \\ 0 & 0 & 1 \end{bmatrix}$$

5️⃣ LiDAR Data Components¶

LiDAR Range Images (`lidar/`)¶

Schema: 11 columns containing range image data and sensor metadata

Each LiDAR captures a range image instead of a raw point cloud. This is a 2D representation where each pixel encodes distance, intensity, and other measurements.

Field	Type	Description
`index`	String	Unique row identifier
`key.segment_context_name`	String	Segment/sequence identifier
`key.frame_timestamp_micros`	Int64	Frame timestamp in microseconds
`key.laser_name`	Int8	LiDAR sensor ID (0-4 for TOP, FRONT, SIDE_LEFT, SIDE_RIGHT, REAR)
`[LiDARComponent].range_image_return1.range`	Binary	First return range data (compressed)
`[LiDARComponent].range_image_return1.intensity`	Binary	First return intensity data (compressed)
`[LiDARComponent].range_image_return1.elongation`	Binary	First return elongation data (compressed)
`[LiDARComponent].range_image_return2.range`	Binary	Second return range data (compressed)
`[LiDARComponent].range_image_return2.intensity`	Binary	Second return intensity data (compressed)
`[LiDARComponent].range_image_return2.elongation`	Binary	Second return elongation data (compressed)
`[LiDARComponent].camera_projection_exclusion_mask`	Binary	Exclusion mask for camera projections

Range Image Structure:

Dimensions: Typically H×W where H varies by sensor (64-200 rows), W is azimuth resolution
Encoding: Each pixel encodes distance measurement in meters
Returns: Two returns per laser beam (first and second reflection)
Compression: Data is compressed using Waymo’s proprietary format

6️⃣ LiDAR-to-Vehicle Transform Mathematics¶

Converting LiDAR range images to 3D point clouds in the vehicle coordinate frame requires several mathematical transformations.

Step 1: Range Image to Spherical Coordinates¶

Each pixel $(u, v)$ in the range image corresponds to spherical coordinates:

\[\begin{split}\begin{align} \text{azimuth} &= \frac{2\pi \cdot u}{W} - \pi \\ \text{inclination} &= \text{beam\_inclinations}[v] \\ \text{range} &= \text{range\_image}[v, u] \end{align}\end{split}\]

where $W$ is the azimuth resolution (typically 2650 for Waymo LiDAR).

Step 2: Spherical to Cartesian Conversion (LiDAR Frame)¶

Convert spherical coordinates to 3D Cartesian coordinates in the LiDAR sensor frame:

\[\begin{split}\begin{align} x_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \cos(\text{azimuth}) \\ y_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \sin(\text{azimuth}) \\ z_{\text{lidar}} &= \text{range} \cdot \sin(\text{inclination}) \end{align}\end{split}\]

Step 3: LiDAR-to-Vehicle Transformation¶

Apply the extrinsic calibration matrix to transform from LiDAR frame to vehicle frame:

\[\begin{split}\begin{bmatrix} x_{\text{vehicle}} \\ y_{\text{vehicle}} \\ z_{\text{vehicle}} \\ 1 \end{bmatrix} = \mathbf{T}_{\text{lidar→vehicle}} \begin{bmatrix} x_{\text{lidar}} \\ y_{\text{lidar}} \\ z_{\text{lidar}} \\ 1 \end{bmatrix}\end{split}\]

where $\mathbf{T}_{\text{lidar→vehicle}}$ is the 4×4 transformation matrix from the calibration data:

\[\begin{split}\mathbf{T}_{\text{lidar→vehicle}} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}\end{split}\]

with $\mathbf{R} \in \mathbb{R}^{3 \times 3}$ being the rotation matrix and $\mathbf{t} \in \mathbb{R}^{3}$ the translation vector.

Complete Transformation Pipeline¶

The complete transformation from range image pixel to vehicle coordinates:

\[\mathbf{p}_{\text{vehicle}} = \mathbf{T}_{\text{lidar→vehicle}} \cdot \mathbf{f}_{\text{spherical→cartesian}}(\text{range}[v,u], \text{azimuth}(u), \text{inclination}(v))\]

Implementation Note: The waymo_parquet_inspector.py script provides detailed field analysis for understanding the exact data formats and transformations.

LiDAR Boxes (`lidar_box/`)¶

Schema: 18 columns containing 3D bounding box annotations

Field	Type	Description
`key.laser_object_id`	String	Unique object identifier per LiDAR
`[LiDARBoxComponent].box.center.{x,y,z}`	Double	3D bounding box center in vehicle frame (meters)
`[LiDARBoxComponent].box.size.{x,y,z}`	Double	3D bounding box dimensions (length, width, height in meters)
`[LiDARBoxComponent].box.heading`	Double	Object orientation angle (radians)
`[LiDARBoxComponent].type`	Int8	Object class type ID
`[LiDARBoxComponent].id`	String	Persistent object tracking ID
`[LiDARBoxComponent].detection_difficulty_level`	Int8	Detection difficulty rating (1-5)
`[LiDARBoxComponent].tracking_difficulty_level`	Int8	Tracking difficulty rating (1-5)
`[LiDARBoxComponent].num_lidar_points_in_box`	Int32	Number of LiDAR points inside the box

3D Box Representation: $$\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \theta\} \text{ where } \begin{cases} \mathbf{c} = [c_x, c_y, c_z]^T & \text{center position} \\ \mathbf{s} = [s_x, s_y, s_z]^T & \text{size (L×W×H)} \\ \theta & \text{heading angle} \end{cases}$$

LiDAR Calibration (`lidar_calibration/`)¶

Schema: 8 columns containing sensor calibration parameters

Field	Type	Description
`[LiDARCalibrationComponent].extrinsic.transform`	List[Double]	4×4 LiDAR-to-vehicle transformation matrix
`[LiDARCalibrationComponent].beam_inclinations`	List[Double]	Vertical beam angle inclinations (radians)
`[LiDARCalibrationComponent].beam_inclination_min`	Double	Minimum beam inclination angle
`[LiDARCalibrationComponent].beam_inclination_max`	Double	Maximum beam inclination angle

7️⃣ Additional Data Components¶

Projected LiDAR (`projected_lidar_labels/`)¶

Schema: 14 columns containing LiDAR points projected onto camera images

Field	Type	Description
`[ProjectedLiDARLabelsComponent].box.center.{x,y}`	Double	Projected 2D box center (pixels)
`[ProjectedLiDARLabelsComponent].box.size.{x,y}`	Double	Projected 2D box size (pixels)
`[ProjectedLiDARLabelsComponent].type`	Int8	Object class type ID
`[ProjectedLiDARLabelsComponent].id`	String	Object tracking ID
`[ProjectedLiDARLabelsComponent].detection_difficulty_level`	Int8	Detection difficulty (1-5)

Segmentation Labels (`lidar_segmentation/`)¶

Schema: 7 columns containing point-wise semantic segmentation

Field	Type	Description
`[LiDARSegmentationLabelComponent].pointcloud_to_image_projection`	Binary	Point-to-pixel mapping data
`[LiDARSegmentationLabelComponent].segmentation_label`	Binary	Per-point semantic labels
`[LiDARSegmentationLabelComponent].instance_id_to_global_id_mapping`	Binary	Instance ID mappings

Statistics (`stats/`)¶

Schema: 9 columns containing frame-level statistics and metadata

Field	Type	Description
`[StatsComponent].location`	String	Geographic location identifier
`[StatsComponent].time_of_day`	String	Time period (Dawn, Day, Dusk, Night)
`[StatsComponent].weather`	String	Weather conditions
`[StatsComponent].camera_object_counts`	List[Int32]	Object counts per camera
`[StatsComponent].lidar_object_counts`	List[Int32]	Object counts per LiDAR

Usage for Data Analysis:

# Filter by weather conditions
sunny_frames = df[df['[StatsComponent].weather'] == 'sunny']

# Analyze object distribution
total_objects = df['[StatsComponent].lidar_object_counts'].apply(sum)
---

## 8️⃣ Range Image Processing

Each pixel encodes `(range, intensity, elongation, ...)` data in compressed binary format.

**Typical Range Image Dimensions**:

| Sensor | Shape (H, W, C) | Field of View |
|---------|-----------------|---------------|
| TOP LiDAR (#0) | 64 × 2650 × 4 | 360° horizontal |
| FRONT LiDAR (#1) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_LEFT (#2) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_RIGHT (#3) | 200 × 600 × 4 | ~100° horizontal |
| REAR (#4) | 200 × 600 × 4 | ~100° horizontal |

### Range Image to Point Cloud Conversion

**Step 1**: Decode the compressed range/intensity data from binary format
**Step 2**: Apply coordinate transformations to get 3D points in vehicle frame
**Step 3**: Filter invalid points (range = 0)

---

## 9️⃣ LiDAR Calibration Mathematics

### Extrinsic Transformation Matrix

The extrinsic matrix $\mathbf{T}_{V \leftarrow L}$ transforms points from LiDAR frame to vehicle frame:

$$\mathbf{T}_{V \leftarrow L} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$

**Storage Format**: Row-major order (`order="C"`) in Parquet, stored as 16-element list

**Example Transformation Matrix**:

[[-0.8478, -0.5304, -0.0025, 1.43 ], [ 0.5304, -0.8478, 0.0002, 0.00 ], [-0.0022, -0.0012, 1.0000, 2.184], [ 0.0000, 0.0000, 0.0000, 1.0000]]

→ Rotation yaw ≈ 148°, translation ≈ (1.43, 0.0, 2.18) meters

### Beam Inclination Angles

Vertical angles for each row in the range image, typically distributed linearly between minimum and maximum inclination values.

$$\theta_v = \text{beam\_inclinations}[v] \text{ for row } v \in [0, H-1]$$

---

## 🔟 Complete LiDAR Processing Pipeline

### Mathematical Transformation Steps

For each LiDAR pixel at position $(u, v)$ with range $r$:

**Step 1: Spherical Coordinates**
$$\begin{align}
\phi &= \frac{2\pi \cdot u}{W} - \pi \quad \text{(azimuth angle)} \\
\theta &= \text{beam\_inclinations}[v] \quad \text{(inclination angle)} \\
r &= \text{range\_image}[v, u] \quad \text{(distance in meters)}
\end{align}$$

**Step 2: LiDAR Frame Cartesian Coordinates**
$$\begin{align}
x_L &= r \cos(\theta) \cos(\phi) \\
y_L &= r \cos(\theta) \sin(\phi) \\
z_L &= r \sin(\theta)
\end{align}$$

**Step 3: Homogeneous Coordinates**
$$\mathbf{p}_L = \begin{bmatrix} x_L \\ y_L \\ z_L \\ 1 \end{bmatrix}$$

**Step 4: Transform to Vehicle Frame**
$$\mathbf{p}_V = \mathbf{T}_{V \leftarrow L} \cdot \mathbf{p}_L$$

### Implementation in NumPy

```python
# Create homogeneous coordinate matrix
pts_h = np.stack([x_L, y_L, z_L, np.ones_like(z_L)], axis=-1).reshape(-1, 4)

# Transform to vehicle frame (do NOT invert the matrix)
xyz_vehicle = (pts_h @ extrinsic_matrix.T)[:, :3]

Important: The dataset stores LiDAR→Vehicle transforms directly. Do not invert the matrix.

1️⃣1️⃣ 3D Bounding Box Specifications¶

Box Parameters in Vehicle Frame¶

Each 3D bounding box in lidar_box/ is defined by:

Parameter	Field	Description
Center	`[LiDARBoxComponent].box.center.{x,y,z}`	Box center position (meters)
Size	`[LiDARBoxComponent].box.size.{x,y,z}`	Length (X), Width (Y), Height (Z)
Heading	`[LiDARBoxComponent].box.heading`	Yaw angle (radians, CCW from +X axis)
Type	`[LiDARBoxComponent].type`	Object class (vehicle, pedestrian, cyclist)

Important Note: Box center Z-coordinate represents the object’s geometric center, not the bottom.

3D Box Mathematical Representation¶

\[\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \psi\}\]

where:

$\mathbf{c} = [c_x, c_y, c_z]^T$ is the center position
$\mathbf{s} = [s_x, s_y, s_z]^T$ is the size vector (length × width × height)
$\psi$ is the heading angle (yaw rotation about Z-axis)

1️⃣2️⃣ Multi-Sensor Data Fusion¶

Coordinate Alignment Process¶

Step 1: Decode range images from all 5 LiDAR sensors Step 2: Transform each sensor’s points to vehicle frame using respective extrinsics Step 3: Merge all point clouds into unified coordinate system

# Process each LiDAR sensor
all_points = []
for sensor_id in range(5):  # 0=TOP, 1=FRONT, 2=SIDE_LEFT, 3=SIDE_RIGHT, 4=REAR
    # Extract sensor-specific data
    range_data = decode_range_image(sensor_data[sensor_id])
    extrinsic = get_extrinsic_matrix(sensor_id)
    
    # Transform to vehicle frame
    points_vehicle = transform_to_vehicle_frame(range_data, extrinsic)
    all_points.append(points_vehicle)

# Merge all sensors
merged_pointcloud = np.concatenate(all_points, axis=0)

Result: Both point cloud and 3D boxes are now in the same vehicle coordinate frame and align perfectly.

1️⃣3️⃣ Visualization and Projection¶

3D Visualization with Open3D¶

import open3d as o3d
import numpy as np

# Create point cloud visualization
pcd = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(xyz_vehicle))
pcd.paint_uniform_color([0.6, 0.6, 0.6])
geometries = [pcd]

# Add 3D bounding boxes
for box_data in boxes_3d:
    x, y, z = box_data['center']
    dx, dy, dz = box_data['size'] 
    yaw = box_data['heading']
    
    # Create rotation matrix for yaw
    c, s = np.cos(yaw), np.sin(yaw)
    R = np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]], dtype=np.float32)
    
    # Create oriented bounding box
    obb = o3d.geometry.OrientedBoundingBox(
        center=[x, y, z], 
        R=R, 
        extent=[dx, dy, dz]
    )
    obb.color = (1, 0, 0)  # Red color
    geometries.append(obb)

# Add coordinate frame
axis = o3d.geometry.TriangleMesh.create_coordinate_frame(size=5.0)
geometries.append(axis)

# Visualize
o3d.visualization.draw_geometries(geometries)

2D Projection onto Camera Images¶

Mathematical Projection Pipeline:

Transform 3D points to camera frame: $$\mathbf{p}_C = \mathbf{T}_{C \leftarrow V} \cdot \mathbf{p}_V$$
Project to image plane: $$\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_C/Z_C \\ Y_C/Z_C \\ 1 \end{bmatrix}$$
Apply distortion correction (if needed): $$\begin{align} r^2 &= u_n^2 + v_n^2 \\ u_d &= u_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + 2p_1u_nv_n + p_2(r^2 + 2u_n^2) \\ v_d &= v_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + p_1(r^2 + 2v_n^2) + 2p_2u_nv_n \end{align}$$

def project_3d_to_2d(points_3d, camera_intrinsic, camera_extrinsic):
    """Project 3D points to camera image coordinates"""
    # Transform to camera frame
    vehicle_to_camera = np.linalg.inv(camera_extrinsic)
    points_homogeneous = np.hstack([points_3d, np.ones((len(points_3d), 1))])
    points_camera = (vehicle_to_camera @ points_homogeneous.T).T[:, :3]
    
    # Project to image plane
    points_2d_homogeneous = (camera_intrinsic @ points_camera.T).T
    image_points = points_2d_homogeneous[:, :2] / points_2d_homogeneous[:, 2:3]
    depths = points_camera[:, 2]
    
    return image_points, depths

1️⃣4️⃣ Common Issues and Solutions¶

Troubleshooting Guide¶

Problem	Cause	Solution
Box appears “floating” above ground	LiDAR mounted ~2m high, box Z is object center	This is normal behavior
Box appears “in front of” points	Using single LiDAR sensor only	Merge all 5 LiDAR sensors
Point cloud mirrored/flipped	Used `np.linalg.inv(extrinsic)`	Use `extrinsic.T` for matrix multiplication
Translation values all zeros	Used `order='F'` for reshape	Use `order='C'` (row-major)
Beam angles incorrect	Reused wrong beam inclinations	Read sensor-specific beam ranges
Point cloud appears “warped”	Mixed sensors with wrong extrinsics	Verify yaw angle per LiDAR sensor

Best Practices¶

✅ Recommended Settings:

Use order="C" for array reshaping
Apply extrinsic.T for transformations (do not invert)
Set flip_rows=True, flip_cols=False for range image processing
Use azimuth = np.linspace(np.pi, -np.pi, W) for azimuth calculation

✅ Validation Checks:

Verify point cloud and boxes align in 3D visualization
Check that merged multi-LiDAR coverage is 360°
Ensure camera projections fall within image boundaries
Validate coordinate frame orientations match expected directions

1️⃣5️⃣ References and Resources¶

Official Documentation¶

Waymo Open Dataset Repository: https://github.com/waymo-research/waymo-open-dataset
Range Image Utilities: range_image_utils.py in official repo
Coordinate Conventions: Waymo Open Dataset Paper, CVPR 2020

Community Tools and Converters¶

OpenCOOD: Multi-modal 3D detection framework with Waymo support
OpenMMLab: MMDetection3D parser examples
Waymo2KITTI: Format conversion utilities (GitHub community)

Data Analysis Tools¶

Field Inspector: - Comprehensive schema analysis
Visualization Scripts: Open3D and Matplotlib integration examples

✨ Summary¶

Complete Processing Pipeline¶

Step	Action	Coordinate Frame
1	Decode range image	LiDAR frame
2	Apply extrinsic transform	Vehicle frame
3	Merge all sensors	Vehicle frame
4	Visualize with boxes	Vehicle frame
5	Project to cameras	Camera/Image frame

Key Mathematical Transformations¶

\[\text{Range Image} \xrightarrow{\text{spherical→cartesian}} \text{LiDAR Frame} \xrightarrow{\mathbf{T}_{V \leftarrow L}} \text{Vehicle Frame} \xrightarrow{\mathbf{T}_{C \leftarrow V}} \text{Camera Frame}\]

When implemented correctly, the merged multi-LiDAR point cloud aligns perfectly with Waymo’s 3D bounding boxes and camera images, enabling robust multi-modal perception and analysis.