🧭 Comprehensive Tutorial: Understanding and Visualizing the Waymo Open Dataset v2¶
This tutorial provides an in-depth, practical, and mathematical explanation of how to interpret, transform, and visualize Waymo v2.1 LiDAR and 3D box data.
It’s written for researchers and developers who want to deeply understand how Waymo structures its data, what coordinate frames are used, and how to correctly align LiDAR, camera, and annotations.
Code Reference: This tutorial is complemented by the comprehensive data inspection utilities in
waymo_parquet_inspector.py, which provides detailed field analysis and validation tools for all Waymo data components.
Table of Contents¶
Dataset Overview
File Structure & Contents
Coordinate Frames
Camera Data Components
LiDAR Data Components
Calibration Data
LiDAR-to-Vehicle Transform Mathematics
3D Box Definitions
Coordinate Alignment: LiDAR ↔ Box
2D and 3D Box Visualization
Common Pitfalls & Debug Tips
References
1️⃣ Dataset Overview¶
📊 Dataset Scale & Format¶
Waymo Open Dataset (WOD) provides synchronized LiDAR and camera data with ground-truth 3D bounding boxes.
2,030 segments of 20s each, collected at 10Hz (390,000 frames)
Diverse geographies and conditions
Perception object assets data in a modular format (v2.0.0)
Extracted perception objects from multi-sensor data: all 5 cameras and the top lidar
🔧 Sensor Configuration¶
LiDAR Sensors:
1 mid-range lidar
4 short-range lidars
Camera Sensors:
5 cameras (front and sides)
Data Synchronization:
Synchronized lidar and camera data
Lidar to camera projections
Sensor calibrations and vehicle poses
🏷️ Annotation Categories¶
3D Bounding Box Labels¶
4 object classes: Vehicles, Pedestrians, Cyclists, Signs
High-quality labels for lidar data: 1,200 segments
12.6M 3D bounding box labels with tracking IDs on lidar data
2D Bounding Box Labels¶
High-quality labels for camera data: 1,000 segments
11.8M 2D bounding box labels with tracking IDs on camera data
2D Video Panoptic Segmentation¶
Subset: 100k camera images
28 classes including:
Vehicles: Car, Bus, Truck, Other Large Vehicle, Trailer, Ego Vehicle, Motorcycle, Bicycle
People: Pedestrian, Cyclist, Motorcyclist
Animals: Ground Animal, Bird
Infrastructure: Pole, Sign, Traffic Light, Construction Cone, Pedestrian Object, Building
Road Elements: Road, Sidewalk, Road Marker, Lane Marker
Environment: Vegetation, Sky, Ground
Motion States: Static, Dynamic
Instance segmentation labels for Vehicle, Pedestrian and Cyclist classes
Consistent both across cameras and over time
Key Point Labels¶
2 object classes: Pedestrians and Cyclists
14 key points from nose to ankle
200k object frames with 2D key point labels
10k object frames with 3D key point labels
3D Semantic Segmentation¶
Segmentation labels: 1,150 segments
23 classes including:
Vehicles: Car, Truck, Bus, Other Vehicle
People: Motorcyclist, Bicyclist, Pedestrian
Objects: Bicycle, Motorcycle, Sign, Traffic Light, Pole, Construction Cone
Environment: Building, Vegetation, Tree Trunk, Curb
Road Elements: Road, Lane Marker, Walkable, Sidewalk, Other Ground
Undefined: Undefined
🗺️ Map Data¶
3D road graph data for each segment
Includes: lane centers, lane boundaries, road boundaries, crosswalks, speed bumps, stop signs, and entrances to driveways
🔗 Cross-Modal Associations¶
Association of 2D and 3D bounding boxes
Corresponding object IDs provided for 2 object classes: Pedestrians and Cyclists
🎯 Challenge Data¶
3D Camera-Only Detection Challenge: 80 segments of 20s camera imagery
🚀 Advanced Features¶
LiDAR features include:
3D point cloud sequences that support 3D object shape reconstruction
Camera features include:
Sequences of camera patches from the most_visible_camera
Projected lidar returns on the corresponding camera
Per-pixel camera rays information
Auto-labeled 2D panoptic segmentation that supports object NeRF reconstruction
The v2.0.1 Parquet version is designed for efficient columnar access and includes:
Component |
Folder |
Description |
Schema Columns |
|---|---|---|---|
Camera Images |
|
RGB frames with pose/velocity metadata |
15 columns with binary JPEG/PNG data |
Camera Boxes |
|
2D bounding boxes in pixel coordinates |
12 columns with detection annotations |
Camera Calibration |
|
Intrinsics + extrinsics for all cameras |
15 columns with calibration matrices |
LiDAR Range Images |
|
Raw per-LiDAR sensor range images |
7 columns with flattened range data |
3D Boxes |
|
Ground-truth boxes in Vehicle Frame |
21 columns with 3D object state |
LiDAR Calibration |
|
Beam inclinations and extrinsic transforms |
6 columns with sensor parameters |
LiDAR Poses |
|
Per-pixel transforms LiDAR → Vehicle → World |
Pose transformation matrices |
Segmentation |
|
Point-wise semantic labels |
Semantic segmentation masks |
Projections |
|
LiDAR-to-camera pixel mappings |
Cross-modal alignment data |
2️⃣ File Structure & Contents¶
Each segment is approximately 20 seconds long, split into multiple Parquet files with standardized naming:
segment_id_start_end.parquet
Example: 10017090168044687777_6380_000_6400_000.parquet
Inside each data folder (e.g., training/lidar/), files contain rows corresponding to sensor measurements at specific timestamps.
Parquet Schema Structure¶
All Waymo v2.01 data follows a consistent schema pattern:
Key Fields (Common across all data types):
├── index: Unique row identifier (String)
├── key.segment_context_name: Segment ID (String)
├── key.frame_timestamp_micros: Timestamp in microseconds (Int64)
└── key.[sensor]_name: Sensor identifier (Int8)
Component Fields:
└── [ComponentType].[field_hierarchy]: Actual data values
├── Scalar values: Direct numeric/string data
├── List values: Arrays (e.g., transformation matrices)
└── Nested structures: Complex hierarchical data
Note: The
[ComponentType]follows the pattern[SensorType]Component(e.g.,CameraImageComponent,LiDARBoxComponent). See the inspector code for detailed field analysis of each component type.
LiDAR IDs (v2.1 five-sensor setup):
ID |
Location |
Yaw (deg) |
Position (m) |
|---|---|---|---|
1 |
Roof edge / back-right |
+148° |
[1.43, 0.0, 2.18] |
2 |
Front bumper |
0° |
[4.07, 0.0, 0.69] |
3 |
Left side |
+90° |
[3.25, +1.02, 0.98] |
4 |
Right side |
−90° |
[3.25, −1.02, 0.98] |
5 |
Rear |
180° |
[−1.15, 0.0, 0.46] |
⚠️ v2.1 no longer includes the 360° Top LiDAR used in early WOD versions.
3️⃣ Coordinate Frames¶
Understanding coordinate systems is the foundation for correct visualization and data alignment.
Vehicle Frame (Primary Reference Frame)¶
Origin: Vehicle center (geometric center of the ego vehicle)
Axes:
+X: Forward direction (vehicle’s front)
+Y: Left direction (driver’s left)
+Z: Upward direction (towards sky)
Usage: All 3D bounding boxes and calibration extrinsics are defined in this frame
Mathematical Representation: Right-handed coordinate system
LiDAR Frame (Sensor-Specific)¶
Origin: Individual LiDAR sensor center
Axes: Same orientation as vehicle frame but translated/rotated
+X: Sensor’s forward direction
+Y: Sensor’s left direction
+Z: Sensor’s upward direction
Usage: Raw range image data is initially in this frame
Transform: Each LiDAR has a unique extrinsic transformation to vehicle frame
Camera Frame (OpenCV Convention)¶
Origin: Camera optical center
Axes:
+X: Right direction (image columns)
+Y: Down direction (image rows)
+Z: Forward direction (optical axis, into the scene)
Usage: Camera images and 2D bounding boxes
Projection: 3D points project to 2D image coordinates via intrinsic matrix
where \(\mathbf{K}\) is the camera intrinsic matrix and \((X_c, Y_c, Z_c)\) are coordinates in camera frame.
4️⃣ Camera Data Components¶
Camera Images (camera_image/)¶
Schema: 15 columns containing RGB image data and comprehensive metadata
Field |
Type |
Description |
|---|---|---|
|
String |
Unique row identifier |
|
String |
Segment/sequence identifier |
|
Int64 |
Frame timestamp in microseconds |
|
Int8 |
Camera ID (0-4 for FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT) |
|
Binary |
JPEG/PNG compressed image bytes |
|
List[Double] |
4×4 transformation matrix (16 elements) |
|
Double |
Linear velocity components (m/s) |
|
Double |
Angular velocity components (rad/s) |
|
Double |
Pose measurement timestamp |
|
Double |
Rolling shutter timing parameter |
Usage Example:
# Extract image from binary data
image_bytes = row['[CameraImageComponent].image']
pil_image = Image.open(io.BytesIO(image_bytes))
# Extract pose matrix (4x4 transformation)
pose_flat = row['[CameraImageComponent].pose.transform'] # 16 elements
pose_matrix = np.array(pose_flat).reshape(4, 4)
Camera Boxes (camera_box/)¶
Schema: 12 columns containing 2D bounding box annotations
Field |
Type |
Description |
|---|---|---|
|
String |
Unique object identifier per camera |
|
Double |
Bounding box center coordinates (pixels) |
|
Double |
Bounding box dimensions (width, height in pixels) |
|
Int8 |
Object class type ID |
|
Int8 |
Detection difficulty rating (1-5) |
|
Int8 |
Tracking difficulty rating (1-5) |
Camera Calibration (camera_calibration/)¶
Schema: 15 columns containing intrinsic and extrinsic calibration parameters
Field |
Type |
Description |
|---|---|---|
|
Double |
Focal length in u direction (pixels) |
|
Double |
Focal length in v direction (pixels) |
|
Double |
Principal point u coordinate (pixels) |
|
Double |
Principal point v coordinate (pixels) |
|
Double |
Radial distortion coefficients |
|
Double |
Tangential distortion coefficients |
|
List[Double] |
4×4 camera-to-vehicle transformation |
|
Int32 |
Image width in pixels |
|
Int32 |
Image height in pixels |
Intrinsic Matrix Construction: $\(\mathbf{K} = \begin{bmatrix} f_u & 0 & c_u \\ 0 & f_v & c_v \\ 0 & 0 & 1 \end{bmatrix}\)$
5️⃣ LiDAR Data Components¶
LiDAR Range Images (lidar/)¶
Schema: 11 columns containing range image data and sensor metadata
Each LiDAR captures a range image instead of a raw point cloud. This is a 2D representation where each pixel encodes distance, intensity, and other measurements.
Field |
Type |
Description |
|---|---|---|
|
String |
Unique row identifier |
|
String |
Segment/sequence identifier |
|
Int64 |
Frame timestamp in microseconds |
|
Int8 |
LiDAR sensor ID (0-4 for TOP, FRONT, SIDE_LEFT, SIDE_RIGHT, REAR) |
|
Binary |
First return range data (compressed) |
|
Binary |
First return intensity data (compressed) |
|
Binary |
First return elongation data (compressed) |
|
Binary |
Second return range data (compressed) |
|
Binary |
Second return intensity data (compressed) |
|
Binary |
Second return elongation data (compressed) |
|
Binary |
Exclusion mask for camera projections |
Range Image Structure:
Dimensions: Typically H×W where H varies by sensor (64-200 rows), W is azimuth resolution
Encoding: Each pixel encodes distance measurement in meters
Returns: Two returns per laser beam (first and second reflection)
Compression: Data is compressed using Waymo’s proprietary format
6️⃣ LiDAR-to-Vehicle Transform Mathematics¶
Converting LiDAR range images to 3D point clouds in the vehicle coordinate frame requires several mathematical transformations.
Step 1: Range Image to Spherical Coordinates¶
Each pixel \((u, v)\) in the range image corresponds to spherical coordinates:
where \(W\) is the azimuth resolution (typically 2650 for Waymo LiDAR).
Step 2: Spherical to Cartesian Conversion (LiDAR Frame)¶
Convert spherical coordinates to 3D Cartesian coordinates in the LiDAR sensor frame:
Step 3: LiDAR-to-Vehicle Transformation¶
Apply the extrinsic calibration matrix to transform from LiDAR frame to vehicle frame:
where \(\mathbf{T}_{\text{lidar→vehicle}}\) is the 4×4 transformation matrix from the calibration data:
with \(\mathbf{R} \in \mathbb{R}^{3 \times 3}\) being the rotation matrix and \(\mathbf{t} \in \mathbb{R}^{3}\) the translation vector.
Complete Transformation Pipeline¶
The complete transformation from range image pixel to vehicle coordinates:
Implementation Note: The waymo_parquet_inspector.py script provides detailed field analysis for understanding the exact data formats and transformations.
LiDAR Boxes (lidar_box/)¶
Schema: 18 columns containing 3D bounding box annotations
Field |
Type |
Description |
|---|---|---|
|
String |
Unique object identifier per LiDAR |
|
Double |
3D bounding box center in vehicle frame (meters) |
|
Double |
3D bounding box dimensions (length, width, height in meters) |
|
Double |
Object orientation angle (radians) |
|
Int8 |
Object class type ID |
|
String |
Persistent object tracking ID |
|
Int8 |
Detection difficulty rating (1-5) |
|
Int8 |
Tracking difficulty rating (1-5) |
|
Int32 |
Number of LiDAR points inside the box |
3D Box Representation: $\(\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \theta\} \text{ where } \begin{cases} \mathbf{c} = [c_x, c_y, c_z]^T & \text{center position} \\ \mathbf{s} = [s_x, s_y, s_z]^T & \text{size (L×W×H)} \\ \theta & \text{heading angle} \end{cases}\)$
LiDAR Calibration (lidar_calibration/)¶
Schema: 8 columns containing sensor calibration parameters
Field |
Type |
Description |
|---|---|---|
|
List[Double] |
4×4 LiDAR-to-vehicle transformation matrix |
|
List[Double] |
Vertical beam angle inclinations (radians) |
|
Double |
Minimum beam inclination angle |
|
Double |
Maximum beam inclination angle |
7️⃣ Additional Data Components¶
Projected LiDAR (projected_lidar_labels/)¶
Schema: 14 columns containing LiDAR points projected onto camera images
Field |
Type |
Description |
|---|---|---|
|
Double |
Projected 2D box center (pixels) |
|
Double |
Projected 2D box size (pixels) |
|
Int8 |
Object class type ID |
|
String |
Object tracking ID |
|
Int8 |
Detection difficulty (1-5) |
Segmentation Labels (lidar_segmentation/)¶
Schema: 7 columns containing point-wise semantic segmentation
Field |
Type |
Description |
|---|---|---|
|
Binary |
Point-to-pixel mapping data |
|
Binary |
Per-point semantic labels |
|
Binary |
Instance ID mappings |
Statistics (stats/)¶
Schema: 9 columns containing frame-level statistics and metadata
Field |
Type |
Description |
|---|---|---|
|
String |
Geographic location identifier |
|
String |
Time period (Dawn, Day, Dusk, Night) |
|
String |
Weather conditions |
|
List[Int32] |
Object counts per camera |
|
List[Int32] |
Object counts per LiDAR |
Usage for Data Analysis:
# Filter by weather conditions
sunny_frames = df[df['[StatsComponent].weather'] == 'sunny']
# Analyze object distribution
total_objects = df['[StatsComponent].lidar_object_counts'].apply(sum)
---
## 8️⃣ Range Image Processing
Each pixel encodes `(range, intensity, elongation, ...)` data in compressed binary format.
**Typical Range Image Dimensions**:
| Sensor | Shape (H, W, C) | Field of View |
|---------|-----------------|---------------|
| TOP LiDAR (#0) | 64 × 2650 × 4 | 360° horizontal |
| FRONT LiDAR (#1) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_LEFT (#2) | 200 × 600 × 4 | ~100° horizontal |
| SIDE_RIGHT (#3) | 200 × 600 × 4 | ~100° horizontal |
| REAR (#4) | 200 × 600 × 4 | ~100° horizontal |
### Range Image to Point Cloud Conversion
**Step 1**: Decode the compressed range/intensity data from binary format
**Step 2**: Apply coordinate transformations to get 3D points in vehicle frame
**Step 3**: Filter invalid points (range = 0)
---
## 9️⃣ LiDAR Calibration Mathematics
### Extrinsic Transformation Matrix
The extrinsic matrix $\mathbf{T}_{V \leftarrow L}$ transforms points from LiDAR frame to vehicle frame:
$$\mathbf{T}_{V \leftarrow L} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$
**Storage Format**: Row-major order (`order="C"`) in Parquet, stored as 16-element list
**Example Transformation Matrix**:
[[-0.8478, -0.5304, -0.0025, 1.43 ], [ 0.5304, -0.8478, 0.0002, 0.00 ], [-0.0022, -0.0012, 1.0000, 2.184], [ 0.0000, 0.0000, 0.0000, 1.0000]]
→ Rotation yaw ≈ 148°, translation ≈ (1.43, 0.0, 2.18) meters
### Beam Inclination Angles
Vertical angles for each row in the range image, typically distributed linearly between minimum and maximum inclination values.
$$\theta_v = \text{beam\_inclinations}[v] \text{ for row } v \in [0, H-1]$$
---
## 🔟 Complete LiDAR Processing Pipeline
### Mathematical Transformation Steps
For each LiDAR pixel at position $(u, v)$ with range $r$:
**Step 1: Spherical Coordinates**
$$\begin{align}
\phi &= \frac{2\pi \cdot u}{W} - \pi \quad \text{(azimuth angle)} \\
\theta &= \text{beam\_inclinations}[v] \quad \text{(inclination angle)} \\
r &= \text{range\_image}[v, u] \quad \text{(distance in meters)}
\end{align}$$
**Step 2: LiDAR Frame Cartesian Coordinates**
$$\begin{align}
x_L &= r \cos(\theta) \cos(\phi) \\
y_L &= r \cos(\theta) \sin(\phi) \\
z_L &= r \sin(\theta)
\end{align}$$
**Step 3: Homogeneous Coordinates**
$$\mathbf{p}_L = \begin{bmatrix} x_L \\ y_L \\ z_L \\ 1 \end{bmatrix}$$
**Step 4: Transform to Vehicle Frame**
$$\mathbf{p}_V = \mathbf{T}_{V \leftarrow L} \cdot \mathbf{p}_L$$
### Implementation in NumPy
```python
# Create homogeneous coordinate matrix
pts_h = np.stack([x_L, y_L, z_L, np.ones_like(z_L)], axis=-1).reshape(-1, 4)
# Transform to vehicle frame (do NOT invert the matrix)
xyz_vehicle = (pts_h @ extrinsic_matrix.T)[:, :3]
Important: The dataset stores LiDAR→Vehicle transforms directly. Do not invert the matrix.
1️⃣1️⃣ 3D Bounding Box Specifications¶
Box Parameters in Vehicle Frame¶
Each 3D bounding box in lidar_box/ is defined by:
Parameter |
Field |
Description |
|---|---|---|
Center |
|
Box center position (meters) |
Size |
|
Length (X), Width (Y), Height (Z) |
Heading |
|
Yaw angle (radians, CCW from +X axis) |
Type |
|
Object class (vehicle, pedestrian, cyclist) |
Important Note: Box center Z-coordinate represents the object’s geometric center, not the bottom.
3D Box Mathematical Representation¶
where:
\(\mathbf{c} = [c_x, c_y, c_z]^T\) is the center position
\(\mathbf{s} = [s_x, s_y, s_z]^T\) is the size vector (length × width × height)
\(\psi\) is the heading angle (yaw rotation about Z-axis)
1️⃣2️⃣ Multi-Sensor Data Fusion¶
Coordinate Alignment Process¶
Step 1: Decode range images from all 5 LiDAR sensors Step 2: Transform each sensor’s points to vehicle frame using respective extrinsics Step 3: Merge all point clouds into unified coordinate system
# Process each LiDAR sensor
all_points = []
for sensor_id in range(5): # 0=TOP, 1=FRONT, 2=SIDE_LEFT, 3=SIDE_RIGHT, 4=REAR
# Extract sensor-specific data
range_data = decode_range_image(sensor_data[sensor_id])
extrinsic = get_extrinsic_matrix(sensor_id)
# Transform to vehicle frame
points_vehicle = transform_to_vehicle_frame(range_data, extrinsic)
all_points.append(points_vehicle)
# Merge all sensors
merged_pointcloud = np.concatenate(all_points, axis=0)
Result: Both point cloud and 3D boxes are now in the same vehicle coordinate frame and align perfectly.
1️⃣3️⃣ Visualization and Projection¶
3D Visualization with Open3D¶
import open3d as o3d
import numpy as np
# Create point cloud visualization
pcd = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(xyz_vehicle))
pcd.paint_uniform_color([0.6, 0.6, 0.6])
geometries = [pcd]
# Add 3D bounding boxes
for box_data in boxes_3d:
x, y, z = box_data['center']
dx, dy, dz = box_data['size']
yaw = box_data['heading']
# Create rotation matrix for yaw
c, s = np.cos(yaw), np.sin(yaw)
R = np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]], dtype=np.float32)
# Create oriented bounding box
obb = o3d.geometry.OrientedBoundingBox(
center=[x, y, z],
R=R,
extent=[dx, dy, dz]
)
obb.color = (1, 0, 0) # Red color
geometries.append(obb)
# Add coordinate frame
axis = o3d.geometry.TriangleMesh.create_coordinate_frame(size=5.0)
geometries.append(axis)
# Visualize
o3d.visualization.draw_geometries(geometries)
2D Projection onto Camera Images¶
Mathematical Projection Pipeline:
Transform 3D points to camera frame: $\(\mathbf{p}_C = \mathbf{T}_{C \leftarrow V} \cdot \mathbf{p}_V\)$
Project to image plane: $\(\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_C/Z_C \\ Y_C/Z_C \\ 1 \end{bmatrix}\)$
Apply distortion correction (if needed): $\(\begin{align} r^2 &= u_n^2 + v_n^2 \\ u_d &= u_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + 2p_1u_nv_n + p_2(r^2 + 2u_n^2) \\ v_d &= v_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + p_1(r^2 + 2v_n^2) + 2p_2u_nv_n \end{align}\)$
def project_3d_to_2d(points_3d, camera_intrinsic, camera_extrinsic):
"""Project 3D points to camera image coordinates"""
# Transform to camera frame
vehicle_to_camera = np.linalg.inv(camera_extrinsic)
points_homogeneous = np.hstack([points_3d, np.ones((len(points_3d), 1))])
points_camera = (vehicle_to_camera @ points_homogeneous.T).T[:, :3]
# Project to image plane
points_2d_homogeneous = (camera_intrinsic @ points_camera.T).T
image_points = points_2d_homogeneous[:, :2] / points_2d_homogeneous[:, 2:3]
depths = points_camera[:, 2]
return image_points, depths
1️⃣4️⃣ Common Issues and Solutions¶
Troubleshooting Guide¶
Problem |
Cause |
Solution |
|---|---|---|
Box appears “floating” above ground |
LiDAR mounted ~2m high, box Z is object center |
This is normal behavior |
Box appears “in front of” points |
Using single LiDAR sensor only |
Merge all 5 LiDAR sensors |
Point cloud mirrored/flipped |
Used |
Use |
Translation values all zeros |
Used |
Use |
Beam angles incorrect |
Reused wrong beam inclinations |
Read sensor-specific beam ranges |
Point cloud appears “warped” |
Mixed sensors with wrong extrinsics |
Verify yaw angle per LiDAR sensor |
Best Practices¶
✅ Recommended Settings:
Use
order="C"for array reshapingApply
extrinsic.Tfor transformations (do not invert)Set
flip_rows=True, flip_cols=Falsefor range image processingUse
azimuth = np.linspace(np.pi, -np.pi, W)for azimuth calculation
✅ Validation Checks:
Verify point cloud and boxes align in 3D visualization
Check that merged multi-LiDAR coverage is 360°
Ensure camera projections fall within image boundaries
Validate coordinate frame orientations match expected directions
1️⃣5️⃣ References and Resources¶
Official Documentation¶
Waymo Open Dataset Repository: https://github.com/waymo-research/waymo-open-dataset
Range Image Utilities:
range_image_utils.pyin official repoCoordinate Conventions: Waymo Open Dataset Paper, CVPR 2020
Community Tools and Converters¶
OpenCOOD: Multi-modal 3D detection framework with Waymo support
OpenMMLab: MMDetection3D parser examples
Waymo2KITTI: Format conversion utilities (GitHub community)
Data Analysis Tools¶
Field Inspector:
- Comprehensive schema analysis Visualization Scripts: Open3D and Matplotlib integration examples
✨ Summary¶
Complete Processing Pipeline¶
Step |
Action |
Coordinate Frame |
|---|---|---|
1 |
Decode range image |
LiDAR frame |
2 |
Apply extrinsic transform |
Vehicle frame |
3 |
Merge all sensors |
Vehicle frame |
4 |
Visualize with boxes |
Vehicle frame |
5 |
Project to cameras |
Camera/Image frame |
Key Mathematical Transformations¶
When implemented correctly, the merged multi-LiDAR point cloud aligns perfectly with Waymo’s 3D bounding boxes and camera images, enabling robust multi-modal perception and analysis.