# 🧭 Comprehensive Tutorial: Understanding and Visualizing the **Waymo Open Dataset v2** This tutorial provides an **in-depth, practical, and mathematical** explanation of how to interpret, transform, and visualize **Waymo v2.1 LiDAR and 3D box data**. It's written for researchers and developers who want to deeply understand how Waymo structures its data, what coordinate frames are used, and how to correctly align LiDAR, camera, and annotations. > **Code Reference**: This tutorial is complemented by the comprehensive data inspection utilities in [`waymo_parquet_inspector.py`](../../DeepDataMiningLearning/detection3d/waymo_parquet_inspector.py), which provides detailed field analysis and validation tools for all Waymo data components. --- ## Table of Contents 1. [Dataset Overview](#1-dataset-overview) 2. [File Structure & Contents](#2-file-structure--contents) 3. [Coordinate Frames](#3-coordinate-frames) 4. [Camera Data Components](#4-camera-data-components) 5. [LiDAR Data Components](#5-lidar-data-components) 6. [Calibration Data](#6-calibration-data) 7. [LiDAR-to-Vehicle Transform Mathematics](#7-lidar-to-vehicle-transform-mathematics) 8. [3D Box Definitions](#8-3d-box-definitions) 9. [Coordinate Alignment: LiDAR ↔ Box](#9-coordinate-alignment-lidar--box) 10. [2D and 3D Box Visualization](#10-2d-and-3d-box-visualization) 11. [Common Pitfalls & Debug Tips](#11-common-pitfalls--debug-tips) 12. [References](#12-references) --- ## 1️⃣ Dataset Overview ### 📊 Dataset Scale & Format Waymo Open Dataset (WOD) provides synchronized **LiDAR** and **camera** data with ground-truth **3D bounding boxes**. - **2,030 segments** of 20s each, collected at 10Hz (**390,000 frames**) - Diverse geographies and conditions - **Perception object assets** data in a modular format (v2.0.0) - Extracted perception objects from multi-sensor data: all 5 cameras and the top lidar ### 🔧 Sensor Configuration **LiDAR Sensors:** - 1 mid-range lidar - 4 short-range lidars **Camera Sensors:** - 5 cameras (front and sides) **Data Synchronization:** - Synchronized lidar and camera data - Lidar to camera projections - Sensor calibrations and vehicle poses ### 🏷️ Annotation Categories #### **3D Bounding Box Labels** - **4 object classes**: Vehicles, Pedestrians, Cyclists, Signs - **High-quality labels for lidar data**: 1,200 segments - **12.6M 3D bounding box labels** with tracking IDs on lidar data #### **2D Bounding Box Labels** - **High-quality labels for camera data**: 1,000 segments - **11.8M 2D bounding box labels** with tracking IDs on camera data #### **2D Video Panoptic Segmentation** - **Subset**: 100k camera images - **28 classes** including: - **Vehicles**: Car, Bus, Truck, Other Large Vehicle, Trailer, Ego Vehicle, Motorcycle, Bicycle - **People**: Pedestrian, Cyclist, Motorcyclist - **Animals**: Ground Animal, Bird - **Infrastructure**: Pole, Sign, Traffic Light, Construction Cone, Pedestrian Object, Building - **Road Elements**: Road, Sidewalk, Road Marker, Lane Marker - **Environment**: Vegetation, Sky, Ground - **Motion States**: Static, Dynamic - **Instance segmentation** labels for Vehicle, Pedestrian and Cyclist classes - Consistent both across cameras and over time #### **Key Point Labels** - **2 object classes**: Pedestrians and Cyclists - **14 key points** from nose to ankle - **200k object frames** with 2D key point labels - **10k object frames** with 3D key point labels #### **3D Semantic Segmentation** - **Segmentation labels**: 1,150 segments - **23 classes** including: - **Vehicles**: Car, Truck, Bus, Other Vehicle - **People**: Motorcyclist, Bicyclist, Pedestrian - **Objects**: Bicycle, Motorcycle, Sign, Traffic Light, Pole, Construction Cone - **Environment**: Building, Vegetation, Tree Trunk, Curb - **Road Elements**: Road, Lane Marker, Walkable, Sidewalk, Other Ground - **Undefined**: Undefined ### 🗺️ Map Data - **3D road graph data** for each segment - Includes: lane centers, lane boundaries, road boundaries, crosswalks, speed bumps, stop signs, and entrances to driveways ### 🔗 Cross-Modal Associations - **Association of 2D and 3D bounding boxes** - Corresponding object IDs provided for **2 object classes**: Pedestrians and Cyclists ### 🎯 Challenge Data - **3D Camera-Only Detection Challenge**: 80 segments of 20s camera imagery ### 🚀 Advanced Features LiDAR features include: - **3D point cloud sequences** that support 3D object shape reconstruction Camera features include: - Sequences of camera patches from the most_visible_camera - Projected lidar returns on the corresponding camera - Per-pixel camera rays information - Auto-labeled 2D panoptic segmentation that supports **object NeRF reconstruction** The **v2.0.1 Parquet version** is designed for efficient columnar access and includes: | Component | Folder | Description | Schema Columns | |------------|---------|-------------|----------------| | Camera Images | `camera_image/` | RGB frames with pose/velocity metadata | 15 columns with binary JPEG/PNG data | | Camera Boxes | `camera_box/` | 2D bounding boxes in pixel coordinates | 12 columns with detection annotations | | Camera Calibration | `camera_calibration/` | Intrinsics + extrinsics for all cameras | 15 columns with calibration matrices | | LiDAR Range Images | `lidar/` | Raw per-LiDAR sensor range images | 7 columns with flattened range data | | 3D Boxes | `lidar_box/` | Ground-truth boxes in **Vehicle Frame** | 21 columns with 3D object state | | LiDAR Calibration | `lidar_calibration/` | Beam inclinations and extrinsic transforms | 6 columns with sensor parameters | | LiDAR Poses | `lidar_pose/` | Per-pixel transforms LiDAR → Vehicle → World | Pose transformation matrices | | Segmentation | `lidar_segmentation/` | Point-wise semantic labels | Semantic segmentation masks | | Projections | `lidar_camera_projection/` | LiDAR-to-camera pixel mappings | Cross-modal alignment data | --- ## 2️⃣ File Structure & Contents Each **segment** is approximately 20 seconds long, split into multiple Parquet files with standardized naming: ``` segment_id_start_end.parquet Example: 10017090168044687777_6380_000_6400_000.parquet ``` Inside each data folder (e.g., `training/lidar/`), files contain rows corresponding to **sensor measurements** at specific **timestamps**. ### Parquet Schema Structure All Waymo v2.01 data follows a consistent schema pattern: ``` Key Fields (Common across all data types): ├── index: Unique row identifier (String) ├── key.segment_context_name: Segment ID (String) ├── key.frame_timestamp_micros: Timestamp in microseconds (Int64) └── key.[sensor]_name: Sensor identifier (Int8) Component Fields: └── [ComponentType].[field_hierarchy]: Actual data values ├── Scalar values: Direct numeric/string data ├── List values: Arrays (e.g., transformation matrices) └── Nested structures: Complex hierarchical data ``` > **Note**: The `[ComponentType]` follows the pattern `[SensorType]Component` (e.g., `CameraImageComponent`, `LiDARBoxComponent`). See the [inspector code](../../DeepDataMiningLearning/detection3d/waymo_parquet_inspector.py) for detailed field analysis of each component type. **LiDAR IDs (v2.1 five-sensor setup):** | ID | Location | Yaw (deg) | Position (m) | |----|-----------|-----------|--------------| | 1 | Roof edge / back-right | +148° | [1.43, 0.0, 2.18] | | 2 | Front bumper | 0° | [4.07, 0.0, 0.69] | | 3 | Left side | +90° | [3.25, +1.02, 0.98] | | 4 | Right side | −90° | [3.25, −1.02, 0.98] | | 5 | Rear | 180° | [−1.15, 0.0, 0.46] | > ⚠️ v2.1 no longer includes the 360° **Top LiDAR** used in early WOD versions. --- ## 3️⃣ Coordinate Frames Understanding coordinate systems is the foundation for correct visualization and data alignment. ### Vehicle Frame (Primary Reference Frame) - **Origin**: Vehicle center (geometric center of the ego vehicle) - **Axes**: - **+X**: Forward direction (vehicle's front) - **+Y**: Left direction (driver's left) - **+Z**: Upward direction (towards sky) - **Usage**: All 3D bounding boxes and calibration extrinsics are defined in this frame - **Mathematical Representation**: Right-handed coordinate system $$\mathbf{p}_{\text{vehicle}} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \text{ where } \begin{cases} x > 0 & \text{forward} \\ y > 0 & \text{left} \\ z > 0 & \text{up} \end{cases}$$ ### LiDAR Frame (Sensor-Specific) - **Origin**: Individual LiDAR sensor center - **Axes**: Same orientation as vehicle frame but translated/rotated - **+X**: Sensor's forward direction - **+Y**: Sensor's left direction - **+Z**: Sensor's upward direction - **Usage**: Raw range image data is initially in this frame - **Transform**: Each LiDAR has a unique extrinsic transformation to vehicle frame ### Camera Frame (OpenCV Convention) - **Origin**: Camera optical center - **Axes**: - **+X**: Right direction (image columns) - **+Y**: Down direction (image rows) - **+Z**: Forward direction (optical axis, into the scene) - **Usage**: Camera images and 2D bounding boxes - **Projection**: 3D points project to 2D image coordinates via intrinsic matrix $$\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_c/Z_c \\ Y_c/Z_c \\ 1 \end{bmatrix}$$ where $\mathbf{K}$ is the camera intrinsic matrix and $(X_c, Y_c, Z_c)$ are coordinates in camera frame. --- ## 4️⃣ Camera Data Components ### Camera Images (`camera_image/`) **Schema**: 15 columns containing RGB image data and comprehensive metadata | Field | Type | Description | |-------|------|-------------| | `index` | String | Unique row identifier | | `key.segment_context_name` | String | Segment/sequence identifier | | `key.frame_timestamp_micros` | Int64 | Frame timestamp in microseconds | | `key.camera_name` | Int8 | Camera ID (0-4 for FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT) | | `[CameraImageComponent].image` | Binary | JPEG/PNG compressed image bytes | | `[CameraImageComponent].pose.transform` | List[Double] | 4×4 transformation matrix (16 elements) | | `[CameraImageComponent].velocity.linear_velocity.{x,y,z}` | Double | Linear velocity components (m/s) | | `[CameraImageComponent].velocity.angular_velocity.{x,y,z}` | Double | Angular velocity components (rad/s) | | `[CameraImageComponent].pose_timestamp` | Double | Pose measurement timestamp | | `[CameraImageComponent].rolling_shutter_params.shutter` | Double | Rolling shutter timing parameter | **Usage Example**: ```python # Extract image from binary data image_bytes = row['[CameraImageComponent].image'] pil_image = Image.open(io.BytesIO(image_bytes)) # Extract pose matrix (4x4 transformation) pose_flat = row['[CameraImageComponent].pose.transform'] # 16 elements pose_matrix = np.array(pose_flat).reshape(4, 4) ``` ### Camera Boxes (`camera_box/`) **Schema**: 12 columns containing 2D bounding box annotations | Field | Type | Description | |-------|------|-------------| | `key.camera_object_id` | String | Unique object identifier per camera | | `[CameraBoxComponent].box.center.{x,y}` | Double | Bounding box center coordinates (pixels) | | `[CameraBoxComponent].box.size.{x,y}` | Double | Bounding box dimensions (width, height in pixels) | | `[CameraBoxComponent].type` | Int8 | Object class type ID | | `[CameraBoxComponent].difficulty_level.detection` | Int8 | Detection difficulty rating (1-5) | | `[CameraBoxComponent].difficulty_level.tracking` | Int8 | Tracking difficulty rating (1-5) | ### Camera Calibration (`camera_calibration/`) **Schema**: 15 columns containing intrinsic and extrinsic calibration parameters | Field | Type | Description | |-------|------|-------------| | `[CameraCalibrationComponent].intrinsic.f_u` | Double | Focal length in u direction (pixels) | | `[CameraCalibrationComponent].intrinsic.f_v` | Double | Focal length in v direction (pixels) | | `[CameraCalibrationComponent].intrinsic.c_u` | Double | Principal point u coordinate (pixels) | | `[CameraCalibrationComponent].intrinsic.c_v` | Double | Principal point v coordinate (pixels) | | `[CameraCalibrationComponent].intrinsic.k1,k2,k3` | Double | Radial distortion coefficients | | `[CameraCalibrationComponent].intrinsic.p1,p2` | Double | Tangential distortion coefficients | | `[CameraCalibrationComponent].extrinsic.transform` | List[Double] | 4×4 camera-to-vehicle transformation | | `[CameraCalibrationComponent].width` | Int32 | Image width in pixels | | `[CameraCalibrationComponent].height` | Int32 | Image height in pixels | **Intrinsic Matrix Construction**: $$\mathbf{K} = \begin{bmatrix} f_u & 0 & c_u \\ 0 & f_v & c_v \\ 0 & 0 & 1 \end{bmatrix}$$ --- ## 5️⃣ LiDAR Data Components ### LiDAR Range Images (`lidar/`) **Schema**: 11 columns containing range image data and sensor metadata Each LiDAR captures a **range image** instead of a raw point cloud. This is a 2D representation where each pixel encodes distance, intensity, and other measurements. | Field | Type | Description | |-------|------|-------------| | `index` | String | Unique row identifier | | `key.segment_context_name` | String | Segment/sequence identifier | | `key.frame_timestamp_micros` | Int64 | Frame timestamp in microseconds | | `key.laser_name` | Int8 | LiDAR sensor ID (0-4 for TOP, FRONT, SIDE_LEFT, SIDE_RIGHT, REAR) | | `[LiDARComponent].range_image_return1.range` | Binary | First return range data (compressed) | | `[LiDARComponent].range_image_return1.intensity` | Binary | First return intensity data (compressed) | | `[LiDARComponent].range_image_return1.elongation` | Binary | First return elongation data (compressed) | | `[LiDARComponent].range_image_return2.range` | Binary | Second return range data (compressed) | | `[LiDARComponent].range_image_return2.intensity` | Binary | Second return intensity data (compressed) | | `[LiDARComponent].range_image_return2.elongation` | Binary | Second return elongation data (compressed) | | `[LiDARComponent].camera_projection_exclusion_mask` | Binary | Exclusion mask for camera projections | **Range Image Structure**: - **Dimensions**: Typically H×W where H varies by sensor (64-200 rows), W is azimuth resolution - **Encoding**: Each pixel encodes distance measurement in meters - **Returns**: Two returns per laser beam (first and second reflection) - **Compression**: Data is compressed using Waymo's proprietary format --- ## 6️⃣ LiDAR-to-Vehicle Transform Mathematics Converting LiDAR range images to 3D point clouds in the vehicle coordinate frame requires several mathematical transformations. ### Step 1: Range Image to Spherical Coordinates Each pixel $(u, v)$ in the range image corresponds to spherical coordinates: $$\begin{align} \text{azimuth} &= \frac{2\pi \cdot u}{W} - \pi \\ \text{inclination} &= \text{beam\_inclinations}[v] \\ \text{range} &= \text{range\_image}[v, u] \end{align}$$ where $W$ is the azimuth resolution (typically 2650 for Waymo LiDAR). ### Step 2: Spherical to Cartesian Conversion (LiDAR Frame) Convert spherical coordinates to 3D Cartesian coordinates in the LiDAR sensor frame: $$\begin{align} x_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \cos(\text{azimuth}) \\ y_{\text{lidar}} &= \text{range} \cdot \cos(\text{inclination}) \cdot \sin(\text{azimuth}) \\ z_{\text{lidar}} &= \text{range} \cdot \sin(\text{inclination}) \end{align}$$ ### Step 3: LiDAR-to-Vehicle Transformation Apply the extrinsic calibration matrix to transform from LiDAR frame to vehicle frame: $$\begin{bmatrix} x_{\text{vehicle}} \\ y_{\text{vehicle}} \\ z_{\text{vehicle}} \\ 1 \end{bmatrix} = \mathbf{T}_{\text{lidar→vehicle}} \begin{bmatrix} x_{\text{lidar}} \\ y_{\text{lidar}} \\ z_{\text{lidar}} \\ 1 \end{bmatrix}$$ where $\mathbf{T}_{\text{lidar→vehicle}}$ is the 4×4 transformation matrix from the calibration data: $$\mathbf{T}_{\text{lidar→vehicle}} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$ with $\mathbf{R} \in \mathbb{R}^{3 \times 3}$ being the rotation matrix and $\mathbf{t} \in \mathbb{R}^{3}$ the translation vector. ### Complete Transformation Pipeline The complete transformation from range image pixel to vehicle coordinates: $$\mathbf{p}_{\text{vehicle}} = \mathbf{T}_{\text{lidar→vehicle}} \cdot \mathbf{f}_{\text{spherical→cartesian}}(\text{range}[v,u], \text{azimuth}(u), \text{inclination}(v))$$ **Implementation Note**: The waymo_parquet_inspector.py script provides detailed field analysis for understanding the exact data formats and transformations. ### LiDAR Boxes (`lidar_box/`) **Schema**: 18 columns containing 3D bounding box annotations | Field | Type | Description | |-------|------|-------------| | `key.laser_object_id` | String | Unique object identifier per LiDAR | | `[LiDARBoxComponent].box.center.{x,y,z}` | Double | 3D bounding box center in vehicle frame (meters) | | `[LiDARBoxComponent].box.size.{x,y,z}` | Double | 3D bounding box dimensions (length, width, height in meters) | | `[LiDARBoxComponent].box.heading` | Double | Object orientation angle (radians) | | `[LiDARBoxComponent].type` | Int8 | Object class type ID | | `[LiDARBoxComponent].id` | String | Persistent object tracking ID | | `[LiDARBoxComponent].detection_difficulty_level` | Int8 | Detection difficulty rating (1-5) | | `[LiDARBoxComponent].tracking_difficulty_level` | Int8 | Tracking difficulty rating (1-5) | | `[LiDARBoxComponent].num_lidar_points_in_box` | Int32 | Number of LiDAR points inside the box | **3D Box Representation**: $$\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \theta\} \text{ where } \begin{cases} \mathbf{c} = [c_x, c_y, c_z]^T & \text{center position} \\ \mathbf{s} = [s_x, s_y, s_z]^T & \text{size (L×W×H)} \\ \theta & \text{heading angle} \end{cases}$$ ### LiDAR Calibration (`lidar_calibration/`) **Schema**: 8 columns containing sensor calibration parameters | Field | Type | Description | |-------|------|-------------| | `[LiDARCalibrationComponent].extrinsic.transform` | List[Double] | 4×4 LiDAR-to-vehicle transformation matrix | | `[LiDARCalibrationComponent].beam_inclinations` | List[Double] | Vertical beam angle inclinations (radians) | | `[LiDARCalibrationComponent].beam_inclination_min` | Double | Minimum beam inclination angle | | `[LiDARCalibrationComponent].beam_inclination_max` | Double | Maximum beam inclination angle | --- ## 7️⃣ Additional Data Components ### Projected LiDAR (`projected_lidar_labels/`) **Schema**: 14 columns containing LiDAR points projected onto camera images | Field | Type | Description | |-------|------|-------------| | `[ProjectedLiDARLabelsComponent].box.center.{x,y}` | Double | Projected 2D box center (pixels) | | `[ProjectedLiDARLabelsComponent].box.size.{x,y}` | Double | Projected 2D box size (pixels) | | `[ProjectedLiDARLabelsComponent].type` | Int8 | Object class type ID | | `[ProjectedLiDARLabelsComponent].id` | String | Object tracking ID | | `[ProjectedLiDARLabelsComponent].detection_difficulty_level` | Int8 | Detection difficulty (1-5) | ### Segmentation Labels (`lidar_segmentation/`) **Schema**: 7 columns containing point-wise semantic segmentation | Field | Type | Description | |-------|------|-------------| | `[LiDARSegmentationLabelComponent].pointcloud_to_image_projection` | Binary | Point-to-pixel mapping data | | `[LiDARSegmentationLabelComponent].segmentation_label` | Binary | Per-point semantic labels | | `[LiDARSegmentationLabelComponent].instance_id_to_global_id_mapping` | Binary | Instance ID mappings | ### Statistics (`stats/`) **Schema**: 9 columns containing frame-level statistics and metadata | Field | Type | Description | |-------|------|-------------| | `[StatsComponent].location` | String | Geographic location identifier | | `[StatsComponent].time_of_day` | String | Time period (Dawn, Day, Dusk, Night) | | `[StatsComponent].weather` | String | Weather conditions | | `[StatsComponent].camera_object_counts` | List[Int32] | Object counts per camera | | `[StatsComponent].lidar_object_counts` | List[Int32] | Object counts per LiDAR | **Usage for Data Analysis**: ```python # Filter by weather conditions sunny_frames = df[df['[StatsComponent].weather'] == 'sunny'] # Analyze object distribution total_objects = df['[StatsComponent].lidar_object_counts'].apply(sum) --- ## 8️⃣ Range Image Processing Each pixel encodes `(range, intensity, elongation, ...)` data in compressed binary format. **Typical Range Image Dimensions**: | Sensor | Shape (H, W, C) | Field of View | |---------|-----------------|---------------| | TOP LiDAR (#0) | 64 × 2650 × 4 | 360° horizontal | | FRONT LiDAR (#1) | 200 × 600 × 4 | ~100° horizontal | | SIDE_LEFT (#2) | 200 × 600 × 4 | ~100° horizontal | | SIDE_RIGHT (#3) | 200 × 600 × 4 | ~100° horizontal | | REAR (#4) | 200 × 600 × 4 | ~100° horizontal | ### Range Image to Point Cloud Conversion **Step 1**: Decode the compressed range/intensity data from binary format **Step 2**: Apply coordinate transformations to get 3D points in vehicle frame **Step 3**: Filter invalid points (range = 0) --- ## 9️⃣ LiDAR Calibration Mathematics ### Extrinsic Transformation Matrix The extrinsic matrix $\mathbf{T}_{V \leftarrow L}$ transforms points from LiDAR frame to vehicle frame: $$\mathbf{T}_{V \leftarrow L} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$ **Storage Format**: Row-major order (`order="C"`) in Parquet, stored as 16-element list **Example Transformation Matrix**: ``` [[-0.8478, -0.5304, -0.0025, 1.43 ], [ 0.5304, -0.8478, 0.0002, 0.00 ], [-0.0022, -0.0012, 1.0000, 2.184], [ 0.0000, 0.0000, 0.0000, 1.0000]] ``` → Rotation yaw ≈ 148°, translation ≈ (1.43, 0.0, 2.18) meters ### Beam Inclination Angles Vertical angles for each row in the range image, typically distributed linearly between minimum and maximum inclination values. $$\theta_v = \text{beam\_inclinations}[v] \text{ for row } v \in [0, H-1]$$ --- ## 🔟 Complete LiDAR Processing Pipeline ### Mathematical Transformation Steps For each LiDAR pixel at position $(u, v)$ with range $r$: **Step 1: Spherical Coordinates** $$\begin{align} \phi &= \frac{2\pi \cdot u}{W} - \pi \quad \text{(azimuth angle)} \\ \theta &= \text{beam\_inclinations}[v] \quad \text{(inclination angle)} \\ r &= \text{range\_image}[v, u] \quad \text{(distance in meters)} \end{align}$$ **Step 2: LiDAR Frame Cartesian Coordinates** $$\begin{align} x_L &= r \cos(\theta) \cos(\phi) \\ y_L &= r \cos(\theta) \sin(\phi) \\ z_L &= r \sin(\theta) \end{align}$$ **Step 3: Homogeneous Coordinates** $$\mathbf{p}_L = \begin{bmatrix} x_L \\ y_L \\ z_L \\ 1 \end{bmatrix}$$ **Step 4: Transform to Vehicle Frame** $$\mathbf{p}_V = \mathbf{T}_{V \leftarrow L} \cdot \mathbf{p}_L$$ ### Implementation in NumPy ```python # Create homogeneous coordinate matrix pts_h = np.stack([x_L, y_L, z_L, np.ones_like(z_L)], axis=-1).reshape(-1, 4) # Transform to vehicle frame (do NOT invert the matrix) xyz_vehicle = (pts_h @ extrinsic_matrix.T)[:, :3] ``` **Important**: The dataset stores LiDAR→Vehicle transforms directly. Do not invert the matrix. --- ## 1️⃣1️⃣ 3D Bounding Box Specifications ### Box Parameters in Vehicle Frame Each 3D bounding box in `lidar_box/` is defined by: | Parameter | Field | Description | |-----------|-------|-------------| | **Center** | `[LiDARBoxComponent].box.center.{x,y,z}` | Box center position (meters) | | **Size** | `[LiDARBoxComponent].box.size.{x,y,z}` | Length (X), Width (Y), Height (Z) | | **Heading** | `[LiDARBoxComponent].box.heading` | Yaw angle (radians, CCW from +X axis) | | **Type** | `[LiDARBoxComponent].type` | Object class (vehicle, pedestrian, cyclist) | **Important Note**: Box center Z-coordinate represents the object's geometric center, not the bottom. ### 3D Box Mathematical Representation $$\mathbf{Box} = \{\mathbf{c}, \mathbf{s}, \psi\}$$ where: - $\mathbf{c} = [c_x, c_y, c_z]^T$ is the center position - $\mathbf{s} = [s_x, s_y, s_z]^T$ is the size vector (length × width × height) - $\psi$ is the heading angle (yaw rotation about Z-axis) --- ## 1️⃣2️⃣ Multi-Sensor Data Fusion ### Coordinate Alignment Process **Step 1**: Decode range images from all 5 LiDAR sensors **Step 2**: Transform each sensor's points to vehicle frame using respective extrinsics **Step 3**: Merge all point clouds into unified coordinate system ```python # Process each LiDAR sensor all_points = [] for sensor_id in range(5): # 0=TOP, 1=FRONT, 2=SIDE_LEFT, 3=SIDE_RIGHT, 4=REAR # Extract sensor-specific data range_data = decode_range_image(sensor_data[sensor_id]) extrinsic = get_extrinsic_matrix(sensor_id) # Transform to vehicle frame points_vehicle = transform_to_vehicle_frame(range_data, extrinsic) all_points.append(points_vehicle) # Merge all sensors merged_pointcloud = np.concatenate(all_points, axis=0) ``` **Result**: Both point cloud and 3D boxes are now in the same vehicle coordinate frame and align perfectly. --- ## 1️⃣3️⃣ Visualization and Projection ### 3D Visualization with Open3D ```python import open3d as o3d import numpy as np # Create point cloud visualization pcd = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(xyz_vehicle)) pcd.paint_uniform_color([0.6, 0.6, 0.6]) geometries = [pcd] # Add 3D bounding boxes for box_data in boxes_3d: x, y, z = box_data['center'] dx, dy, dz = box_data['size'] yaw = box_data['heading'] # Create rotation matrix for yaw c, s = np.cos(yaw), np.sin(yaw) R = np.array([[c, -s, 0], [s, c, 0], [0, 0, 1]], dtype=np.float32) # Create oriented bounding box obb = o3d.geometry.OrientedBoundingBox( center=[x, y, z], R=R, extent=[dx, dy, dz] ) obb.color = (1, 0, 0) # Red color geometries.append(obb) # Add coordinate frame axis = o3d.geometry.TriangleMesh.create_coordinate_frame(size=5.0) geometries.append(axis) # Visualize o3d.visualization.draw_geometries(geometries) ``` ### 2D Projection onto Camera Images **Mathematical Projection Pipeline**: 1. **Transform 3D points to camera frame**: $$\mathbf{p}_C = \mathbf{T}_{C \leftarrow V} \cdot \mathbf{p}_V$$ 2. **Project to image plane**: $$\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{K} \begin{bmatrix} X_C/Z_C \\ Y_C/Z_C \\ 1 \end{bmatrix}$$ 3. **Apply distortion correction** (if needed): $$\begin{align} r^2 &= u_n^2 + v_n^2 \\ u_d &= u_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + 2p_1u_nv_n + p_2(r^2 + 2u_n^2) \\ v_d &= v_n(1 + k_1r^2 + k_2r^4 + k_3r^6) + p_1(r^2 + 2v_n^2) + 2p_2u_nv_n \end{align}$$ ```python def project_3d_to_2d(points_3d, camera_intrinsic, camera_extrinsic): """Project 3D points to camera image coordinates""" # Transform to camera frame vehicle_to_camera = np.linalg.inv(camera_extrinsic) points_homogeneous = np.hstack([points_3d, np.ones((len(points_3d), 1))]) points_camera = (vehicle_to_camera @ points_homogeneous.T).T[:, :3] # Project to image plane points_2d_homogeneous = (camera_intrinsic @ points_camera.T).T image_points = points_2d_homogeneous[:, :2] / points_2d_homogeneous[:, 2:3] depths = points_camera[:, 2] return image_points, depths ``` --- ## 1️⃣4️⃣ Common Issues and Solutions ### Troubleshooting Guide | **Problem** | **Cause** | **Solution** | |-------------|-----------|--------------| | Box appears "floating" above ground | LiDAR mounted ~2m high, box Z is object center | This is normal behavior | | Box appears "in front of" points | Using single LiDAR sensor only | Merge all 5 LiDAR sensors | | Point cloud mirrored/flipped | Used `np.linalg.inv(extrinsic)` | Use `extrinsic.T` for matrix multiplication | | Translation values all zeros | Used `order='F'` for reshape | Use `order='C'` (row-major) | | Beam angles incorrect | Reused wrong beam inclinations | Read sensor-specific beam ranges | | Point cloud appears "warped" | Mixed sensors with wrong extrinsics | Verify yaw angle per LiDAR sensor | ### Best Practices ✅ **Recommended Settings**: - Use `order="C"` for array reshaping - Apply `extrinsic.T` for transformations (do not invert) - Set `flip_rows=True, flip_cols=False` for range image processing - Use `azimuth = np.linspace(np.pi, -np.pi, W)` for azimuth calculation ✅ **Validation Checks**: - Verify point cloud and boxes align in 3D visualization - Check that merged multi-LiDAR coverage is 360° - Ensure camera projections fall within image boundaries - Validate coordinate frame orientations match expected directions --- ## 1️⃣5️⃣ References and Resources ### Official Documentation - **Waymo Open Dataset Repository**: [https://github.com/waymo-research/waymo-open-dataset](https://github.com/waymo-research/waymo-open-dataset) - **Range Image Utilities**: `range_image_utils.py` in official repo - **Coordinate Conventions**: Waymo Open Dataset Paper, CVPR 2020 ### Community Tools and Converters - **OpenCOOD**: Multi-modal 3D detection framework with Waymo support - **OpenMMLab**: MMDetection3D parser examples - **Waymo2KITTI**: Format conversion utilities (GitHub community) ### Data Analysis Tools - **Field Inspector**: - Comprehensive schema analysis - **Visualization Scripts**: Open3D and Matplotlib integration examples --- ## ✨ Summary ### Complete Processing Pipeline | **Step** | **Action** | **Coordinate Frame** | |----------|------------|---------------------| | 1 | Decode range image | LiDAR frame | | 2 | Apply extrinsic transform | Vehicle frame | | 3 | Merge all sensors | Vehicle frame | | 4 | Visualize with boxes | Vehicle frame | | 5 | Project to cameras | Camera/Image frame | ### Key Mathematical Transformations $$\text{Range Image} \xrightarrow{\text{spherical→cartesian}} \text{LiDAR Frame} \xrightarrow{\mathbf{T}_{V \leftarrow L}} \text{Vehicle Frame} \xrightarrow{\mathbf{T}_{C \leftarrow V}} \text{Camera Frame}$$ When implemented correctly, the merged multi-LiDAR point cloud aligns perfectly with Waymo's 3D bounding boxes and camera images, enabling robust multi-modal perception and analysis.