KITTI Dataset Tutorial¶
Table of Contents¶
3D Bounding Box Processing
Common Issues and Solutions
Best Practices
Introduction¶
The KITTI dataset is one of the most influential autonomous driving datasets, providing synchronized camera images, LiDAR point clouds, and GPS/IMU data. This tutorial focuses on understanding and implementing the coordinate system transformations, 3D object detection, and visualization techniques using the comprehensive KITTI toolkit.
This tutorial covers:
Dataset Structure: Understanding KITTI’s file organization and data formats
Coordinate Systems: Camera, LiDAR, and object coordinate systems
3D Object Processing: Loading, transforming, and visualizing 3D bounding boxes
Visualization Tools: 2D, 3D, LiDAR, and Bird’s Eye View (BEV) visualization
Dataset Management: Downloading, extracting, and validating KITTI data
Dataset Structure¶
KITTI organizes data into training and testing splits with synchronized sensor data:
KITTI/
├── training/
│ ├── image_2/ # Left color camera images
│ ├── image_3/ # Right color camera images
│ ├── velodyne/ # LiDAR point clouds (.bin files)
│ ├── label_2/ # 3D object annotations (.txt files)
│ └── calib/ # Calibration matrices (.txt files)
├── testing/
│ ├── image_2/ # Test images (no labels)
│ ├── image_3/ # Right test images
│ ├── velodyne/ # Test LiDAR data
│ └── calib/ # Test calibration data
└── ImageSets/
├── train.txt # Training sample indices
├── val.txt # Validation sample indices
└── test.txt # Test sample indices
File Naming Convention¶
All files use a consistent 6-digit zero-padded naming scheme:
Images:
000000.png,000001.png, …,007480.pngLiDAR:
000000.bin,000001.bin, …,007480.binLabels:
000000.txt,000001.txt, …,007480.txtCalibration:
000000.txt,000001.txt, …,007480.txt
Annotation Format¶
3D Object Annotations (label_2/*.txt)¶
Each line in a label file represents one 3D object with 15 space-separated values:
type truncated occluded alpha bbox_2d_left bbox_2d_top bbox_2d_right bbox_2d_bottom height width length x y z rotation_y
Detailed Field Description:
Field |
Type |
Description |
Range/Units |
|---|---|---|---|
|
string |
Object class |
‘Car’, ‘Van’, ‘Truck’, ‘Pedestrian’, ‘Person_sitting’, ‘Cyclist’, ‘Tram’, ‘Misc’, ‘DontCare’ |
|
float |
Truncation level |
0 (non-truncated) to 1 (fully truncated) |
|
int |
Occlusion state |
0 (fully visible), 1 (partly occluded), 2 (largely occluded), 3 (unknown) |
|
float |
Observation angle |
-π to π radians |
|
float×4 |
2D bounding box |
[left, top, right, bottom] in pixels |
|
float×3 |
3D object dimensions |
[height, width, length] in meters |
|
float×3 |
3D object center |
[x, y, z] in camera coordinates (meters) |
|
float |
Rotation around Y-axis |
-π to π radians |
Example Annotation:
Car 0.00 0 -1.57 599.41 156.40 629.75 189.25 1.73 1.87 4.60 1.84 1.47 8.41 -1.56
This represents:
A Car that is not truncated (0.00) or occluded (0)
Observation angle α = -1.57 radians
2D bbox: [599.41, 156.40, 629.75, 189.25] pixels
3D dimensions: height=1.73m, width=1.87m, length=4.60m
3D center: x=1.84m, y=1.47m, z=8.41m (camera coordinates)
Y-axis rotation: -1.56 radians
Calibration Data (calib/*.txt)¶
Calibration files contain transformation matrices between coordinate systems:
P0: 7.215377e+02 0.000000e+00 6.095593e+01 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P1: 7.215377e+02 0.000000e+00 6.095593e+01 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P2: 7.215377e+02 0.000000e+00 6.095593e+01 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
P3: 7.215377e+02 0.000000e+00 6.095593e+01 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
R0_rect: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
Tr_velo_to_cam: 7.533745e-03 -9.999714e-01 -6.166020e-04 -4.069766e-03 1.480249e-02 7.280733e-04 -9.998902e-01 -7.631618e-02 9.998621e-01 7.523790e-03 1.480755e-02 -2.717806e-01
Tr_imu_to_velo: 9.999976e-01 7.553071e-04 -2.035826e-03 -8.086759e-01 -7.854027e-04 9.998898e-01 -1.482298e-02 3.195559e-01 2.024406e-03 1.482454e-02 9.998881e-01 -7.997231e-01
Matrix Descriptions:
Matrix |
Size |
Description |
|---|---|---|
|
3×4 |
Projection matrices for cameras 0-3 |
|
3×3 |
Rectification matrix for camera 0 |
|
3×4 |
Transformation from LiDAR to camera coordinates |
|
3×4 |
Transformation from IMU to LiDAR coordinates |
LiDAR Point Cloud Data (velodyne/*.bin)¶
LiDAR data is stored as binary files with each point containing 4 float32 values:
x, y, z: 3D coordinates in LiDAR coordinate system (meters)intensity: Reflectance value (0-255)
# Loading LiDAR data
import numpy as np
def load_lidar_data(file_path):
"""Load LiDAR point cloud from binary file"""
points = np.fromfile(file_path, dtype=np.float32).reshape(-1, 4)
return points[:, :3], points[:, 3] # coordinates, intensities
Coordinate Systems¶
KITTI uses multiple coordinate systems that require careful transformation:
1. Camera Coordinate System¶
Origin: Camera center
X-axis: Right (positive to the right in image)
Y-axis: Down (positive downward in image)
Z-axis: Forward (positive into the scene)
Units: Meters
2. LiDAR Coordinate System (Velodyne)¶
Origin: LiDAR sensor center
X-axis: Forward (vehicle driving direction)
Y-axis: Left (positive to the left of vehicle)
Z-axis: Up (positive upward)
Units: Meters
3. Object Coordinate System¶
Origin: Object center (bottom center for vehicles)
Dimensions: Height (Y), Width (X), Length (Z)
Rotation: Around Y-axis (yaw angle)
Coordinate System Relationships¶
IMU → LiDAR → Camera → Image
│ │ │ │
│ │ │ └── 2D pixel coordinates
│ │ └────────── 3D camera coordinates
│ └─────────────────── 3D LiDAR coordinates
└────────────────────────── 3D IMU coordinates
Coordinate Transformations¶
Transformation Pipeline¶
The complete transformation from LiDAR to image coordinates involves several steps:
def lidar_to_camera_transform(points_lidar, calib):
"""Transform points from LiDAR to camera coordinates"""
# Step 1: LiDAR → Camera (unrectified)
# Apply Tr_velo_to_cam transformation
points_cam_unrect = calib.Tr_velo_to_cam @ np.vstack([points_lidar.T, np.ones((1, points_lidar.shape[0]))])
# Step 2: Apply rectification
# Multiply by R0_rect to get rectified camera coordinates
points_cam_rect = calib.R0_rect @ points_cam_unrect[:3, :]
return points_cam_rect.T
def camera_to_image_projection(points_cam, calib):
"""Project 3D camera points to 2D image coordinates"""
# Apply projection matrix P2 (left color camera)
points_2d_hom = calib.P2 @ np.vstack([points_cam.T, np.ones((1, points_cam.shape[0]))])
# Normalize homogeneous coordinates
points_2d = points_2d_hom[:2, :] / points_2d_hom[2, :]
return points_2d.T
3D Bounding Box Corner Generation¶
KITTI 3D bounding boxes are defined by center, dimensions, and rotation. The 8 corners are generated by the
Computes 8 corner points of a 3D bounding box from KITTI object parameters
Handles coordinate transformations between object-local, camera, and LiDAR coordinate systems
Follows KITTI’s standard corner ordering convention
Supports optional transformation to LiDAR coordinates using calibration data
Corner Ordering Convention¶
KITTI uses a specific ordering for the 8 corners of 3D bounding boxes:
4 -------- 5
/| /|
7 -------- 6 .
| | | |
. 0 -------- 1
|/ |/
3 -------- 2
Bottom face: 0,1,2,3 (y = -h/2)
Top face: 4,5,6,7 (y = +h/2)
Corner Indices:
0: Bottom-front-left
1: Bottom-front-right
2: Bottom-rear-right
3: Bottom-rear-left
4: Top-front-left
5: Top-front-right
6: Top-rear-right
7: Top-rear-left
3D Bounding Box Processing¶
Object3d Class Implementation¶
The Object3d class encapsulates KITTI 3D object annotations and is implemented in
Parses KITTI annotation lines into structured object data
Stores 2D and 3D bounding box parameters
Provides methods for coordinate transformations
Handles object type, visibility, and geometric properties
Key attributes:
type: Object category (Car, Pedestrian, Cyclist, etc.)truncation,occlusion: Visibility indicatorsalpha: Observation anglexmin,ymin,xmax,ymax: 2D bounding box coordinatesh,w,l: 3D dimensions (height, width, length)t: 3D center location in camera coordinatesry: Rotation around Y-axis (yaw angle)
Calibration Class Implementation¶
The calibration class handles coordinate transformations and is implemented through the
Loads calibration matrices from KITTI calibration files
Handles coordinate transformations between different reference frames
Supports both KITTI and WaymoKITTI calibration formats
Provides projection matrices for camera-to-image transformations
Manages LiDAR-to-camera coordinate conversions
Key transformation matrices:
P0,P1,P2,P3: Camera projection matrices (3x4)R0_rect: Rectification matrix (3x3)Tr_velo_to_cam: LiDAR to camera transformation (3x4)Tr_imu_to_velo: IMU to LiDAR transformation (3x4)
Visualization Features¶
The KITTI toolkit provides comprehensive visualization capabilities implemented in
1. 2D Image Visualization¶
Basic 2D Bounding Box Visualization¶
The 2D bounding box visualization is handled by the
Displays images with 2D bounding boxes overlaid
Supports multiple camera views simultaneously
Color-codes different object types
Handles object filtering and visibility checks
3D Bounding Box Projection to 2D¶
The 3D-to-2D projection visualization is implemented through the
Projects 3D bounding boxes onto 2D images
Handles coordinate transformations from 3D to 2D space
Draws wireframe representations of 3D boxes
Filters objects based on visibility and distance
2. 3D Point Cloud Visualization¶
LiDAR Point Cloud with 3D Bounding Boxes¶
The 3D LiDAR visualization is implemented through the
Renders LiDAR point clouds in 3D space using Open3D
Overlays 3D bounding boxes in LiDAR coordinate system
Supports color-coding by object type and height
Handles coordinate transformations between camera and LiDAR frames
Provides interactive 3D visualization capabilities
The 3D bounding box creation is handled by the
3. Bird’s Eye View (BEV) Visualization¶
The BEV visualization is implemented through the
Creates top-down view of LiDAR point cloud data
Projects 3D bounding boxes to 2D BEV coordinates
Color-codes points by height using viridis colormap
Draws object orientation arrows and labels
Supports configurable range and resolution parameters
4. Multi-Modal Visualization¶
The comprehensive multi-modal visualization is implemented through the
Creates multi-panel visualizations combining different data modalities
Displays 2D bounding boxes, 3D projections, and LiDAR overlays
Generates Bird’s Eye View representations
Supports intensity and distance mapping
Provides comprehensive sample analysis in a single view
The LiDAR-to-image projection is handled by coordinate transformation functions in
Code Implementation¶
Complete KITTI Sample Processing Pipeline¶
The complete KITTI processing pipeline is implemented in
Main Processing Functions:¶
: Loads and parses KITTI label files : Reads calibration parameters : Loads LiDAR point cloud data
Coordinate System Transformations:¶
: Transforms between coordinate systems Camera-to-LiDAR and LiDAR-to-camera transformations
3D-to-2D projection handling
Visualization Pipeline:¶
Multi-modal data visualization combining images, LiDAR, and annotations
Comprehensive sample analysis and processing results
Export capabilities for processed data and visualizations
For detailed implementation examples and usage patterns, refer to the functions in
Dataset Management¶
The KITTI toolkit provides comprehensive dataset management capabilities including downloading, extracting, and validating data. These functionalities are implemented in
Dataset Loading and Processing:¶
Automatic data loading from KITTI directory structure
Sample indexing and batch processing capabilities
Data validation and integrity checking
Support for both training and testing splits
File Management:¶
Structured file path handling for images, LiDAR, labels, and calibration
Automatic directory creation and organization
Error handling for missing or corrupted files
For specific usage examples and implementation details, see the data loading functions in
File count consistency checks
Sample file validation
Data integrity verification
Comprehensive error reporting
Summary¶
This tutorial has covered the essential aspects of working with the KITTI dataset for 3D object detection and autonomous driving research. The complete implementation is available in
Data Loading: Efficient loading of images, LiDAR point clouds, labels, and calibration data
Coordinate Transformations: Functions for converting between LiDAR, camera, and image coordinate systems
Visualization Tools: Comprehensive visualization functions for 2D/3D bounding boxes, point clouds, and multi-modal data
Dataset Management: Tools for downloading, validating, and processing KITTI data
The modular design allows researchers to easily integrate KITTI data processing into their machine learning pipelines while maintaining code clarity and performance.
Summary¶
This tutorial has covered the essential aspects of working with KITTI datasets:
Data Structure: Understanding KITTI directory organization and file formats
Data Loading: Using
functions for efficient data access Coordinate Systems: Camera, LiDAR, and image coordinate transformations
Visualization: 2D/3D bounding boxes, point clouds, and multi-modal displays
Dataset Management: Validation, downloading, and processing tools
For complete implementation details, refer to the