KITTI Dataset Tutorial¶

Table of Contents¶

Introduction
Dataset Structure
Annotation Format
Coordinate Systems
Coordinate Transformations
3D Bounding Box Processing
Visualization Features
Code Implementation
Dataset Management
Common Issues and Solutions
Best Practices

Introduction¶

The KITTI dataset is one of the most influential autonomous driving datasets, providing synchronized camera images, LiDAR point clouds, and GPS/IMU data. This tutorial focuses on understanding and implementing the coordinate system transformations, 3D object detection, and visualization techniques using the comprehensive KITTI toolkit.

This tutorial covers:

Dataset Structure: Understanding KITTI’s file organization and data formats
Coordinate Systems: Camera, LiDAR, and object coordinate systems
3D Object Processing: Loading, transforming, and visualizing 3D bounding boxes
Visualization Tools: 2D, 3D, LiDAR, and Bird’s Eye View (BEV) visualization
Dataset Management: Downloading, extracting, and validating KITTI data

Dataset Structure¶

KITTI organizes data into training and testing splits with synchronized sensor data:

KITTI/
├── training/
│   ├── image_2/          # Left color camera images
│   ├── image_3/          # Right color camera images  
│   ├── velodyne/         # LiDAR point clouds (.bin files)
│   ├── label_2/          # 3D object annotations (.txt files)
│   └── calib/            # Calibration matrices (.txt files)
├── testing/
│   ├── image_2/          # Test images (no labels)
│   ├── image_3/          # Right test images
│   ├── velodyne/         # Test LiDAR data
│   └── calib/            # Test calibration data
└── ImageSets/
    ├── train.txt         # Training sample indices
    ├── val.txt           # Validation sample indices
    └── test.txt          # Test sample indices

File Naming Convention¶

All files use a consistent 6-digit zero-padded naming scheme:

Images: 000000.png, 000001.png, …, 007480.png
LiDAR: 000000.bin, 000001.bin, …, 007480.bin
Labels: 000000.txt, 000001.txt, …, 007480.txt
Calibration: 000000.txt, 000001.txt, …, 007480.txt

Annotation Format¶

3D Object Annotations (label_2/*.txt)¶

Each line in a label file represents one 3D object with 15 space-separated values:

type truncated occluded alpha bbox_2d_left bbox_2d_top bbox_2d_right bbox_2d_bottom height width length x y z rotation_y

Detailed Field Description:

Field	Type	Description	Range/Units
`type`	string	Object class	‘Car’, ‘Van’, ‘Truck’, ‘Pedestrian’, ‘Person_sitting’, ‘Cyclist’, ‘Tram’, ‘Misc’, ‘DontCare’
`truncated`	float	Truncation level	0 (non-truncated) to 1 (fully truncated)
`occluded`	int	Occlusion state	0 (fully visible), 1 (partly occluded), 2 (largely occluded), 3 (unknown)
`alpha`	float	Observation angle	-π to π radians
`bbox_2d`	float×4	2D bounding box	[left, top, right, bottom] in pixels
`dimensions`	float×3	3D object dimensions	[height, width, length] in meters
`location`	float×3	3D object center	[x, y, z] in camera coordinates (meters)
`rotation_y`	float	Rotation around Y-axis	-π to π radians

Example Annotation:

Car 0.00 0 -1.57 599.41 156.40 629.75 189.25 1.73 1.87 4.60 1.84 1.47 8.41 -1.56

This represents:

A Car that is not truncated (0.00) or occluded (0)
Observation angle α = -1.57 radians
2D bbox: [599.41, 156.40, 629.75, 189.25] pixels
3D dimensions: height=1.73m, width=1.87m, length=4.60m
3D center: x=1.84m, y=1.47m, z=8.41m (camera coordinates)
Y-axis rotation: -1.56 radians

Calibration Data (calib/*.txt)¶

Calibration files contain transformation matrices between coordinate systems:

P0: 7.215377e+02 0.000000e+00 6.095593e+01 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P1: 7.215377e+02 0.000000e+00 6.095593e+01 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P2: 7.215377e+02 0.000000e+00 6.095593e+01 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
P3: 7.215377e+02 0.000000e+00 6.095593e+01 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
R0_rect: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
Tr_velo_to_cam: 7.533745e-03 -9.999714e-01 -6.166020e-04 -4.069766e-03 1.480249e-02 7.280733e-04 -9.998902e-01 -7.631618e-02 9.998621e-01 7.523790e-03 1.480755e-02 -2.717806e-01
Tr_imu_to_velo: 9.999976e-01 7.553071e-04 -2.035826e-03 -8.086759e-01 -7.854027e-04 9.998898e-01 -1.482298e-02 3.195559e-01 2.024406e-03 1.482454e-02 9.998881e-01 -7.997231e-01

Matrix Descriptions:

Matrix	Size	Description
`P0`, `P1`, `P2`, `P3`	3×4	Projection matrices for cameras 0-3
`R0_rect`	3×3	Rectification matrix for camera 0
`Tr_velo_to_cam`	3×4	Transformation from LiDAR to camera coordinates
`Tr_imu_to_velo`	3×4	Transformation from IMU to LiDAR coordinates

LiDAR Point Cloud Data (velodyne/*.bin)¶

LiDAR data is stored as binary files with each point containing 4 float32 values:

x, y, z: 3D coordinates in LiDAR coordinate system (meters)
intensity: Reflectance value (0-255)

# Loading LiDAR data
import numpy as np

def load_lidar_data(file_path):
    """Load LiDAR point cloud from binary file"""
    points = np.fromfile(file_path, dtype=np.float32).reshape(-1, 4)
    return points[:, :3], points[:, 3]  # coordinates, intensities

Coordinate Systems¶

KITTI uses multiple coordinate systems that require careful transformation:

1. Camera Coordinate System¶

Origin: Camera center
X-axis: Right (positive to the right in image)
Y-axis: Down (positive downward in image)
Z-axis: Forward (positive into the scene)
Units: Meters

2. LiDAR Coordinate System (Velodyne)¶

Origin: LiDAR sensor center
X-axis: Forward (vehicle driving direction)
Y-axis: Left (positive to the left of vehicle)
Z-axis: Up (positive upward)
Units: Meters

3. Object Coordinate System¶

Origin: Object center (bottom center for vehicles)
Dimensions: Height (Y), Width (X), Length (Z)
Rotation: Around Y-axis (yaw angle)

Coordinate System Relationships¶

IMU → LiDAR → Camera → Image
 │      │        │       │
 │      │        │       └── 2D pixel coordinates
 │      │        └────────── 3D camera coordinates  
 │      └─────────────────── 3D LiDAR coordinates
 └────────────────────────── 3D IMU coordinates

Coordinate Transformations¶

Transformation Pipeline¶

The complete transformation from LiDAR to image coordinates involves several steps:

def lidar_to_camera_transform(points_lidar, calib):
    """Transform points from LiDAR to camera coordinates"""
    
    # Step 1: LiDAR → Camera (unrectified)
    # Apply Tr_velo_to_cam transformation
    points_cam_unrect = calib.Tr_velo_to_cam @ np.vstack([points_lidar.T, np.ones((1, points_lidar.shape[0]))])
    
    # Step 2: Apply rectification
    # Multiply by R0_rect to get rectified camera coordinates
    points_cam_rect = calib.R0_rect @ points_cam_unrect[:3, :]
    
    return points_cam_rect.T

def camera_to_image_projection(points_cam, calib):
    """Project 3D camera points to 2D image coordinates"""
    
    # Apply projection matrix P2 (left color camera)
    points_2d_hom = calib.P2 @ np.vstack([points_cam.T, np.ones((1, points_cam.shape[0]))])
    
    # Normalize homogeneous coordinates
    points_2d = points_2d_hom[:2, :] / points_2d_hom[2, :]
    
    return points_2d.T

3D Bounding Box Corner Generation¶

KITTI 3D bounding boxes are defined by center, dimensions, and rotation. The 8 corners are generated by the function in . This function:

Computes 8 corner points of a 3D bounding box from KITTI object parameters
Handles coordinate transformations between object-local, camera, and LiDAR coordinate systems
Follows KITTI’s standard corner ordering convention
Supports optional transformation to LiDAR coordinates using calibration data

Corner Ordering Convention¶

KITTI uses a specific ordering for the 8 corners of 3D bounding boxes:

    4 -------- 5
   /|         /|
  7 -------- 6 .
  | |        | |
  . 0 -------- 1
  |/         |/
  3 -------- 2

Bottom face: 0,1,2,3 (y = -h/2)
Top face:    4,5,6,7 (y = +h/2)

Corner Indices:

0: Bottom-front-left
1: Bottom-front-right
2: Bottom-rear-right
3: Bottom-rear-left
4: Top-front-left
5: Top-front-right
6: Top-rear-right
7: Top-rear-left

3D Bounding Box Processing¶

Object3d Class Implementation¶

The Object3d class encapsulates KITTI 3D object annotations and is implemented in as the class. This class:

Parses KITTI annotation lines into structured object data
Stores 2D and 3D bounding box parameters
Provides methods for coordinate transformations
Handles object type, visibility, and geometric properties

Key attributes:

type: Object category (Car, Pedestrian, Cyclist, etc.)
truncation, occlusion: Visibility indicators
alpha: Observation angle
xmin, ymin, xmax, ymax: 2D bounding box coordinates
h, w, l: 3D dimensions (height, width, length)
t: 3D center location in camera coordinates
ry: Rotation around Y-axis (yaw angle)

Calibration Class Implementation¶

The calibration class handles coordinate transformations and is implemented through the function in . The calibration system:

Loads calibration matrices from KITTI calibration files
Handles coordinate transformations between different reference frames
Supports both KITTI and WaymoKITTI calibration formats
Provides projection matrices for camera-to-image transformations
Manages LiDAR-to-camera coordinate conversions

Key transformation matrices:

P0, P1, P2, P3: Camera projection matrices (3x4)
R0_rect: Rectification matrix (3x3)
Tr_velo_to_cam: LiDAR to camera transformation (3x4)
Tr_imu_to_velo: IMU to LiDAR transformation (3x4)

Visualization Features¶

The KITTI toolkit provides comprehensive visualization capabilities implemented in :

1. 2D Image Visualization¶

Basic 2D Bounding Box Visualization¶

The 2D bounding box visualization is handled by the function, which:

Displays images with 2D bounding boxes overlaid
Supports multiple camera views simultaneously
Color-codes different object types
Handles object filtering and visibility checks

3D Bounding Box Projection to 2D¶

The 3D-to-2D projection visualization is implemented through the function, which: ‘b-’, linewidth=2)

Projects 3D bounding boxes onto 2D images
Handles coordinate transformations from 3D to 2D space
Draws wireframe representations of 3D boxes
Filters objects based on visibility and distance

2. 3D Point Cloud Visualization¶

LiDAR Point Cloud with 3D Bounding Boxes¶

The 3D LiDAR visualization is implemented through the function, which:

Renders LiDAR point clouds in 3D space using Open3D
Overlays 3D bounding boxes in LiDAR coordinate system
Supports color-coding by object type and height
Handles coordinate transformations between camera and LiDAR frames
Provides interactive 3D visualization capabilities

The 3D bounding box creation is handled by the function.

3. Bird’s Eye View (BEV) Visualization¶

The BEV visualization is implemented through the function, which:

Creates top-down view of LiDAR point cloud data
Projects 3D bounding boxes to 2D BEV coordinates
Color-codes points by height using viridis colormap
Draws object orientation arrows and labels
Supports configurable range and resolution parameters

Code Implementation¶

Complete KITTI Sample Processing Pipeline¶

The complete KITTI processing pipeline is implemented in with the following key components:

Main Processing Functions:¶

: Loads and parses KITTI label files
: Reads calibration parameters
: Loads LiDAR point cloud data

Coordinate System Transformations:¶

: Transforms between coordinate systems
Camera-to-LiDAR and LiDAR-to-camera transformations
3D-to-2D projection handling

Visualization Pipeline:¶

Multi-modal data visualization combining images, LiDAR, and annotations
Comprehensive sample analysis and processing results
Export capabilities for processed data and visualizations

For detailed implementation examples and usage patterns, refer to the functions in .

Dataset Management¶

The KITTI toolkit provides comprehensive dataset management capabilities including downloading, extracting, and validating data. These functionalities are implemented in with the following key features:

Dataset Loading and Processing:¶

Automatic data loading from KITTI directory structure
Sample indexing and batch processing capabilities
Data validation and integrity checking
Support for both training and testing splits

File Management:¶

Structured file path handling for images, LiDAR, labels, and calibration
Automatic directory creation and organization
Error handling for missing or corrupted files

For specific usage examples and implementation details, see the data loading functions in . For complete dataset validation functionality, see which includes:

File count consistency checks
Sample file validation
Data integrity verification
Comprehensive error reporting

Summary¶

This tutorial has covered the essential aspects of working with the KITTI dataset for 3D object detection and autonomous driving research. The complete implementation is available in , which provides:

Data Loading: Efficient loading of images, LiDAR point clouds, labels, and calibration data
Coordinate Transformations: Functions for converting between LiDAR, camera, and image coordinate systems
Visualization Tools: Comprehensive visualization functions for 2D/3D bounding boxes, point clouds, and multi-modal data
Dataset Management: Tools for downloading, validating, and processing KITTI data

The modular design allows researchers to easily integrate KITTI data processing into their machine learning pipelines while maintaining code clarity and performance.

Summary¶

This tutorial has covered the essential aspects of working with KITTI datasets:

Data Structure: Understanding KITTI directory organization and file formats
Data Loading: Using functions for efficient data access
Coordinate Systems: Camera, LiDAR, and image coordinate transformations
Visualization: 2D/3D bounding boxes, point clouds, and multi-modal displays
Dataset Management: Validation, downloading, and processing tools

For complete implementation details, refer to the module which contains all the functions referenced in this tutorial.

KITTI Dataset Tutorial¶

Table of Contents¶

Introduction¶

Dataset Structure¶

File Naming Convention¶

Annotation Format¶

3D Object Annotations (label_2/*.txt)¶

Calibration Data (calib/*.txt)¶

LiDAR Point Cloud Data (velodyne/*.bin)¶

Coordinate Systems¶

1. Camera Coordinate System¶

2. LiDAR Coordinate System (Velodyne)¶

3. Object Coordinate System¶

Coordinate System Relationships¶

Coordinate Transformations¶

Transformation Pipeline¶

3D Bounding Box Corner Generation¶

Corner Ordering Convention¶

3D Bounding Box Processing¶

Object3d Class Implementation¶

Calibration Class Implementation¶

Visualization Features¶

1. 2D Image Visualization¶

Basic 2D Bounding Box Visualization¶

3D Bounding Box Projection to 2D¶

2. 3D Point Cloud Visualization¶

LiDAR Point Cloud with 3D Bounding Boxes¶

3. Bird’s Eye View (BEV) Visualization¶

4. Multi-Modal Visualization¶

Code Implementation¶

Complete KITTI Sample Processing Pipeline¶

Main Processing Functions:¶

Coordinate System Transformations:¶

Visualization Pipeline:¶

Dataset Management¶

Dataset Loading and Processing:¶

File Management:¶

Summary¶

Summary¶