KITTI Dataset Tutorial¶

Table of Contents¶

  1. Introduction

  2. Dataset Structure

  3. Annotation Format

  4. Coordinate Systems

  5. Coordinate Transformations

  6. 3D Bounding Box Processing

  7. Visualization Features

  8. Code Implementation

  9. Dataset Management

  10. Common Issues and Solutions

  11. Best Practices

Introduction¶

The KITTI dataset is one of the most influential autonomous driving datasets, providing synchronized camera images, LiDAR point clouds, and GPS/IMU data. This tutorial focuses on understanding and implementing the coordinate system transformations, 3D object detection, and visualization techniques using the comprehensive KITTI toolkit.

This tutorial covers:

  • Dataset Structure: Understanding KITTI’s file organization and data formats

  • Coordinate Systems: Camera, LiDAR, and object coordinate systems

  • 3D Object Processing: Loading, transforming, and visualizing 3D bounding boxes

  • Visualization Tools: 2D, 3D, LiDAR, and Bird’s Eye View (BEV) visualization

  • Dataset Management: Downloading, extracting, and validating KITTI data

Dataset Structure¶

KITTI organizes data into training and testing splits with synchronized sensor data:

KITTI/
├── training/
│   ├── image_2/          # Left color camera images
│   ├── image_3/          # Right color camera images  
│   ├── velodyne/         # LiDAR point clouds (.bin files)
│   ├── label_2/          # 3D object annotations (.txt files)
│   └── calib/            # Calibration matrices (.txt files)
├── testing/
│   ├── image_2/          # Test images (no labels)
│   ├── image_3/          # Right test images
│   ├── velodyne/         # Test LiDAR data
│   └── calib/            # Test calibration data
└── ImageSets/
    ├── train.txt         # Training sample indices
    ├── val.txt           # Validation sample indices
    └── test.txt          # Test sample indices

File Naming Convention¶

All files use a consistent 6-digit zero-padded naming scheme:

  • Images: 000000.png, 000001.png, …, 007480.png

  • LiDAR: 000000.bin, 000001.bin, …, 007480.bin

  • Labels: 000000.txt, 000001.txt, …, 007480.txt

  • Calibration: 000000.txt, 000001.txt, …, 007480.txt

Annotation Format¶

3D Object Annotations (label_2/*.txt)¶

Each line in a label file represents one 3D object with 15 space-separated values:

type truncated occluded alpha bbox_2d_left bbox_2d_top bbox_2d_right bbox_2d_bottom height width length x y z rotation_y

Detailed Field Description:

Field

Type

Description

Range/Units

type

string

Object class

‘Car’, ‘Van’, ‘Truck’, ‘Pedestrian’, ‘Person_sitting’, ‘Cyclist’, ‘Tram’, ‘Misc’, ‘DontCare’

truncated

float

Truncation level

0 (non-truncated) to 1 (fully truncated)

occluded

int

Occlusion state

0 (fully visible), 1 (partly occluded), 2 (largely occluded), 3 (unknown)

alpha

float

Observation angle

-π to π radians

bbox_2d

float×4

2D bounding box

[left, top, right, bottom] in pixels

dimensions

float×3

3D object dimensions

[height, width, length] in meters

location

float×3

3D object center

[x, y, z] in camera coordinates (meters)

rotation_y

float

Rotation around Y-axis

-π to π radians

Example Annotation:

Car 0.00 0 -1.57 599.41 156.40 629.75 189.25 1.73 1.87 4.60 1.84 1.47 8.41 -1.56

This represents:

  • A Car that is not truncated (0.00) or occluded (0)

  • Observation angle α = -1.57 radians

  • 2D bbox: [599.41, 156.40, 629.75, 189.25] pixels

  • 3D dimensions: height=1.73m, width=1.87m, length=4.60m

  • 3D center: x=1.84m, y=1.47m, z=8.41m (camera coordinates)

  • Y-axis rotation: -1.56 radians

Calibration Data (calib/*.txt)¶

Calibration files contain transformation matrices between coordinate systems:

P0: 7.215377e+02 0.000000e+00 6.095593e+01 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P1: 7.215377e+02 0.000000e+00 6.095593e+01 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
P2: 7.215377e+02 0.000000e+00 6.095593e+01 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
P3: 7.215377e+02 0.000000e+00 6.095593e+01 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
R0_rect: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
Tr_velo_to_cam: 7.533745e-03 -9.999714e-01 -6.166020e-04 -4.069766e-03 1.480249e-02 7.280733e-04 -9.998902e-01 -7.631618e-02 9.998621e-01 7.523790e-03 1.480755e-02 -2.717806e-01
Tr_imu_to_velo: 9.999976e-01 7.553071e-04 -2.035826e-03 -8.086759e-01 -7.854027e-04 9.998898e-01 -1.482298e-02 3.195559e-01 2.024406e-03 1.482454e-02 9.998881e-01 -7.997231e-01

Matrix Descriptions:

Matrix

Size

Description

P0, P1, P2, P3

3×4

Projection matrices for cameras 0-3

R0_rect

3×3

Rectification matrix for camera 0

Tr_velo_to_cam

3×4

Transformation from LiDAR to camera coordinates

Tr_imu_to_velo

3×4

Transformation from IMU to LiDAR coordinates

LiDAR Point Cloud Data (velodyne/*.bin)¶

LiDAR data is stored as binary files with each point containing 4 float32 values:

  • x, y, z: 3D coordinates in LiDAR coordinate system (meters)

  • intensity: Reflectance value (0-255)

# Loading LiDAR data
import numpy as np

def load_lidar_data(file_path):
    """Load LiDAR point cloud from binary file"""
    points = np.fromfile(file_path, dtype=np.float32).reshape(-1, 4)
    return points[:, :3], points[:, 3]  # coordinates, intensities

Coordinate Systems¶

KITTI uses multiple coordinate systems that require careful transformation:

1. Camera Coordinate System¶

  • Origin: Camera center

  • X-axis: Right (positive to the right in image)

  • Y-axis: Down (positive downward in image)

  • Z-axis: Forward (positive into the scene)

  • Units: Meters

2. LiDAR Coordinate System (Velodyne)¶

  • Origin: LiDAR sensor center

  • X-axis: Forward (vehicle driving direction)

  • Y-axis: Left (positive to the left of vehicle)

  • Z-axis: Up (positive upward)

  • Units: Meters

3. Object Coordinate System¶

  • Origin: Object center (bottom center for vehicles)

  • Dimensions: Height (Y), Width (X), Length (Z)

  • Rotation: Around Y-axis (yaw angle)

Coordinate System Relationships¶

IMU → LiDAR → Camera → Image
 │      │        │       │
 │      │        │       └── 2D pixel coordinates
 │      │        └────────── 3D camera coordinates  
 │      └─────────────────── 3D LiDAR coordinates
 └────────────────────────── 3D IMU coordinates

Coordinate Transformations¶

Transformation Pipeline¶

The complete transformation from LiDAR to image coordinates involves several steps:

def lidar_to_camera_transform(points_lidar, calib):
    """Transform points from LiDAR to camera coordinates"""
    
    # Step 1: LiDAR → Camera (unrectified)
    # Apply Tr_velo_to_cam transformation
    points_cam_unrect = calib.Tr_velo_to_cam @ np.vstack([points_lidar.T, np.ones((1, points_lidar.shape[0]))])
    
    # Step 2: Apply rectification
    # Multiply by R0_rect to get rectified camera coordinates
    points_cam_rect = calib.R0_rect @ points_cam_unrect[:3, :]
    
    return points_cam_rect.T

def camera_to_image_projection(points_cam, calib):
    """Project 3D camera points to 2D image coordinates"""
    
    # Apply projection matrix P2 (left color camera)
    points_2d_hom = calib.P2 @ np.vstack([points_cam.T, np.ones((1, points_cam.shape[0]))])
    
    # Normalize homogeneous coordinates
    points_2d = points_2d_hom[:2, :] / points_2d_hom[2, :]
    
    return points_2d.T

3D Bounding Box Corner Generation¶

KITTI 3D bounding boxes are defined by center, dimensions, and rotation. The 8 corners are generated by the function in . This function:

  • Computes 8 corner points of a 3D bounding box from KITTI object parameters

  • Handles coordinate transformations between object-local, camera, and LiDAR coordinate systems

  • Follows KITTI’s standard corner ordering convention

  • Supports optional transformation to LiDAR coordinates using calibration data

Corner Ordering Convention¶

KITTI uses a specific ordering for the 8 corners of 3D bounding boxes:

    4 -------- 5
   /|         /|
  7 -------- 6 .
  | |        | |
  . 0 -------- 1
  |/         |/
  3 -------- 2

Bottom face: 0,1,2,3 (y = -h/2)
Top face:    4,5,6,7 (y = +h/2)

Corner Indices:

  • 0: Bottom-front-left

  • 1: Bottom-front-right

  • 2: Bottom-rear-right

  • 3: Bottom-rear-left

  • 4: Top-front-left

  • 5: Top-front-right

  • 6: Top-rear-right

  • 7: Top-rear-left

3D Bounding Box Processing¶

Object3d Class Implementation¶

The Object3d class encapsulates KITTI 3D object annotations and is implemented in as the class. This class:

  • Parses KITTI annotation lines into structured object data

  • Stores 2D and 3D bounding box parameters

  • Provides methods for coordinate transformations

  • Handles object type, visibility, and geometric properties

Key attributes:

  • type: Object category (Car, Pedestrian, Cyclist, etc.)

  • truncation, occlusion: Visibility indicators

  • alpha: Observation angle

  • xmin, ymin, xmax, ymax: 2D bounding box coordinates

  • h, w, l: 3D dimensions (height, width, length)

  • t: 3D center location in camera coordinates

  • ry: Rotation around Y-axis (yaw angle)

Calibration Class Implementation¶

The calibration class handles coordinate transformations and is implemented through the function in . The calibration system:

  • Loads calibration matrices from KITTI calibration files

  • Handles coordinate transformations between different reference frames

  • Supports both KITTI and WaymoKITTI calibration formats

  • Provides projection matrices for camera-to-image transformations

  • Manages LiDAR-to-camera coordinate conversions

Key transformation matrices:

  • P0, P1, P2, P3: Camera projection matrices (3x4)

  • R0_rect: Rectification matrix (3x3)

  • Tr_velo_to_cam: LiDAR to camera transformation (3x4)

  • Tr_imu_to_velo: IMU to LiDAR transformation (3x4)

Visualization Features¶

The KITTI toolkit provides comprehensive visualization capabilities implemented in :

1. 2D Image Visualization¶

Basic 2D Bounding Box Visualization¶

The 2D bounding box visualization is handled by the function, which:

  • Displays images with 2D bounding boxes overlaid

  • Supports multiple camera views simultaneously

  • Color-codes different object types

  • Handles object filtering and visibility checks

3D Bounding Box Projection to 2D¶

The 3D-to-2D projection visualization is implemented through the function, which: ‘b-’, linewidth=2)

  • Projects 3D bounding boxes onto 2D images

  • Handles coordinate transformations from 3D to 2D space

  • Draws wireframe representations of 3D boxes

  • Filters objects based on visibility and distance

2. 3D Point Cloud Visualization¶

LiDAR Point Cloud with 3D Bounding Boxes¶

The 3D LiDAR visualization is implemented through the function, which:

  • Renders LiDAR point clouds in 3D space using Open3D

  • Overlays 3D bounding boxes in LiDAR coordinate system

  • Supports color-coding by object type and height

  • Handles coordinate transformations between camera and LiDAR frames

  • Provides interactive 3D visualization capabilities

The 3D bounding box creation is handled by the function.

3. Bird’s Eye View (BEV) Visualization¶

The BEV visualization is implemented through the function, which:

  • Creates top-down view of LiDAR point cloud data

  • Projects 3D bounding boxes to 2D BEV coordinates

  • Color-codes points by height using viridis colormap

  • Draws object orientation arrows and labels

  • Supports configurable range and resolution parameters

4. Multi-Modal Visualization¶

The comprehensive multi-modal visualization is implemented through the function, which:

  • Creates multi-panel visualizations combining different data modalities

  • Displays 2D bounding boxes, 3D projections, and LiDAR overlays

  • Generates Bird’s Eye View representations

  • Supports intensity and distance mapping

  • Provides comprehensive sample analysis in a single view

The LiDAR-to-image projection is handled by coordinate transformation functions in .

Code Implementation¶

Complete KITTI Sample Processing Pipeline¶

The complete KITTI processing pipeline is implemented in with the following key components:

Main Processing Functions:¶

  • : Loads and parses KITTI label files

  • : Reads calibration parameters

  • : Loads LiDAR point cloud data

Coordinate System Transformations:¶

  • : Transforms between coordinate systems

  • Camera-to-LiDAR and LiDAR-to-camera transformations

  • 3D-to-2D projection handling

Visualization Pipeline:¶

  • Multi-modal data visualization combining images, LiDAR, and annotations

  • Comprehensive sample analysis and processing results

  • Export capabilities for processed data and visualizations

For detailed implementation examples and usage patterns, refer to the functions in .

Dataset Management¶

The KITTI toolkit provides comprehensive dataset management capabilities including downloading, extracting, and validating data. These functionalities are implemented in with the following key features:

Dataset Loading and Processing:¶

  • Automatic data loading from KITTI directory structure

  • Sample indexing and batch processing capabilities

  • Data validation and integrity checking

  • Support for both training and testing splits

File Management:¶

  • Structured file path handling for images, LiDAR, labels, and calibration

  • Automatic directory creation and organization

  • Error handling for missing or corrupted files

For specific usage examples and implementation details, see the data loading functions in . For complete dataset validation functionality, see which includes:

  • File count consistency checks

  • Sample file validation

  • Data integrity verification

  • Comprehensive error reporting

Summary¶

This tutorial has covered the essential aspects of working with the KITTI dataset for 3D object detection and autonomous driving research. The complete implementation is available in , which provides:

  • Data Loading: Efficient loading of images, LiDAR point clouds, labels, and calibration data

  • Coordinate Transformations: Functions for converting between LiDAR, camera, and image coordinate systems

  • Visualization Tools: Comprehensive visualization functions for 2D/3D bounding boxes, point clouds, and multi-modal data

  • Dataset Management: Tools for downloading, validating, and processing KITTI data

The modular design allows researchers to easily integrate KITTI data processing into their machine learning pipelines while maintaining code clarity and performance.

Summary¶

This tutorial has covered the essential aspects of working with KITTI datasets:

  • Data Structure: Understanding KITTI directory organization and file formats

  • Data Loading: Using functions for efficient data access

  • Coordinate Systems: Camera, LiDAR, and image coordinate transformations

  • Visualization: 2D/3D bounding boxes, point clouds, and multi-modal displays

  • Dataset Management: Validation, downloading, and processing tools

For complete implementation details, refer to the module which contains all the functions referenced in this tutorial.