Bird’s-Eye-View (BEV) Detection Tutorial¶

This tutorial covers Bird’s-Eye-View (BEV) detection methods, including LSS (Lift-Splat-Shoot) and BEVFusion architectures for 3D object detection.

LSS (Lift-Splat-Shoot)¶

LSS Bird’s-Eye-View Conversion¶

We have added a new folder (mydetector3d/datasets/nuscenes/lss) to test Bird’s-Eye-View conversion based on the LSS model from lift-splat-shoot.

Installation Requirements¶

Install the required dependencies:

pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0

Training LSS Model¶

Perform LSS training on the nuScenes v1.0-mini dataset:

# File: mydetector3d/datasets/nuscenes/lss/lssmain.py
train('mini', dataroot='/data/cmpe249-fa22/nuScenes/nuScenesv1.0-mini/', nepochs=100, gpuid=0, logdir='./output/lss')

Model Evaluation¶

The pretrained model is saved at /data/cmpe249-fa22/Mymodels/lss_model525000.pt. Use the eval_model_iou function in mydetector3d/datasets/nuscenes/lss/lssexplore.py for inference:

{'loss': 0.09620507466204373, 'iou': 0.35671476137624863}

Map Configuration Issue¶

When running viz_model_preds, you may encounter a missing map file error:

No such file or directory: '/data/cmpe249-fa22/nuScenes/nuScenesv1.0-mini/maps/maps/expansion/singapore-hollandvillage.json'

To fix this issue, extract and copy the map expansion files:

(mycondapy39) [010796032@cs001 nuScenes]$ unzip nuScenes-map-expansion-v1.3.zip
Archive:  nuScenes-map-expansion-v1.3.zip
creating: basemap/
inflating: basemap/boston-seaport.png
inflating: basemap/singapore-hollandvillage.png
inflating: basemap/singapore-queenstown.png
inflating: basemap/singapore-onenorth.png
creating: expansion/
inflating: expansion/boston-seaport.json
inflating: expansion/singapore-onenorth.json
inflating: expansion/singapore-queenstown.json
inflating: expansion/singapore-hollandvillage.json
creating: prediction/
inflating: prediction/prediction_scenes.json
(mycondapy39) [010796032@cs001 nuScenes]$ cp -r expansion/ nuScenesv1.0-mini/maps/

Visualization Results¶

After fixing the map issue, the evaluation figures from viz_model_preds are saved as eval000000_000.jpg (format: f'eval{batchi:06}_{si:03}.jpg') in the root folder.

Image dimensions: [4, 6, 3, 128, 352]

LSS Visualization 1 LSS model prediction visualization - Sample 1

LSS Visualization 2 LSS model prediction visualization - Sample 2

LSS Visualization 3 LSS model prediction visualization - Sample 3

LiDAR Calibration Check¶

The lidar_check function performs a visual verification to ensure extrinsics and intrinsics are parsed correctly:

Left: Input images with LiDAR scans projected using extrinsics and intrinsics
Middle: The projected LiDAR scan
Right: X-Y projection of the point cloud generated by the lift-splat model

LiDAR Check 1 LiDAR calibration verification - Sample 1

LiDAR Check 2 LiDAR calibration verification - Sample 2

Training Results¶

After completing training on the nuScenes v1.0-mini dataset using mydetector3d/datasets/nuscenes/lss/lssmain.py, the models are saved in the output folder as model1000.pt and model8000.pt. Using model8000.pt for inference yields:

{'loss': 0.23870943376311549, 'iou': 0.11804760577248166}

BEVFusion¶

BEVFusion code has been integrated into the mydetector3d framework for multi-modal 3D object detection.

BEVFusion Training¶

Training Configuration¶

Training Parameters (Updated: 10/21):

Config file: mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml
Batch size: 4
Epochs: 128
Extra tag: 0522
Checkpoint: /data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/latest_model.pth
Output folder: /data/cmpe249-fa22/Mymodels/

Available Models¶

(mycondapy310) [010796032@cs001 3DDepth]$ ls /data/cmpe249-fa22/Mymodels/nuscenes_models/
bevfusion  cbgs_pp_multihead
/data/cmpe249-fa22/Mymodels/nuscenes_models/cbgs_pp_multihead/0522/ckpt/checkpoint_epoch_128.pth
/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/checkpoint_epoch_56.pth  latest_model.pth

Training Command¶

(mycondapy310) [010796032@cs001 3DDepth]$ python ./mydetector3d/tools/mytrain.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --batch_size=4 --epochs=128 --extra_tag='0522' --ckpt='/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/latest_model.pth' --outputfolder='/data/cmpe249-fa22/Mymodels/'
023-10-21 17:09:07,965   INFO  Train:   59/128 ( 46%) [4534/30895 ( 15%)]  Loss: 0.4369 (0.437)  LR: 5.738e-05  Time cost: 00:47/346:12:13 [00:47/28342:55:05]  Acc_iter 1796445     Data time: 10.99(10.99)  Forward time: 36.29(36.29)  Batch time: 47.28(47.28)

BEVFusion Evaluation¶

Evaluation Results - Custom Trained Model¶

(mycondapy310) [010796032@cs002 3DDepth]$ python mydetector3d/tools/myevaluatev2_nuscenes.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --ckpt='/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/checkpoint_epoch_56.pth' --tag='1021' --outputpath='/data/cmpe249-fa22/Mymodels/'

Dataset Statistics:

Ground truth annotations: 6,019 samples
Original predictions: 1,203,800 boxes
After distance filtering: 807,685 boxes
After LiDAR/RADAR filtering: 807,685 boxes
After bike rack filtering: 807,498 boxes

Overall Performance Metrics:

mAP: 0.6215
mATE: 0.2811 (Average Translation Error)
mASE: 0.2565 (Average Scale Error)
mAOE: 0.3630 (Average Orientation Error)
mAVE: 0.2630 (Average Velocity Error)
mAAE: 0.1964 (Average Attribute Error)
NDS: 0.6747 (nuScenes Detection Score)
Evaluation time: 123.9s

Per-Class Performance:

Object Class	AP	ATE	ASE	AOE	AVE	AAE
car	0.867	0.182	0.155	0.064	0.242	0.187
truck	0.517	0.356	0.210	0.077	0.273	0.215
bus	0.704	0.339	0.185	0.076	0.505	0.267
trailer	0.427	0.482	0.213	0.775	0.208	0.181
construction_vehicle	0.257	0.630	0.439	0.877	0.146	0.350
pedestrian	0.856	0.128	0.286	0.351	0.209	0.089
motorcycle	0.678	0.206	0.235	0.382	0.333	0.268
bicycle	0.493	0.172	0.261	0.613	0.187	0.013
traffic_cone	0.755	0.122	0.316	nan	nan	nan
barrier	0.660	0.195	0.265	0.051	nan	nan

Evaluation Results - Pretrained Model¶

(mycondapy310) [010796032@cs002 3DDepth]$ python mydetector3d/tools/myevaluatev2_nuscenes.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --ckpt='/data/cmpe249-fa23/modelzoo/cbgs_bevfusion.pth' --tag='1022' --outputpath='/data/cmpe249-fa22/Mymodels/'

Model Loading Issues:

==> Loading parameters from checkpoint /data/cmpe249-fa23/modelzoo/cbgs_bevfusion.pth to cuda:0
Not updated weight backbone_3d.conv1.0.conv1.bias: torch.Size([16])
[... additional weight loading warnings ...]
==> Done (loaded 582/599)

Performance Metrics (Pretrained Model):

mAP: 0.2364
mATE: 0.7516
mASE: 0.6989
mAOE: 0.6777
mAVE: 0.6240
mAAE: 0.4523
NDS: 0.2977
Evaluation time: 100.4s

BEVFusion Architecture Overview¶

The BEVFusion model forward process consists of the following major components:

1. MeanVFE (Voxel Feature Encoder)¶

Input: voxel_features([600911, 10, 5]), voxel_num_points([600911])
Output: batch_dict['voxel_features'] = points_mean.contiguous() #[600911, 5]

2. VoxelResBackBone8x (3D Backbone)¶

Input: voxel_features([600911, 5]), voxel_coords([600911, 4])
Output:
- batch_dict['encoded_spconv_tensor']: out([2, 180, 180])
- batch_dict['encoded_spconv_tensor_stride']: 8
- batch_dict['multi_scale_3d_features']

3. HeightCompression (BEV Mapping Module)¶

Input: encoded_spconv_tensor (Sparse [2, 180, 180])
Output:
- batch_dict['spatial_features']: [6, 256, 180, 180]
- batch_dict['spatial_features_stride']: 8

4. SwinTransformer (Image Backbone)¶

Input: batch_dict['camera_imgs'] #[6, 6, 3, 256, 704]
Output: batch_dict['image_features'] (3 items):
- [36, 192, 32, 88]
- [36, 384, 16, 44]
- [36, 768, 8, 22]

5. GeneralizedLSSFPN (Feature Pyramid Network)¶

Input: batch_dict['image_features']
Output: batch_dict['image_fpn'] (2 items):
- [36, 256, 32, 88]
- [36, 256, 16, 44]

6. DepthLSSTransform (View Transformation)¶

Lifts images into 3D and splats onto BEV features (from BEVFusion)

Input:
- batch_dict['image_fpn']: [6, 6, 256, 32, 88]
- batch_dict['points']: [1456967, 6]
Output: batch_dict['spatial_features_img']: [6, 80, 180, 180]
Components: dtransform, depthnet, downsample

8. BaseBEVBackbone (2D Backbone)¶

Input: spatial_features = data_dict['spatial_features']: [6, 256, 180, 180]
Output: data_dict['spatial_features_2d']: [6, 512, 180, 180]

9. TransFusionHead (Detection Head)¶

Loss Functions:
- loss_cls: SigmoidFocalClassificationLoss()
- loss_bbox: L1Loss()
- loss_heatmap: GaussianFocalLoss()
Input: feats = batch_dict['spatial_features_2d']: [6, 512, 180, 180]
Predictions:
- 'center': [6, 2, 200]
- 'height': [6, 1, 200]
- 'dim': [6, 3, 200]
- 'rot': [6, 2, 200]
- 'vel': [6, 2, 200]
- 'heatmap': [6, 10, 200]
- 'query_heatmap_score': [6, 10, 200]
- 'dense_heatmap': [6, 10, 180, 180]
Loss Computation: loss, tb_dict = self.loss(gt_bboxes_3d [6, 51, 9], gt_labels_3d [6, 51], res)

MMDetection3D Integration¶

Installation Guide¶

Reference: MMDetection3D Installation

Step-by-Step Installation¶

Install OpenMMLab Package Manager:

(mycondapy310) [010796032@coe-hpc2 3DDepth]$ pip install -U openmim

Install MMEngine:

(mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install mmengine
Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html
....
Successfully installed addict-2.4.0 mmengine-0.9.0 opencv-python-4.8.1.78 platformdirs-3.11.0 yapf-0.40.2

Install MMCV:

(mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install 'mmcv>=2.0.0rc4'
Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html
Collecting mmcv>=2.0.0rc4
  Downloading https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl (98.6 MB)
Successfully installed mmcv-2.1.0

Install MMDetection:

(mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install 'mmdet>=3.0.0'
Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html
Collecting mmdet>=3.0.0
  Downloading mmdet-3.2.0-py3-none-any.whl (2.1 MB)
Successfully installed mmdet-3.2.0 terminaltables-3.1.10

Clone and Install MMDetection3D:

(mycondapy310) [010796032@coe-hpc2 3DObject]$ git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x

Handling Installation Issues¶

If you encounter Open3D installation issues:

ERROR: No matching distribution found for open3d
(mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ nano requirements/runtime.txt #comment out open3d
(mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ pip install -v -e .
Successfully installed black-23.10.0 flake8-6.1.0 iniconfig-2.0.0 lyft_dataset_sdk-0.0.8 matplotlib-3.5.3 mccabe-0.7.0 mmdet3d-1.2.0 mypy-extensions-1.0.0 pathspec-0.11.2 plotly-5.17.0 pluggy-1.3.0 plyfile-1.0.1 pycodestyle-2.11.1 pyflakes-3.1.0 pytest-7.4.2 tenacity-8.2.3 trimesh-4.0.0

Model Download and Testing¶

Download Pretrained Model¶

(mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ mim download mmdet3d --config pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car --dest .
processing pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car...
downloading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.4/18.4 MiB 117.4 MB/s eta 0:00:00
Successfully downloaded hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth to /lts/home/010796032/3DObject/mmdetection3d
Successfully dumped pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py to /lts/home/010796032/3DObject/mmdetection3d

Run Point Cloud Demo¶

(mycondapy310) [010796032@cs001 mmdetection3d]$ python demo/pcd_demo.py demo/data/kitti/000008.bin pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth --no-save-vis

Sample Detection Results¶

(mycondapy310) [010796032@cs001 mmdetection3d]$ cat outputs/preds/000008.json 
{
  "labels_3d": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
  "scores_3d": [0.9750590920448303, 0.9682098627090454, 0.9457541108131409, 0.8904030919075012, 0.8890073299407959, 0.7703604698181152, 0.7550405859947205, 0.7058141827583313, 0.5811426639556885, 0.44102343916893005], 
  "bboxes_3d": [
    [14.75867748260498, -1.0537946224212646, -1.5589320659637451, 3.7562406063079834, 1.6059986352920532, 1.558688998222351, -0.31321752071380615],
    [6.438138961791992, -3.8679745197296143, -1.7354645729064941, 3.147707223892212, 1.4599915742874146, 1.4284530878067017, -0.2998310327529907],
    [8.112329483032227, 1.216971516609192, -1.6341216564178467, 3.6662495136260986, 1.573140025138855, 1.5916767120361328, 2.8161733150482178],
    [20.169925689697266, -8.43094253540039, -1.6689856052398682, 2.381495237350464, 1.51751708984375, 1.5693042278289795, -0.3255223035812378],
    [33.455665588378906, -7.035743236541748, -1.3376567363739014, 4.213741302490234, 1.744563102722168, 1.6697136163711548, 2.828497886657715],
    [55.621891021728516, -20.328449249267578, -1.3771171569824219, 4.370689392089844, 1.7358696460723877, 1.7066415548324585, 2.8504137992858887],
    [3.637699842453003, 2.7381889820098877, -1.6892050504684448, 3.7212045192718506, 1.5820955038070679, 1.51765775680542, -0.2304447889328003],
    [25.04075050354004, -10.156379699707031, -1.6326467990875244, 3.739389181137085, 1.6084976196289062, 1.4840202331542969, -0.32967936992645264],
    [28.72532081604004, -1.552423357963562, -1.202379822731018, 3.69446063041687, 1.5429767370224, 1.5610381364822388, 1.2416549921035767],
    [40.87098693847656, -9.748966217041016, -1.3669469356536865, 3.8333828449249268, 1.6528679132461548, 1.5699278116226196, -0.28837358951568604]
  ], 
  "box_type_3d": "LiDAR"
}

This tutorial provides a comprehensive overview of BEV detection methods, from basic LSS implementation to advanced BEVFusion architecture, along with practical MMDetection3D integration for production-ready 3D object detection systems.