# Bird's-Eye-View (BEV) Detection Tutorial This tutorial covers Bird's-Eye-View (BEV) detection methods, including LSS (Lift-Splat-Shoot) and BEVFusion architectures for 3D object detection. ## LSS (Lift-Splat-Shoot) ### LSS Bird's-Eye-View Conversion We have added a new folder (`mydetector3d/datasets/nuscenes/lss`) to test Bird's-Eye-View conversion based on the LSS model from [lift-splat-shoot](https://github.com/nv-tlabs/lift-splat-shoot/tree/master). #### Installation Requirements Install the required dependencies: ```bash pip install nuscenes-devkit tensorboardX efficientnet_pytorch==0.7.0 ``` #### Training LSS Model Perform LSS training on the nuScenes v1.0-mini dataset: ```python # File: mydetector3d/datasets/nuscenes/lss/lssmain.py train('mini', dataroot='/data/cmpe249-fa22/nuScenes/nuScenesv1.0-mini/', nepochs=100, gpuid=0, logdir='./output/lss') ``` #### Model Evaluation The pretrained model is saved at `/data/cmpe249-fa22/Mymodels/lss_model525000.pt`. Use the **eval_model_iou** function in `mydetector3d/datasets/nuscenes/lss/lssexplore.py` for inference: ```bash {'loss': 0.09620507466204373, 'iou': 0.35671476137624863} ``` #### Map Configuration Issue When running **viz_model_preds**, you may encounter a missing map file error: ``` No such file or directory: '/data/cmpe249-fa22/nuScenes/nuScenesv1.0-mini/maps/maps/expansion/singapore-hollandvillage.json' ``` To fix this issue, extract and copy the map expansion files: ```bash (mycondapy39) [010796032@cs001 nuScenes]$ unzip nuScenes-map-expansion-v1.3.zip Archive: nuScenes-map-expansion-v1.3.zip creating: basemap/ inflating: basemap/boston-seaport.png inflating: basemap/singapore-hollandvillage.png inflating: basemap/singapore-queenstown.png inflating: basemap/singapore-onenorth.png creating: expansion/ inflating: expansion/boston-seaport.json inflating: expansion/singapore-onenorth.json inflating: expansion/singapore-queenstown.json inflating: expansion/singapore-hollandvillage.json creating: prediction/ inflating: prediction/prediction_scenes.json (mycondapy39) [010796032@cs001 nuScenes]$ cp -r expansion/ nuScenesv1.0-mini/maps/ ``` #### Visualization Results After fixing the map issue, the evaluation figures from **viz_model_preds** are saved as `eval000000_000.jpg` (format: `f'eval{batchi:06}_{si:03}.jpg'`) in the root folder. **Image dimensions:** [4, 6, 3, 128, 352] ![LSS Visualization 1](imgs/3D/eval000008_001.jpg) *LSS model prediction visualization - Sample 1* ![LSS Visualization 2](imgs/3D/eval000006_001.jpg) *LSS model prediction visualization - Sample 2* ![LSS Visualization 3](imgs/3D/eval000011_001.jpg) *LSS model prediction visualization - Sample 3* #### LiDAR Calibration Check The **lidar_check** function performs a visual verification to ensure extrinsics and intrinsics are parsed correctly: - **Left:** Input images with LiDAR scans projected using extrinsics and intrinsics - **Middle:** The projected LiDAR scan - **Right:** X-Y projection of the point cloud generated by the lift-splat model ![LiDAR Check 1](imgs/3D/lcheck000_00023_00.jpg) *LiDAR calibration verification - Sample 1* ![LiDAR Check 2](imgs/3D/lcheck000_00027_00.jpg) *LiDAR calibration verification - Sample 2* #### Training Results After completing training on the nuScenes v1.0-mini dataset using `mydetector3d/datasets/nuscenes/lss/lssmain.py`, the models are saved in the output folder as `model1000.pt` and `model8000.pt`. Using `model8000.pt` for inference yields: ```bash {'loss': 0.23870943376311549, 'iou': 0.11804760577248166} ``` ## BEVFusion BEVFusion code has been integrated into the mydetector3d framework for multi-modal 3D object detection. ### BEVFusion Training #### Training Configuration **Training Parameters (Updated: 10/21):** - Config file: `mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml` - Batch size: 4 - Epochs: 128 - Extra tag: 0522 - Checkpoint: `/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/latest_model.pth` - Output folder: `/data/cmpe249-fa22/Mymodels/` #### Available Models ```bash (mycondapy310) [010796032@cs001 3DDepth]$ ls /data/cmpe249-fa22/Mymodels/nuscenes_models/ bevfusion cbgs_pp_multihead /data/cmpe249-fa22/Mymodels/nuscenes_models/cbgs_pp_multihead/0522/ckpt/checkpoint_epoch_128.pth /data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/checkpoint_epoch_56.pth latest_model.pth ``` #### Training Command ```bash (mycondapy310) [010796032@cs001 3DDepth]$ python ./mydetector3d/tools/mytrain.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --batch_size=4 --epochs=128 --extra_tag='0522' --ckpt='/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/latest_model.pth' --outputfolder='/data/cmpe249-fa22/Mymodels/' 023-10-21 17:09:07,965 INFO Train: 59/128 ( 46%) [4534/30895 ( 15%)] Loss: 0.4369 (0.437) LR: 5.738e-05 Time cost: 00:47/346:12:13 [00:47/28342:55:05] Acc_iter 1796445 Data time: 10.99(10.99) Forward time: 36.29(36.29) Batch time: 47.28(47.28) ``` ### BEVFusion Evaluation #### Evaluation Results - Custom Trained Model ```bash (mycondapy310) [010796032@cs002 3DDepth]$ python mydetector3d/tools/myevaluatev2_nuscenes.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --ckpt='/data/cmpe249-fa22/Mymodels/nuscenes_models/bevfusion/0522/ckpt/checkpoint_epoch_56.pth' --tag='1021' --outputpath='/data/cmpe249-fa22/Mymodels/' ``` **Dataset Statistics:** - Ground truth annotations: 6,019 samples - Original predictions: 1,203,800 boxes - After distance filtering: 807,685 boxes - After LiDAR/RADAR filtering: 807,685 boxes - After bike rack filtering: 807,498 boxes **Overall Performance Metrics:** - **mAP:** 0.6215 - **mATE:** 0.2811 (Average Translation Error) - **mASE:** 0.2565 (Average Scale Error) - **mAOE:** 0.3630 (Average Orientation Error) - **mAVE:** 0.2630 (Average Velocity Error) - **mAAE:** 0.1964 (Average Attribute Error) - **NDS:** 0.6747 (nuScenes Detection Score) - **Evaluation time:** 123.9s **Per-Class Performance:** | Object Class | AP | ATE | ASE | AOE | AVE | AAE | |--------------|----|----|----|----|----|----| | car | 0.867 | 0.182 | 0.155 | 0.064 | 0.242 | 0.187 | | truck | 0.517 | 0.356 | 0.210 | 0.077 | 0.273 | 0.215 | | bus | 0.704 | 0.339 | 0.185 | 0.076 | 0.505 | 0.267 | | trailer | 0.427 | 0.482 | 0.213 | 0.775 | 0.208 | 0.181 | | construction_vehicle | 0.257 | 0.630 | 0.439 | 0.877 | 0.146 | 0.350 | | pedestrian | 0.856 | 0.128 | 0.286 | 0.351 | 0.209 | 0.089 | | motorcycle | 0.678 | 0.206 | 0.235 | 0.382 | 0.333 | 0.268 | | bicycle | 0.493 | 0.172 | 0.261 | 0.613 | 0.187 | 0.013 | | traffic_cone | 0.755 | 0.122 | 0.316 | nan | nan | nan | | barrier | 0.660 | 0.195 | 0.265 | 0.051 | nan | nan | #### Evaluation Results - Pretrained Model ```bash (mycondapy310) [010796032@cs002 3DDepth]$ python mydetector3d/tools/myevaluatev2_nuscenes.py --cfg_file='mydetector3d/tools/cfgs/nuscenes_models/bevfusion.yaml' --ckpt='/data/cmpe249-fa23/modelzoo/cbgs_bevfusion.pth' --tag='1022' --outputpath='/data/cmpe249-fa22/Mymodels/' ``` **Model Loading Issues:** ``` ==> Loading parameters from checkpoint /data/cmpe249-fa23/modelzoo/cbgs_bevfusion.pth to cuda:0 Not updated weight backbone_3d.conv1.0.conv1.bias: torch.Size([16]) [... additional weight loading warnings ...] ==> Done (loaded 582/599) ``` **Performance Metrics (Pretrained Model):** - **mAP:** 0.2364 - **mATE:** 0.7516 - **mASE:** 0.6989 - **mAOE:** 0.6777 - **mAVE:** 0.6240 - **mAAE:** 0.4523 - **NDS:** 0.2977 - **Evaluation time:** 100.4s ### BEVFusion Architecture Overview The BEVFusion model forward process consists of the following major components: #### 1. MeanVFE (Voxel Feature Encoder) - **Input:** `voxel_features([600911, 10, 5])`, `voxel_num_points([600911])` - **Output:** `batch_dict['voxel_features'] = points_mean.contiguous()` `#[600911, 5]` #### 2. VoxelResBackBone8x (3D Backbone) - **Input:** `voxel_features([600911, 5])`, `voxel_coords([600911, 4])` - **Output:** - `batch_dict['encoded_spconv_tensor']`: `out([2, 180, 180])` - `batch_dict['encoded_spconv_tensor_stride']`: 8 - `batch_dict['multi_scale_3d_features']` #### 3. HeightCompression (BEV Mapping Module) - **Input:** `encoded_spconv_tensor` (Sparse `[2, 180, 180]`) - **Output:** - `batch_dict['spatial_features']`: `[6, 256, 180, 180]` - `batch_dict['spatial_features_stride']`: 8 #### 4. SwinTransformer (Image Backbone) - **Input:** `batch_dict['camera_imgs']` `#[6, 6, 3, 256, 704]` - **Output:** `batch_dict['image_features']` (3 items): - `[36, 192, 32, 88]` - `[36, 384, 16, 44]` - `[36, 768, 8, 22]` #### 5. GeneralizedLSSFPN (Feature Pyramid Network) - **Input:** `batch_dict['image_features']` - **Output:** `batch_dict['image_fpn']` (2 items): - `[36, 256, 32, 88]` - `[36, 256, 16, 44]` #### 6. DepthLSSTransform (View Transformation) Lifts images into 3D and splats onto BEV features (from [BEVFusion](https://github.com/mit-han-lab/bevfusion/)) - **Input:** - `batch_dict['image_fpn']`: `[6, 6, 256, 32, 88]` - `batch_dict['points']`: `[1456967, 6]` - **Output:** `batch_dict['spatial_features_img']`: `[6, 80, 180, 180]` - **Components:** dtransform, depthnet, downsample #### 7. ConvFuser (Multi-Modal Fusion) - **Input:** - `img_bev = batch_dict['spatial_features_img']`: `[6, 80, 180, 180]` - `lidar_bev = batch_dict['spatial_features']`: `[6, 256, 180, 180]` - **Process:** `cat_bev = torch.cat([img_bev, lidar_bev], dim=1)` - **Output:** `batch_dict['spatial_features'] = mm_bev`: `[6, 256, 180, 180]` #### 8. BaseBEVBackbone (2D Backbone) - **Input:** `spatial_features = data_dict['spatial_features']`: `[6, 256, 180, 180]` - **Output:** `data_dict['spatial_features_2d']`: `[6, 512, 180, 180]` #### 9. TransFusionHead (Detection Head) - **Loss Functions:** - `loss_cls`: SigmoidFocalClassificationLoss() - `loss_bbox`: L1Loss() - `loss_heatmap`: GaussianFocalLoss() - **Input:** `feats = batch_dict['spatial_features_2d']`: `[6, 512, 180, 180]` - **Predictions:** - `'center'`: `[6, 2, 200]` - `'height'`: `[6, 1, 200]` - `'dim'`: `[6, 3, 200]` - `'rot'`: `[6, 2, 200]` - `'vel'`: `[6, 2, 200]` - `'heatmap'`: `[6, 10, 200]` - `'query_heatmap_score'`: `[6, 10, 200]` - `'dense_heatmap'`: `[6, 10, 180, 180]` - **Loss Computation:** `loss, tb_dict = self.loss(gt_bboxes_3d [6, 51, 9], gt_labels_3d [6, 51], res)` ## MMDetection3D Integration ### Installation Guide Reference: [MMDetection3D Installation](https://mmdetection3d.readthedocs.io/en/latest/get_started.html#installation) #### Step-by-Step Installation 1. **Install OpenMMLab Package Manager:** ```bash (mycondapy310) [010796032@coe-hpc2 3DDepth]$ pip install -U openmim ``` 2. **Install MMEngine:** ```bash (mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install mmengine Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html .... Successfully installed addict-2.4.0 mmengine-0.9.0 opencv-python-4.8.1.78 platformdirs-3.11.0 yapf-0.40.2 ``` 3. **Install MMCV:** ```bash (mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install 'mmcv>=2.0.0rc4' Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html Collecting mmcv>=2.0.0rc4 Downloading https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl (98.6 MB) Successfully installed mmcv-2.1.0 ``` 4. **Install MMDetection:** ```bash (mycondapy310) [010796032@coe-hpc2 3DDepth]$ mim install 'mmdet>=3.0.0' Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html Collecting mmdet>=3.0.0 Downloading mmdet-3.2.0-py3-none-any.whl (2.1 MB) Successfully installed mmdet-3.2.0 terminaltables-3.1.10 ``` 5. **Clone and Install MMDetection3D:** ```bash (mycondapy310) [010796032@coe-hpc2 3DObject]$ git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x ``` #### Handling Installation Issues If you encounter Open3D installation issues: ```bash ERROR: No matching distribution found for open3d (mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ nano requirements/runtime.txt #comment out open3d (mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ pip install -v -e . Successfully installed black-23.10.0 flake8-6.1.0 iniconfig-2.0.0 lyft_dataset_sdk-0.0.8 matplotlib-3.5.3 mccabe-0.7.0 mmdet3d-1.2.0 mypy-extensions-1.0.0 pathspec-0.11.2 plotly-5.17.0 pluggy-1.3.0 plyfile-1.0.1 pycodestyle-2.11.1 pyflakes-3.1.0 pytest-7.4.2 tenacity-8.2.3 trimesh-4.0.0 ``` ### Model Download and Testing #### Download Pretrained Model ```bash (mycondapy310) [010796032@coe-hpc2 mmdetection3d]$ mim download mmdet3d --config pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car --dest . processing pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car... downloading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.4/18.4 MiB 117.4 MB/s eta 0:00:00 Successfully downloaded hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth to /lts/home/010796032/3DObject/mmdetection3d Successfully dumped pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py to /lts/home/010796032/3DObject/mmdetection3d ``` #### Run Point Cloud Demo ```bash (mycondapy310) [010796032@cs001 mmdetection3d]$ python demo/pcd_demo.py demo/data/kitti/000008.bin pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth --no-save-vis ``` #### Sample Detection Results ```bash (mycondapy310) [010796032@cs001 mmdetection3d]$ cat outputs/preds/000008.json { "labels_3d": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "scores_3d": [0.9750590920448303, 0.9682098627090454, 0.9457541108131409, 0.8904030919075012, 0.8890073299407959, 0.7703604698181152, 0.7550405859947205, 0.7058141827583313, 0.5811426639556885, 0.44102343916893005], "bboxes_3d": [ [14.75867748260498, -1.0537946224212646, -1.5589320659637451, 3.7562406063079834, 1.6059986352920532, 1.558688998222351, -0.31321752071380615], [6.438138961791992, -3.8679745197296143, -1.7354645729064941, 3.147707223892212, 1.4599915742874146, 1.4284530878067017, -0.2998310327529907], [8.112329483032227, 1.216971516609192, -1.6341216564178467, 3.6662495136260986, 1.573140025138855, 1.5916767120361328, 2.8161733150482178], [20.169925689697266, -8.43094253540039, -1.6689856052398682, 2.381495237350464, 1.51751708984375, 1.5693042278289795, -0.3255223035812378], [33.455665588378906, -7.035743236541748, -1.3376567363739014, 4.213741302490234, 1.744563102722168, 1.6697136163711548, 2.828497886657715], [55.621891021728516, -20.328449249267578, -1.3771171569824219, 4.370689392089844, 1.7358696460723877, 1.7066415548324585, 2.8504137992858887], [3.637699842453003, 2.7381889820098877, -1.6892050504684448, 3.7212045192718506, 1.5820955038070679, 1.51765775680542, -0.2304447889328003], [25.04075050354004, -10.156379699707031, -1.6326467990875244, 3.739389181137085, 1.6084976196289062, 1.4840202331542969, -0.32967936992645264], [28.72532081604004, -1.552423357963562, -1.202379822731018, 3.69446063041687, 1.5429767370224, 1.5610381364822388, 1.2416549921035767], [40.87098693847656, -9.748966217041016, -1.3669469356536865, 3.8333828449249268, 1.6528679132461548, 1.5699278116226196, -0.28837358951568604] ], "box_type_3d": "LiDAR" } ``` This tutorial provides a comprehensive overview of BEV detection methods, from basic LSS implementation to advanced BEVFusion architecture, along with practical MMDetection3D integration for production-ready 3D object detection systems.