Welcome to DeepDataMiningLearning documentation!¶
- Author:
Kaikai Liu, Associate Professor, SJSU
Email: kaikai.liu@sjsu.edu
Web: http://www.sjsu.edu/cmpe/faculty/tenure-line/kaikai-liu.php
Check out the Conda Environment Setup Tutorial section for Conda environment setup; Check out the Author: section for SJSU CoE HPC environment setup
Note
This project is under active development.
If you find the tutorials helpful and would like to cite them, you can use the following bibtex:
@misc{kliu2024ddml,
title = {{DeepDataMiningLearning Tutorials}},
author = {Kaikai Liu},
year = 2024,
howpublished = {\url{https://deepdatamininglearning.readthedocs.io/}}
}
Contents¶
There are three main ways of running the notebooks we recommend:
Local Machine: You can download our sample code from Github. Following the following sections to setup your local AI system (choose the Machine type, CPU or GPU version depending on your system).
Google Colab: If you prefer to run the code on a different platform than your own computer, or want to experiment with GPU support, we recommend using Google Colab. Each notebook on this documentation website has a badge with a link to open it on Google Colab. Remember to enable GPU support before running the notebook (
Runtime -> Change runtime type). If using Colab, changes will be lost after timeout or when closed your session. You need to manually save the data to your local computer or your Google Drive.SJSU CoE HPC: If you want to save your large dataset and train your own (larger) neural networks for a longer period (longer than Colab’s timeout), you can make use of our SJSU CoE HPC cluster. The setup of your HPC workspace is documented beblow in the HPC section.
Deep Learning Tutorial
- Deep Learning: From Perceptrons to Modern Architectures
- Table of Contents
- Introduction
- History of Neural Networks
- The Deep Learning Revolution
- Convolutional Neural Networks
- Major CNN Architectures
- Optimization Techniques
- Regularization Methods
- Advanced Training Techniques
- Modern Architectures and Trends
- Semi-Supervised Learning
- Self-Supervised Learning
- Implementation Guide
- References and Resources
- Key Takeaways
- Transformers: The Architecture That Changed Everything
- Evolution of Sequence Models: From RNNs to Transformers
- Modern Transformer Modifications and Optimizations
- Table of Contents
- Introduction
- Architectural Innovations
- Attention Mechanism Optimizations
- Positional Encoding Innovations
- Training and Optimization Innovations
- Performance Analysis and Comparisons
- Implementation Guidelines and Best Practices
- Future Directions and Research Trends
- Comprehensive References and Resources
- Conclusion
- Technical Deep Dive: LLM Frameworks and Architectures
- LLMs and Their Architecture
- Architecture-Specific Innovations in Latest Models
- Key Research Papers and Implementation Resources
- Model Formats and Frameworks
- Model Formats and Naming Conventions
- Advanced LLM Techniques and Optimizations
- Performance Benchmarks and Comparisons
- Choosing the Right Backend
- Future Directions in LLM Deployment
- GPT Architecture Evolution: From GPT-2 to Modern LLMs
- Self-Supervised Learning: From Word Embeddings to Modern Vision-Language Models
AI System Optimization
- GPU Architecture and Acceleration for Deep Learning and LLMs
- Table of Contents
- Introduction
- Modern GPU Architecture
- NVIDIA CUDA Ecosystem
- GPU Acceleration for Deep Learning
- GPU Acceleration for Large Language Models
- Multi-GPU Training
- Edge GPU Solutions for Inference
- Performance Optimization Strategies
- Future Trends and Developments
- NVIDIA Blackwell GPU Architecture
- GPU Architecture Comparison: NVIDIA vs AMD vs ARM vs Apple
- MXFP8: Advanced 8-Bit Floating Point Format
- MXFP4: Next-Generation 4-Bit Floating Point Format
- Conclusion
- References and Further Reading
- Inference Optimization
AI System Setups
Autonomous Systems
- Autonomous Systems Survey: A Chronological Evolution (2014-2025)
- Autonomous System Technology
- Table of Contents
- Current Solutions in Autonomous Driving
- Tesla’s Latest Model: A Case Study
- Vision-based Object Detection Models
- 3D Object Detection Models
- Localization and Mapping
- Path Planning and Motion Planning
- Vehicle Control Systems
- Future Directions and Emerging Technologies
- Conclusion
- References
- Physical AI and Large Language Models in Autonomous Driving
- Table of Contents
- Vision-Language Models in Perception
- 3D Scene Reconstruction and Geometry Understanding
- Multimodal Sensor Fusion with Unified Embeddings
- End-to-End Transformers for Joint Perception-Planning
- Vision-Language-Action Models
- Current Challenges and Solutions
- Future Research Directions
- Conclusion
- Tesla Perception Stack & Its Research Lineage
- Executive Summary
- 1. Research Lineage → Tesla Modules
- 2. Inside Tesla’s Models
- 3. How the Pieces Fit the Planner
- 4. Practical Engineering Lessons
- 5. Open Research Gaps & Next Steps
- 6. Tesla’s End-to-End Evolution: From Autopilot v11 to v12+ and Beyond
- 7. Implementation Resources and Code References
- 8. Comprehensive Bibliography and References
- Appendix: Example Tensor/IO Specifications
- Physical AI and Large Language Models in Autonomous Driving
- Table of Contents
- Introduction: The Convergence of Physical AI and LLMs
- Why Physical AI and LLMs are Crucial for Autonomous Driving
- Current Solutions in Autonomous Driving
- Tesla’s Latest Model: A Case Study
- Vision-based Object Detection Models
- 3D Object Detection Models
- Localization and Mapping
- Vision-Language Models in Perception
- 3D Scene Reconstruction and Geometry Understanding
- Multimodal Sensor Fusion with Unified Embeddings
- End-to-End Transformers for Joint Perception-Planning
- Vision-Language-Action Models
- Current Challenges and Solutions
- Future Research Directions
- Conclusion
- KITTI Dataset Tutorial
- NuScenes Dataset Tutorial: Coordinate Transformations and Bounding Box Processing
- Table of Contents
- Introduction
- Dataset Structure
- Dataset Annotation Format
- Coordinate Systems
- Coordinate Transformations
- Detailed Transformation Processes by Visualization Type
- Transformation Summary
- 3D Bounding Box Processing
- 2D Projection Pipeline
- Code Implementation
- Common Issues and Solutions
- Best Practices
- 🧭 Comprehensive Tutorial: Understanding and Visualizing the Waymo Open Dataset v2
- Table of Contents
- 1️⃣ Dataset Overview
- 2️⃣ File Structure & Contents
- 3️⃣ Coordinate Frames
- 4️⃣ Camera Data Components
- 5️⃣ LiDAR Data Components
- 6️⃣ LiDAR-to-Vehicle Transform Mathematics
- 7️⃣ Additional Data Components
- 1️⃣1️⃣ 3D Bounding Box Specifications
- 1️⃣2️⃣ Multi-Sensor Data Fusion
- 1️⃣3️⃣ Visualization and Projection
- 1️⃣4️⃣ Common Issues and Solutions
- 1️⃣5️⃣ References and Resources
- ✨ Summary
- MyDetector3D Training and Evaluation
- Bird’s-Eye-View (BEV) Detection Tutorial
- nuScenes Dataset Tutorial