Project Summary
Project Aim
The main aim of this project is to improve the current model stack of deep learning models, in terms of inference speed with minimum loss of precision, for autonomous driving applications. JdeRobot organization has created Behavior Metrics, a tool for comparing deep learning architectures for autonomous driving on different circuits with the support of Gazebo and Ros Noetic. The organization also provides another tool called DeepLearningStudio, which has datasets and model implementations for training deep learning models. I have used the available tools and techniques such as TensorRT, Quantization, Pruning, and variants to optimize the current model stack available in both PyTorch and TensorFlow framework.
Contributions
The first three weeks of the community bonding period involved working on solving issues and adding new features. This initial work helped to set the base for further development and experimentation in the coding period. The following issues and PRs have been created and merged during and before that period.
- Issue Replicating DeepestLSTMTinyPilotNet model from Tensorflow to PyTorch framework #37 solved by Replicate DeepestLSTMTinyPilotNet model from Tensorflow to PyTorch framework #40
- New feature - Add support files for DeepestLSTMTinyPilotNet pytorch model #335
- Created issue Dependency conflict while installation #45 in DeepLearningStudio repo.
- Solved issue #45 with PR updated package versions for python3.10 #46 in DeepLearningStudio repo.
- Solved docker issues with PR Fixes failing build of Docker images (with GPU support) in the workflow #365 in BehaviorMetric repo.
- I got errors while using
show_pilots.py
and created corresponding issue KeyError while using show_plots.py script #366. - Created another issue Errors using ‘scripts/analyse_brain.bash’ #367.
- I encountered additional errors while using PilotNet (Pytorch) brain, for which I created additional issues - Error while trying to save stats with DL-torch.yml config #368 and Not utilizing GPU when running simulation #369.
- Create issue Update PilotNet model to use new F1 dataset #48.
- Fixed the issue Update PilotNet model to use new F1 dataset #48 by PR Use new dataset #49.
- Submitted PR Removed: unnecessary exp details #370 to fix issue #368.
- Solved issue #369 by PR Update: brain file to use gpu #371.
- Updated PR updated package versions for python3.10 #46.
- Updated PR Use new dataset #49.
- Create a PR unnormalized prediction value #372 to unnormalize (expand) the predicted values of PilotNet brain in BehaviorNet.
- Updated PilotNet training script with PR Adding validation set for model selection #50.
Please refer to following blog posts for more details:
Community Bonding: Week 1
Community Bonding: Week 2
Community Bonding: Week 3
Final simulation video
The following video provides an overview of the complete project and a comparison between the final models. All implementations, documentation and experiments are contributed to the two official repositories of the JdeRobot organization, i.e., BehaviorMetrics and DeepLearningStudio via Issues and Pull Requests (PRs).
TensorFlow Framework
I started my work with the TensorFlow framework. Till Phase 1 evaluation, the majority of goals for implementing and experimenting with optimized models were completed. The following video provides a overview of the performance.
The initial weeks included setting up a baseline of available models for future comparison. Also, I worked on installing TensorRT for optimization. The following issues and PRs were created and merged during that period:
- Issue - Missing argument: learning_rate #55
- Issue - Not utilizing GPU when running simulation #369 solved by PR Update: brain file to use gpu #371
Please refer to following blog posts for more details:
TF Lite Optimization
All the optimization strategies are supported by TensorFlow Lite. There are majorly two categories - Quantization and Pruning. Counting the variants and combinations of techniques, a total of 10 optimization strategies were implemented. Moreover, proper benchmarking was done in offline mode over the test set (calculating Mean square error, modal size, and inference time) and online model via live simulation on Gazebo.
The following issues and PRs are created and merged during that period:
- New feature - Add support for baseline evaluation and model optimization #67
- Issue - ImportError: cannot import name ‘ft2font’ from ‘matplotlib’ #383
- Issue - AttributeError: ‘Brain’ object has no attribute ‘suddenness_distance’ #384
- Issue - Skipping registering GPU devices #385
- PR (new feature) - Update the scripts to support optimized tflite models #386.
Please refer to following blog posts for more details:
Important Links:
TensorFlow-TensorRT (TF-TRT)
TensorRT is an SDK for high-performance deep learning inference. It utilized various techniques such as precision calibration, layer fusion, kernel tuning, etc. It also supports TensorFlow via the TensorFlow-TensorRT package. I have optimized the models in three different precision (Int 8, float 16, and float 32) and achieved excellent results.
The following issues and PRs are created and merged during that period:
- Issue - Crash while recording stats with PilotNet (TF) model on Montreal circuit #392.
- PR (new feature) - Add support for inference optimization with TensorRT for Tensorflow models #71
- PR (new feature) - Add support for inference with TensorRT optimized (TF) models #395
- Issue - Crash while recording stats with PilotNet (TF) model on Montreal circuit #392
Please refer to following blog posts for more details:
- Coding Period: Week 5
- Coding Period: Week 6
- Coding Period: Week 7
- Coding Period: Week 8
- Coding Period: Week 9
Important Links:
PyTorch Framework
PyTorch also supports similar optimization strategies. There is the support of TorchScript for inference and just changing the framework (from TF to PyTorch) shows quite a performance difference. Most of the optimization strategies only support CPU inference, which could be a major downside. I still received comparable and mostly better results with PyTorch. There is a total of 6 strategies implemented.
The following issues and PRs are created during that period:
- Issue - AttributeError: ‘Brain’ object has no attribute ‘suddenness_distance’ #397 solved by PR #399.
- Issue: Issue using Pytorch quantized models #396 solved by Update PyTorch requirements #400
- PR (new feature) - PyTorch optimized model inference #399
- PR (new feature) - Implemented Pytorch optimization strategies #72
Please refer to following blog posts for more details:
Important Links:
Results
I have performed an extensive range of experiments on each optimization strategy and compiled the combined results in a separate blog page - Results Summary.
Summarizing the improvements
The essential observations and improvements presented below that I have achieved through the project. For exact numerical figures, please see the result page (mentioned above). For code implementation, please refer to the individual PRs.
TensorFlow framework
- We achieved a ~12x reduction in the model memory size with
Dynamic range quantization
. - We maintain a similar
MSE
value (at best 0.001 better) as the baseline in offline evaluation. - We achieved a ~33x better inference time with
TensorRT Int8
optimization and ~7.5x better inference time withDynamic range quantization
in offline evaluation. - We achieved ~0.66x times less
Position deviation MAE
and ~12x time higher Brain iteration frequency (RT) in simulation. - We achieved ~22x time improvement in
Mean inference time
in simulation.
PyTorch framework
- We achieved, at best, a ~4x memory reduction with the
Static Quantization technique
. - The
MSE
is improved by 0.011 from baseline usingPrune + Quantization
. Other techniques also give a slightly better performance. - All the methods inference time in same scale ( 10 -3 ) of magnitude.
The
Local prune
strategy has the best inference speed (CPU). Quantization + Prune
gives the best performance overall. However, all other strategies are also very close.
Recommendations
- PyTorch optimized models are smaller in size and have better inference time. I would recommend the
Global/Local Prune
strategy for a start because they also support inference with GPU.Quantization + Prune
has the leastMSE
but only supports CPU inference. - Tflite optimized models give better performance than original models with very less memory sizes. The installation is easy and there are no specific hardware constraints. I would recommend
Dynamic range quantization
as the first optimization method. - TensorRT optimized models have the best performance in both offline and simulation. However, they have a large memory footprint. If the disk space is not a constraint, I would recommend using the
Int8
orFloat16
precision model.
Future work
TensorRT also supports the PyTorch framework with Torch-TensorRT compiler. It would be interesting to compare the same TensorRT optimizations between the two popular deep learning frameworks (TensorFlow and PyTorch). We already have everything set up for offline and online benchmarking, which could provide additional insights for our project. I have saved helpful resources for installation and development as follows: