Project Summary

7 minute read

Project Aim

The main aim of this project is to improve the current model stack of deep learning models, in terms of inference speed with minimum loss of precision, for autonomous driving applications. JdeRobot organization has created Behavior Metrics, a tool for comparing deep learning architectures for autonomous driving on different circuits with the support of Gazebo and Ros Noetic. The organization also provides another tool called DeepLearningStudio, which has datasets and model implementations for training deep learning models. I have used the available tools and techniques such as TensorRT, Quantization, Pruning, and variants to optimize the current model stack available in both PyTorch and TensorFlow framework.

Contributions

The first three weeks of the community bonding period involved working on solving issues and adding new features. This initial work helped to set the base for further development and experimentation in the coding period. The following issues and PRs have been created and merged during and before that period.

Issue Replicating DeepestLSTMTinyPilotNet model from Tensorflow to PyTorch framework #37 solved by Replicate DeepestLSTMTinyPilotNet model from Tensorflow to PyTorch framework #40
New feature - Add support files for DeepestLSTMTinyPilotNet pytorch model #335
Created issue Dependency conflict while installation #45 in DeepLearningStudio repo.
Solved issue #45 with PR updated package versions for python3.10 #46 in DeepLearningStudio repo.
Solved docker issues with PR Fixes failing build of Docker images (with GPU support) in the workflow #365 in BehaviorMetric repo.
I got errors while using show_pilots.py and created corresponding issue KeyError while using show_plots.py script #366.
Created another issue Errors using ‘scripts/analyse_brain.bash’ #367.
I encountered additional errors while using PilotNet (Pytorch) brain, for which I created additional issues - Error while trying to save stats with DL-torch.yml config #368 and Not utilizing GPU when running simulation #369.
Create issue Update PilotNet model to use new F1 dataset #48.
Fixed the issue Update PilotNet model to use new F1 dataset #48 by PR Use new dataset #49.
Submitted PR Removed: unnecessary exp details #370 to fix issue #368.
Solved issue #369 by PR Update: brain file to use gpu #371.
Updated PR updated package versions for python3.10 #46.
Updated PR Use new dataset #49.
Create a PR unnormalized prediction value #372 to unnormalize (expand) the predicted values of PilotNet brain in BehaviorNet.
Updated PilotNet training script with PR Adding validation set for model selection #50.

Please refer to following blog posts for more details:

Community Bonding: Week 1
Community Bonding: Week 2
Community Bonding: Week 3

Final simulation video

The following video provides an overview of the complete project and a comparison between the final models. All implementations, documentation and experiments are contributed to the two official repositories of the JdeRobot organization, i.e., BehaviorMetrics and DeepLearningStudio via Issues and Pull Requests (PRs).

TensorFlow Framework

I started my work with the TensorFlow framework. Till Phase 1 evaluation, the majority of goals for implementing and experimenting with optimized models were completed. The following video provides a overview of the performance.

The initial weeks included setting up a baseline of available models for future comparison. Also, I worked on installing TensorRT for optimization. The following issues and PRs were created and merged during that period:

Issue - Missing argument: learning_rate #55
Issue - Not utilizing GPU when running simulation #369 solved by PR Update: brain file to use gpu #371

Please refer to following blog posts for more details:

TF Lite Optimization

All the optimization strategies are supported by TensorFlow Lite. There are majorly two categories - Quantization and Pruning. Counting the variants and combinations of techniques, a total of 10 optimization strategies were implemented. Moreover, proper benchmarking was done in offline mode over the test set (calculating Mean square error, modal size, and inference time) and online model via live simulation on Gazebo.

The following issues and PRs are created and merged during that period:

New feature - Add support for baseline evaluation and model optimization #67
Issue - ImportError: cannot import name ‘ft2font’ from ‘matplotlib’ #383
Issue - AttributeError: ‘Brain’ object has no attribute ‘suddenness_distance’ #384
Issue - Skipping registering GPU devices #385
PR (new feature) - Update the scripts to support optimized tflite models #386.

Please refer to following blog posts for more details:

Important Links:

All the trained models are available here.
All the simulation videos are available here

TensorFlow-TensorRT (TF-TRT)

TensorRT is an SDK for high-performance deep learning inference. It utilized various techniques such as precision calibration, layer fusion, kernel tuning, etc. It also supports TensorFlow via the TensorFlow-TensorRT package. I have optimized the models in three different precision (Int 8, float 16, and float 32) and achieved excellent results.

The following issues and PRs are created and merged during that period:

Issue - Crash while recording stats with PilotNet (TF) model on Montreal circuit #392.
PR (new feature) - Add support for inference optimization with TensorRT for Tensorflow models #71
PR (new feature) - Add support for inference with TensorRT optimized (TF) models #395
Issue - Crash while recording stats with PilotNet (TF) model on Montreal circuit #392

Please refer to following blog posts for more details:

Important Links:

All the trained models are available here.
All the simulation videos are available here

PyTorch Framework

PyTorch also supports similar optimization strategies. There is the support of TorchScript for inference and just changing the framework (from TF to PyTorch) shows quite a performance difference. Most of the optimization strategies only support CPU inference, which could be a major downside. I still received comparable and mostly better results with PyTorch. There is a total of 6 strategies implemented.

The following issues and PRs are created during that period:

Issue - AttributeError: ‘Brain’ object has no attribute ‘suddenness_distance’ #397 solved by PR #399.
Issue: Issue using Pytorch quantized models #396 solved by Update PyTorch requirements #400
PR (new feature) - PyTorch optimized model inference #399
PR (new feature) - Implemented Pytorch optimization strategies #72

Please refer to following blog posts for more details:

Important Links:

All the trained models are available here
All the simulation videos are available here.

Results

I have performed an extensive range of experiments on each optimization strategy and compiled the combined results in a separate blog page - Results Summary.

Summarizing the improvements

The essential observations and improvements presented below that I have achieved through the project. For exact numerical figures, please see the result page (mentioned above). For code implementation, please refer to the individual PRs.

TensorFlow framework

We achieved a ~12x reduction in the model memory size with Dynamic range quantization.
We maintain a similar MSE value (at best 0.001 better) as the baseline in offline evaluation.
We achieved a ~33x better inference time with TensorRT Int8 optimization and ~7.5x better inference time with Dynamic range quantization in offline evaluation.
We achieved ~0.66x times less Position deviation MAE and ~12x time higher Brain iteration frequency (RT) in simulation.
We achieved ~22x time improvement in Mean inference time in simulation.

PyTorch framework

We achieved, at best, a ~4x memory reduction with the Static Quantization technique.
The MSE is improved by 0.011 from baseline using Prune + Quantization. Other techniques also give a slightly better performance.
All the methods inference time in same scale ( 10 ^-3 ) of magnitude. The Local prune strategy has the best inference speed (CPU).
Quantization + Prune gives the best performance overall. However, all other strategies are also very close.

Recommendations

PyTorch optimized models are smaller in size and have better inference time. I would recommend the Global/Local Prune strategy for a start because they also support inference with GPU. Quantization + Prune has the least MSE but only supports CPU inference.
Tflite optimized models give better performance than original models with very less memory sizes. The installation is easy and there are no specific hardware constraints. I would recommend Dynamic range quantization as the first optimization method.
TensorRT optimized models have the best performance in both offline and simulation. However, they have a large memory footprint. If the disk space is not a constraint, I would recommend using the Int8 or Float16 precision model.

Future work

TensorRT also supports the PyTorch framework with Torch-TensorRT compiler. It would be interesting to compare the same TensorRT optimizations between the two popular deep learning frameworks (TensorFlow and PyTorch). We already have everything set up for offline and online benchmarking, which could provide additional insights for our project. I have saved helpful resources for installation and development as follows:

Twitter LinkedIn

Project Summary

Project Aim

Contributions

Final simulation video

TensorFlow Framework

TF Lite Optimization

TensorFlow-TensorRT (TF-TRT)

PyTorch Framework

Results

Summarizing the improvements

TensorFlow framework

PyTorch framework

Recommendations

Future work

You May Also Enjoy

Results summary

Coding Period: Week 12

Coding Period: Week 10 & 11

Coding Period: Week 9