Results summary
Simulation
SimpleCircuit
All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2
GPU and batch size of 1 on SimpleCircuit. Only models which have completed one lap are presented here.
Method | Average speed | Position deviation MAE | Brain iteration frequency (RT) | Mean Inference time (s) | Real time factor | Framework |
---|---|---|---|---|---|---|
PilotNet (original) | 8.386 | 7.406 | 5.585 | 0.124 | 0.557 | TensorFlow |
Dynamic Range Q | 8.534 | 6.693 | 58.474 | 0.010 | 0.54 | TensorFlow Lite |
TF-TRT Baseline | 8.536 | 5.06 | 73.37 | 0.0063 | 0.47 | TensorFlow |
TF-TRT FP32 | 8.32 | 4.94 | 60.28 | 0.0065 | 0.50 | TensorFlow |
TF-TRT FP16 | 8.14 | 5.39 | 71.90 | 0.0056 | 0.48 | TensorFlow |
TF-TRT Int8 | 8.01 | 6.65 | 59.36 | 0.0067 | 0.51 | TensorFlow |
Global Prune | 7.73 | 17.03 | 42.90 | 0.0023 | 0.508 | PyTorch |
Local Prune | 7.74 | 14.15 | 32.57 | 0.0027 | 0.428 | PyTorch |
QAT | 7.94 | 11.47 | 26.45 | 0.0188 | 0.45 | PyTorch |
Prune + Quantization | 8.84 | 4.61 | 34.40 | 0.0125 | 0.51 | PyTorch |
Montreal
All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2
GPU and batch size of 1.
Method | Average speed | Position deviation MAE | Brain iteration frequency (RT) | Mean Inference time (s) |
---|---|---|---|---|
PilotNet (circuit not completed) | 9.431 | - | - | 0.1228 |
TF-TRT FP32 | 10.12 | 6.55 | 70.88 | 0.0062 |
TF-TRT FP16 | 10.05 | 6.2 | 99.30 | 0.0049 |
TF-TRT Int8 | 10.23 | 6.02 | 66.31 | 0.0065 |
Offline Evaluation
All evaluation on testset are conducted on a Nvidia GPU. All subsets of new datasets are used for experiment, testset - SimpleCircuit, Montreal and Montemelo circuits.
Optimization strategies
Method | Model size (MB) | MSE | Inference time (s) | Framework |
---|---|---|---|---|
PilotNet | 195 | 0.041 | 0.0364 | TensorFlow |
Baseline | 64.9173469543457 | 0.04108056542969754 | 0.007913553237915039 | TensorFlow Lite |
Dynamic Range Q | 16.242530822753906 | 0.04098070281274293 | 0.004902467966079712 | TensorFlow Lite |
Float16 Q | 32.464256286621094 | 0.041072421023905605 | 0.007940708875656129 | TensorFlow Lite |
Q aware training | 16.250564575195312 | 0.042138221871067326 | 0.009550530910491944 | TensorFlow Lite |
Weight pruning | 64.9173469543457 | 0.04257505210072217 | 0.0077278904914855956 | TensorFlow Lite |
Weight pruning + Q | 16.242530822753906 | 0.042606822364652304 | 0.004810283422470093 | TensorFlow Lite |
Integer only Q | 16.244918823242188 | 28157.721509850544 | 0.007908073902130127 | TensorFlow Lite |
Integer (float fallback) Q | 16.244888305664062 | 0.04507085706016211 | 0.00781548523902893 | TensorFlow Lite |
CQAT | 16.250564575195312 | 0.0393811650675438 | 0.007680371761322021 | TensorFlow Lite |
PQAT | 16.250564575195312 | 0.043669467093106665 | 0.007949142932891846 | TensorFlow Lite |
PCQAT | 16.250564575195312 | 0.039242053481006144 | 0.007946955680847167 | TensorFlow Lite |
Precision fp32 | 260 | 0.04103255125749467 | 0.0013057808876037597 | TensorFlow |
Precision fp16 | 260 | 0.04103255125749467 | 0.0021804444789886475 | TensorFlow |
Precision int8 | 260 | 0.04103255125749467 | 0.0011799652576446533 | TensorFlow |
Baseline | 6.118725776672363 | 0.07883577048224175 | 0.002177743434906006 | PyTorch |
Dynamic Range Q | 1.9464960098266602 | 0.07840978354769981 | 0.003166124105453491 | PyTorch |
Static Q | 1.6051549911499023 | 0.07881803711366263 | 0.0026564240455627442 | PyTorch |
Q aware training | 1.606736183166504 | 0.07080468241058822 | 0.0027930240631103514 | PyTorch |
Local Prune | 6.119879722595215 | 0.07294941230377715 | 0.0020925970077514647 | PyTorch |
Global Prune | 6.119948387145996 | 0.07079961896774226 | 0.00215102481842041 | PyTorch |
Prune + Quantization | 1.606797218322754 | 0.06724451272748411 | 0.002662529468536377 | PyTorch |
Summarizing the improvements
TensorFlow framework
- We achieved a ~12x reduction in the model memory size with
Dynamic range quantization
. - We maintain a similar
MSE
value (at best 0.001 better) as baseline in offline evaluation. - We achieved a ~33x better inference time with
TensorRT Int8
optimization and ~7.5x better inference time withDynamic range quantization
in offline evaluation. - We achieved ~0.66x times smaller
Position deviation MAE
and ~12x time higher Brain iteration frequency (RT) in simulation. - We achieved ~22x time improvement in
Mean inference time
in simulation.
PyTorch framework
- We achieved, at best, a ~4x memory reduction with
Static Quantization technique
. - The
MSE
is improved by 0.011 from baseline usingPrune + Quantization
. Other techniques also gives a slight better performance. - All the methods inference time in same scale ( 10 -3 ) of magnitude.
Local prune
strategy has the best inference speed (cpu). Quantization + Prune
gives the best performance overall. However, all other strategies are also very close.
Recommendations
- PyTorch optimized models are smaller in size and have better inference time. I would recommend
Global/Local Prune
strategy for start because they also support inference with GPU.Quantization + Prune
has leastMSE
but only support CPU inference. - Tflite optimized models gives better performance than original models with very less memory size. The installation is easy and there is no specific hardware constraints. I would recommend
Dynamic range quantization
as first optimization method. - TensorRT optimized models have best performance in both offline and simulation. However, they have large memory footprint. If the disk space is not a constraint, I would recommend using
Int8
orFloat16
precision model.
References
[1] https://github.com/JdeRobot/BehaviorMetrics
[2] https://github.com/JdeRobot/DeepLearningStudio
[3] https://developer.nvidia.com/tensorrt
[4] https://www.tensorflow.org/lite/performance/model_optimization
[5] https://www.tensorflow.org/model_optimization/guide/install
[6] https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
[7] https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow
[8] https://www.tensorflow.org/model_optimization/guide
[9] https://pytorch.org/docs/stable/jit.html
[10] https://pytorch.org/blog/quantization-in-practice/
[11] https://pytorch.org/docs/1.12/quantization.html
[12] https://pytorch.org/tutorials/intermediate/pruning_tutorial.html#