Results summary

3 minute read

Simulation

SimpleCircuit

All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2 GPU and batch size of 1 on SimpleCircuit. Only models which have completed one lap are presented here.

Method	Average speed	Position deviation MAE	Brain iteration frequency (RT)	Mean Inference time (s)	Real time factor	Framework
PilotNet (original)	8.386	7.406	5.585	0.124	0.557	TensorFlow
Dynamic Range Q	8.534	6.693	58.474	0.010	0.54	TensorFlow Lite
TF-TRT Baseline	8.536	5.06	73.37	0.0063	0.47	TensorFlow
TF-TRT FP32	8.32	4.94	60.28	0.0065	0.50	TensorFlow
TF-TRT FP16	8.14	5.39	71.90	0.0056	0.48	TensorFlow
TF-TRT Int8	8.01	6.65	59.36	0.0067	0.51	TensorFlow
Global Prune	7.73	17.03	42.90	0.0023	0.508	PyTorch
Local Prune	7.74	14.15	32.57	0.0027	0.428	PyTorch
QAT	7.94	11.47	26.45	0.0188	0.45	PyTorch
Prune + Quantization	8.84	4.61	34.40	0.0125	0.51	PyTorch

Montreal

All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2 GPU and batch size of 1.

Method	Average speed	Position deviation MAE	Brain iteration frequency (RT)	Mean Inference time (s)
PilotNet (circuit not completed)	9.431	-	-	0.1228
TF-TRT FP32	10.12	6.55	70.88	0.0062
TF-TRT FP16	10.05	6.2	99.30	0.0049
TF-TRT Int8	10.23	6.02	66.31	0.0065

Offline Evaluation

All evaluation on testset are conducted on a Nvidia GPU. All subsets of new datasets are used for experiment, testset - SimpleCircuit, Montreal and Montemelo circuits.

Optimization strategies

Method	Model size (MB)	MSE	Inference time (s)	Framework
PilotNet	195	0.041	0.0364	TensorFlow
Baseline	64.9173469543457	0.04108056542969754	0.007913553237915039	TensorFlow Lite
Dynamic Range Q	16.242530822753906	0.04098070281274293	0.004902467966079712	TensorFlow Lite
Float16 Q	32.464256286621094	0.041072421023905605	0.007940708875656129	TensorFlow Lite
Q aware training	16.250564575195312	0.042138221871067326	0.009550530910491944	TensorFlow Lite
Weight pruning	64.9173469543457	0.04257505210072217	0.0077278904914855956	TensorFlow Lite
Weight pruning + Q	16.242530822753906	0.042606822364652304	0.004810283422470093	TensorFlow Lite
Integer only Q	16.244918823242188	28157.721509850544	0.007908073902130127	TensorFlow Lite
Integer (float fallback) Q	16.244888305664062	0.04507085706016211	0.00781548523902893	TensorFlow Lite
CQAT	16.250564575195312	0.0393811650675438	0.007680371761322021	TensorFlow Lite
PQAT	16.250564575195312	0.043669467093106665	0.007949142932891846	TensorFlow Lite
PCQAT	16.250564575195312	0.039242053481006144	0.007946955680847167	TensorFlow Lite
Precision fp32	260	0.04103255125749467	0.0013057808876037597	TensorFlow
Precision fp16	260	0.04103255125749467	0.0021804444789886475	TensorFlow
Precision int8	260	0.04103255125749467	0.0011799652576446533	TensorFlow
Baseline	6.118725776672363	0.07883577048224175	0.002177743434906006	PyTorch
Dynamic Range Q	1.9464960098266602	0.07840978354769981	0.003166124105453491	PyTorch
Static Q	1.6051549911499023	0.07881803711366263	0.0026564240455627442	PyTorch
Q aware training	1.606736183166504	0.07080468241058822	0.0027930240631103514	PyTorch
Local Prune	6.119879722595215	0.07294941230377715	0.0020925970077514647	PyTorch
Global Prune	6.119948387145996	0.07079961896774226	0.00215102481842041	PyTorch
Prune + Quantization	1.606797218322754	0.06724451272748411	0.002662529468536377	PyTorch

Summarizing the improvements

TensorFlow framework

We achieved a ~12x reduction in the model memory size with Dynamic range quantization.
We maintain a similar MSE value (at best 0.001 better) as baseline in offline evaluation.
We achieved a ~33x better inference time with TensorRT Int8 optimization and ~7.5x better inference time with Dynamic range quantization in offline evaluation.
We achieved ~0.66x times smaller Position deviation MAE and ~12x time higher Brain iteration frequency (RT) in simulation.
We achieved ~22x time improvement in Mean inference time in simulation.

PyTorch framework

We achieved, at best, a ~4x memory reduction with Static Quantization technique.
The MSE is improved by 0.011 from baseline using Prune + Quantization. Other techniques also gives a slight better performance.
All the methods inference time in same scale ( 10 ^-3 ) of magnitude. Local prune strategy has the best inference speed (cpu).
Quantization + Prune gives the best performance overall. However, all other strategies are also very close.

Recommendations

PyTorch optimized models are smaller in size and have better inference time. I would recommend Global/Local Prune strategy for start because they also support inference with GPU. Quantization + Prune has least MSE but only support CPU inference.
Tflite optimized models gives better performance than original models with very less memory size. The installation is easy and there is no specific hardware constraints. I would recommend Dynamic range quantization as first optimization method.
TensorRT optimized models have best performance in both offline and simulation. However, they have large memory footprint. If the disk space is not a constraint, I would recommend using Int8 or Float16 precision model.

References

[1] https://github.com/JdeRobot/BehaviorMetrics
[2] https://github.com/JdeRobot/DeepLearningStudio
[3] https://developer.nvidia.com/tensorrt
[4] https://www.tensorflow.org/lite/performance/model_optimization
[5] https://www.tensorflow.org/model_optimization/guide/install
[6] https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
[7] https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow
[8] https://www.tensorflow.org/model_optimization/guide [9] https://pytorch.org/docs/stable/jit.html [10] https://pytorch.org/blog/quantization-in-practice/
[11] https://pytorch.org/docs/1.12/quantization.html
[12] https://pytorch.org/tutorials/intermediate/pruning_tutorial.html#

Twitter LinkedIn

Results summary

Simulation

SimpleCircuit

Montreal

Offline Evaluation

Optimization strategies

Summarizing the improvements

TensorFlow framework

PyTorch framework

Recommendations

References

You May Also Enjoy

Project Summary

Coding Period: Week 12

Coding Period: Week 10 & 11

Coding Period: Week 9