Coding Period: Week 9

2 minute read

Preliminaries

Last week, I had done good good research on the existing quantization strategies supported by PyTorch. In result of the research and development experiments, I have implemented three varient of quantization for PilotNet model. Moreover, I also need to change the implementation of PilotNet in PyTorch to support some quantization operations. I benchmarked the previous trained model and retrained it with new implementation. The retrained models has been optimized with the implemented quantization techniques and results are presented in this blog. I have also done benchmarking of TF-TRT optimized PilotNet (TensorFlow) on Montreal circuit after solving the issue with stat recording.

Objectives

  • Benchmark the current PilotNet trained model
  • Implement Static quantization
  • Implement Quantization aware training
  • Optimize the PilotNet model and do offline evaluation
  • TF-TRT Simulation on additional circuits

Additional

  • Reimplementation and retraining of original PilotNet model in PyTorch

Code repository:

Execution

PilotNet benchmarking and retraining

The avaible model has MSE = 35.15 for the test circuits (SimpleCircuit, Montreal and Montemelo). I have done a reimplementation and retraining of the model. After which the performance improvement to MSE=0.0719, which is close to it’s TensorFlow variant. The updated scripts of part of the new branch torch_optim and I will open an PR after sufficient evaluation of the optimized models.

Optimization and results

In total three strategies are implemented, namely, Dynamic range quantization, Static quantization and Quantization aware trainig. More theoretical details can be found in the last week blog.

Offline evaluation

All the simulations are conducted on a NVIDIA GeForce 1080 GPU with 8 GB memory. The batch size was 1 for inference and 1024 for evaluation. All subsets of new datasets are used for experiment, testset - SimpleCircuit, Montreal and Montemelo circuits.

Result table

Method Model size (MB) MSE Inference time (s)
Baseline 6.118725776672363 0.08102315874947678 0.0016005966663360596
Dynamic Range Q 1.9464960098266602 0.0807250236781935 0.0021979384422302246
Static Q 1.6051549911499023 0.08005780008222375 0.0020941667556762696
QAT 1.6069040298461914 0.06957647389277727 0.0020713112354278566

Observations

  • We store the original PilotNet model in Torchscript [6] format. So, there is no longer need to define PilotNet model architecture to use the saved model.
  • We received, at best, a 3x memory reduction with Static Quantization technique.
  • The MSE is improved by 0.011 using Quantization aware training. Other techniques also gives a slight better performance.
  • All the methods inference time in same scale ( 10 -3 ) of magnitude. Among the optimization techniques, quantization aware training strategy has the best value.
  • Quantization aware strategy gives the best performance overall. However, all other strategies are also very close.

Simulation with Montreal

All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2 GPU and batch size of 1.

Method Average speed Position deviation MAE Brain iteration frequency (RT) Mean Inference time (s)
PilotNet (circuit not completed) 9.431 - - 0.1228
TF-TRT FP32 10.12 6.55 70.88 0.0062
TF-TRT FP16 10.05 6.2 99.30 0.0049
TF-TRT Int8 10.23 6.02 66.31 0.0065

[1] https://github.com/JdeRobot/BehaviorMetrics
[2] https://github.com/JdeRobot/DeepLearningStudio
[3] https://developer.nvidia.com/tensorrt
[4] https://pytorch.org/blog/quantization-in-practice/
[5] https://pytorch.org/docs/1.12/quantization.html
[6] https://pytorch.org/docs/stable/jit.html