Coding Period: Week 9
Preliminaries
Last week, I had done good good research on the existing quantization strategies supported by PyTorch. In result of the research and development experiments, I have implemented three varient of quantization for PilotNet model. Moreover, I also need to change the implementation of PilotNet in PyTorch to support some quantization operations. I benchmarked the previous trained model and retrained it with new implementation. The retrained models has been optimized with the implemented quantization techniques and results are presented in this blog. I have also done benchmarking of TF-TRT optimized PilotNet (TensorFlow) on Montreal circuit after solving the issue with stat recording.
Objectives
- Benchmark the current PilotNet trained model
- Implement Static quantization
- Implement Quantization aware training
- Optimize the PilotNet model and do offline evaluation
- TF-TRT Simulation on additional circuits
Additional
- Reimplementation and retraining of original PilotNet model in PyTorch
Related Issues and Pull requests.
Code repository:
Execution
PilotNet benchmarking and retraining
The avaible model has MSE = 35.15
for the test circuits (SimpleCircuit, Montreal and Montemelo).
I have done a reimplementation and retraining of the model. After which the performance improvement to
MSE=0.0719
, which is close to it’s TensorFlow variant. The updated scripts of part of the new branch
torch_optim
and I will open an PR after sufficient evaluation of the optimized models.
Optimization and results
In total three strategies are implemented, namely, Dynamic range quantization
,
Static quantization
and Quantization aware trainig
. More theoretical details can
be found in the last week blog.
Offline evaluation
All the simulations are conducted on a NVIDIA GeForce 1080 GPU with 8 GB memory. The batch size was 1 for inference and 1024 for evaluation. All subsets of new datasets are used for experiment, testset - SimpleCircuit, Montreal and Montemelo circuits.
Result table
Method | Model size (MB) | MSE | Inference time (s) |
---|---|---|---|
Baseline | 6.118725776672363 | 0.08102315874947678 | 0.0016005966663360596 |
Dynamic Range Q | 1.9464960098266602 | 0.0807250236781935 | 0.0021979384422302246 |
Static Q | 1.6051549911499023 | 0.08005780008222375 | 0.0020941667556762696 |
QAT | 1.6069040298461914 | 0.06957647389277727 | 0.0020713112354278566 |
Observations
- We store the original PilotNet model in Torchscript [6] format. So, there is no longer need to define PilotNet model architecture to use the saved model.
- We received, at best, a 3x memory reduction with Static Quantization technique.
- The MSE is improved by 0.011 using Quantization aware training. Other techniques also gives a slight better performance.
- All the methods inference time in same scale ( 10 -3 ) of magnitude. Among the optimization techniques, quantization aware training strategy has the best value.
- Quantization aware strategy gives the best performance overall. However, all other strategies are also very close.
Simulation with Montreal
All the simulations are conducted on a 4 GB NVIDIA GeForce GTX 1050/PCIe/SSE2
GPU and batch size of 1.
Method | Average speed | Position deviation MAE | Brain iteration frequency (RT) | Mean Inference time (s) |
---|---|---|---|---|
PilotNet (circuit not completed) | 9.431 | - | - | 0.1228 |
TF-TRT FP32 | 10.12 | 6.55 | 70.88 | 0.0062 |
TF-TRT FP16 | 10.05 | 6.2 | 99.30 | 0.0049 |
TF-TRT Int8 | 10.23 | 6.02 | 66.31 | 0.0065 |
Important links
- Results page - https://theroboticsclub.github.io/gsoc2022-Nikhil_Paliwal/gsoc/Results-Summary/
- Trained weights - https://drive.google.com/drive/folders/1qTQ8Fc7OqBElU8M2llO_P-8FaA7yhiP5?usp=sharing
References
[1] https://github.com/JdeRobot/BehaviorMetrics
[2] https://github.com/JdeRobot/DeepLearningStudio
[3] https://developer.nvidia.com/tensorrt
[4] https://pytorch.org/blog/quantization-in-practice/
[5] https://pytorch.org/docs/1.12/quantization.html
[6] https://pytorch.org/docs/stable/jit.html