Week 8: July 21 ~ July 27 | Md. Shariar Kabir

Preliminaries

In Week 8 of my GSoC journey, I focused on enabling GPU acceleration for deep learning tasks inside the RoboticsAcademy Docker Image (RADI) . Since training and inference using PyTorch can be computationally expensive, especially for real-time simulations, enabling NVIDIA CUDA and cuDNN support inside the container is essential for faster and more efficient model execution.

Additionally, I tested the newly released version of RADI (v5.8.0), identified some bugs related to dependencies and model execution, and contributed by creating pull requests to fix these issues.

Objectives

Enable GPU Acceleration (CUDA/cuDNN) in RADI
Testing newly released RADI
Create issue for bugs in github repo

Execution

Enabling GPU Acceleration with PyTorch

To accelerate deep learning workloads in RoboticsAcademy, I worked on enabling CUDA and cuDNN support inside the official RADI (RoboticsAcademy Docker Image). GPU-based tasks like training or running PyTorch models require explicit CUDA and cuDNN installation, which was not pre-configured in the existing RADI setup.

Instead, we installed the CUDA and cuDNN packages provided by NVIDIA. We leverage this feature through PyTorch.

To enable this:

I installed the PyTorch GPU version inside the Docker container using pip with the appropriate CUDA runtime version.
I verified CUDA and cuDNN compatibility inside the container
Ran validation tests such as torch.cuda.is_available().

In RoboticsAcademy deep learning exercises, such as the Human Detection and Digit Classification tasks, the input deep learning models are expected in the ONNX (Open Neural Network Exchange) format. However, PyTorch does not natively support executing ONNX models directly.

To address this, I installed the onnx and onnx2pytorch Python packages. These tools allow me to load ONNX models and convert them into a format compatible with PyTorch, enabling inference and integration within the exercise pipeline. This setup ensures that pre-trained models in ONNX format can be seamlessly used in RoboticsAcademy simulations.

The diagram provides a clear understanding of the execution flow of the operation.

We created another video demonstrating the Follow-Line exercise running with a deep learning model implemented in PyTorch, utilizing GPU acceleration through CUDA.

Test the latest RADI release and report issues.

I tested the latest release of RADI (v5.8.0) and found some issues related to the follow-line exercise. The exercise throws an error while changing the universe (circuit). I created issue #3169 in the GitHub repository to report the bug, which will help improve the overall functionality and stability of the Docker image.

References

[1] RoboticsAcademy Docker Image (RADI)

[2] PyTorch

[3] NVIDIA CUDA and cuDNN

[4] RADI (v5.8.0)