Preliminaries

During this week, we focused on the challenge of integrating ONNX model support within the RoboticsAcademy Docker Image (RADI). We explored ways to reliably load and run ONNX models using PyTorch, targeting GPU acceleration. This required analyzing the compatibility of available tools, identifying runtime issues, and designing a solution that is both stable and lightweight.

Additionally, this includes the evaluation of the performance of Deep Learning models in the Follow-Line Exercise.

Objectives

  • ONNX Model Testing and Debugging Using onnx2pytorch (last week solution).
  • Multiple solutions were evaluated to ensure stable and robust GPU acceleration within RADI.
  • Install PyTorch with CUDA support inside the container and verify its functionality.
  • Testing Line-Following Performance with four Deep Learning models.

Execution

🏷️ ONNX Model Testing and Debugging Using onnx2pytorch

In Week 08 , I enabled GPU acceleration in the RoboticsAcademy Docker Image (RADI) using the onnx2pytorch package. Our ultimate goal is to provide stable and robust GPU acceleration within RADI, which will support deep learning-based exercises such as Human Detection , Digit Classification , and end-to-end visual control of an autonomous vehicle.

To move toward this objective, I loaded existing Open Neural Network Exchange (ONNX) models using the onnx2pytorch package, which we began integrating last week. In general, the models loaded successfully, though some emitted non-critical warnings that appear to be ignorable.

However, we also encountered a few compatibility issues during testing. These issues may impact the reliability and portability of the solution and need to be addressed before full integration. Below are some of the problems we identified:

Ignorable warnings:
week-9-img-1
Unsupported Ops:

While testing the ONNX model using the onnx2pytorch package, which converts ONNX models to PyTorch, I encountered the following error:

Conversion not implemented for op_type=If
week-9-img-1

The op_type=If issue reflects a common tension between static vs. dynamic computation graphs. ONNX sits somewhere in the middle, and while ONNX Runtime handles both, conversion to PyTorch (a mostly static graph approach with nn.Module ) can't easily translate conditional paths.

Summary: Issue with onnx2pytorch Package

We initially attempted to convert ONNX models to PyTorch using the onnx2pytorch library. While it worked with basic models, it failed to load our custom ONNX model, likely due to unsupported operators or internal conversion bugs.

  • Observed exceptions and incorrect graph parsing.
  • Realized onnx2pytorch lacks full support for complex or dynamic ONNX graphs.
  • This prompted us to seek a more stable and production-ready alternative.

🏷️ Moving to a stable and robust solution

After encountering stability and robustness issues with the onnx2pytorch solution, we sought a more stable and reliable alternative.

week-7-img-1 Solution 01
week-7-img-2 Solution 02
✅ Solution 01: onnxruntime-gpu + CUDA-Toolkit/cuDNN Stack

This approach involved:

  1. Installing onnxruntime-gpu in the RADI container.
  2. Manually installing CUDA Toolkit and cuDNN inside the Docker image to enable GPU support.

Pros: Direct ONNX model inference on GPU with onnxruntime

Cons:

  • Large image size due to full CUDA/cuDNN installation.
  • Dependency management becomes more complex.
  • Limited flexibility in chaining with PyTorch-based workflows.
  • Dependency conflicts arose, affecting compatibility with other components.

After testing, this solution proved too heavy and complex to maintain within the lightweight scope of RADI.

✅ Solution 02: onnxruntime-gpu + PyTorch with CUDA Support (Preferred)

This solution uses:

  • onnxruntime-gpu for ONNX inference.
  • PyTorch with CUDA support installed in RADI to allow flexible DL model operations.

Why we chose this:

  • Leverages PyTorch for any extra processing or hybrid pipelines.
  • Smaller image size than full CUDA install.
  • Easier to maintain and debug.
  • Verified compatibility with existing DL exercises and Docker infrastructure.
🏷️ Using ONNX Runtime GPU with PyTorch and CUDA Support for Accelerated Inference (Solution 02: Preferred Setup)

The onnxruntime-gpu package is designed to work seamlessly with PyTorch, provided both are built against the same major version of CUDA and cuDNN. When installing PyTorch with CUDA support, the necessary CUDA and cuDNN DLLs are included, eliminating the need for separate installations of the CUDA toolkit or cuDNN.

To ensure ONNX Runtime utilizes the DLLs installed by PyTorch, you can preload these libraries before creating an inference session. This can be achieved by either importing PyTorch or by using the onnxruntime.preload_dlls() function.

Example 1: Importing PyTorch

week-9-img-5

Example 2: Using preload_dlls Function

week-9-img-6

🏷️ Validation of ONNXRuntime-GPU and PyTorch with CUDA Support

To verify that the ONNX Runtime is using the correct CUDA and cuDNN libraries, you can check the environment variables or use the ONNX Runtime API to print the provider information. Some ways to verify:

1. Verify PyTorch Has CUDA Support:

week-9-img-6

2. Verify ONNX Runtime Uses GPU

week-9-img-6

3. Monitoring GPU Usages

watch nvidia-smi
week-9-img-6

4. Compare Speed (CPU vs GPU)

After verifying the versions and confirming CUDA availability, I processed 17,862 follow-line exercise images using both CPU and GPU modes.I copied all the images and the Python script inside the RADI container. The performance difference is shown below:

week-9-img-6

The GPU mode is significantly faster, processing all the 17,862 images in 46.90 seconds compared to 112.96 seconds on CPU.

week-9-img-11

This confirms that the ONNX Runtime is correctly utilizing the GPU for inference, providing a substantial performance boost over CPU execution.

This solution is now the standard setup for future RoboticsAcademy DL exercises that use ONNX models.

Summary:

I chose Solution 02: onnxruntime-gpu + PyTorch with CUDA Support as our final path forward due to its balance of performance, flexibility, and maintainability. This will become the standard setup for future RoboticsAcademy DL exercises that use ONNX models.

🏷️ Evaluation of Deep Learning Model Performance in Follow-Line Exercise

In Week 07 , the focus was on preparing and training deep learning models using circuit-based datasets for the follow-line exercise. I created four distinct datasets by splitting data from four different circuits, using three circuits for training and one for testing in each combination. To understand the steering behavior within each dataset, I visualized the angular velocity (v) distributions, which highlighted the diversity and imbalance in steering commands. To address this, I applied upsampling and downsampling techniques to balance the datasets, ensuring that each steering class contained 5,000 samples. Finally, I trained four models on each balanced dataset using the NVIDIA PilotNet architecture, a well-known framework for end-to-end autonomous driving tasks. I also provided training and testing loss visualizations, along with evaluation metrics, for all four models to assess their performance and generalization.

This week, I evaluated four deep learning models using the Follow-Line exercise in RoboticsAcademy, which features four unique circuits. Each model was tested on a circuit it had not previously seen during training, ensuring that the evaluation was performed on truly unseen data. For each test, one circuit was reserved exclusively as the test dataset while the model was trained on data from the other three circuits. This approach allowed for a rigorous assessment of the models' generalization capabilities across different environments.

Test 01:

Test Circuit: Simple Circuit

Train Dataset: Montreal, Montmelo, Nürburgring Circuit

Test: Simple Circuit

Result:

The Simple Circuit completed approximately 79% of the track before it deviated from the path. The circuit maintained proper line detection for 69 seconds.

Test 02:

Test Circuit: Montreal Circuit

Train Dataset: Simple, Montmelo, Nürburgring Circuit

Test: Montreal Circuit

Result:

The Montreal Circuit completed approximately 6% of the track before it deviated from the path. The circuit maintained proper line detection for 10 seconds.

Test 03:

Test Circuit: Montmelo Circuit

Train Dataset: Simple, Montreal , Nürburgring Circuit

Test: Montmelo Circuit

Result:

The Montmelo Circuit completed approximately 86% of the track before it deviated from the path. The circuit maintained proper line detection for 128 seconds.

Test 04:

Test Circuit: Nürburgring Circuit

Train Dataset: Simple, Montreal , Montmelo Circuit

Test: Nürburgring Circuit

Result:

The Nürburgring circuit successfully completed the entire track without any deviations. It maintained consistent line detection and stable movement throughout the circuit. The robot completed the track in approximately 111 seconds, handling both straight sections and curves with accuracy.

I ran each circuit multiple times, but the result was almost the same.

References

[1] RoboticsAcademy Docker Image (RADI)

[2] onnx2pytorch package

[3] Human Detection Exercise

[4] Digit Classification Exercise

[5] Open Neural Network Exchange (ONNX)

[6] CUDA Toolkit

[7] cuDNN

[8] PyTorch

[9] follow-line exercise

[10] NVIDIA PilotNet