GSoC 2025: Project Recap
Robotics-Academy: Exercise on End-to-End Visual Control of an Autonomous Vehicle using Deep Learning.
Organization: JdeRobot
Contributor: Md. Shariar Kabir
Mentors: L. Roberto Morales, David Pascual-Hernández
Link to GSoC Project Page: Robotics-Academy: Exercise on End-to-End Visual Control of an Autonomous Vehicle Using Deep Learning
Summary
The past 14 weeks have been an incredible journey filled with learning, challenges, and achievements. In the initial phase, I explored the RoboticsAcademy and contributed to exercises such as Digit Classification and Human Detection, which helped me understand how deep learning models integrate into simulation-based robotics tasks.
The focus of my project was on creating a new End-to-End Visual Control exercise, where a deep learning model maps raw camera images directly to driving commands. To achieve this, I collected and processed large datasets from multiple Follow-Line circuits, applied preprocessing techniques such as image cropping, and balanced the data for effective training. I trained models using NVIDIA PilotNet and later experimented with ResNet neural network, testing their generalization across different circuits.
Alongside model development, I worked on enabling GPU acceleration in RADI with CUDA/cuDNN, redesigned the model upload system for asynchronous handling, and contributed bug fixes to ensure compatibility in the development environment. I also uploaded cleaned datasets, trained models, and cleaned Jupyter notebooks. Uploaded dataset on Hugging Face and my blog repository, ensuring reproducibility for the community.
Toward the latter part of the program, I completed the End-to-End Visual Control exercise with support for four circuits, comprehensive user documentation, GPU acceleration guides, user code examples, and even a video tutorial. The exercise is now ready for beta testing and further expansion in future versions.
I am deeply grateful to my mentors, L. Roberto Morales and David Pascual-Hernández, whose guidance, patience, and unwavering support made this journey not only possible but truly enriching. Their expertise, encouragement, and thoughtful feedback pushed me to grow, learn, and achieve more than I could have on my own.
I would also like to extend my sincere thanks to Jose María Cañas for his insightful contributions and valuable advice, which greatly enhanced the quality of my work. My gratitude goes to Google for organizing GSoC, providing an incredible platform to learn, explore, and contribute to open-source robotics. Finally, I am profoundly thankful to the entire JdeRobot community for welcoming me, supporting me, and offering an inspiring environment where I could develop my skills, collaborate, and make meaningful contributions.
Without each of you, this journey would not have been as rewarding or transformative, and I am truly thankful for the opportunity to work alongside such remarkable mentors and peers.
Showcasing My GSoC 2025 Project
Weekly Progress
Beyond GSoC: Roadmap & Plan
Beyond GSoC, I plan to continue contributing to JdeRobot by extending the Follow-Line exercise to support Ackermann steering, including dataset collection, training, and testing on Ackermann cars in various circuits. I also aim to develop new deep learning–based exercises such as drone navigation, follow-person, obstacle avoidance, follow-road, auto-parking, and Monte Carlo localization. Alongside building these exercises, I will experiment with different deep learning neural networks and focus on performance optimization to ensure that models can run effectively in real-time robotics applications.
Week - 14
In the final week, I completed the End-to-End Visual Control documentation, adding GPU acceleration, code examples, and a video tutorial. The documentation was shared with reviewers and will be refined based on feedback. I also experimented with training the end-to-end visual control follow-line model using ResNet as the base architecture for imitation learning. This week marked the wrap-up of my official GSoC contributions, but I have planned future work to continue with JdeRobot.
Week - 13
This week I focused on writing and finalizing the End-to-End Visual Control documentation. I completed the goal and data download & build model sections, ensuring clarity for users. I also prepared the exercise for beta testing. In addition, I created and uploaded my GSoC project video to the JdeRobot YouTube channel, summarizing my work and contributions.
Week - 12
I created the End-to-End Visual Control exercise, extending the Follow-Line exercise to allow a deep learning model to directly map vision to control commands. Initially, it supports four circuits, with the potential to expand. After beta testing of Digit Classification and Human Detection exercises, I updated their documentation via issues and PRs. I also fixed a GPU bug in dev_humble_nvidia.yml and started drafting the user documentation for End-to-End Visual Control.
Week - 11
In Week 11, I shifted focus from active data collection and model training to organizing and finalizing previous work. With multiple datasets, trained models, and experiments accumulated over earlier weeks, I cleaned, structured, and secured all resources. Hugging Face was used for sharing datasets and models publicly to ensure accessibility and reproducibility, while my GSoC blog repository served as a secure backup for code and trained models.
Week - 10
In week 10, I proposed making the model upload process asynchronous for a smoother user experience and to reduce exercise starting time. Since models previously struggled to complete circuits due to dataset limitations, I collected counterclockwise data for all four circuits, processed it like the earlier dataset, and merged it with clockwise data. With these enriched datasets, I trained individual models for each circuit and a master model capable of handling all circuits.
Week - 9
I explored loading ONNX format models using onnx2pytorch but encountered stability issues with the package. After discussions, we evaluated two alternatives: (1) using onnxruntime-gpu with explicit CUDA/cuDNN installs in RADI, and (2) using onnxruntime-gpu alongside PyTorch with CUDA support in RADI. Since the first solution was heavy and complex, we decided to adopt the second option, as it is lighter and more stable.
Week - 8
This week’s focus was on enabling CUDA and cuDNN support inside the RADI container for GPU acceleration. I successfully installed PyTorch with CUDA GPU support, tested GPU functionality in the container, and contributed fixes to ensure that deep learning exercises can run with hardware acceleration. I also tested the new RADI release, identified bugs, and submitted PRs to fix them.
Week - 7
With four unique Follow-Line circuits available, I designed a cross-evaluation approach: using any three circuits for training and the remaining one for testing, producing four datasets. I then balanced each dataset by up/down-sampling to 5k samples per angular velocity category. Finally, I trained and tested models using NVIDIA PilotNet, visualizing training and testing loss curves to measure performance.
Week - 6
This week I focused on data analysis. I categorized angular velocity into five types (sharp left, slight left, straight, slight right, and sharp right) and linear velocity into three categories (slow, medium, and fast). I also created visualizations and a heatmap of angular vs. linear velocity to better understand dataset distribution. This step was crucial to detect imbalances and prepare for dataset balancing.
Week - 5
In Week 5, I collected a large dataset of 84,969 images from the Follow-Line circuits. For each image, I stored the image name with file path, linear velocity (v), and angular velocity (w) in a CSV file. I cropped images from 640×480 to 640×242 to remove irrelevant sky portions, following techniques validated by a research paper shared by my mentors. Data was collected inside the Docker container and later transferred to my local machine for training.
Week - 4
In Week 4 of my GSoC journey, I focused on refining the Human Detection and Digit Classification exercises to enhance their usability, visual presentation, and overall user experience. This included updating documentation with example code snippets, refreshed instruction images, and step-by-step video tutorials. Based on reviewer feedback, I improved the UI by updating the hover effect of the upload button and ensured proper tracking of changes by creating GitHub issues and submitting pull requests to the gh-pages branch. Additionally, I implemented an abstract model path feature to allow dynamic ONNX model paths in the exercises. Alongside these updates, I followed a tutorial to strengthen my PyTorch fundamentals, preparing for future model improvements and development.
Week - 3
In Week 3, I worked on both frontend and backend. I relocated and redesigned the upload button for better usability, trained a simple digit classification model using PyTorch on the MNIST dataset. Created issues and submitted pull requests related to human detection and digit classification.
Week - 2
This week, I created issues for two new exercises: Digit Classification and Human Detection. I also submitted initial pull requests to start developing these exercises. In addition, I began exploring PyTorch and ONNX models, preparing myself for the upcoming deep learning exercise.
Week - 1
In the first week, I began by exploring the file and folder architecture of the RoboticsAcademy Docker image to understand how exercises and supporting components are organized. To gain hands-on experience, I created a demo exercise by cloning the basic Computer Vision exercise and then experimented with deep learning integration by manually uploading an ONNX model to the workspace/code directory inside RADI (RoboticsAcademy Docker Image). Using this approach, I was able to run the Human Detection and Digit Classification exercises with the manually uploaded ONNX model. Building on this, I developed a file upload widget that enables users to upload ONNX models directly, making the workflow more interactive and user-friendly. Finally, I extended this functionality so that ONNX models could be dynamically uploaded to the workspace/code directory inside RADI through the widget, laying the groundwork for a smoother model integration process.
Community Bonding
The week involved setting up the blog website, meeting with mentors, and laying the project’s groundwork. I began my GSoC journey with Robotics Academy by exploring the existing exercises, the RoboticsAcademy dependencies repo, and the development workflow.
During the community bonding period, I analyzed the digit classification and human detection exercises, ran them locally, and studied how models are integrated with simulation tasks. This exploration gave me practical insight into the Robotics Academy architecture and prepared me for contributing new deep learning–based exercises.
Issues
- #3106 Broken JdeRobot education link in Unibotics basic computer vision exercise (production mode).
- #3116 Human detection exercise
- #3118 Digit classification exercise
- #3137 [gh-pages] RA: Web documentation for the new "deep learning-based digit classification exercise"
- #3139 [gh-pages] RA: Web documentation for the new "deep learning-based human detection exercise"
- #3169 [Bug] Follow-line exercise: Error loading universe/world
- #3200 [gh-pages] Upgrade Human Detection Prototype to Running Mode
- #3201 [gh-pages] Upgrade Digit Classification Prototype to Running Mode
- #3205 [Bug] RA: Error Running dev_humble_nvidia.yaml
- #3217 [gh-pages] End-to-End Visual Control Exercise
Pull Requests
- #3113 update fix basic computer vision exercise link
- #3117 Human detection exercise
- #3119 Digit classification exercise
- #3138 [gh-pages] RA: Web documentation for the new "deep learning-based digit classification exercise"
- #3140 [gh-pages] RA: Web documentation for the new "deep learning-based human detection exercise"
- #3202 [gh-pages] Implement Running Mode for Human Detection
- #3203 [gh-pages] Implement Running Mode for Digit Classification
- #3206 [Fixed] Issue related to dev_humble_nvidia.yml, fix and improvements applied
- #3218 [gh-pages] End-to-End Visual Control Exercise
Relevant Links
- TheRoboticsClub/gsoc2025-Md_Shariar_Kabir/code
- TheRoboticsClub/gsoc2025-Md_Shariar_Kabir/follow_line_dl_models_codes
- Basic Computer Vision Exercise
- Deep learning-based Human Detection Exercise
- Deep learning-based Digit Classification Exercise
- RoboticsAcademy
- ONNX
- RADI (RoboticsAcademy Docker Image)
- MNIST dataset
- PyTorch
- Visual Follow Line
- [paper] Imitation Learning for vision based Autonomous Driving with Ackermann cars
- PID Controller
- NVIDIA PilotNet
- NVIDIA CUDA and cuDNN
- onnx2pytorch
- JdeRobot / Follow-Line-Combine-Dataset
- JdeRobot / Follow-Line-Simple-Circuit-Dataset
- JdeRobot/Follow-Line-Imitation-DL-Models-V1
- [gh-pages] End-to-End Visual Control Exercise
- ResNet
Enjoy Reading This Article?
Here are some more articles you might like to read next: