Coding week9 7/22-7/28

Authors

Affiliations

Zebin Huang

Edinburgh Centre for Robotics

Published

July 21, 2024

This blog post serves as a comprehensive update on our progress, challenges, and future plans, providing a clear picture of where the project stands and where it is headed. We look forward to the next phase of development.

Introduction

This project aims to advance the intersection of Large Language Models (LLMs) and autonomous driving simulation. To better understand the architecture of this project, let’s take a closer look at the project’s overall framework. This framework illustrates the workflow, starting with the generation of synthetic data through LLMs, followed by training a BERT model with this data. Finally, the trained model is integrated into the CARLA simulator.

The project is divided into two distinct yet interconnected parts:

Data Generation and BERT Model Training: The initial phase involves leveraging LLMs to generate diverse scalable datasets. These datasets are then used to train a BERT series model for specific classification tasks related to autonomous driving scenarios.
CARLA Simulation: The second phase focuses on integrating the trained BERT model into the CARLA simulator. This integration is designed to enhance the simulation’s ability to classify and respond to various human instructions, thereby improving the overall decision-making process.

Video Demo

As part of the mid-term deliverable, a video demonstrating the project’s current capabilities has been prepared. This video showcases how the BERT model, trained with LLM-generated data, functions within the CARLA simulator.

Challenges

Throughout the development process, several significant challenges were encountered, particularly in the following areas:

Performance and Reliability of Data Generation: One of the core challenges was ensuring that the data generated by the LLMs was both relevant and diverse enough to effectively train the BERT model. The quality of this synthetic data directly impacts the model’s performance, making it crucial to address issues related to data generation reliability and efficiency.
Integration of BERT with CARLA: While the integration of the BERT model into the CARLA simulator marks a significant step forward, it was noted that the integration was not as seamless as desired. The classifier model, trained using LLM-generated data, currently operates somewhat independently of the broader autonomous driving system. As a result, the full potential of the LLM in enhancing the CARLA simulation has not yet been fully realized. For information on how to integrate models into CARLA, check out this PR.

Limitations and Future Works

Despite the progress made, the project has several limitations that need to be addressed in future iterations:

Limited Integration of LLM: The current integration between the LLM-generated data and the CARLA simulator is not fully optimized. The BERT model functions as a standalone classifier rather than being deeply embedded within the autonomous driving system. This limits the overall impact of the LLM on the simulation’s performance. This would involve developing methods to enable real-time scenario understanding and decision-making within the simulator, leveraging the strengths of both LLMs and autonomous driving technologies.
Quality Evaluation of Datasets: The project has primarily focused on the generation of synthetic data. However, the evaluation of this data’s quality remains an area that requires further exploration. Future work will need to develop self-supervised evaluation methods that can assess data quality without human intervention. These techniques would allow for automated, scalable assessments of synthetic data, improving the overall reliability and effectiveness of the training datasets.
Bridging the Sim2Real Gap: Another key area of research could involve bridging the gap between simulated data and real-world driving data. This would involve refining the synthetic data generation process to more closely mirror real-world conditions, thus making the models trained on this data more applicable to real-world autonomous driving scenarios.