Coding week10&11 7/29-8/11

Over the past two weeks, our focus has been on addressing several tasks for the ongoing development of our project. Here is a summary of the key action items, progress, and additional insights that have emerged during this period:

  1. Mid-Term Evaluation:
    • We have been meticulously preparing for the mid-term evaluation, including our progress, documentation, and demos. The feedback from this evaluation will guide the next steps of our development.
  2. Fixing Video Issues in the Blog:
    • To resolve this, the video will be uploaded to a trusted platform like YouTube or Google Drive, ensuring that it is easily accessible to all viewers.
  3. Mid-Term Demo Video:
    • A critical component of our mid-term deliverables is the demonstration video showcasing the current capabilities of our model within the simulation environment. This video will highlight how the model processes and classifies different driving scenarios.
  4. Improving Dataset Evaluation:
    • We are actively exploring ways to enhance the synthetic dataset generation process, focusing on creating more diverse datasets without directly embedding commands into the instructions. The goal is to ensure that the generated data is both rich in context and relevant to our classification tasks. We are using resources such as:
    • These resources provide some inspiration for improving the evaluation process.
  5. Future Works:
    • As we look ahead, we are conducting a literature review to identify potential research avenues that can further enhance our project. The LMDrive repository and similar projects offer valuable insights into how we can refine our approach and explore new research ideas. We are particularly interested in extending our work to incorporate more advanced LLM techniques.
    • Additionally, to show the project in action, all scripts are integrated into a web app through the Streamlit platform.

Streamlit Development

The recent development efforts have been centred on implementing a Streamlit-based app for our project. This app can make the tool more accessible and user-friendly.

Design and Architecture

Data Import/Export Interface

def generate_pdf_report(train_fig, eval_fig, train_log, eval_log, cls_report, folder_path):
    """Convert logs and figures into an HTML and generate a PDF report."""

    # Convert train figure to base64
    buffer1 = io.BytesIO()
    train_fig.savefig(buffer1, format='png')
    train_base64 = base64.b64encode('utf-8')
    train_img_html = (
        f'<img src="data:image/png;base64,{train_base64}" '
        'style="width: 80%; max-width: 800px;"/>'

    # Convert eval figure to base64
    buffer2 = io.BytesIO()
    eval_fig.savefig(buffer2, format='png')
    eval_base64 = base64.b64encode('utf-8')
    eval_img_html = (
        f'<img src="data:image/png;base64,{eval_base64}" '
        'style="width: 80%; max-width: 800px;"/>'

    # Generate markdown for logs
    train_log_md, eval_log_md, cls_report_md = generate_markdown(
        train_log, eval_log, cls_report
    train_html, eval_html, _ = (

    # Create HTML content
    html_content = f"""
    <!DOCTYPE html>


    # Convert HTML to PDF
    pdf_path = os.path.join(folder_path, "logs_report.pdf")
    convert_html_to_pdf(html_content, pdf_path)
    return pdf_path

Challenges in Model Storage

LLM Evaluation

In addition to implementing the Streamlit app, we have been exploring various cookbooks from Hugging Face suggested by mentors to refine our approach. However, this exploration has also revealed several challenges:

  1. Sample Size Insufficiency: The current dataset might not be large enough to fully train more complex models. We are considering various data augmentation strategies to increase the dataset’s size and diversity.
  2. Evaluation of Human Input: The evaluation methods we currently use are heavily reliant on human judgment. To address this, we are exploring automated evaluation techniques that can provide consistent and scalable assessments.
  3. Lack of Trials: The limited availability of real-world data and trials has been a bottleneck. To overcome this, some self-supervised methods should be explored.

Model Inference

We conducted a series of comparative experiments to determine whether the model inference stage significantly impacts decision-making within the simulation. Our findings indicate that the model’s decisions remain consistent, even under varied conditions.

Interestingly, we discovered that some collision issues initially thought to be caused by the model inference stage, were actually rooted in problems with the previous model version. To provide more insight, we have attached both error videos and examples of correct simulations.

Next Steps

I have prepared a mid-term summary video, which will be submitted for feedback. This video summarizes the current status of the project. Based on the feedback received, we will make the necessary adjustments to ensure that the project continues to meet its objectives and align with the broader goals of the GSoC initiative.

Moving forward, we will delve deeper into the literature, exploring new research directions that could lead to improvements in the model’s performance and its integration within the CARLA simulation environment.