Coding week10&11 7/29-8/11

Authors

Affiliations

Zebin Huang

Edinburgh Centre for Robotics

Published

Aug. 21, 2024

Over the past two weeks, our focus has been on addressing several tasks for the ongoing development of our project. Here is a summary of the key action items, progress, and additional insights that have emerged during this period:

Mid-Term Evaluation:
- We have been meticulously preparing for the mid-term evaluation, including our progress, documentation, and demos. The feedback from this evaluation will guide the next steps of our development.
Fixing Video Issues in the Blog:
- To resolve this, the video will be uploaded to a trusted platform like YouTube or Google Drive, ensuring that it is easily accessible to all viewers.
Mid-Term Demo Video:
- A critical component of our mid-term deliverables is the demonstration video showcasing the current capabilities of our model within the simulation environment. This video will highlight how the model processes and classifies different driving scenarios.
Improving Dataset Evaluation:
- We are actively exploring ways to enhance the synthetic dataset generation process, focusing on creating more diverse datasets without directly embedding commands into the instructions. The goal is to ensure that the generated data is both rich in context and relevant to our classification tasks. We are using resources such as:
  - RAG Evaluation Cookbook
  - LLM Judge Evaluation Cookbook
- These resources provide some inspiration for improving the evaluation process.
Future Works:
- As we look ahead, we are conducting a literature review to identify potential research avenues that can further enhance our project. The LMDrive repository and similar projects offer valuable insights into how we can refine our approach and explore new research ideas. We are particularly interested in extending our work to incorporate more advanced LLM techniques.
- Additionally, to show the project in action, all scripts are integrated into a web app through the Streamlit platform.

Streamlit Development

The recent development efforts have been centred on implementing a Streamlit-based app for our project. This app can make the tool more accessible and user-friendly.

Design and Architecture

The Streamlit app has been designed with a modular architecture, allowing for easy scaling and adaptation. I have already encapsulated the scripts independently, which now allows for quick splitting and building of the app through different modular pages.
The main pages include:
- Data Generation: Users can generate datasets required for training models. This page provides the tools necessary to create diverse and robust datasets.
- Data Analysis: This section allows users to analyze the generated or uploaded data. Visualization tools are integrated to help users understand the data distribution and key metrics at a glance.
- Model Training: In this section, users can initiate model training sessions. The interface includes options for viewing logs and evaluating interim results. It also allows for customization of training parameters to optimize performance.
- Check Logs: Users can review detailed logs from the data generation, analysis, and model training processes. This helps in debugging and ensures transparency in the operations performed by the app.
- Model Testing: This page allows users to test the BERT model online with single instructions or files.

Data Import/Export Interface

For online deployment, we should also implement a data import/export mechanism. Unlike the local environment, the online version requires handling real-time data flow effectively. We implemented a streamlined interface that allows users to upload data files easily and download results in various formats.
Below is a snapshot of the code handling data import/export:

def generate_pdf_report(train_fig, eval_fig, train_log, eval_log, cls_report, folder_path):
    """Convert logs and figures into an HTML and generate a PDF report."""

    # Convert train figure to base64
    buffer1 = io.BytesIO()
    train_fig.savefig(buffer1, format='png')
    buffer1.seek(0)
    train_base64 = base64.b64encode(buffer1.read()).decode('utf-8')
    train_img_html = (
        f'<img src="data:image/png;base64,{train_base64}" '
        'style="width: 80%; max-width: 800px;"/>'
    )

    # Convert eval figure to base64
    buffer2 = io.BytesIO()
    eval_fig.savefig(buffer2, format='png')
    buffer2.seek(0)
    eval_base64 = base64.b64encode(buffer2.read()).decode('utf-8')
    eval_img_html = (
        f'<img src="data:image/png;base64,{eval_base64}" '
        'style="width: 80%; max-width: 800px;"/>'
    )

    # Generate markdown for logs
    train_log_md, eval_log_md, cls_report_md = generate_markdown(
        train_log, eval_log, cls_report
    )
    train_html, eval_html, _ = (
        markdown(train_log_md),
        markdown(eval_log_md),
        markdown(cls_report_md),
    )

    # Create HTML content
    html_content = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>Markdown</title>
        <style>
            body 
            img 
            .page-break 
        </style>
    </head>
    <body>
        {train_html}
        {train_img_html}
        {eval_html}
        {eval_img_html}

    </body>
    </html>
    """

    # Convert HTML to PDF
    pdf_path = os.path.join(folder_path, "logs_report.pdf")
    convert_html_to_pdf(html_content, pdf_path)
    return pdf_path

Challenges in Model Storage

A significant technical challenge was how to manage the storage and export of models trained online. Given the constraints of bandwidth and storage, exporting large models like BERT is not feasible within our current setup. Therefore, we opted for TinyBERT, a more compact model that has shown excellent performance in our tests, achieving nearly 100% accuracy in the scenarios we’ve evaluated.
Referring to this article here, Streamlit currently has resource limitations, which prevent us from freely training and exporting pre-trained models. This feature in our app is not yet fully supported. We are considering using Github’s Large File Storage, but this solution has not been fully tested yet.

LLM Evaluation

In addition to implementing the Streamlit app, we have been exploring various cookbooks from Hugging Face suggested by mentors to refine our approach. However, this exploration has also revealed several challenges:

Sample Size Insufficiency: The current dataset might not be large enough to fully train more complex models. We are considering various data augmentation strategies to increase the dataset’s size and diversity.
Evaluation of Human Input: The evaluation methods we currently use are heavily reliant on human judgment. To address this, we are exploring automated evaluation techniques that can provide consistent and scalable assessments.
Lack of Trials: The limited availability of real-world data and trials has been a bottleneck. To overcome this, some self-supervised methods should be explored.

Model Inference

We conducted a series of comparative experiments to determine whether the model inference stage significantly impacts decision-making within the simulation. Our findings indicate that the model’s decisions remain consistent, even under varied conditions.

Interestingly, we discovered that some collision issues initially thought to be caused by the model inference stage, were actually rooted in problems with the previous model version. To provide more insight, we have attached both error videos and examples of correct simulations.

Next Steps

I have prepared a mid-term summary video, which will be submitted for feedback. This video summarizes the current status of the project. Based on the feedback received, we will make the necessary adjustments to ensure that the project continues to meet its objectives and align with the broader goals of the GSoC initiative.

Moving forward, we will delve deeper into the literature, exploring new research directions that could lead to improvements in the model’s performance and its integration within the CARLA simulation environment.