Coding week1 5/27-6/02

Weekly Meeting

In this week’s meeting, we reviewed the project’s current progress. I updated the project blog and asked for feedback. I have successfully set up CARLA on Docker, with plans to transition to a physical machine soon, and I will post a Docker installation tutorial later. We discussed several technical issues, including dependencies and model loading errors, and discussed data collection script problems in relation to graphical mode and ROS Bridge compatibility.

Open issues on GitHub were reviewed, with a particular note on the need to make a PR and future plans for Docker installation. During the open floor discussion, the team discussed the potential for reproducibility and future enhancements similar to the LMdrive model. Concerns about GPU resources and API token support from Google were raised, with plans to inquire further.

More details can be found here: Google Doc

To-Do List

Following previous issues

Following previous issues: issue1 and issue2, the model’s issues were resolved by changing the number of outputs, and the environment dependency issues have also been resolved. However, there are still problems with the current data collection, but an issue has not yet been raised; it is still being checked. The current data collection scripts may encounter errors or pauses, which could require manual intervention or result in delays in data collection. There are several solutions currently available:

  1. Use a physical machine to set up checks for cameras, etc., in a graphical interface.
  2. LMDrive has provided some scripts for data collection that can be tested and explored.

Literature review

Modules

The architectural framework of world models is structured to facilitate complex decision-making processes that closely emulate human cognitive functions. These models are comprised of several distinct but interconnected modules, each serving a crucial role in the system’s overall performance and capability:

These modules form an integrated framework that enables world models to simulate human-like cognitive processes and decision-making. This module structure not only enhances the operational capabilities of such systems but also contributes to their ability to operate independently and efficiently in a variety of real-world applications.

Architectures

The architecture of world models is designed to predict future states of environments by balancing deterministic forecasts with the uncertainty of real-world dynamics. In high-dimensional sensory input scenarios, the challenge lies in efficiently representing observed information through latent dynamical models to make compact and accurate predictions. To manage these complexities, a variety of architectures have been proposed, including the RSSM and the JEPA, as well as Transformer-based architecture.

Please note that there is also a section of review content that will be part of a paper to be submitted and will be expected to make public next week.

GPU resource

Assessing the risks, it’s clear that we should rely on external sources like university clusters, especially when I need consistent access to high-performance GPUs such as the NVIDIA A100. But I’ve faced challenges with availability. While I can access 30 series and A4000 GPUs, I’m also exploring potential access through a university cluster. Unfortunately, GSOC has confirmed they cannot provide my GPU resources. I’m also considering GPU resources from the University of Edinburgh, though this might require queuing for access. Sergio mentioned that we have access to a powerful GPU cluster at his university, and they might give me access when needed.

Docker installation

We have updated the step-by-step tutorial for installing CARLA based on Docker. Unlike the official website, this version includes more detailed troubleshooting steps. More details can be find in this post.