Preliminaries

This week we focused on addressing the “halting” problem in the latest model we observed from last week. The problem occurs the most frequently when the vehicle is making a turn at intersections where they would suddenly stop in the middle of the road and get stuck in that state.

Objectives

  • Experiment with adding more meaningful data, adding more modalities etc to address the halting problem
  • Finish installing Behavior Metrics

Execution

Efficient data collection

The solution to most problems in the realm of machine learning lies in more data. In research settings, autonomous driving tasks often require dozens to hundreds of hours of driving data. Given computational limitations, we’re currently collecting under 100 episodes of training data in the CARLA simulator, amassing a few hours of driving at most. Therefore, we need to make our data collection process as efficient and effective as possible. This week, we explored the following ways to optimize data collection:

  • Data Trimming: A typical episode in the training set usually involves the agent waiting at the intersection for a long time. We removed segments where the vehicle stops and the throttle remains below 0.01 after more than 50 frames. This step helps eliminate redundant data and increases the effectiveness of our dataset. (Update: NOT a good idea, see week 12)
  • Data prioritization: Certain tasks, such as making turns at intersections and obstacle avoidance, are more complex and crucial. Therefore, we should prioritize these scenarios in our data collection.
  • Selective Data Retention: After the initial training, we can analyze the scenarios where the model underperforms, and focus on collecting and retaining more data that reflects these scenarios.

Finetuning on Junction-Only Dataset

In previous week, our observation was that the model often stops suddenly in the middle of making a turn at an intersection. By collecting additional episodes that specifically targeted junctions, we were able to create a “junction-only” dataset. This allowed us to finetune the model to hopefully better learn the nuances of navigating intersections from the additional data. The image below illustrates a few examples of the data collected.

The video below demonstrates an example of vehicle successfully making a left turn after finetuning.

However, even after finetuning. Some problems persist. The next video shows a failed example where the vehicle goes straight at an intersection despite the “Turn Left” command.

Overall, there is some improvement and the agent seems to get stuck less at intersections after finetuning. To truly evaluate the effectiveness of the finetuning, especially in terms of the success rate of making turns, we need to develop a mechanism for detecting whether the agent successfully makes a turn or not, which is lacking in our current evaluation process. (Update: turning detection implemented in week 9)

Data Aggregation (DAgger)

Another technique that we plan to try in the following week is called Data Aggregation(DAgger)[1]. DAgger is an iterative algorithm designed to improve the performance of machine learning models, particularly imitation learning. Traditional imitation learning can sometimes lead to a problem known as “distributional shift,” where a trained model, when put into practice, might encounter states that were not well-represented in the initial training data. This discrepancy can result in suboptimal or incorrect actions by the model.

DAgger addresses this issue by iteratively training the model with a combination of its own behavior and expert guidance. By repeatedly collecting the states that the policy experiences and querying an expert for the correct actions, DAgger continually refines the training dataset and the policy. This approach helps the model to generalize better to unseen states, aligning more closely with the expert’s performance.

In the context of our project, the steps to apply DAgger can be outlined as follows:

  1. Initialize dataset D with autopilot expert demonstrations
  2. Train policy on D
  3. Run trained policy in the simulator: only save the states that the policy sees, not the actions it takes.
  4. Ask the expert for actions: For each state in the newly generated trajectory, query the expert what action they would take
  5. Add the new state-action pairs to D
  6. Retrain policy on D
  7. Repeat steps 3-6 for a number of iterations

References

[1] Stephane Ross, Geoffrey J. Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Conference on Artificial Intelligence and Statistics (AISTATS), 2011.