Week 4: June 19 ~ June 25 | Meiqi Zhao

Preliminaries

The main goal of this week is to expand the model such that the agent can make turns in any direction at intersections instead of always turning right. In order to achieve this, the model needs to take a high-level command (LaneFollow, Left, Right, Straight) in addition to sensory input to guide it to either follow the lane or make turns at junctions. Thus, a new data collection routine needs to be implemented and the model architecture needs to be updated to incorporate the high-level command.

Objectives

Improve model to support turns in any direction at intersections
Retrieve the status of traffic lights and signs through the CARLA API
Understand the currently available metrics in Behavior Metrics
Push existing code to Github
Create demo videos

Execution

High-Level Commands

The main challenge was two-fold: recording high-level commands during the data collection phase and then supplying these commands during inference time without the need to implement a route planner. Starting from CARLA version 0.9.13, we can access turning decisions via the traffic manager API, which eased the first problem. For the second issue, we opted for a hard-coded sequence of turning decisions at each junction during every testing episode, akin to following directions from a GPS navigator, such as “Left Right Left.”

During testing, at each frame, the high-level command defaults to LaneFollow unless the agent is at a junction. For every new junction the agent encounters, it reads the next turning decision from the sequence. The agent is gauranteed to arrive at the target location following the turning instructions.

Model Architecture Update

As described previously, the model takes the RGB camera image and semantic segmentation stacked as a 6-channel image and feeds it through a convolutional LSTM network. The output is flattened and concatenated with the speed measurement and then sent through three linear layers to produce the driving commands: throttle, steer, and brake. In order to incorporate the high-level commands, which are discre values, we employed an embedding layer. This layer produces embedding vectors of length 5 from the input high-level command, which are then concatenated with the flattened visual features and speed measurement before passed through the linear layers.

Traffic Light Problem

During data collection, the expert vehicle is set to ignore traffic lights and signs. Naturally, the trained agent will also acquire the same behavior as in training data. This causes a problem where the agent occasionally runs red lights, potentially colliding with other vehicles. The video below demonstrates one example where the agent almost collides with a truck as it attempts to make a turn against a red light. Interestingly, the agent was able to recover back to the lane after barely avoiding the collision. We also experimented with setting all traffic lights to green, but this solution proved ineffective as all vehicles tried to cross intersections simultaneously, essentially mirroring the original problem.

Distance to the Leading Vehicle

We trained our model using two slightly different datasets. For the first dataset, during data collection, the distance to the leading vehicle was set to 2.5 meters, meaning that the expert vehicle was instructed to maintain a distance of 2.5 meters from the vehicle in front of it. Following the training of our model on this dataset, we observed that although the agent did well at lane following, it frequently collided with the vehicle ahead. This prompted us to make an adjustment to the distance to the leading vehicle using the traffic manager, thus generating a second dataset. The evaluation results, as shown below, indicate that by making the expert agent more “cautious” during data collection, the trained agent will also try to maintain a larger distance from other vehicles, effectively reducing the risk of collision.

Evaluation Metrics

We explored two evalaution metrics: success rate and distance traveled before a collision occurs for failed cases. All testing is done in Town02, a map unseen to the model during training. We sampled 12 routes in Town02 for evaluation, each consisting of exactly two turns.

We compared the models trained on the two datasets as described above.

v6.1: expert maintains a distance of 2.5 meters to leading vehicle

success rate: 0.417
average distance traveled before collision (failed cases): 94.04m

v6.2: expert maintains a distance of 4 meters to leading vehicle

success rate: 0.083 |
average distance traveled before collision (failed cases): 88.22m

For the next steps, it would also be interesting to evaluate success rate weighted by distance.

Demo

The two videos below demonstrate a successful and a failed episode during testing time respectively.