Preliminaries

The main objective for this week is to develop more sophisticated evaluation metrics beyond the existing success rate and average distance traveled before a collision. Specifically, we intend to incorporate metrics from the CARLA Leaderboard into our evaluation process, which take into account route completion and traffic infractions when calculating a score for model performance.

Objectives

  • Finish editing the demo video
  • Develop advanced evaluation metrics
  • Debug CARLA 0.9.14 segmentation label issue
  • Study evaluation metrics in Behavior Metrics

Execution

Current models

Following the upgrade to CARLA 0.9.14, due to the changes in the semantic segmentation labels, we had to retrain our previous models which were initially trained under CARLA 0.9.13. Below is a summary of the current state of our models (see previous post for model architectures):

By comparing v7.0 and v7.1, we aim to discern whether one-hot encoding or an embedding layer leads to superior performance. Furthermore, by comparing v7.1 and v7.2 and specifically observing the number of traffic light infractions, we can evaluate how effectively model v7.2 has learned to adhere to traffic light signals.

Evaluation Metrics

Following the scoring system of the CARLA Leaderboard, we have updated our evaluation metrics as follows:

  • Driving Score: \(R_i P_i\) where \(R_i\) represents the percentage of completion of the \(i\)-th route and \(P_i\) the infraction penalty of the \(i\)−th route.
  • Route Completion: This metric represents the percentage of the route completed. For each testing route, we pre-calculate the total length of the route by letting an expert agent, i.e., the CARLA autopilot, drive through the route and measure the distance traveled from start to end.
  • Infraction Penalty: \(\prod_j (p_i ^j)^{\text{# of infractions}_j}\) where \(p^j\) is the penalty of infractions of type \(j\) as specified below:
    • Collision with walker: 0.5
    • Collision with other vehicle: 0.6
    • Collision with static objects: 0.65
    • Timeout: 0.7
    • Running a red light: 0.7 The overall driving score will be the arithmetic mean of the driving scores of all routes.

Demo Video

Finally, we edited and uploaded a demo video for the Youtube channel. Check it out here: