Coding week7 7/08-7/14

Weekly Meeting

During the July 10 meeting, we focused on refining our dataset and integrating our models into the overall framework. We discussed the use of iterative generation techniques to remove duplicated lines from our datasets, aiming to establish a robust baseline for our project. We also explored the implementation of high-level commands with text inputs for the model.

Two key questions were raised during the meeting:

More details can be found here: Google Doc

Data Generation Performance Optimization

Recently, the focus has been on optimizing the performance of data generation. In previous work, it was noticeable that calling the API consumed a significant amount of time. Here is a potential analysis of the reasons for the slowness, which may involve two situations:

  1. The number of parallel requests is too small, resulting in request limitations.
  2. Each call requires a long return time.

API Rate Limits

I checked the GPT documentation and found the following notes on rate limits.

Rate limits

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

How do these rate limits work?

Rate limits are measured in five ways: RPM (requests per minute), RPD (requests per day), TPM (tokens per minute), TPD (tokens per day), and IPM (images per minute). Rate limits can be hit across any of the options depending on what occurs first. For example, you might send 20 requests with only 100 tokens to the ChatCompletions endpoint and that would fill your limit (if your RPM was 20), even if you did not send 150k tokens (if your TPM limit was 150k) within those 20 requests.

OpenAI calculates different rates based on different prepayment levels. Currently, my account is at Tier 2. As observed, even within Tier 2, GPT-3.5-turbo is faster than GPT-4. This means there are more parallel requests available. Therefore, the performance bottleneck should mainly be in point 2: each call requires a long return time. I speculate that the performance bottleneck is likely due to the time taken for each request’s model inference and the network latency.

How can we reduce the number of calls?

In the original code, instructions generation was done one by one, meaning each time a command was generated, an API call was made. Even with the Batch method, the implementation of this batch method does not reduce the number of API calls.

This method becomes very slow when a large number of commands need to be generated because each API call incurs network latency and processing time. Additionally, if the generated commands are duplicated, they need to be regenerated, further increasing the time consumption.

To improve the speed of command generation, we have adopted the following two main optimization methods:

  1. Batch Generation with One-time Request: One-time request can reduce the number of API calls, thereby reducing the overhead of network latency and processing time for each call.
    • Implementation: By using the n parameter of the openai.chat.completions.create method, multiple commands can be generated at once. For example, a single API call can generate 5 commands instead of making 5 separate calls to generate 1 command each time.
    • Prompt: Currently, the prompt can only generate one command at a time. Modifying it to generate batch_size commands at once can significantly reduce the number of API calls.

Original prompt:

def generate_instruction_prompt(action):
    """
    Generate a prompt for the OpenAI API to create a driving instruction for a given action.
    """
    return f"""
    Generate a short driving instruction that includes the action '{action}'. Here are three examples:
    1. "Turn right at the next intersection."
    2. "Go straight past the traffic light."
    3. "Merge into the left lane."
    Generate an instruction that includes '{action}':
    """

New prompt for multiple instructions:

def generate_instruction_prompt(action, batch_size):
    """
    Generate a prompt for the OpenAI API to create a driving instruction for a given action.
    """
    return f"""
    Generate a short driving instruction that includes the action '{action}'.
    Make sure each instruction is distinct and uses different wording or context. Here are some examples:
    "Turn right at the next intersection."
    "Go straight past the traffic light."
    "Merge into the left lane."
    "At the roundabout, take the second exit."
    "Keep left to stay on the main road."
    There does not need to be any numerical numbering or any prefixes.
    Generate {batch_size} unique instruction that include '{action}':
    """

After testing, we found that the new prompt can significantly reduce the data generation time and there are almost no duplicate commands. This suggests that to generate data with similar structures, a more effective way to increase speed and avoid duplication is to generate a large batch of data at the prompt stage.

Parallel Processing

Besides reducing the number of API calls, another method is to fully utilize the current Tier 2 parallel potential. Given the 3500 RPM of GPT-3.5-turbo, there is still more room for parallel processing. Parallel processing can fully utilize the capabilities of multi-core CPUs, reducing the overall processing time. Especially in network I/O intensive operations such as API calls, parallel processing can significantly reduce waiting time.

By using concurrent.futures.ThreadPoolExecutor, parallel processing is achieved.

In the optimized code, we combined the two methods mentioned above: using batch generation to reduce the number of API calls and using parallel processing to handle multiple batch generation requests simultaneously. The specific implementation steps are as follows:

Batch Generation of Commands:

Using the latest prompt mentioned above and passing in the batch_size parameter to generate multiple instructions at once. It is important to note that more tokens are needed (max_tokens=10 in our case), and regular expressions should be used to handle any potential impurities in the data. For example, data entries like these: ”43. Proceed straight towards the tall building in the distance.“ or ”- Travel straight as the road bends to the left.” The following regex code can effectively remove impurities:

    # Split multiple instructions from a single response
    instructions = raw_instruction.split('\n')
    # Remove leading numbers and punctuation from each instruction
    instructions_list = [
        re.sub(r'^[\d\W]+', '', instruction).strip() for instruction in instructions
    ]

In the fetch_instructions and generate_instructions_batch functions in utils.py, use the n parameter to generate multiple commands at once. This method was later removed because the prompt itself can generate a sufficient number of instructions.

Thread Pool:

In the generate_dataset function in dataset_generator.py, use ThreadPoolExecutor to parallelize the execution of multiple batch generation requests.

ThreadPoolExecutor is used to manage concurrent tasks, where each task generates a batch of instructions for a given action. It ensures that a unique set of instructions is generated for each action, dynamically adjusting the batch size to ensure at least five samples per batch while not exceeding the maximum batch size. The tasks are submitted to the executor, and the future results are mapped to their respective actions for further processing.

By default, ThreadPoolExecutor automatically determines the number of threads based on the system’s available resources. However, we can control the number of concurrent tasks more precisely by setting the thread pool size. If not specified, ThreadPoolExecutor will use a reasonable default value, usually os.cpu_count().

In addition to the core points mentioned above, there are other techniques that can be adopted.

  1. Reduce output token count
  2. Switch from GPT-4 to GPT-3.5
  3. Switch to Azure-hosted APIs
  4. Parallelize your calls
  5. Stream output and use stop sequences

For more detailed information, you can refer to this blog: Making GPT API Responses Faster

Because the data structures we use are relatively simple, it’s easy to reduce the token count, and using GPT-3.5 is more cost-effective and faster.

We used optimized methods to recreate the data and tested the generation efficiency under different conditions.

python dataset_generator_batch_optimized.py --actions Straight Right LaneFollow Left --num_samples 1000 --output_file ./datasets/user_instructions/dataset_4000.csv --max_batch_size 100
Instructions Generated Time (seconds)
500 26.09
1000 44.97
4000 159.67

In the previous method, generating 100 instructions took 158.48 seconds, and the data set stopped growing around 1k. With the new method, this limit can be easily surpassed, and the time to generate data has been significantly reduced.

Here is the complete terminal output log for dataset_4000.csv:

python dataset_generator_batch_optimized.py --actions Straight Right LaneFollow Left --num_samples 1000 --output_file ./datasets/user_instructions/dataset_4000.csv --max_batch_size 100
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Straight': Generated 94 / 1000 unique instructions (Discarded 6 duplicates)
Action 'Left': Generated 94 / 1000 unique instructions (Discarded 6 duplicates)
Action 'LaneFollow': Generated 60 / 1000 unique instructions (Discarded 40 duplicates)
Action 'Right': Generated 84 / 1000 unique instructions (Discarded 16 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Straight': Generated 167 / 1000 unique instructions (Discarded 27 duplicates)
Action 'Left': Generated 184 / 1000 unique instructions (Discarded 10 duplicates)
Action 'Right': Generated 167 / 1000 unique instructions (Discarded 17 duplicates)
Action 'LaneFollow': Generated 144 / 1000 unique instructions (Discarded 16 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Right': Generated 250 / 1000 unique instructions (Discarded 17 duplicates)
Action 'Left': Generated 271 / 1000 unique instructions (Discarded 13 duplicates)
Action 'LaneFollow': Generated 228 / 1000 unique instructions (Discarded 16 duplicates)
Action 'Straight': Generated 252 / 1000 unique instructions (Discarded 15 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Straight': Generated 338 / 1000 unique instructions (Discarded 14 duplicates)
Action 'Right': Generated 336 / 1000 unique instructions (Discarded 14 duplicates)
Action 'LaneFollow': Generated 305 / 1000 unique instructions (Discarded 23 duplicates)
Action 'Left': Generated 348 / 1000 unique instructions (Discarded 23 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Left': Generated 439 / 1000 unique instructions (Discarded 9 duplicates)
Action 'Straight': Generated 434 / 1000 unique instructions (Discarded 4 duplicates)
Action 'Right': Generated 419 / 1000 unique instructions (Discarded 17 duplicates)
Action 'LaneFollow': Generated 394 / 1000 unique instructions (Discarded 11 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Left': Generated 522 / 1000 unique instructions (Discarded 17 duplicates)
Action 'Right': Generated 502 / 1000 unique instructions (Discarded 17 duplicates)
Action 'Straight': Generated 524 / 1000 unique instructions (Discarded 10 duplicates)
Action 'LaneFollow': Generated 472 / 1000 unique instructions (Discarded 22 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Left': Generated 607 / 1000 unique instructions (Discarded 15 duplicates)
Action 'Straight': Generated 595 / 1000 unique instructions (Discarded 29 duplicates)
Action 'Right': Generated 577 / 1000 unique instructions (Discarded 25 duplicates)
Action 'LaneFollow': Generated 555 / 1000 unique instructions (Discarded 17 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Straight': Generated 688 / 1000 unique instructions (Discarded 7 duplicates)
Action 'Left': Generated 699 / 1000 unique instructions (Discarded 8 duplicates)
Action 'Right': Generated 661 / 1000 unique instructions (Discarded 16 duplicates)
Action 'LaneFollow': Generated 637 / 1000 unique instructions (Discarded 18 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Right': Generated 747 / 1000 unique instructions (Discarded 14 duplicates)
Action 'Straight': Generated 782 / 1000 unique instructions (Discarded 6 duplicates)
Action 'LaneFollow': Generated 706 / 1000 unique instructions (Discarded 31 duplicates)
Action 'Left': Generated 785 / 1000 unique instructions (Discarded 14 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Left': Generated 875 / 1000 unique instructions (Discarded 10 duplicates)
Action 'LaneFollow': Generated 783 / 1000 unique instructions (Discarded 23 duplicates)
Action 'Right': Generated 830 / 1000 unique instructions (Discarded 17 duplicates)
Action 'Straight': Generated 855 / 1000 unique instructions (Discarded 27 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 100
Action 'Straight': Generated 945 / 1000 unique instructions (Discarded 10 duplicates)
Action 'Right': Generated 915 / 1000 unique instructions (Discarded 15 duplicates)
Action 'LaneFollow': Generated 862 / 1000 unique instructions (Discarded 21 duplicates)
Action 'Left': Generated 957 / 1000 unique instructions (Discarded 18 duplicates)
Submitting batch generation for action 'Straight' with batch size 100
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 86
Action 'Straight': Generated 1019 / 1000 unique instructions (Discarded 12 duplicates)
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Submitting batch generation for action 'Left' with batch size 86
Action 'Left': Generated 1036 / 1000 unique instructions (Discarded 7 duplicates)
Submitting batch generation for action 'Right' with batch size 100
Submitting batch generation for action 'LaneFollow' with batch size 100
Action 'Right': Generated 1003 / 1000 unique instructions (Discarded 12 duplicates)
Submitting batch generation for action 'LaneFollow' with batch size 100
Action 'LaneFollow': Generated 944 / 1000 unique instructions (Discarded 18 duplicates)
Action 'Left': Generated 1075 / 1000 unique instructions (Discarded 25 duplicates)
Submitting batch generation for action 'LaneFollow' with batch size 100
Action 'LaneFollow': Generated 1026 / 1000 unique instructions (Discarded 18 duplicates)
Total discarded (duplicate) instructions: 813
                                            instruction      action
0                                                    ""  [Straight]
1     "Continue straight on this route until you see...  [Straight]
2             "Go straight onto the highway exit ramp."  [Straight]
3                    "Head straight along this avenue."  [Straight]
4     "Navigate the roundabout and take the exit str...  [Straight]
...                                                 ...         ...
3995  "Watch for any obstructions on the left side o...      [Left]
3996               "Signal left before changing lanes."      [Left]
3997              "Stay left to enter the parking lot."      [Left]
3998  "Merge into the left lane for the upcoming lef...      [Left]
3999              "Proceed left at the T-intersection."      [Left]

[4000 rows x 2 columns]
Dataset generated and saved to './datasets/user_instructions/dataset_4000.csv'
Generated 4000 instructions in 159.67 seconds.

Word cloud map analysis of the dataset

Conclusion

For generating similar types of instruction data, a more efficient approach is:

The above test results are based on a not fully optimized setup. I believe further optimization is possible. This method is scalable in terms of data. Regarding performance, one of the biggest impacts is num_threads; I currently use around 10. Additionally, for max_batch_size, I generally use 100.

It is also worth noting that the development cost is relatively affordable. Throughout the entire development cycle, my API usage bill did not exceed $10.