Skip to content

mahmoudhage21/Parallel-LLM-API-Requester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure OpenAI Parallel Requests Handler

New features to be added soon:

  1. switch from notebook to scritps
  2. choose a provider: Azure / OpenAI
  3. set a budget to stop generating once its reached
  4. print cost every N requests are completed
  5. add many asserts to capture errors before running the API

This project simplifies making parallel requests to the Azure OpenAI API for chat completions of scenarios where one needs to batch process a large number of prepared prompts simultaneously.

This project efficiently manages rate limits (Requests RPM & Tokens TRM) and incorporates robust error handling to streamline processing multiple inputs simultaneously. Unlike the official OpenAI parallel implementation, which can be complex and cumbersome for beginners, this project offers a simplified, easy-to-understand approach, using libraries such as tenacity and threading.

Example

For a very simple scenario where the data consists of 100 requests asking simple questions such as What is 1+1?, What is 5+5?, processing these requests one by one took about 18.6 seconds 🛵. However, using the parallel processing method, this time was significantly reduced to approximately 2.6 seconds 🏎️, making it 7 times faster.

So hit it with more complex requests and larger datasets, and watch this method flexes its muscles, shaving off loads of time and zipping through tasks like a rocket booster 🚀

Requirements

  • API key from Azure OpenAI
  • Store the API key in a file named .env AZURE_OPENAI_API_KEY = <your_token>

Installation

Set up a virtual environment (macOS) as a kernel in Jupyter Notebook by installing the required packages to get started with this project:

python -m venv myenv

source myenv/bin/activate

pip install -r requirements.txt

python -m ipykernel install --user --name=myenv --display-name="Python 3.11 (myenv)"

Usage

To use this implementation, structure your input data as follows and utilize the provided APIPlayer class to handle parallel requests:

Data Format Example

[
 [{'role': 'system', 'content': "<Replace this with your desired system msg>"},
  {'role': 'user', 'content': '<Replace this with your desired user msg>'}],

 [{'role': 'system', 'content': "<Replace this with your desired system msg>"},
  {'role': 'user', 'content': '<Replace this with your desired user msg>'}],

 ...
]

Sample Class Usage

Instantiate the APIRequester class and call the get_responses_parallel method with your input data:

gpt35_turbo_api = APIRequester(model_name = "gpt-35-turbo", temperature = 1.0, max_tokens = 20, rate_limit = 100, token_rate_limit = 10000)  
results = gpt35_turbo_api.get_responses_parallel(message_sequences)
results[:2]

Each result is saved as a dictionary with input (the user's request message) and content (the response from the API), maintaining the relationship between each request and its corresponding response.

[{'input': 'What is 53 + 53?', 'content': '{"content": "106"}'},
 {'input': 'What is 100 + 100?', 'content': '{"content": "200"}'}]

Key Features

  • ThreadPoolExecutor: Manages multiple requests in parallel, improving response time.
  • Semaphore: Controls the rate of API calls to comply with rate limits.
  • Retry Mechanism: Handles intermittent errors effectively by automatically retrying failed requests.
  • Custom error handling: Provides a fallback mechanism that triggers after all retry attempts fail, allowing the process to proceed smoothly despite errors.

Related Projects

While other projects provide mechanisms to interact with OpenAI's API, this project utilises libraries such as tenacity and threading, focusing on simplicity and ease of use, especially for users new to parallel computing.

This Script openai-cookbook/examples/api_request_parallel_processor.py is well-suited for making parallel requests to the OpenAI API. However, it can be complex and cumbersome for scenarios where one wants to just send a lot of prompts that are already prepared simultaneously. This project aims to streamline and simplify that process.

Credits

Special thanks to the Max Planck Institute for Human Development, Center for Humans & Machines for providing the Azure OpenAI API endpoint that facilitated the development of this project.

For more information on their work and further research, please visit their GitHub and official website.