- switch from notebook to scritps
- choose a provider: Azure / OpenAI
- set a budget to stop generating once its reached
- print cost every N requests are completed
- add many asserts to capture errors before running the API
This project simplifies making parallel requests to the Azure OpenAI API for chat completions of scenarios where one needs to batch process a large number of prepared prompts simultaneously.
This project efficiently manages rate limits (Requests RPM & Tokens TRM) and incorporates robust error handling to streamline processing multiple inputs simultaneously. Unlike the official OpenAI parallel implementation, which can be complex and cumbersome for beginners, this project offers a simplified, easy-to-understand approach, using libraries such as tenacity and threading.
For a very simple scenario where the data consists of 100 requests asking simple questions such as What is 1+1?
, What is 5+5?
, processing these requests one by one took about 18.6 seconds 🛵. However, using the parallel processing method, this time was significantly reduced to approximately 2.6 seconds 🏎️, making it 7 times faster.
So hit it with more complex requests and larger datasets, and watch this method flexes its muscles, shaving off loads of time and zipping through tasks like a rocket booster 🚀
- API key from Azure OpenAI
- Store the API key in a file named .env
AZURE_OPENAI_API_KEY = <your_token>
Set up a virtual environment (macOS) as a kernel in Jupyter Notebook by installing the required packages to get started with this project:
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
python -m ipykernel install --user --name=myenv --display-name="Python 3.11 (myenv)"
To use this implementation, structure your input data as follows and utilize the provided APIPlayer class to handle parallel requests:
[
[{'role': 'system', 'content': "<Replace this with your desired system msg>"},
{'role': 'user', 'content': '<Replace this with your desired user msg>'}],
[{'role': 'system', 'content': "<Replace this with your desired system msg>"},
{'role': 'user', 'content': '<Replace this with your desired user msg>'}],
...
]
Instantiate the APIRequester class and call the get_responses_parallel method with your input data:
gpt35_turbo_api = APIRequester(model_name = "gpt-35-turbo", temperature = 1.0, max_tokens = 20, rate_limit = 100, token_rate_limit = 10000)
results = gpt35_turbo_api.get_responses_parallel(message_sequences)
results[:2]
Each result is saved as a dictionary with input (the user's request message) and content (the response from the API), maintaining the relationship between each request and its corresponding response.
[{'input': 'What is 53 + 53?', 'content': '{"content": "106"}'},
{'input': 'What is 100 + 100?', 'content': '{"content": "200"}'}]
- ThreadPoolExecutor: Manages multiple requests in parallel, improving response time.
- Semaphore: Controls the rate of API calls to comply with rate limits.
- Retry Mechanism: Handles intermittent errors effectively by automatically retrying failed requests.
- Custom error handling: Provides a fallback mechanism that triggers after all retry attempts fail, allowing the process to proceed smoothly despite errors.
While other projects provide mechanisms to interact with OpenAI's API, this project utilises libraries such as tenacity and threading, focusing on simplicity and ease of use, especially for users new to parallel computing.
This Script openai-cookbook/examples/api_request_parallel_processor.py is well-suited for making parallel requests to the OpenAI API. However, it can be complex and cumbersome for scenarios where one wants to just send a lot of prompts that are already prepared simultaneously. This project aims to streamline and simplify that process.
Special thanks to the Max Planck Institute for Human Development, Center for Humans & Machines for providing the Azure OpenAI API endpoint that facilitated the development of this project.
For more information on their work and further research, please visit their GitHub and official website.