You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run launcher in docker and mount sock to host with below cli docker run --gpus all --shm-size 1g -v /tmp:/tmp -v /root/Project/text-generation-inference/ink-tgi/models:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0
get model tokenizer_config.json and change chattemplate as below: "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'}}{% if message['tool_calls'] %} {{''}} {% else %} {{message['content'] + eos_token}} {% endif %}\n{% elif message['role'] == 'tool' %}\n{{ '<|tool|>\n' +message['name'] + '\n'+ message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
Run router in host with
cd router
cargo run -- --tokenizer-config-path /root/Project/text-generation-inference/ink-tgi/router/tokenizer_config.json
call function call curl as example of openai interface
struct Message in lib.rs doesn't have attribute tool_calls as Openai Spec
struct ToolCall in lib.rs have id as u32 which must be String according to Openai Spec
Expected behavior
Router must serve interface that support function calling implementation of lang chain or other LLM application frameworks.
You can test it with below python code
importopenaiimportjsonclient=openai.OpenAI(
api_key="", # can be anythingbase_url="http://localhost:3000/v1"# NOTE: Replace with IP address and port of your llama-cpp-python server
)
# Example dummy function hard coded to return the same weather# In production, this could be your backend API or an external APIdefget_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""if"tokyo"inlocation.lower():
returnjson.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
elif"san francisco"inlocation.lower():
returnjson.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
elif"paris"inlocation.lower():
returnjson.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
else:
returnjson.dumps({"location": location, "temperature": "unknown"})
defrun_conversation():
# Step 1: send the conversation and available functions to the modelmessages= [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools= [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
response=client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
response_message=response.choices[0].messagetool_calls=response_message.tool_calls# Step 2: check if the model wanted to call a functioniftool_calls:
# Step 3: call the function# Note: the JSON response may not always be valid; be sure to handle errorsavailable_functions= {
"get_current_weather": get_current_weather,
} # only one function in this example, but you can have multiplemessages.append(response_message) # extend conversation with assistant's reply# Step 4: send the info for each function call and function response to the modelfortool_callintool_calls:
function_name=tool_call.function.namefunction_to_call=available_functions[function_name]
function_args=tool_call.function.argumentsfunction_response=function_to_call(
location=function_args.get("location"),
unit=function_args.get("unit"),
)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
) # extend conversation with function responsesecond_response=client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=messages,
) # get a new response from the model where it can see the function responsereturnsecond_responseprint(run_conversation())```
The text was updated successfully, but these errors were encountered:
System Info
CUDA: 12.1
Python 3.10
Rust: 1.75.0
Information
Tasks
Reproduction
docker run --gpus all --shm-size 1g -v /tmp:/tmp -v /root/Project/text-generation-inference/ink-tgi/models:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0
"chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'}}{% if message['tool_calls'] %} {{''}} {% else %} {{message['content'] + eos_token}} {% endif %}\n{% elif message['role'] == 'tool' %}\n{{ '<|tool|>\n' +message['name'] + '\n'+ message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
Expected behavior
Router must serve interface that support function calling implementation of lang chain or other LLM application frameworks.
You can test it with below python code
The text was updated successfully, but these errors were encountered: