Your Truffle has powerful AI models running locally. The truffile CLI provides an OpenAI-compatible proxy that lets you use these models with any OpenAI SDK or tool.
Quick Start
Check Available Models
See what models are loaded on your Truffle:
🍄🟫 Models on truffle-6272
✓ Llama-3.2-3B
id: 5d9a9c10-58cd-4b9c-b456-67d21cafb945
✓ Qwen3-30B-A3B-Thinking-2507 reasoner
id: 99b7eefa-e051-4699-b172-7535e4b87a28
Memory: 453MB / 62842MB
Models marked with reasoner support chain-of-thought reasoning and will include their thinking process in responses.
Start the Proxy
Start an OpenAI-compatible proxy server:
🍄🟫 Starting OpenAI proxy
✓ Resolving truffle-6272.local
✓ Connecting to inference service
Device: truffle-6272 (192.168.1.32)
Models: 2 loaded
✓ Proxy running at http://127.0.0.1:8080/v1
Use with OpenAI SDK:
from openai import OpenAI (we dont require you to use an API key so its not needed)
client = OpenAI(base_url="http://127.0.0.1:8080/v1")
Or set environment variable:
export OPENAI_BASE_URL=http://127.0.0.1:8080/v1
Press Ctrl+C to stop
Using with OpenAI SDK
Once the proxy is running, you can use the standard OpenAI Python SDK:
from openai import OpenAI
client = OpenAI( base_url = "http://127.0.0.1:8080/v1" )
response = client.chat.completions.create(
model = "default" , # Uses the default loaded model
messages = [
{ "role" : "user" , "content" : "What is the capital of France?" }
]
)
print (response.choices[ 0 ].message.content)
Streaming Responses
stream = client.chat.completions.create(
model = "default" ,
messages = [{ "role" : "user" , "content" : "Write a haiku about coding" }],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
Structured Output (JSON Schema)
Request structured JSON responses:
response = client.chat.completions.create(
model = "default" ,
messages = [{ "role" : "user" , "content" : "List 3 programming languages" }],
response_format = {
"type" : "json_schema" ,
"json_schema" : {
"name" : "languages" ,
"schema" : {
"type" : "object" ,
"properties" : {
"languages" : {
"type" : "array" ,
"items" : { "type" : "string" }
}
},
"required" : [ "languages" ]
}
}
}
)
The proxy supports OpenAI-style function/tool calling:
tools = [{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get the current weather for a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" , "description" : "City name" }
},
"required" : [ "location" ]
}
}
}]
response = client.chat.completions.create(
model = "default" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Tokyo?" }],
tools = tools
)
# Check if the model wants to call a tool
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Arguments: { tool_call.function.arguments } " )
Proxy Options
Custom Port
Run the proxy on a different port:
truffile proxy --port 9000
Bind to All Interfaces
Allow connections from other machines on your network:
truffile proxy --host 0.0.0.0
Debug Mode
Include reasoning/thinking in responses (for reasoner models):
Exposing to the Internet
To use your Truffle’s inference from anywhere, you can tunnel the proxy using ngrok :
Install ngrok
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \
| sudo tee /etc/apt/trusted.gpg.d/ngrok.asc > /dev/null \
&& echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \
| sudo tee /etc/apt/sources.list.d/ngrok.list \
&& sudo apt update \
&& sudo apt install ngrok
Start the Tunnel
Start the proxy in one terminal:
In another terminal, start ngrok:
You’ll get a public URL like https://abc123.ngrok.io
Use this URL as your OpenAI base URL:
client = OpenAI( base_url = "https://abc123.ngrok.io/v1" )
Anyone with your ngrok URL can use your Truffle for inference. Only share it with people you trust, or use ngrok’s authentication features.
Command Reference
Command Description truffile modelsList AI models on connected device truffile proxyStart proxy on localhost:8080 truffile proxy --port <port>Use custom port truffile proxy --host <host>Bind to specific interface truffile proxy --device <name>Connect to specific Truffle truffile proxy --debugInclude reasoning in responses
Source Code
The inference proxy implementation is open source. See how it works:
truffile/infer OpenAI-compatible proxy implementation
Key files:
proxy.py - HTTP server that translates OpenAI API requests to Truffle’s gRPC inference API
Handles streaming, tool calls, JSON schema, and model resolution