Inference

Your Truffle has powerful AI models running locally. The truffile CLI provides an OpenAI-compatible proxy that lets you use these models with any OpenAI SDK or tool.

Quick Start

Check Available Models

See what models are loaded on your Truffle:

truffile models

🍄‍🟫 Models on truffle-6272

  ✓ Llama-3.2-3B
    id: 5d9a9c10-58cd-4b9c-b456-67d21cafb945
  ✓ Qwen3-30B-A3B-Thinking-2507 reasoner
    id: 99b7eefa-e051-4699-b172-7535e4b87a28

Memory: 453MB / 62842MB

Models marked with reasoner support chain-of-thought reasoning and will include their thinking process in responses.

Start the Proxy

Start an OpenAI-compatible proxy server:

truffile proxy

🍄‍🟫 Starting OpenAI proxy

✓ Resolving truffle-6272.local
✓ Connecting to inference service
  Device: truffle-6272 (192.168.1.32)
  Models: 2 loaded

✓ Proxy running at http://127.0.0.1:8080/v1

  Use with OpenAI SDK:
    from openai import OpenAI (we dont require you to use an API key so its not needed)
    client = OpenAI(base_url="http://127.0.0.1:8080/v1")

  Or set environment variable:
    export OPENAI_BASE_URL=http://127.0.0.1:8080/v1

  Press Ctrl+C to stop

Using with OpenAI SDK

Once the proxy is running, you can use the standard OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1")

response = client.chat.completions.create(
    model="default",  # Uses the default loaded model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

Streaming Responses

stream = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Structured Output (JSON Schema)

Request structured JSON responses:

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "List 3 programming languages"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "languages",
            "schema": {
                "type": "object",
                "properties": {
                    "languages": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["languages"]
            }
        }
    }
)

Tool Calling

The proxy supports OpenAI-style function/tool calling:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# Check if the model wants to call a tool
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Proxy Options

Custom Port

Run the proxy on a different port:

truffile proxy --port 9000

Bind to All Interfaces

Allow connections from other machines on your network:

truffile proxy --host 0.0.0.0

Debug Mode

Include reasoning/thinking in responses (for reasoner models):

truffile proxy --debug

Exposing to the Internet

To use your Truffle’s inference from anywhere, you can tunnel the proxy using ngrok:

Install ngrok

macOS
Windows
Linux

brew install ngrok

choco install ngrok

curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \
  | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null \
  && echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \
  | sudo tee /etc/apt/sources.list.d/ngrok.list \
  && sudo apt update \
  && sudo apt install ngrok

Start the Tunnel

Start the proxy in one terminal:
```
truffile proxy
```
In another terminal, start ngrok:
```
ngrok http 8080
```
You’ll get a public URL like https://abc123.ngrok.io

Use this URL as your OpenAI base URL:

client = OpenAI(base_url="https://abc123.ngrok.io/v1")

Anyone with your ngrok URL can use your Truffle for inference. Only share it with people you trust, or use ngrok’s authentication features.

Command Reference

Command	Description
`truffile models`	List AI models on connected device
`truffile proxy`	Start proxy on localhost:8080
`truffile proxy --port <port>`	Use custom port
`truffile proxy --host <host>`	Bind to specific interface
`truffile proxy --device <name>`	Connect to specific Truffle
`truffile proxy --debug`	Include reasoning in responses

Source Code

The inference proxy implementation is open source. See how it works:

truffile/infer

OpenAI-compatible proxy implementation

Key files:

proxy.py - HTTP server that translates OpenAI API requests to Truffle’s gRPC inference API
Handles streaming, tool calls, JSON schema, and model resolution

Getting Started

Building Apps

Quick Start

Check Available Models

Start the Proxy

Using with OpenAI SDK

Streaming Responses

Structured Output (JSON Schema)

Tool Calling

Proxy Options

Custom Port

Bind to All Interfaces

Debug Mode

Exposing to the Internet

Install ngrok

Start the Tunnel

Command Reference

Source Code

truffile/infer

Getting Started

Building Apps

​Quick Start

​Check Available Models

​Start the Proxy

​Using with OpenAI SDK

​Streaming Responses

​Structured Output (JSON Schema)

​Tool Calling

​Proxy Options

​Custom Port

​Bind to All Interfaces

​Debug Mode

​Exposing to the Internet

​Install ngrok

​Start the Tunnel

​Command Reference

​Source Code

truffile/infer

Quick Start

Check Available Models

Start the Proxy

Using with OpenAI SDK

Streaming Responses

Structured Output (JSON Schema)

Tool Calling

Proxy Options

Custom Port

Bind to All Interfaces

Debug Mode

Exposing to the Internet

Install ngrok

Start the Tunnel

Command Reference

Source Code