Model

Choose the LLM that powers your agent or connect your own endpoint.

Available models

PolyAI
OpenAI
Amazon Bedrock

Our default, proprietary models optimized for voice AI.

Model	Best for
Raven V2	Production-hardened, real-time voice interactions and high retrieval precision
Raven V3	Improved grounding, paraphrasing, and robustness for enterprise voice use cases

Raven models are purpose-built for conversational voice AI. Contact PolyAI for detailed benchmarks and guidance.

Model	Best for
GPT-5.2	High-quality interactions requiring nuance and strong reasoning
GPT-5.2 chat	Extended dialogue and conversational stability
GPT-5 mini	Lower latency and reduced cost for mid-complexity use cases
GPT-5 nano	Simple tasks and fast-response workloads
GPT-4o	Versatile balance of reasoning, speed, and cost
GPT-4o mini	Everyday queries and high-volume deployments
GPT-4.1	Strong reasoning with improved cross-task performance
GPT-4.1 mini	Cost-effective, latency-focused for lighter workloads
GPT-4.1 nano	Minimal compute and high throughput

See OpenAI model documentation for detailed specifications.

Model	Best for
Claude 3.5 Haiku	Simple, predictable tasks with strong safety alignment
Nova Micro	Efficiency with strong general-purpose performance

See Anthropic Claude docs and Amazon Nova docs for more details.

Configuring the model

Open model settings

Navigate to Agent Settings > Large Language Model.

Select a model

Choose the desired model from the dropdown.

Save changes

Click Save to apply your changes.

OpenAI models

Official OpenAI model reference

Anthropic Claude

Claude model documentation

Amazon Nova

Amazon Bedrock model details

Bring your own model (BYOM)

PolyAI supports bring-your-own-model (BYOM) with a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAI chat/completions schema and PolyAI will treat it like any other provider.

Overview

Expose an API endpoint

Accept and return data in the OpenAI chat/completions format.

Configure authentication

PolyAI can send either an x-api-key header or a Bearer token.

Enable streaming (optional)

Support streaming responses using stream: true for lower latency.

API endpoint

Request format
Response format
Streaming (SSE)

{
  "model": "your-model-id",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What's the weather today?" }
  ],
  "temperature": 0.7,
  "top_p": 1.0,
  "stream": false
}

You might receive extra OpenAI-style fields such as frequency_penalty, presence_penalty, etc.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "your-model-id",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "It's sunny today in London."
      },
      "finish_reason": "stop"
    }
  ]
}

If stream is true, send Server-Sent Events (SSE) mirroring OpenAI’s format:

data: {
  "id": "...",
  "object": "chat.completion.chunk",
  "choices": [{
    "delta": { "content": "Hello" },
    "index": 0,
    "finish_reason": null
  }]
}

data: {
  "choices": [{
    "delta": {},
    "index": 0,
    "finish_reason": "stop"
  }]
}

data: [DONE]

Authentication

Method	Header sent by PolyAI
API Key	`x-api-key: YOUR_API_KEY`
Bearer	`Authorization: Bearer YOUR_TOKEN`

Configure your server to accept one of the above.

Sample implementation (Python / Flask)

from flask import Flask, request, jsonify
import time, uuid

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def chat_completions():
    data = request.json
    messages = data.get('messages', [])
    user_input = messages[-1]['content'] if messages else ''

    # TODO: insert your model inference here
    reply = f'You said: {user_input}'

    return jsonify({
        'id': f'chatcmpl-{uuid.uuid4().hex}',
        'object': 'chat.completion',
        'created': int(time.time()),
        'model': 'my-llm',
        'choices': [{
            'index': 0,
            'message': { 'role': 'assistant', 'content': reply },
            'finish_reason': 'stop'
        }]
    })

Final checklist

Before going live, verify all of the following:

Endpoint reachable with POST.
Request/response match OpenAI chat/completions schema.
Authentication header configured (API Key or Bearer token).
(Optional) Streaming supported if needed.

Send to your PolyAI contact:

Endpoint URL
Model ID
Auth method & credential

Introduction

Analytics

Build

Channels

Configure

Deployments

Troubleshoot

Legal

Available models

Configuring the model

OpenAI models

Anthropic Claude

Amazon Nova

Bring your own model (BYOM)

Overview

API endpoint

Authentication

Sample implementation (Python / Flask)

Final checklist

Introduction

Analytics

Build

Channels

Configure

Deployments

Troubleshoot

Legal

​Available models

​Configuring the model

OpenAI models

Anthropic Claude

Amazon Nova

​Bring your own model (BYOM)

​Overview

​API endpoint

​Authentication

​Sample implementation (Python / Flask)

​Final checklist

Available models

Configuring the model

Bring your own model (BYOM)

Overview

API endpoint

Authentication

Sample implementation (Python / Flask)

Final checklist