Version: 2.0

Agent

Agents are the core orchestration unit in the Vectara platform. The agent decides how to respond to user input, when to invoke tools, and how to manage conversation state.

Each agent is configured with:

A unique key and name following the pattern agt_[identifier]. If you do not provide a key, Vectara generates one automatically based on the name.
A human-readable description
Optional instructions
A list of available tools (referenced by name or ID)
Optional tool configurations, for example Corpora Search tools configured to grant access to various corpora
Metadata and versioning controls
A first_step definition that encompasses optional instructions for the agent's behavior.

Agents operate through a conversational step architecture, processing user input through reasoning, tool execution, and response generation phases. The step-based design enables complex multi-turn workflows and intelligent tool orchestration.

Create an agent

You can create an agent in the Vectara Console, or you can use the API. For more information, check out our Agents Quick Start.

Example agent definition

This example shows a basic customer support agent configured with corpus search capabilities and inline instructions. The agent demonstrates the core components: tool configurations for searching support tickets, and a conversational first step with behavior guidelines.

AGENT EXAMPLE

Model configuration

Agents use large language models for reasoning and response generation. You can configure:

Model: Choose from available models like GPT-4o.
Parameters: Adjust temperature, max tokens, and other model-specific settings
Cost optimization: Balance performance with token usage
Retry configuration: Configure automatic retry behavior for transient failures

Retry configuration

When agents interact with LLMs, transient failures may occur that interrupt the conversation flow, including network timeouts, temporary server issues, or reaching API rate limits. Without a retry mechanism, these temporary issues cause your agent to fail immediately, resulting in a poor user experience.

Vectara provides a retry configuration option for agents which detects these recoverable failures and retries the request with exponential backoff automatically.

The RetryConfiguration object controls the retry behavior for your agent's interactions with the LLM. You define these settings when creating or updating your agent model, and they apply to all LLM requests made by that agent.

Retry configuration parameters

enabled: The boolean flag to enable or disable retry logic
- Default: true
max_retries: The maximum number of retry attempts after the initial failure
- Range: 0-10
- Default: 3
initial_backoff_ms: The initial delay in milliseconds before the first retry
- Range: 100-60000ms
- Default: 1000ms
max_backoff_ms: The maximum delay in milliseconds between retries
- Range: 1000-300000ms
- Default: 30000ms
backoff_factor: The exponential multiplier for calculating backoff delays
- Range: 1.0-10.0
- Default: 2.0

Exponential backoff

Exponential backoff progressively increases the delay between retry attempts to avoid overwhelming a recovering service. For example, with default settings (initial: 1000ms, factor: 2.0, max: 30000ms):

Attempt 1: 1000ms delay
Attempt 2: 2000ms delay
Attempt 3: 4000ms delay
Attempt 4: 8000ms delay

The delay continues to grow exponentially until it reaches the max_backoff_ms value, at which point it remains constant for any remaining retry attempts.

Example: Research assistant with web search

Here's how to create a research assistant agent that can search the web for current information. This agent completes the following tasks:

Search the web when users ask questions requiring current information
Limit search results to 20 for comprehensive responses
Use a lower temperature (0.3) for more consistent, factual responses
Follow instructions to cite sources and admit uncertainty when appropriate
Configure retry logic to handle transient API failures gracefully

This example requires no corpus setup, making it perfect for immediate testing.

CREATE A RESEARCH ASSISTANT AGENT

1?

Chat with your agent

After creating an agent, you can interact with it by creating a session and sending messages:

1. Create a session

Sessions provide conversation context and are required for all agent interactions:

CREATE A SESSION

2. Send messages to the agent

Once you have a session, send messages using the events endpoint:

SEND A MESSAGE

The agent will respond with events including its reasoning, tool usage, and final response.

Quick Start

For a complete step-by-step guide with code examples, see Agent Quick Start.

Example agent definition​

Model configuration​

Retry configuration​

Retry configuration parameters​

Exponential backoff​

Example: Research assistant with web search​

Chat with your agent​

1. Create a session​

2. Send messages to the agent​