Version: 2.0

Understanding Vectara

Vectara is an API-first Agentic Platform. The end-user application (the UI, the brand, the workflow) sits on top of the platform and calls Vectara over REST. Underneath the application layer, Vectara handles retrieval, generation, agent orchestration, factual consistency checks, governance, and observability.

tip

Vectara is SOC 2 Type II and HIPAA certified.

There are two ways to get an application built and running on the same platform:

Your team builds it with the API + Vectara Skills + a coding agent
Vectara delivers it turnkey as Vectara Managed Agents.

See the application layer for both options.

This section is the conceptual map

Use this section to orient yourself. It explains what the parts of the platform are, how they fit together, and the trade-offs that shape the design. For the implementation reference (configuration fields, API schemas, tuning playbooks), each topic links to its canonical guide in /docs/agents/, /docs/search-and-retrieval/, /docs/pipelines/, and the REST API reference. Read this section first if you're evaluating Vectara or orienting a new engineer. Reach for the canonical guides once you're building.

Vectara is available as a service on-premises, within your VPC, or via SaaS.

The platform comes with:

Agent orchestration, exposed as a hosted Agent API that you can use to write and use your agents.
- Writing an agent involves:
  - Specifying agent instructions as text. Vectara also provides default instructions for a Q&A agent, and the Vectara team can help you write the instructions for your agents.
  - Specifying which tools and skills the agent can use.
  - Specifying which ML models the agent can use.
- Using an agent is done via invoking the session and interaction APIs.
A list of tools that agents can use for various complex use cases. This includes tools like web search, image manipulation, agent artifacts, subagents, text2sql, and many more. Vectara also supports users adding their own tools, whether Python code directly, MCP, or Agent Skills.
A multimodal ingestion pipeline that is capable of processing complex documents containing text, images, tables, flowcharts, graphs, etc. Advanced parsing mechanisms ensure the contents of documents are extracted and indexed in a way that they can be used correctly at query time.
Managed indexing via connectors for various sources. While this enables Vectara to pull data from sources like S3, users can also push data to Vectara via APIs.
An index and search pipeline. This includes a vector DB, neural retrieval, lexical search, hybrid search, neural rerankers, non-neural rerankers, etc.
A variety of ML models, including but not limited to: vector embedding, reranking, RAG answer generator, RAG hallucination detector, agent orchestrator, vision models. Vectara works with on-prem and VPC customers to bring models that suit their use case.
A developer console useful for tenant configuration, tenant usage view, API debugging API, agent building, agent testing, and RAG, etc.
An admin console depicting all tenants in the deployment, their usage, governance, deployment-wide configuration, etc. This is mainly for on-prem and VPC deployments.

Vectara comes with various ML models (embedding model, reranker, RAG, hallucination detection), but also supports any non-Vectara models that are exposed via standard APIs.

Explore the runtime in the Agent Playground

The fastest way to understand what a Vectara agent does is to run one yourself. Open the Agent Playground. Paste an API key, paste an agent key, watch session metadata, step transitions, tool calls, and structured outputs stream in real time. See the playground walkthrough for setup details.

The three layers

Every Vectara deployment has the same three layers. Knowing which layer owns what keeps integration decisions clear.

Layer	Owner	Responsibility
End user	The user	Sees only your branded UI. Has no concept of an "agent" or whether Vectara is being used behind the scenes.
The application	You / Vectara buyer	Three responsibilities: (1) supplying documents and systems used by the application, (2) defining agent and other system configurations, and (3) writing a thin layer of code (UI, business logic, identity) that calls Vectara over REST. The platform does the AI heavy lifting underneath, so this layer stays small. Except the first responsibility, your engineering team can deliver the rest with Vectara APIs, Vectara Skills, and a coding agent, or get them delivered turnkey by Vectara Managed Agents. You own it either way.
Vectara platform	Vectara	Indexes your data, and runs your declared agents over sessions. Calls tools, queries indexes, generates answers with your chosen LLM, grades with HHEM, streams events back. The platform is also responsible for enterprise features like security, tracing, observability, and governance.

The end user never sees Vectara. The application is the only thing they touch. See the application layer for the two ways to build and operate it.

The platform stack

Read top-down. Clients call the interfaces. Agents orchestrate tools and the LLM gateway. Retrieval queries the corpora that pipelines populate. The foundation enforces isolation and compliance. None of these layers are custom-built per customer.

Layer	What it does
Interfaces	REST API for developers, Vectara Skills for coding agents like Claude Code, Admin Console for operators.
Agent runtime	Stepped state machines, sub-agent delegation, structured-output gating, cross-session approvals.
Tools	35+ built-in tools (search, write, SQL, code, image), Python Lambdas, MCP clients, `web_get` with OAuth.
LLM gateway	Anthropic, OpenAI, Gemini, on-prem models, BYO LLM. Velocity prompts. Hallucination Corrector.
Retrieval engine	Hybrid BM25 + dense retrieval, Slingshot reranker (chain, MMR, UDF), metadata filters, citations.
Corpora & ingestion	Boomerang embeddings, chunking, pipelines and connectors. Knowledge, memory, and state in one primitive.
Foundation	Tenant isolation, IdP / SSO, RBAC by corpus, audit and traces, SOC 2 Type II, HIPAA, KMS-managed encryption.

For a layer-by-layer walkthrough of what each component does, what is configurable, and how it connects to the rest, see the platform stack.

The three layers​

The platform stack​

Read next​

The three layers

The platform stack

Read next