Understanding Vectara
Vectara is an API-first Agentic Platform. The end-user application (the UI, the brand, the workflow) sits on top of the platform and calls Vectara over REST. Underneath, the platform handles the AI heavy lifting: document parsing and ingestion, agent orchestration, retrieval, generation, factual grading, governance, and observability.
Vectara is SOC 2 Type II and HIPAA certified.
There are two ways to get an application built and running on the same platform:
- Your team builds it with the API + Vectara Skills + a coding agent
- Vectara delivers it turnkey as Vectara Managed Agents.
See the application layer for both options.
Use this section to orient yourself. It explains what the parts of the
platform are, how they fit together, and the trade-offs that shape the
design. For the implementation reference (configuration fields, API
schemas, tuning playbooks), each topic links to its canonical guide in
/docs/agents/,
/docs/search-and-retrieval/,
/docs/pipelines/, and the
REST API reference. Read this section first
if you're evaluating Vectara or orienting a new engineer. Reach for the
canonical guides once you're building.
Vectara is available as a service on-premises, within your VPC, or via SaaS.
The platform comes with:
-
Agent orchestration, exposed as a hosted Agent API that you can use to write and use your agents.
- Writing an agent involves:
- Specifying agent instructions as text. Vectara also provides default instructions for a Q&A agent, and the Vectara team can help you write the instructions for your agents.
- Specifying which tools and skills the agent can use.
- Specifying which ML models the agent can use.
- Using an agent is done via invoking the session and interaction APIs.
- Writing an agent involves:
-
A list of tools that agents can use for various complex use cases. This includes tools like web search, image manipulation,
agent artifacts,subagents, text2sql, and many more. Vectara also supports users adding their own tools, whether Python code directly, MCP, or Agent Skills. -
A multimodal ingestion pipeline that is capable of processing complex documents containing text, images, tables, flowcharts, graphs, etc. Advanced parsing mechanisms ensure the contents of documents are extracted and indexed in a way that they can be used correctly at query time.
-
Managed indexing via connectors for various sources. While this enables Vectara to pull data from sources like S3, users can also push data to Vectara via APIs.
-
An index and search pipeline. This includes a vector DB, neural retrieval, lexical search, hybrid search, neural rerankers, non-neural rerankers, etc.
-
A variety of ML models, including but not limited to: vector embedding, reranking, RAG answer generator, RAG hallucination detector, agent orchestrator, vision models. Vectara works with on-prem and VPC customers to bring models that suit their use case.
-
A developer console useful for tenant configuration, tenant usage view, API debugging API, agent building, agent testing, and RAG, etc.
-
An admin console depicting all tenants in the deployment, their usage, governance, deployment-wide configuration, etc. This is mainly for on-prem and VPC deployments.
Vectara comes with various ML models (embedding model, reranker, RAG, hallucination detection), but also supports any non-Vectara models that are exposed via standard APIs.
The fastest way to understand what a Vectara agent does is to run one yourself. Open the Agent Playground. Paste an API key, paste an agent key, watch session metadata, step transitions, tool calls, and structured outputs stream in real time. See the playground walkthrough for setup details.
The three layers
Every Vectara deployment has the same three layers. Knowing which layer owns what keeps integration decisions clear.
| Layer | Owner | Responsibility |
|---|---|---|
| End user | The user | Sees only your branded UI. Has no concept of an "agent" or whether Vectara is being used behind the scenes. |
| The application | You / Vectara buyer | Three responsibilities: (1) supplying documents and systems used by the application, (2) defining agent and other system configurations, and (3) writing a thin layer of code (UI, business logic, identity) that calls Vectara over REST. The platform does the AI heavy lifting underneath, so this layer stays small. Except the first responsibility, your engineering team can deliver the rest with Vectara APIs, Vectara Skills, and a coding agent, or get them delivered turnkey by Vectara Managed Agents. You own it either way. |
| Vectara platform | Vectara | Indexes your data, and runs your declared agents over sessions. Calls tools, queries indexes, generates answers with your chosen LLM, grades with HHEM, streams events back. The platform is also responsible for enterprise features like security, tracing, observability, and governance. |
The end user never sees Vectara. The application is the only thing they touch. See the application layer for the two ways to build and operate it.
The platform stack
Read top-down. Clients call the interfaces. Agents orchestrate tools and the LLM gateway. Retrieval queries the corpora that pipelines populate. The foundation enforces isolation and compliance. None of these layers are custom-built per customer.
| Layer | What it does |
|---|---|
| Interfaces | REST API for developers, Vectara Skills for coding agents like Claude Code, Admin Console for operators. |
| Agent runtime | Stepped state machines, sub-agent delegation, structured-output gating, cross-session approvals. |
| Tools | 35+ built-in tools (search, write, SQL, code, image), Python Lambdas, MCP clients, web_get with OAuth. |
| LLM gateway | Anthropic, OpenAI, Gemini, on-prem models, BYO LLM. Velocity prompts. Hallucination Corrector. |
| Retrieval engine | Hybrid BM25 + dense retrieval, Slingshot reranker (chain, MMR, UDF), metadata filters, citations. |
| Corpora & ingestion | Boomerang embeddings, SmartChunk, pipelines and connectors. Knowledge, memory, and state in one primitive. |
| Foundation | Tenant isolation, IdP / SSO, RBAC by corpus, audit and traces, SOC 2 Type II, HIPAA, KMS-managed encryption. |
For a layer-by-layer walkthrough of what each component does, what is configurable, and how it connects to the rest, see the platform stack.
Read next
Concepts (what the platform is and how it runs):
- The application layer. Custom-built application vs. Vectara Managed Agents. Who builds and operates it.
- The platform stack. Each layer of the platform, what it does, and what you control.
- Agent anatomy. The parts of a Vectara agent and how they snap together.
- Request lifecycle. What happens between one user message and one streamed answer.
What an agent can do (the four capabilities):
- Knowledge. What the agent knows. RAG pipeline (SmartChunk, Boomerang, hybrid search, Slingshot, citations).
- Context & memory. What the agent remembers. Per-user memory and tool-result scratchpad on the same corpus primitive.
- Workflows. What the agent does. Stepped state machines, conditional routing, sub-agents, cross-session approvals.
- Tools & connectors.
How the agent reaches out. Built-in tools, Python Lambdas, MCP,
web_get, Slack connectors. Minutes to live, not platform releases.
Where Vectara sits in the market:
- Vectara vs other solutions. Side-by-side comparison against product-first vendors and specialized tooling. Why the platform-first shape compounds.
Ready to build? In Getting Started you will find:
- Build with coding agents. Scaffold connector UIs, dashboards, and stepped agents in 30 minutes with Claude Code, Cursor, or Codex.
- Try the playground. Drive any Vectara agent live and watch its events stream in.
- Agents quickstart. Create your first agent in the Console in a few minutes.