Skip to main content
Version: 2.0

Vectara Python SDK Quick Start

Get up and running with the Vectara Python SDK in minutes. This quick start guides you through installing the SDK, authenticating with an API key, creating a corpus, uploading a document, and running a semantic query.

Each step builds toward a functional setup for indexing and querying content, enabling you to leverage Vectara's Retrieval Augmented Generation (RAG) capabilities for applications like enterprise search, chatbots, or knowledge bases.

Prerequisites

  1. Install Python 3.7 or later.
  2. Get an API key from the Vectara Console.

API keys can have multiple types and roles from Personal (most administrative functions) to QueryService (read-only) to read and write (IndexService). For more information, see Authentication Methods and Authorization Levels.


1. Install the SDK

INSTALL THE SDK
1

Install the Vectara Python SDK using pip to access its core functionality for interacting with Vectara's API.

This step prepares your Python environment (version 3.7 or higher recommended) for corpus management, document indexing, and querying.

  • Command: pip install vectara
  • Purpose: Downloads and installs the SDK and its dependencies from PyPI.
  • Run in a virtual environment to avoid dependency conflicts.

Ensure pip is up-to-date (pip install --upgrade pip). After installation, import the vectara module to begin using the SDK.


2. Authenticate

AUTHENTICATE WITH API KEY
1

Authenticate with the Vectara API using an API key to securely access your account's resources.

The Vectara client initializes a connection to Vectara's services, enabling subsequent operations like corpus creation and querying.

  • api_key (string, required): A unique API key from the Vectara Console, used for authentication.
    Example: zwt_abc123....
    Keep this secret to prevent unauthorized access.
  • Purpose: Establishes a secure session with Vectara's API, required for all SDK operations.
  • Constraints: Obtain the API key from the Vectara Console under your account settings.
    Ensure the key has appropriate roles (serving for querying, admin for corpus creation).

Store API keys in environment variables or secure vaults (.env files) to avoid hardcoding in production code.

AUTHENTICATE WITH OAUTH
1

Alternatively, authenticate using OAuth2:

Obtain your OAuth credentials from the Vectara Console. This method is suitable for applications requiring token-based authentication.


3. Create a Corpus

CREATE CORPUS
1

A corpus is a collection of documents that you can search and query. Think of it as a database for your text content. Each corpus can have its own configuration, metadata schema, and access controls.

Creating a corpus requires a unique key and descriptive name. The corpus key acts as an identifier for all future operations, so choose something memorable and descriptive.

The corpora.create endpoint (HTTP POST /corpora) sets up a new corpus with a unique identifier, forming the foundation for storing and querying content. For more details on request and response parameters, see the Create Corpus REST API.

  • key (string, required): A unique identifier for the corpus (my-docs).
    Must be alphanumeric, underscores, or hyphens, with a maximum length of 100 characters.
  • name (string, required): A human-readable name (My Documentation).
    Maximum length: 255 characters.
    Helps identify the corpus in the Vectara Console.
  • description (string, optional): A brief description of the corpus's purpose (Demo corpus for quickstart).
    Maximum length: 1000 characters.
  • request_timeout (integer, optional): Timeout in seconds for the API request (10).
  • request_timeout_millis (integer, optional): Timeout in milliseconds, overriding request_timeout if set (10000).
  • Purpose: Initializes a corpus for storing documents, enabling search, chat, or RAG workflows.
  • Returns: A corpus object with key, name, and other metadata, confirming creation.
  • Constraints: The key must be unique within your account.
    Invalid characters or duplicates result in an API error (HTTP 400).

Use descriptive names for team collaboration. After creation, assign user or API key permissions to control access.


4. Upload a Document

UPLOAD DOCUMENT
1

Vectara supports two types of documents: structured and core. Upload a document to your corpus to make its content searchable. The documents.create endpoint (HTTP POST /documents) indexes a structured document, consisting of sections with titles and text, into the specified corpus.

This step populates your corpus with content for querying. For more details on request and response parameters, see the Index APIs.

UPLOAD CORE DOCUMENT
1

To upload a core document:

  • corpus_key (string, required): The target corpus identifier (my-docs), matching the key from step 3.
  • request (StructuredDocument or CoreDocument, required): Defines the document structure.
    • id (string, required): A unique document ID within the corpus (welcome-doc). Alphanumeric, underscores, or hyphens, maximum 100 characters.
    • type (string, required): Set to "structured" for section-based documents or "core" for part-based documents.
    • For structured: sections (list[StructuredDocumentSection], required): List of document sections.
      • title (string, optional): Section title (Welcome).
        Maximum length: 255 characters.
      • text (string, required): Section content (Welcome to Vectara! This is your first document.).
        Maximum length: varies by account limits.
      • metadata (dict, optional): Key-value pairs for filtering ({"category": "intro"}).
    • For core: document_parts (list[CoreDocumentPart], required): List of document parts.
      • text (string, required): Part content.
      • metadata (dict, optional): Key-value pairs.
    • metadata (dict, optional): Document-level metadata ({"source": "quickstart"}).
  • Purpose: Indexes a document for semantic search, enabling queries to retrieve relevant content.
  • Returns: A response confirming the document was indexed, including its id.
  • Constraints: The id must be unique within the corpus. Exceeding size limits or invalid characters results in an API error (HTTP 400).

Structured documents support section-based organization, ideal for manuals or reports. Core documents are simpler, with sequential parts. Use metadata to enable filtering in queries (by category). For larger datasets, consider uploading files (PDFs).


5. Run a Query

QUERY
1

Run a semantic query against your corpus to retrieve relevant content using natural language.

The client.query endpoint (HTTP POST /query) searches the corpus and returns results ordered by relevance. For more details on request and response parameters, see the Query REST API.

  • query (string, required): The natural language query ("What is Vectara?").
    Maximum length: 1000 characters.
  • search (SearchCorporaParameters, required): Configures the search parameters.
    • corpora (list[dict], required): List of corpora to query.
      • corpus_key (string, required): The corpus to search (my-docs).
      • metadata_filter (string, optional): Filters results by metadata (doc.category = 'intro').
        Default: empty string.
      • lexical_interpolation (float, optional): Balances lexical and semantic search (0.0 to 1.0, default 0.005).
    • context_configuration (ContextConfiguration, optional): Configures result context (sentences_before=2, sentences_after=2).
    • reranker (dict, optional): Applies a reranker for refined ordering
      ({"type": "customer_reranker", "reranker_name": "Rerank_Multilingual_v1"}).
  • generation (GenerationParameters, optional): Configures response generation (response_language="eng").
  • Purpose: Retrieves relevant documents or generates answers based on the corpus content, using semantic understanding.
  • Returns: A response object with:
    • results (list): Search results, each with text, score (float, 0.0 to 1.0), and metadata.
    • answer (string, optional): Generated answer if generation is configured.
  • Constraints: The corpus_key must exist and contain indexed documents. Queries exceeding length limits result in an API error (HTTP 400).

Quick start validation script

COMPLETE VALIDATION SCRIPT
1

This full validation script demonstrates all the quickstart concepts working together in a single, executable file. The script handles common edge cases and provides clear feedback at each step.

  • Environment Variables: The script securely retrieves the API key from environment variables (VECTARA_API_KEY) rather than hardcoding sensitive credentials.
  • Error handling: Uses ApiError for proper exception handling, catching specific error conditions like "already exists" scenarios that commonly occur during testing.
  • Corpus Propagation: Includes a small delay (time.sleep(2)) after corpus creation to ensure the corpus is fully available before attempting document upload.
  • Consistent Corpus Keys: Uses the standardized my-docs naming convention that is used throughout all documentation examples.
  • Single-Corpus Search: Demonstrates the simplified client.corpora.search() method for querying a specific corpus directly.

Run the validation script

  1. Set your API key: export VECTARA_API_KEY=your_key_here.
  2. Save the script as validate_quickstart.py.
  3. Run python3 validate_quickstart.py.

The script outputs step-by-step progress and handles cases where resources already exist from previous runs.

Expected Output

1. Authenticating...
2. Uploading document...
✅ Uploaded: welcome-doc
3. Running query...
✅ Found 1 results
Top result: Welcome to Vectara! This is your first document...
🎉 Quickstart validation complete!

This quickstart uses a simple query for demonstration. Enhance queries with rerankers or metadata filters for precision (see Types of Rerankers). For real-time applications, consider streaming queries (query_stream).

Your SDK and corpus are live, with a document indexed and a query executed. Next, explore advanced features like rerankers, chat sessions, or generation presets.


Cleanup: Delete the Corpus

DELETE CORPUS
1

If you want to delete the corpus to clean up or retry the quickstart:

This calls the delete corpus API to remove the corpus and all its documents. Note that this operation is irreversible.