Version: 2.0

Queries

This guide covers the Vectara Python SDK for querying corpora, enabling search and Retrieval Augmented Generation (RAG) operations. These methods enable you to search corpora for relevant documents and generate summarized responses using Vectara's RAG-focused LLMs, supporting enterprise needs like legal research or customer insights.

Prerequisites

This guide assumes you have a corpus called my-docs with indexed documents and filter attributes defined. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus and add some documents.

Prerequisites

INSTALL VECTARA SDK

Code example with bash syntax.

Setup Requirements:

Install the SDK with pip install vectara.
Get an API key from the Vectara Console.
Create a corpus with client.corpora.create().

Initialize the Vectara Client

INITIALIZE VECTARA CLIENT

Code example with python syntax.

Set up authentication to securely access querying methods using an API key. Ensure your API key has querying permissions for the target corpora.

Simple query with generation

SIMPLE QUERY WITH GENERATION (RAG)

Code example with python syntax.

Perform a query with Retrieval Augmented Generation (RAG) to get both search results and an AI-generated summary. This is the most common pattern for getting comprehensive answers from your corpus.

The client.query method corresponds to the HTTP POST /v2/query endpoint. For more details on request and response parameters, see the Query REST API.

Key Parameters:

generation_preset_name: vectara-summary-ext-24-05-med-omni provides high-quality, comprehensive responses using GPT-4o.
See Generation Presets for a list of currently supported prompts.
max_used_search_results: 50 results ensures the LLM has substantial context for generation.
enable_factual_consistency_score: Provides confidence score for the generated summary.

Returns:

summary: AI-generated summary based on search results
factual_consistency_score: Reliability score (0.0-1.0) for the summary
search_results: List of relevant documents with scores

Use this pattern when you need both specific document excerpts and a synthesized answer.

Advanced query with filtering and reranking

ADVANCED QUERY WITH FILTERING AND RERANKING

Code example with python syntax.

Execute sophisticated queries with metadata filtering, reranking, and custom generation prompts for specialized use cases.

Advanced Features:

Metadata Filtering: Use doc.field = 'value' syntax to filter by document properties (requires corpus filter attributes)
Lexical Interpolation: 0.3 balances keyword matching (30%) with semantic search (70%)
Context Configuration: Adds surrounding sentences for better understanding
Reranking: Improves result relevance using specialized models
Custom Prompts: Tailor AI responses for specific domains or formats

Important: Metadata filtering requires that your corpus has filter attributes defined for the fields you want to filter on. See the Corpus guide for creating filter attributes.

Streaming query

STREAMING QUERY FOR REAL-TIME RESPONSES

Code example with python syntax.

Stream query responses in real-time for better user experience in interactive applications like chatbots or live search interfaces.

The client.query_stream method corresponds to the HTTP POST /v2/query_stream endpoint.

Streaming Benefits:

Immediate feedback to users as content generates
Better perceived performance for long responses
Ability to stop generation early if needed

Use Cases:

Interactive chat interfaces
Live search suggestions
Long-form content generation where users want to see progress

Error handling and best practices

Common Error Scenarios:

ERROR HANDLING PATTERNS

Code example with python syntax.

Best Practices:

Always use try-catch blocks for production queries
Monitor factual consistency scores for quality control
Start with simple queries before adding advanced features
Use appropriate max_used_search_results (50 for comprehensive, 10-20 for fast responses)
Ensure corpus has filter attributes before using metadata filters

Performance Tips:

Cache frequently used search configurations
Use streaming for long responses
Consider pagination for very large result sets
Monitor query latency and adjust parameters accordingly

Metadata Filtering Requirements:

Filter attributes must be defined when creating the corpus
Metadata field names must exactly match filter attribute names
Use doc. prefix for document-level and part. for part-level filters

Next steps

After understanding queries, explore:

Chat sessions: Use client.chats.create() for conversational interfaces with the Chats guide
Metadata filtering: Learn advanced filtering techniques with the Metadata guide
Batch processing: Process multiple queries efficiently
Custom rerankers: Train domain-specific reranking models
Advanced analytics: Track query performance and user patterns

Prerequisites​

INSTALL VECTARA SDK

Initialize the Vectara Client​

INITIALIZE VECTARA CLIENT

Simple query with generation​

SIMPLE QUERY WITH GENERATION (RAG)

Advanced query with filtering and reranking​

ADVANCED QUERY WITH FILTERING AND RERANKING

Streaming query​

STREAMING QUERY FOR REAL-TIME RESPONSES

Error handling and best practices​

ERROR HANDLING PATTERNS

Next steps​

Prerequisites

Initialize the Vectara Client

Simple query with generation

Advanced query with filtering and reranking

Streaming query

Error handling and best practices

Next steps