Skip to main content
Version: 2.0

Queries

This guide covers the Vectara Python SDK for querying corpora, enabling search and Retrieval Augmented Generation (RAG) operations. These methods enable you to search corpora for relevant documents and generate summarized responses using Vectara's RAG-focused LLMs, supporting enterprise needs like legal research or customer insights.

Prerequisites

This guide assumes you have a corpus called my-docs with indexed documents and filter attributes defined. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus and add some documents.

Prerequisites

INSTALL VECTARA SDK
1

Setup Requirements:

  1. Install the SDK with pip install vectara.
  2. Get an API key from the Vectara Console.
  3. Create a corpus with client.corpora.create().

Initialize the Vectara Client

INITIALIZE VECTARA CLIENT
1

Set up authentication to securely access querying methods using an API key. Ensure your API key has querying permissions for the target corpora.


Simple query with generation

SIMPLE QUERY WITH GENERATION (RAG)
1

Perform a query with Retrieval Augmented Generation (RAG) to get both search results and an AI-generated summary. This is the most common pattern for getting comprehensive answers from your corpus.

The client.query method corresponds to the HTTP POST /v2/query endpoint. For more details on request and response parameters, see the Query REST API.

Key Parameters:

  • generation_preset_name: vectara-summary-ext-24-05-med-omni provides high-quality, comprehensive responses using GPT-4o.
    See Generation Presets for a list of currently supported prompts.
  • max_used_search_results: 50 results ensures the LLM has substantial context for generation.
  • enable_factual_consistency_score: Provides confidence score for the generated summary.

Returns:

  • summary: AI-generated summary based on search results
  • factual_consistency_score: Reliability score (0.0-1.0) for the summary
  • search_results: List of relevant documents with scores

Use this pattern when you need both specific document excerpts and a synthesized answer.


Advanced query with filtering and reranking

ADVANCED QUERY WITH FILTERING AND RERANKING
1

Execute sophisticated queries with metadata filtering, reranking, and custom generation prompts for specialized use cases.

Advanced Features:

  • Metadata Filtering: Use doc.field = 'value' syntax to filter by document properties (requires corpus filter attributes)
  • Lexical Interpolation: 0.3 balances keyword matching (30%) with semantic search (70%)
  • Context Configuration: Adds surrounding sentences for better understanding
  • Reranking: Improves result relevance using specialized models
  • Custom Prompts: Tailor AI responses for specific domains or formats

Important: Metadata filtering requires that your corpus has filter attributes defined for the fields you want to filter on. See the Corpus guide for creating filter attributes.


Streaming query

STREAMING QUERY FOR REAL-TIME RESPONSES
1

Stream query responses in real-time for better user experience in interactive applications like chatbots or live search interfaces.

The client.query_stream method corresponds to the HTTP POST /v2/query_stream endpoint.

Streaming Benefits:

  • Immediate feedback to users as content generates
  • Better perceived performance for long responses
  • Ability to stop generation early if needed

Use Cases:

  • Interactive chat interfaces
  • Live search suggestions
  • Long-form content generation where users want to see progress

Error handling and best practices

Common Error Scenarios:

ERROR HANDLING PATTERNS
1

Best Practices:

  • Always use try-catch blocks for production queries
  • Monitor factual consistency scores for quality control
  • Start with simple queries before adding advanced features
  • Use appropriate max_used_search_results (50 for comprehensive, 10-20 for fast responses)
  • Ensure corpus has filter attributes before using metadata filters

Performance Tips:

  • Cache frequently used search configurations
  • Use streaming for long responses
  • Consider pagination for very large result sets
  • Monitor query latency and adjust parameters accordingly

Metadata Filtering Requirements:

  • Filter attributes must be defined when creating the corpus
  • Metadata field names must exactly match filter attribute names
  • Use doc. prefix for document-level and part. for part-level filters

Next steps

After understanding queries, explore:

  • Chat sessions: Use client.chats.create() for conversational interfaces with the Chats guide
  • Metadata filtering: Learn advanced filtering techniques with the Metadata guide
  • Batch processing: Process multiple queries efficiently
  • Custom rerankers: Train domain-specific reranking models
  • Advanced analytics: Track query performance and user patterns