Skip to main content
Version: 2.0

Upload Files

This guide demonstrates how to upload files (PDFs, DOCX, and more) to a Vectara corpus using the Python SDK. Uploaded files are automatically parsed, chunked, and indexedβ€”making their contents instantly available for search and Retrieval Augmented Generation (RAG).

Use file upload for:

  • Bulk onboarding of policy docs, technical manuals, invoices, contracts, or research papers
  • Ingesting new content as soon as it is generated by your business
  • Processing documents with tables, charts, and structured data
Prerequisites

This guide assumes you have a corpus called my-docs. If you haven't created a corpus yet, follow the Quick Start guide to set up your first corpus.

Upload a basic file​

UPLOAD SAMPLE EMPLOYEE HANDBOOK
1

Upload a document (employee handbook or policy document) to your corpus. Vectara automatically parses, chunks, and indexes the content for semantic search. No manual processing is required.

The upload.file method corresponds to the HTTP POST /v2/upload_file endpoint. For more details on request and response parameters, see the Upload File REST API.

Key Parameters:

  • corpus_key: Target corpus identifier where the file will be stored
  • file: Binary content of the file (read with "rb" mode)
  • filename: Name of the file being uploaded
  • metadata: Optional key-value pairs for filtering and categorization

Supported File Types: PDF, DOCX, DOC, TXT, HTML, Markdown

Note

Each file uploaded can be up to 10 MB in size.

To update or overwrite an existing file, you must first delete the document using client.documents.delete() and then re-upload it, as direct updates to content are not supported. The file name is used as the document ID. Attempting to upload a file with the same name but different content will result in a 409 error.

Error Handling:

  • 400 Bad Request: Invalid parameters or unsupported file type
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Corpus not found
  • 409 Conflict: Document with same ID exists but different content
  • 413 Payload Too Large: File exceeds size limit

Upload with table extraction​

UPLOAD TECHNICAL DOCUMENT WITH TABLE EXTRACTION
1

Upload documents containing tables, charts, or structured data with enhanced extraction capabilities. Perfect for technical documentation, API references, or any document with tabular information.

Table Extraction Benefits:

  • Automatically extracts and indexes table content
  • Makes tabular data searchable alongside text content
  • Preserves table structure and relationships
  • Enables queries about specific data points within tables

Use Cases:

  • Technical specifications with parameter tables
  • API documentation with endpoint tables
  • Research papers with data tables
  • Configuration guides with settings tables
Note

Each file uploaded can be up to 10 MB in size.

To update or overwrite an existing file, you must first delete the document using client.documents.delete() and then re-upload it, as direct updates to content are not supported. The file name is used as the document ID. Attempting to upload a file with the same name but different content will result in a 409 error.

Error Handling:

  • 400 Bad Request: Invalid parameters or unsupported file type
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Corpus not found
  • 409 Conflict: Document with same ID exists but different content
  • 413 Payload Too Large: File exceeds size limit

Upload from file object (streaming)​

UPLOAD FROM FILE OBJECT (STREAMING)
1

Upload files directly from file objects without loading the entire content into memory. This is ideal for streaming scenarios where files are large or come from dynamic sources like cloud storage (e.g., S3 downloads), APIs, or webhooks, avoiding memory overhead.

Streaming Use Cases:

  • Files downloaded from cloud storage (S3, Google Cloud, etc.)
  • Content received through APIs or webhooks
  • Temporary files that don't need local persistence
  • Batch processing from external systems

Error Handling:

  • 400 Bad Request: Invalid parameters or unsupported file type
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Corpus not found
  • 409 Conflict: Document with same ID exists but different content
  • 413 Payload Too Large: File exceeds size limit

Advanced upload with comprehensive metadata​

UPLOAD WITH COMPREHENSIVE METADATA AND PROCESSING OPTIONS
1

Upload documents with comprehensive metadata to capture important business context. The system automatically processes document structure for precise queries and analysis.

Comprehensive Metadata Benefits:

  • Better document organization and discovery
  • Enhanced filtering capabilities in queries
  • Support for compliance and audit requirements
  • Improved search relevance through context

Business Document Benefits:

  • Query by department: "Show all HR policies"
  • Filter by dates: "Find documents effective after 2025"
  • Search by classification: "Show internal documents only"
  • Track versions: "Get the latest version of each handbook"

To update or overwrite an existing file, you must first delete the document using client.documents.delete() and then re-upload it, as direct updates to content are not supported. The file name is used as the document ID. Attempting to upload a file with the same name but different content will result in a 409 error.

Error Handling:

  • 400 Bad Request: Invalid parameters or unsupported file type
  • 403 Forbidden: Insufficient permissions
  • 404 Not Found: Corpus not found
  • 409 Conflict: Document with same ID exists but different content
  • 413 Payload Too Large: File exceeds size limit

Best practices and error handling​

File Upload Best Practices:

PRODUCTION-READY UPLOAD PATTERNS
1

Production Guidelines:

  • Always validate file existence and readability before upload
  • Include comprehensive metadata for better searchability
  • Use appropriate chunking strategies based on content type
  • Enable table extraction for documents with structured data
  • Implement retry logic for transient failures
  • Monitor upload success rates and file processing times

Error Handling:

  • File Issues: Validate file existence, permissions, and size
  • API Errors: Check corpus permissions and file format support
  • Network Issues: Implement retry logic with exponential backoff
  • Large Files: Consider chunked uploads for very large documents

Metadata Recommendations:

  • Include document type, department, and date information
  • Add file size and upload timestamp for tracking
  • Use consistent naming conventions across your organization
  • Include business-specific fields for filtering and analytics

Next steps​

After understanding file uploads:

  • Query uploaded content: Use Queries to search uploaded documents.
  • Document management: Use Documents to manage uploaded content.
  • Chat integration: Build conversational interfaces with uploaded documents using Chats.