Use OpenAI SDK with the Vectara Chat Completions API
This tutorial demonstrates how to use Vectara's Chat Completions API through OpenAI-compatible interfaces. Learn how to integrate Vectara's generative AI capabilities into your applications using either direct HTTP requests or the OpenAI Python SDK. This enables seamless migration from OpenAI or integration with OpenAI-compatible tools. By completing this tutorial, you will use Vectara's API directly or via OpenAI SDK.
This tutorial contains the following steps:
- Prerequisites and setup
- Step 1. Install the required packages
- Step 2. Implement the VectaraChat client
- Step 3. Enter your API key
- Step 4. Initialize the Vectara chat client
- Step 5. Perform tests
We recommend that you complete this tutorial in Google Colab.
Prerequisites and setup
- Python 3.8 or higher
- Basic understanding of REST APIs and HTTP requests
- A valid Vectara API key with access to the Chat Completions endpoint.
Step 1. Install the required packages
Install the required Python packages. The requests
library handles direct HTTP
calls, while openai
provides the official OpenAI SDK for simplified
integration.
1
Step 2. Implement the VectaraChat client
The following code contains the implementation of the VectaraChat client, which provides methods for interacting with Vectara's Chat Completions API.
1
Enable verbose=True
during development to see detailed request/response
logging for debugging.
Step 3. Enter your API key
1
Step 4: Initialize the Vectara chat client
Create the VectaraChat
instance and choose between Bearer token authentication
(recommended) or x-api-key header authentication.
1
Step 5. Perform tests
Now that you've set up the VectaraChat client and initialized it with your API key, let's test both implementation approaches. The following tests demonstrate four different scenarios: direct HTTP requests (streaming and non-streaming) and OpenAI SDK integration (streaming and non-streaming). Each test shows you how to make requests and handle responses in different ways.
Test 1: Direct API (non-streaming)
Let's test the direct API approach without streaming:
Direct HTTP Request
1
1
OpenAI SDK Request
1
Test 2: Direct API (streaming)
Now let's test with streaming enabled:
1
1
Test 3: OpenAI SDK (non-streaming)
Now let's test using the OpenAI SDK without streaming:
1
1
Test 4: OpenAI SDK (streaming)
Finally, let's test the OpenAI SDK with streaming:
1
1
Advanced usage examples
Beyond the basic tests, explore these advanced usage patterns to build production-ready applications:
- Multi-turn conversations - Maintain context across multiple exchanges.
- Use different models - Switch between available LLM models.
- Customize generation parameters - Control output with temperature and token limits.
Multi-turn conversations
The previous tests showed single-question interactions. Real conversational applications need to maintain context across multiple exchanges. The Chat Completions API supports multi-turn conversations by including the conversation history in each request. Here's how to build a contextual conversation:
1
1
Use different models
Vectara supports various LLM models. Let's try a different model:
1
1
Customize generation parameters
You can customize generation parameters to control the output:
1
1
This tutorial demonstrated how to use the Vectara Chat Completions API, both directly and with the OpenAI SDK. You can use this API to add powerful generative AI capabilities to your applications with OpenAI-compatible interfaces.
For integration examples with external applications, see Use with External Applications.