Skip to main content
Version: 2.0

Use OpenAI SDK with the Vectara Chat Completions API

This tutorial demonstrates how to use Vectara's Chat Completions API through OpenAI-compatible interfaces. Learn how to integrate Vectara's generative AI capabilities into your applications using either direct HTTP requests or the OpenAI Python SDK. This enables seamless migration from OpenAI or integration with OpenAI-compatible tools. By completing this tutorial, you will use Vectara's API directly or via OpenAI SDK.

This tutorial contains the following steps:

  1. Prerequisites and setup
  2. Step 1. Install the required packages
  3. Step 2. Implement the VectaraChat client
  4. Step 3. Enter your API key
  5. Step 4. Initialize the Vectara chat client
  6. Step 5. Perform tests
Note

We recommend that you complete this tutorial in Google Colab.

Prerequisites and setup

Step 1. Install the required packages

Install the required Python packages. The requests library handles direct HTTP calls, while openai provides the official OpenAI SDK for simplified integration.

INSTALL REQUIRED PACKAGES
1

Step 2. Implement the VectaraChat client

The following code contains the implementation of the VectaraChat client, which provides methods for interacting with Vectara's Chat Completions API.

VECTARACHAT CLIENT IMPLEMENTATION
1
tip

Enable verbose=True during development to see detailed request/response logging for debugging.

Step 3. Enter your API key

API KEY CONFIGURATION
1

Step 4: Initialize the Vectara chat client

Create the VectaraChat instance and choose between Bearer token authentication (recommended) or x-api-key header authentication.

INITIALIZE CLIENT
1

Step 5. Perform tests

Now that you've set up the VectaraChat client and initialized it with your API key, let's test both implementation approaches. The following tests demonstrate four different scenarios: direct HTTP requests (streaming and non-streaming) and OpenAI SDK integration (streaming and non-streaming). Each test shows you how to make requests and handle responses in different ways.

Test 1: Direct API (non-streaming)

Let's test the direct API approach without streaming:

Direct HTTP Request

NON-STREAMING DIRECT API CALL
1
NON-STREAMING RESPONSE EXAMPLE
1

OpenAI SDK Request

NON-STREAMING WITH OPENAI SDK
1

Test 2: Direct API (streaming)

Now let's test with streaming enabled:

STREAMING DIRECT API CALL
1
STREAMING OUTPUT EXAMPLE
1

Test 3: OpenAI SDK (non-streaming)

Now let's test using the OpenAI SDK without streaming:

OPENAI SDK NON-STREAMING CALL
1
OPENAI SDK OUTPUT EXAMPLE
1

Test 4: OpenAI SDK (streaming)

Finally, let's test the OpenAI SDK with streaming:

OPENAI SDK STREAMING CALL
1
STREAMING OUTPUT EXAMPLE
1

Advanced usage examples

Beyond the basic tests, explore these advanced usage patterns to build production-ready applications:

Multi-turn conversations

The previous tests showed single-question interactions. Real conversational applications need to maintain context across multiple exchanges. The Chat Completions API supports multi-turn conversations by including the conversation history in each request. Here's how to build a contextual conversation:

MULTI-TURN CONVERSATION EXAMPLE
1
MULTI-TURN CONVERSATION OUTPUT
1

Use different models

Vectara supports various LLM models. Let's try a different model:

USING DIFFERENT MODELS
1
DIFFERENT MODEL OUTPUT
1

Customize generation parameters

You can customize generation parameters to control the output:

CUSTOMIZING PARAMETERS
1
CUSTOMIZED OUTPUT
1

This tutorial demonstrated how to use the Vectara Chat Completions API, both directly and with the OpenAI SDK. You can use this API to add powerful generative AI capabilities to your applications with OpenAI-compatible interfaces.

For integration examples with external applications, see Use with External Applications.