ThoughtSpot Sage

ThoughtSpot Sage is our new AI-powered search experience which uses GPT-3, GPT-3.5T and GPT-4, with the large language models (LLM) of Microsoft Azure OpenAI Service.

For details on how to use the new ThoughtSpot Sage features, see:

How our natural language search works

Our new natural language search significantly reduces the skill and level of effort required for you to create insights in ThoughtSpot.

When searching in a Worksheet, ThoughtSpot Sage translates your natural language query to a relational search to provide you an accurate, AI-generated answer from your complex data.

Natural language query and token-based relational search

In the AI-generated answer, you can view the ThoughtSpot search phrases (also known as tokens) that matched your natural language query to verify the measures, attributes and filters used to create the answer. If you notice an error, you can quickly modify the query by adding or removing search phrases, and then send us the feedback so we can improve the results in a future ThoughtSpot release.

The following diagram provides a high-level overview of how ThoughtSpot Sage takes a natural language question you ask and gives you an AI-generated answer.

sage gpt flow

Steps of a natural language search query

After you enter your natural language query, ThoughtSpot Sage does the following:

Pre-processing

Sage uses all available information about your data source to create a prompt for generating a SQL query using GPT.

ThoughtSpot Sage:

  • Identifies the columns and values to be shared with GPT

  • Identifies the worksheet metadata to be shared with GPT, including column names, column description, and sample column data

  • Provides instructions for generating the SQL query

    • Shares custom functions with GPT for converting the generated SQL to the syntax required for ThoughtSpot’s Search data functionality.

GPT

GPT uses the information collected during pre-processing to generate a SQL query to answer your question. ThoughtSpot Sage uses GPT to:

  • Understand your intent by looking at Few-Shot examples, column names and values.

  • Generate a SQL query to answer your question using custom functions

Post-processing

In this stage, ThoughtSpot Sage converts the SQL query generated by GPT into a query which contains search tokens for ThoughtSpot’s Search data functionality.

Natural language search accuracy

Accuracy varies based on the complexity of the question and the data source you select, and will improve with future releases. For worksheets with a single use case, clearly formatted names, and no more than 50 columns, we measured an average of over 80% accuracy.

For more complex worksheets with thousands of columns, multiple use cases combined, and overlapping column names, accuracy is around 60%. As a result, you should always review the answer completely before relying on the result. Because of this, we’ve provided you with the ability to submit feedback, correct the answers to improve the model, and control which worksheets generate AI answers.

ThoughtSpot Sage administration

ThoughtSpot Sage features are disabled by default, and must be enabled by your administrator:

  1. Select the Admin tab.

  2. Under Application settings, select Search & SpotIQ.

  3. Click Edit and enable each feature individually.

  4. Click Save.

Your ThoughtSpot administrator can do the following:

  • Limit access to the Natural language search and AI-suggested searches features by enabling ThoughtSpot Sage for specific users by assigning the Can preview ThoughtSpot Sage user privilege to groups of users to use Sage.

  • Disable indexing of personally identifiable information (PII) and sensitive data columns so the data isn’t used by the GPT features at all.

ThoughtSpot Sage security

Features of ThoughtSpot Sage security:

  • Azure OpenAI Service complies with SOC2 Type II for data security and privacy, and has passed the ThoughtSpot Vendor Security Risk Assessment (Azure cloud compliances).

  • ThoughtSpot communication with the Azure OpenAI Service is encrypted in transit using TLS 1.2.

  • ThoughtSpot sends the natural language search query along with additional metadata such as worksheet column names, descriptions and sample values as part of the GPT prompt in order to provide accurate, in-context responses.

  • GPT does not store the sample data or metadata that ThoughtSpot sends nor does it use this data or metadata for retraining the model. We have turned off persistence of the prompt. Though Microsoft allows you to persist data for 30 days to do troubleshooting, we have explicitly disabled this and also retraining.

  • ThoughtSpot Sage does not use ChatGPT. We use a combination of GPT-3, GPT-3.5T and GPT-4, also created by Open AI, because they are better suited for natural language translation to SQL. As the models progress we will update the versions if they improve the performance of our features.