DataFlow security

Storing credentials for data sources

ThoughtSpot stores data source credentials in the DataFlow metadata repository (Postgres). We encrypt sensitive information, like passwords or authentication tokens, using AES-128. We store the tokens with the metadata.

ThoughtSpot generates the key using the Java crypto extension package classes/methods, using cipher block chaining (CBC). We store the key as part of the DataFlow code itself.

If the ThoughtSpot cluster deploys on AWS infrastructure, you can leverage AWS Vault for storing the source passwords. Currently, we support AWS vault for Oracle, Netezza, Redshift, MySQL, PostgreSQL, and Amazon Aurora.

DataFlow has separate login credentials which contain username and password. These credentials are stored in DataFlow metadata repository by encrypting the password using AES-128.

ThoughtSpot does not rotate the key; it remains static.

DataFlow metadata location

As of the 7.0 software release, DataFlow uses ThoughtSpot’s internal Postgres repository for storing the metadata for its connections.

Securing DataFlow data in ThoughtSpot

By default, ThoughtSpot uses JDBC in pipe mode for data extraction through DataFlow for all relational data sources. In this process, the system creates a named pipe under the /export/xvdb1/large_files/diyotta_agent_stage_dir/ directory. The JDBC extraction process pushes plain data to the named pipe in chunks and in parallel, tsload as a service reads the chunked data from the pipe and loads it to the Falcon destination table. The source data does not persist to disk at any time. After completing the load process, the system deletes the named pipe from the staging directory.

When using Snowflake as the DataFlow source, the default extraction type is bulk export, and default extract mode is pipe. Here, DataFlow stages data to the Snowflake internal user stage and then exports to pipe.

You can override the extraction mode from the default pipe to file by using DataFlow sync properties. The JDBC process extracts all data to a csv file as a plain text under the /export/xvdb1/large_files/diyotta_agent_stage_dir/ directory, and provides the file as input to tsload as a service. After completing the load process, the system deletes the intermediate staging file.

With file sources like Amazon S3, Azure Blob Storage, GCS and HDFS, DataFlow downloads source files to the staging directory /export/xvdb1/large_files/diyotta_agent_stage_dir, and sends them to tsload as a service. After completing the load process, the system deletes the downloaded files.

External database encryption

ThoughtSpot supports data encryption in transit for all connection types in DataFlow. For Azure Synapse, Databricks, Google BigQuery, MemSQL managed instances, and Snowflake, the system encrypts data by default. For more information, see DataFlow encryption reference.