DataFlow security
Storing credentials for data sources
ThoughtSpot stores data source credentials in the DataFlow metadata repository (Postgres). We encrypt sensitive information, like passwords or authentication tokens, using AES-128. We store the tokens with the metadata.
ThoughtSpot generates the key using the Java crypto extension package classes/methods, using cipher block chaining (CBC). We store the key as part of the DataFlow code itself.
If the ThoughtSpot cluster deploys on AWS infrastructure, you can leverage AWS Vault for storing the source passwords. Currently, we support AWS vault for Oracle, Netezza, Redshift, MySQL, PostgreSQL, and Amazon Aurora.
DataFlow has separate login credentials which contain username and password. These credentials are stored in DataFlow metadata repository by encrypting the password using AES-128.
ThoughtSpot does not rotate the key; it remains static. |
DataFlow metadata location
DataFlow uses an embedded Postgres repository bundled as part of its installation for all metadata storage. Manage the start and shutdown of this repository service using DataFlow enable and disable commands.
Securing DataFlow data in ThoughtSpot
By default, ThoughtSpot uses JDBC in pipe mode for data extraction through DataFlow for all relational data sources. In this process, the system creates a named pipe under the /export/xvdb1/large_files/diyotta_agent_stage_dir/
directory. The JDBC extraction process pushes plain data to the named pipe in chunks and in parallel, tsload
as a service reads the chunked data from the pipe and loads it to the Falcon destination table. The source data does not persist to disk at any time. After completing the load process, the system deletes the named pipe from the staging directory.
When using Snowflake as the DataFlow source, the default extraction type is bulk export, and default extract mode is pipe. Here, DataFlow stages data to the Snowflake internal user stage and then exports to pipe.
You can override the extraction mode from the default pipe to file by using DataFlow sync properties. The JDBC process extracts all data to a csv
file as a plain text under the /export/xvdb1/large_files/diyotta_agent_stage_dir/
directory, and provides the file as input to tsload
as a service. After completing the load process, the system deletes the intermediate staging file.
With file sources like Amazon S3, Azure Blob Storage, GCS and HDFS, DataFlow downloads source files to the staging directory /export/xvdb1/large_files/diyotta_agent_stage_dir
, and sends them to tsload
as a service. After completing the load process, the system deletes the downloaded files.
External database encryption
ThoughtSpot supports data encryption in transit for all connection types in DataFlow. For Azure Synapse, Databricks, Google BigQuery, and Snowflake, the system encrypts data by default. For more information, see DataFlow encryption reference.