Sync data through Google Cloud Storage connection

After using ThoughtSpot DataFlow to establish a connection to a Google Cloud Storage file system, you can create automatic data updates, to seamlessly refresh your data.

To sync your data, perform these tasks:

Select file

  1. Immediately after creating a new connection, the connection detail page appears.

    You can fill out this information immediately, or return to it at a later stage, by clicking on the connection name in the list of connections.

  2. On the connection detail page, click Add table sync.

  3. In the Add table sync interface, on the database selector, scroll to find the source file, and click to select it.

    You can search for the file by name.

    If you use multiple files, you have to repeat these steps for each file.

  4. In the file detail display, adjust the properties.

    Before confirming that you plan to sync a file, examine it to ensure that it has the right information: the correct data types, and reasonable sample data.

    For wide files, scroll to the right to see all columns.

File format

Specify the format of the source file.

The options are Delimited, Parquet, JSON, LDJSON, ORC, XML New, and Avro New.

Column separator

Specify the character that signals start of new column.

The options are Comma(,) (default), Semicolon(;), Pipe(|), Space, Tab, or Other.

Header row

Specify if file uses a header row.

Compression

Specify if file uses compression.

The options are None (default), or gzip

  1. Click Set up sync.

Specify sync schedule

Immediately after creating a new connection, the connection detail page appears.

When you create a sync schedule, you have the following scheduling options:

After you specify the sync schedule, click Save and sync now. This action saves the schedule, and starts the data sync immediately.

Alternatively, if you click Save, the system saves the schedule, but does not sync the data. This gives you the opportunity to fine-tune the column mapping between external data sources and tables inside ThoughtSpot, such as naming, visibility, data type conversion, conditions, and so on. The first sync starts on the specified schedule.

Minute(s) sync

Repeat

Select minute(s).

Mandatory field.

Starts

Accept the defaults, or set your own starting date and time, and end time(until) for syncing.

Note that the timezone is in UTC+00:00.

Mandatory fields.

Run the task every X minute(s)

Select the frequency of the sync.

The valid range is from 1 (default) to 59 minutes.

Mandatory field.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. Email notification for sync scheduled every 1 minute

Hourly sync

Repeat

Select hourly.

Mandatory field.

Starts

Accept the defaults, or set your own starting date and time for syncing.

Note that the timezone is in UTC+00:00.

Mandatory fields.

Run the task every X hour(s)

Select the frequency of the sync.

The valid range is from 1 (default) to 12 hours.

Mandatory field.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. hourly

Daily sync

Repeat

Select daily.

Mandatory field.

Starts

Accept the defaults, or set your own starting date and time for syncing.

Note that the timezone is in UTC+00:00.

Mandatory fields.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. Success and failure notifications

Weekly sync

Repeat

Select weekly.

Mandatory field.

Starts

Accept the defaults, or set your own starting date and time for syncing.

Note that the timezone is in UTC+00:00.

Mandatory fields.

Days of the week

Select the days of the week when you want to sync.

For example, you may select only Monday, or only Monday through Friday.

Mandatory field.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. Success and failure notifications

Monthly sync

Repeat

Select monthly.

Mandatory field.

Starts

Accept the defaults, or set your own starting date and time for syncing.

Note that the timezone is in UTC+00:00.

Mandatory fields.

Day of the Month

Select the days of the month when you want to sync.

Mandatory field.

There are two basic approaches:

Cardinal day

Click the Day selector, and choose by date of the month from drop down menu.

For example, select 15 to run sync on 15th of each month.

Ordinal day

Click the The selector, an choose one of First(default), Second, Third, Fourth, or Last. Then choose one of the days of the week, Sunday through Saturday.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. Success and failure notifications

Does not repeat

Repeat

Select Does not repeat.

Mandatory field.

Alert notifications

Specify when to notify of the sync status:

  • Failures (default),

  • Success and Failures

  • Never.

    Email notifications

    Write a list of recipients to whom success and failure notifications should be sent. Multiple email IDs can be specified as comma-separated values. You must provide this parameter to send an email for both failure and success scenarios. Success and failure notifications

Map tables

To map the external files to ThoughtSpot’s internal database, follow these steps:

  1. Open the Map tables and columns interface by clicking the toggle to open.

  2. The Internal ThoughtSpot storage opens by default.

  3. Notice that the filename and its path appear, under Filename.

    You can change the name of the file.

  4. Specify the following information for internal ThoughtSpot storage:

    ThoughtSpot database

    Select an existing ThoughtSpot database from the dropdown menu. If you wish to create a new database in ThoughtSpot, click TQL Editor, enter the CREATE DATABASE my_database; command, and click Execute.
    Mandatory field.

    ThoughtSpot schema

    Select an existing ThoughtSpot schema from the dropdown menu. If you wish to create a new schema in your ThoughtSpot database, click TQL Editor, enter the CREATE SCHEMA my_schema; command, and click Execute.
    Mandatory field.

    New or Existing table

    Mandatory field.

    Create a new table

    Choose this option when you want to load data into a new table.

    The system creates a new table automatically.

    If you want the new table to have some special properties, create the table using the TQL Editor, proceed to Choose existing table, and then select the table you just created.

    Click TQL Editor, enter the CREATE TABLE my_table ...; command, and click Execute.

    Specify the table name.

    Choose existing table

    Choose this option to load data into a table that already exists inside ThoughtSpot.

    Select the table name from the dropdown menu.

ThoughtSpot table

The name of the target table for data sync, inside ThoughtSpot.
Mandatory field.

Map columns

To map the columns of the external tables to columns in ThoughtSpot’s internal tables, follow these steps:

  1. Scroll down to the Map the columns…​ section.

  2. Specify the following information for columns:

    Adjust the file parsing properties as necessary: Column separator, Header row, and Compression.

    Select (or deselect) columns for syncing into ThoughtSpot.

    By default, all columns are selected.

    • Search for columns by name; this is very useful for very wide tables.

    • Rename columns in the ThoughtSpot table, for easier search.
      To make this change, click the pencil (edit) icon next to the name of the column.

    • Change the data type of the column inside the ThoughtSpot table.

      Click the down chevron icon to open the dropdown menu, and select a new data type.

      For example, if you know you have integer data, change the default DOUBLE datatype to INT32.

    • Set Primary keys of the table by toggling the selector to the 'on' position.

      Note that several columns may be primary keys. - Set the Sharding keys of the table by toggling the selector to the 'on' position. - Specify the Number of shards in the table.

  3. Save your work by clicking Save.

    Alternatively, click Save and sync now to save your work and sync data at the same time.

Set sync properties

Basic sync properties

To set the sync properties, follow these steps:

  1. Open the Advanced setup interface by clicking the toggle to open.

  2. Choose the Sync properties tab.

  3. Under Set sync properties, specify the following information:

    Sync mode

    Choose the sync mode, either Append or Overwrite.

    Mandatory field.

    For Append or Overwrite, when the table has a primary key, ThoughtSpot uses UPSERT to update the new rows to the table.
    Append

    This option adds new rows to the table.

    Overwrite

    This option removes all existing rows, and then adds new rows to the table.

    Default:

    Sync mode defaults to append.

    Completion

    Specify what to do with the file after the sync successfully completes.

    One of the following options: do nothing, delete the file, or move the file to specified location: directory/path

    When you select move it to, the default path is empty. In the text field, provide the path to the location the sync file should move to.

  4. Save your work by clicking Save.

    Alternatively, click Save and sync now to save your work and sync data at the same time.

Scripting for sync

To use scripting before or after the sync, follow these steps:

  1. Open the Advanced setup interface by clicking the toggle to open.

  2. Choose the Sync properties tab.

  3. Scroll down to Advanced configuration, and click the toggle to expand.

  4. Specify the following information:

    Wait for file

    Specify the fully-qualified name of the file (directory path and file name) that must be present before sync can start.

    ThoughtSpot deletes the file after the sync process completes.
    Pre-script

    Add the script that you want to run before syncing.

    Optional field.

    Example:

    Drop data that is over 1 year old; to find the correct functions for the script, click Expression editor, use it to create a valid command (or several commands), and click OK.

    DELETE from FACT_TABLE Where DATE < ADD_DAYS(TODAY(),-365)
    Post-script

    Add the script that you want to run after syncing.

    Optional field.

    Example:

    Drop data that is over 1 year old; to find the correct functions for the script, click Expression editor, use it to create a valid command (or several commands), and click OK.

    DELETE from FACT_TABLE Where DATE < ADD_DAYS(TODAY(),-365)
  5. Save your work by clicking Save.

    Alternatively, click Save and sync now to save your work and sync data at the same time.

Connector-specific sync properties

To modify connection-specific properties for sync, follow these steps:

  1. Open the Advanced setup interface by clicking the toggle to open.

  2. Choose the Sync properties tab.

  3. Scroll down to Sync connector properties, and click the toggle to expand.

  4. Specify the sync properties for files:

    See Sync properties for details, defaults, and examples.

    File name

    Specify name of the file.

    Enable archive on success

    Specify if file needs to be archived once it is succeeded.

    Optional field.

    Delete file on success

    Specify if file needs to be deleted after execution is successful.

    Optional field.

    Column delimiter

    Specify the column delimiter character.

    Skip header rows

    Skip the number of header rows specified while loading the data.

    Optional field.

    Compression

    Specify this if the file is compressed and what kind of compressed file it is.

    Mandatory field.

    Row delimiter

    Specifies the character to be used to indicate the end of the row in the extracted data.

    Enclosing character

    Specify if the text columns in the source data is enclosed in quotes.

    Escape character

    Specify the escape character if using a text qualifier in the source data.

    Null value

    Specifies the string literal indicates the null value for a column. During the data load, the column value matching this string will be loaded as null in the target.

    Date style

    Specifies how to interpret the date format.

    Optional field.

    Date delimiter

    Specifies the separator used in the date format.

    Time style

    Specifies the format of the time portion in the data.

    Time delimiter

    Specifies the character used as separate the time components.

    Decimal delimiter

    Specify the decimal delimiter for float/double/numeric data types.

    Skip trailer rows

    Skip the number of trailer rows specified while loading the data.

    Optional field.

    TS load options

    Specify the parameters passed with the tsload command, in addition to the commands already included by the application.

    Boolean representation

    Specifies the representation of data in the boolean field.

    Optional field.

  5. Save your work by selecting Save.

    Alternatively, select Save and sync now to save your work and sync data at the same time.


Related information


Was this page helpful?