Although there is no native incremental load capability, you can do incremental loads by adding a filter on the source data. Filters only work if the source has the capability of providing information on the last successful load. This means the source should have a timestamp field that indicates when a particular source was modified.
If your source data has such a timestamp, then you can use the following filters:
modifiedDate > $LastRunDate
modifiedTime > $LastRunTime
The $LastRunDate
and $LastRunTime
filters are correctly updated at runtime
to reflect the date/time of the last successful job run. Both $LastRunDate
and $LastRunTime
are in GMT, they cannot be reconfigured. `
You can use filters on any other column of any datatype, if the filter provides a way to identify the modified rows. Even if the source does not have this capability, you may still be able to do an incremental load.
As long as a primary key exists on the target ThoughtSpot table, the data will
be loaded using an upsert
. This means that for a given row being loaded, if
the primary key already exists in the target table, that row is updated with the
data in the new row. If a row with that primary key does not exist, the new row
will be inserted. This is how ThoughtSpot handles data loads into a table that
already contains rows, and it is the equivalent to an incremental load.
If you are relying on the upsert
method, you’ll want to run a COMPACT TABLE
command in ThoughtSpot SQL Command Line (TQL) to reduce the size of the table at
intervals (usually weekly).
Another technique that can effectively free up space would be to run a post-job script that deletes data older than a particular date. This keeps the data fresh and ensures that old data, which may not be valuable, is not taking up memory.