Constraints

Constraints allow you to build relationships and join tables.

Constraints include primary keys, foreign keys, and relationships. Relationships allow you to create a generic relationship for use when you want to join tables that don’t have a primary key/foreign key relationship.

Defining a generic relationship in the UI rather than using a primary key/ foreign key join through TQL has no impact on performance. However, when creating relationships in the UI, you must ensure that you create it in the right direction: many to one. To create many-to-many joins, or to create joins using >, <, >=, or <=, use TQL.

Primary keys

When a primary key is selected for a table, it impacts data loading behavior. When a new row is added:

  • If another row already exists with the same primary key, it is updated with the values in the new row.

  • If a row with the same primary key does not exist already, the new row is inserted into the table.

This behavior is referred to as “upsert” because it does an INSERT or an UPDATE, depending on whether a row with the same primary key already exists.

Note that ThoughtSpot does not check for primary key violations across different shards of the table. Therefore, you need to shard the table on the primary key columns if you require this “upsert” behavior.

Permitted joins and necessary permissions

You can join any data object to another object, with the following 2 restrictions:

  • You can only join a worksheet to a user-uploaded CSV table.

  • You cannot join objects across connections. This means that you cannot join an object that is part of one connection to an object in a different connection.

Foreign key relationships

Foreign key relationships tell ThoughtSpot how two tables can be joined. These relationships are only used for joining the tables, and not for referential integrity constraint checking.

The directionality of primary key - foreign key relationships is important. The foreign key relationship is defined on the fact table and references the primary key(s) in the dimension table. So you can think of the fact table as the source and the dimension table as the target. In the schema viewer, notice that the arrow that represents a PK/FK join points to the dimension table.

If you use primary and foreign keys, when users search the data from the search bar, tables are automatically joined. For example, assume there are two tables:

  • revenue, which is a fact table

  • region, which is a dimension table

There is a foreign key on the fact table on regionid which points to the id in the region dimension table. When a user types in "revenue by region", the two tables will be joined automatically.

Foreign keys have to match the primary key of the target table they refer to. So if there are multiple columns that make up the primary key in the target table, the foreign key must include all of them, and in the same order.

Generic relationships (many-to-many)

You may have a schema where there is a fact table that you want to join with another fact table. If there isn’t a primary key/foreign key relationship between the tables, you can use many-to-many to enable this. You can do this by using the RELATIONSHIP syntax to add a link between them, that works similarly to the WHERE clause in a SQL join clause.

A many-to-many implementation may lead to overcounting. We recommend that you avoid using aggregation or count formulas in your search; They may count some rows multiple times, because the data satisfies the join condition for multiple rows.
We recommend that you avoid using many-to-many joins. When you have a use case that depends on joining two fact tables, we recommend that you create a bridge table between them, as a chasm trap. If the fact tables share common data, use it to create the dimension table that acts as a bridge. For example, a date or product dimension could join an inventory fact table to a sales fact table.

The generic join is a special kind of relationship that applies to specific data models and use cases. For example, suppose you have a table that shows wholesale purchases of fruits, and another table that shows retail fruit sales made, but no inventory information. In this case, it would be of some use to see the wholesale purchases that led to sales, but you don’t have the data to track a single apple from wholesale purchase through to sale to a customer.

In a many-to-many relationship, the value(s) in a table can be used to join to a second table, using an equality condition (required) and one or more range conditions (optional). These conditions act like the WHERE clause in a SQL JOIN clause. They are applied using AND logic, such that all conditions must be met for a row to be included.

To use a many-to-many relationship, you need to follow a few rules:

  • There must be one equality condition defined between the two tables.

  • Each table must be sharded on the same key as the equality condition.

  • There can optionally be one or more range conditions defined.

This example shows the TQL statements that create the two fact tables and the relationship between them.

TQL> CREATE TABLE "wholesale_buys" (
  "order_number" VARCHAR(255),
  "date_ordered" DATE,
  "expiration_date" DATE,
  "supplier" VARCHAR(255),
  "fruit" VARCHAR(255),
  "quantity" VARCHAR(255),
  "unit_price" DOUBLE
  )  PARTITION BY HASH (96) KEY ("fruit");

TQL> CREATE TABLE "retail_sales" (
  "date_sold" DATE,
  "location" VARCHAR(255),
  "vendor" VARCHAR(255),
  "fruit" VARCHAR(255),
  "quantity" VARCHAR(255),
  "sell_price" DOUBLE
  )  PARTITION BY HASH (96) KEY ("fruit");

TQL> ALTER TABLE "wholesale_buys" ADD RELATIONSHIP WITH "retail_sales"
     AS "wholesale_buys"."fruit" = "retail_sales"."fruit"
     AND ("wholesale_buys"."date_ordered" < "retail_sales"."date_sold"
          AND "retail_sales"."date_sold" < "wholesale_buys"."expiration_date");

Related information