Calculated fields
Overview
A field is a column in a database table. A calculated field is a field that uses existing database fields and applies additional logic — it allows you to create new data from your existing data.
A calculated field either:
-
performs some calculation on database fields to create a value that is not directly stored in the database, or
-
selects values in database fields based on some customized criteria.
When to use calculated fields
Calculated fields lend to more flexibility and efficiency in your analyses.
-
Power multiple visual analyses with one query - You can now create multiple calculations on top of one Helix-powered query. Whether you’re exploring your data or answering follow-up questions from your stakeholders, you’ll no longer have to revisit your multiple SQL queries multiple times.
-
Take full advantage of filtering and drill down features - Previously, some aggregations of pre-aggregated SQL fields led to incorrect results.
-
Empower non-SQL stakeholders - People who aren’t familiar with SQL but perhaps are familiar with similar tools like Tableau can now answer their own questions.
Some common scenarios for when you’d want to use calculated fields include:
-
The metrics you need for your analysis are not directly stored in your data warehouse.
-
You want to transform values for your visualization.
-
You want to quickly aggregate or filter your data.
Creating a calculated field
-
Click the app switcher icon in the top navigation bar and select Analyst Studio, then sign in to your Workspace.
-
Click the green
+
to create a new report in the upper right-hand corner. -
Run a SQL query. (It can be as simple as
SELECT * FROM table
.) -
Create a new chart.
-
Click the New field button to open the calculated field formula editor:
-
From the chart builder, click the New field icon to open the calculated field formula editor:
-
Type in a name and formula for your calculated field. This example uses the formula:
SUM(CASE [Status] WHEN 'CANCELLED' THEN 1 ELSE 0 END)/SUM(1)
.This formula checks for whether the order status was cancelled. It will sum up the tally of cancelled orders and divide by the total number of rows to calculate the cancellation rate.
To see the full list of functions Analyst Studio currently supports, open the panel on the right-hand side. -
When you’re done, hit Apply or Done.
You have now created your first calculated field. You should see it in your fields list, with an equal sign (=) next to the data type icon to indicate that it is a calculated field.
Using a calculated field
In charts
You can chart your calculated field just as you could a SQL-generated field, by selecting and dragging the field into your chart menu.
In filters
You can also filter your calculated field just as you could a SQL-generated field.
Calculated field best practices
Calculation building blocks
These are the four basic components that make up any calculated field:
-
Fields - columns from your data source, can be either a dimension or a measure.
-
Operators - symbols that denote a certain operation, like
+
and-
. -
Functions - transform the given input to an expected output, like
COUNT()
andSUM()
. -
Literal expressions - constant values that are represented as is. This includes numbers (
1
), strings ("This is a string"
), dates (#2020-06-01#
), booleans (true
), andnull
.
Additionally, calculated fields can also contain:
-
Parameters - fixed values that functions expect as input, such as
'week'
inDATEPART()
. -
Comments - notes or commentary about the calculation that will not be included in the computation. Comments in calculated fields are always marked by a prepended
//
.
Field properties
Property | Description | In Analyst Studio |
---|---|---|
Dimension |
Fields that are used to slice and describe data records (for example, names, dates). |
|
Measure |
Typically, the values corresponding to the dimension that will be aggregated (for example, sum, count, average). |
|
Discrete |
Values in the Dataset are distinct and separate. These fields are indicated in Analyst Studio with blue icons. |
|
Continuous |
Values in the Dataset can take on any value within a finite or infinite range. These fields are indicated in Analyst Studio with green icons. |
Available operators
Precedence | Symbol | Name | Description | Example |
---|---|---|---|---|
1 |
- (negate) |
Negate |
Negates the numeric input. |
|
2 |
* |
Multiplication |
Multiplies two numeric types together. |
|
3 |
/ |
Division |
Divides the first numeric input by the second numeric input. |
|
4 |
\+ |
Addition |
Adds two numeric types together. |
|
4 |
\- |
Subtraction |
Subtracts two numeric types. |
|
5 |
= |
Equal to |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
5 |
> |
Greater than |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
5 |
< |
Less than |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
5 |
>= |
Greater than or equal to |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
5 |
<= |
Less than or equal to |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
5 |
<> |
Not equal to |
Compares two numbers, dates, or strings, and returns either TRUE, FALSE, or NULL. |
|
6 |
NOT |
Not |
Negates the boolean or expression. |
|
7 |
AND |
And |
An expression or boolean must evaluate to TRUE on both sides of the AND. |
|
8 |
OR |
Or |
An expression or boolean must evaluate to TRUE on at least one side of the OR. |
|
Precedence dictates the order in which operators will be evaluated in a formula. Parentheses can be used to change the order of precedence.
Available functions
Number
Function | Description | Examples |
---|---|---|
|
Returns the absolute number of the given number. |
|
|
Rounds a number to the nearest integer of greater than or equal value. |
|
|
Returns e raised to the power of the given number, where e is the Euler’s constant 2.718… |
|
|
Rounds a number to the nearest integer of less than or equal value. |
|
|
Returns the base 10 logarithm of a number. |
|
|
Returns the natural logarithm of a number, where the base is Euler’s constant e. |
|
|
Divides the first number by the second number and returns their remainder. |
|
|
Returns the base raised to the inputted exponent power. |
|
|
Returns the number rounded to the nearest specified decimal place. |
|
|
Returns the square root of the given number. |
|
|
Returns the number cut off to the specified decimal place. |
|
|
Returns the given expression if not |
|
String
Function | Description | Examples |
---|---|---|
|
Returns TRUE if the substring is within the string, otherwise returns FALSE. |
|
|
Returns the index position of substring in string or 0 if the substring isn’t found. First character of the string is at position 1. Start is an optional argument to define from where to start the search. |
|
|
Extract the left-most count characters. |
|
|
Returns string with all characters lower-cased. |
|
|
Removes any spaces from the left side of the string. |
|
|
Splits the string along the separator/delimiter, returning the string at the corresponding token index. |
|
|
Replaces all occurrences of the search string with the replace string. |
|
|
Extract the right-most count characters |
|
|
Removes any spaces from the right side of the string. |
|
|
Returns the substring beginning at start. Note that a start value of 1 refers to the first character of the string. If length is provided, the returned substring will include that number of characters at most |
|
|
Removes any spaces from either side of the string. |
|
`UPPER(<string>) ` |
Returns string with all characters upper-cased. |
|
Datetime
Function | Description | Examples |
---|---|---|
|
Adds the specified datepart to the given datetime, where |
|
|
Finds the difference between the two datetimes expressed in units of the given datepart. In the examples on the right, the first expression returns 0 because the two dates are in the same month. The second expression returns 1 because the second date is in a new month, even though the two dates are not 30 days apart. |
|
|
Returns the specified part of the given datetime expression as a number. |
|
|
Returns a date value equal to the given datetime expression truncated to the specified precision. |
|
|
Returns the current datetime. |
|
|
Returns the current date. |
|
Possible <datepart>
values include:
-
second
-
minute
-
hour
-
day
-
week
-
weekday
-
month
-
dayofyear
-
quarter
-
year
Week Start Day customization
The Week Start Day option in the context menu for date fields can be used to customize the week start day to be any day of the week. The default is Sunday. This selection will also be reflected in the +/- granularity controls on the chart.
Week Start Day customization in Quick Charts
Week Start Day customization in Visual Explorer
Year Start customization
Year Start option in the context menu for date fields in Quick Charts and Visual Explorer can be used to customize the start of year to be any month of the year. The default is January. This selection will also be reflected in the +/- granularity controls on the chart. The year start can be adjusted in visualization filters to match the chart by using the settings gear icon in the filter modal.
Type conversion
Function | Description | Examples |
---|---|---|
|
Convert expression to YYYY-MM-DD date format.
Returns |
|
|
Convert expression to YYYY-MM-DD HH:MM:SS format.
Returns |
|
|
Convert the given expression to an integer. The results are rounded towards zero. |
|
|
Convert the given expression to a floating point number. |
|
Logical
Function | Description | Examples |
---|---|---|
|
Returns TRUE if and only if both expressions are true. |
|
|
Performs a series of logical tests for equality and returns the value of the test that first evaluated to true. |
|
|
Performs a series of logical tests, not necessarily always for equality, and returns the value of the test that first evaluated to true. |
|
|
Returns TRUE as long as one of the expressions is true. |
|
|
Returns TRUE if |
|
|
Returns |
|
Aggregate
Function | Description | Example |
---|---|---|
|
Averages the values of items in a group, not including |
|
|
Counts the total number of items in a group, not including |
|
|
Counts the total number of distinct items in a group, not including |
|
|
Returns the excess kurtosis of all input values. |
|
|
Computes the item in the group with the largest numeric value. |
|
|
Computes the median of an expression, which is the value that the values in the expression are below 50% of the time. |
|
|
Computes the item in the group with the smallest numeric value. |
|
|
Returns the most frequent value for the values within x.
|
|
|
Computes the 1st percentile within an expression, which is the value that the values in the expression are below 1% of the time. |
|
|
Computes the 5th percentile within an expression, which is the value that the values in the expression are below 5% of the time. |
|
|
Computes the 25th percentile within an expression, which is the value that the values in the expression are below 25% of the time. |
|
|
Computes the 75th percentile within an expression, which is the value that the values in the expression are below 75% of the time. |
|
|
Computes the 95th percentile within an expression, which is the value that the values in the expression are below 95% of the time. |
|
|
Computes the 99th percentile within an expression, which is the value that the values in the expression are below 99% of the time. |
|
|
Returns the skewness of all input values. |
|
|
Returns the standard deviation of all values in the given expression based on a sample of the population. |
|
|
Returns the standard deviation of all values in the given expression based on the entire population. |
|
|
Sums the total number of items in a group, not including |
|
|
Returns the variance of all values in the given expression based on a sample of the population. |
|
|
Returns the variance of all values in the given expression based on the entire population. |
|
Analytical
Function | Description | Examples |
---|---|---|
|
Returns the number of rows from the current row to the first row of the partition. |
|
|
Returns the index of the current row in the partition. |
|
|
Returns the number of rows from the current row to the last row of the partition. |
|
|
Returns the value of the expression in a target row and can be specified as a relative offset number from the current row. |
|
|
Distributes the rows in an ordered partition into the specified (integer) number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs. The default order is descending. |
|
|
Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. The default order is descending. |
|
|
Returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row. The default order is descending. |
|
|
Returns the running average of the given expression, from the first row in the partition to the current row. The given expression must be either an aggregate or a constant. |
|
|
Returns the running count of the given aggregate expression, from the first row in the partition to the current row. The given expression must be either an aggregate or a constant. |
|
|
Returns the running sum of the given aggregate expression, from the first row in the partition to the current row. The given expression must be either an aggregate or a constant. |
|
|
Returns the total for the given expression, calculated using all rows within that partition. |
|
|
Returns the average of the given expression within the window.
The window is defined by means of offsets from the current row.
The given expression must be either an aggregate or a constant. |
|
|
Returns the count of the given expression within the window.
The window is defined by means of offsets from the current row.
The given expression must be either an aggregate or a constant.
|
|
|
Returns the sum of the given expression within the window.
The window is defined by means of offsets from the current row.
The given expression must be either an aggregate or a constant.
|
💡 For calculated field window functions, it will be helpful to understand how window partitions are defined.
SQL allows you to perform aggregations in different levels of the view using window functions, generally written as OVER (PARTITION BY column)
.
Window functions also exist in calculated fields, though the way you define window partitions is different.
-
Instead of specifying the partition directly in the formula code, you’d drag and drop the field into your chart axis along with your window calculated field. The system will automatically re-calculate the values depending on your dimension.
-
For moving windows, you’d specify a
<start>
and<end>
relative to the current row.-
In general,
-n
refers to the nth row before the current row, andn
refers to the nth row after the current row. -
You can also crate offsets based on the first or last rows in the expression, using
FIRST()+n
andLAST()-n
.-
FIRST()
always returns-1
for the second row,-2
for the third row, etc. -
LAST()
always returns1
for the second-to-last row,2
for the third-to-last row, etc.
-
-
The corresponding formula for this window sum would be WINDOW_SUM(SUM([field]), -3, 2)
.
Calculated field component types
Unlike your SQL results, which are always constants, calculated fields have different computation levels:
Order | Type | Description | Examples |
---|---|---|---|
1 |
Constant |
A fixed value. |
|
2 |
Scalar |
Values are mapped to a single result in a one-to-one manner. |
|
3 |
Aggregate |
Values of multiple rows are grouped together as the input to form a single value of more significant meaning. |
|
4 |
Analytical |
Computes aggregate values over a group of rows. |
|
Component operations
You can combine various component types in operation.
Example:
-
1 + [column]
will add 1 to every row in your column. The result of that operation will take the greatest order of the combined data types —constant + scalar
returns ascalar
result. -
1 + SUM([column])
However, there are limitations to what calculated fields you can use in functions.
Non-examples:
-
Aggregating an aggregate -
SUM(COUNT([column]))
❌ -
Mixing aggregate and non-aggregate values in certain functions -
DATEDIFF('day', created_at, MAX(updated_at))
❌ -
Using scalar values in an analytical function -
RUNNING_COUNT([id])
❌
FAQs
Q: How to do a CASE statement where the condition is a comparison (e.g. <=)?
You use CASE
statements for direct equality against one field.
For example:
CASE [status]
WHEN 'Completed' THEN 1
WHEN 'Cancelled' THEN 0
ELSE NULL END
If you wish to compare multiple fields or use comparisons, then you’d use an IF
statement.
For example:
IF [revenue] > 0 OR [cost] < 0 THEN 'Profitable'
ELSEIF [revenue] = 0 OR [cost] = 0 THEN 'Neutral'
ELSE 'Unprofitable'
END
Q: Are special characters allowed in the calculated field name?
We currently do not allow brackets like [
and ]
in the calculated field name.
This is for parsing and usability reasons, because you can reference calculated fields by their names in other calculated field formulas.
Troubleshooting
1. Why am I getting a 'Cannot combine aggregate and non-aggregate fields' error?
You cannot directly combine and/or compare aggregate and non-aggregate fields because they are different component types.
-
Let’s say your non-aggregate field contains the data
[1, 2, 3, 4, 5]
. It has a cardinality of5
. -
An aggregate calculated field, such as
SUM([field])
yields the result15
. It has a cardinality of1
.
2. My calculated field is not saving.
A calculated field will not be saved if it exceeds the maximum number of characters (1024). Please ensure that your calculated field does not exceed this limit in order to save it successfully.
If the issue is not the above, please don’t hesitate to reach out to ThoughtSpot Support for further assistance.