The Notebook

Overview

Every Analyst Studio Report contains an integrated notebook-style environment where analysts can extend their analysis using either Python or R.

The Notebook’s movable code blocks and markdown cells enable exploratory data analysis, visualization, and collaboration. Notebook environments have a variety of supported Python libraries and R packages installed. You can add the results of output cells to Reports, or share a link to the Notebook directly. When Notebook output is included in the Report, that Report’s schedule will re-run the Notebook so all of the data stays in sync.

Using the Notebook

To get started using the Notebook:

  1. Open an existing Report or create a new Report and run one or more SQL queries from the SQL Editor.

  2. Click New Notebook. Your query results will automatically be loaded into a datasets object.

  3. On the right-side panel, click the dropdown to select the environment you want to launch a Notebook from, usually either Python 3, R, or Python 3 Edge.

notebook language

Key elements of the Analyst Studio Notebook:

  • Toolbar

    • Where you can manipulate and run your Notebook, restart the session, export, and more.

  • Cells

    • Compose code and view results in a Code cell, or contextualize your work with a Markdown cell.

  • Resources Panel

    • The right-side panel provides resources to help you, including keyboard shortcuts, external documentation, and supported libraries/packages.

  • Status Indicator

    • Where you are notified about your Notebook session status.

Toolbar

Main Toolbar

Notebook Toolbar
  1. Run All - Runs all input cells in the Notebook in sequence (from top to bottom).

  2. Restart - Stops any current computations running in the Notebook.
    Restarts the session, thus clearing all the variables, libraries imported, etc., that were defined. However, code in input cells will be available to re-run after the Notebook restarts.

  3. Run Cell and Advance - Runs code in the selected cell and advances to the next cell.

  4. Add New Cell - Adds new input cell above or below the current cell.

  5. Add to Report - Adds the output of the selected cell to the Report Builder.

  6. More → Hide All Output - Hide all output in the notebook. Refreshing the Notebook will show the output that was previously hidden.

  7. More → Export - Exports all input cells as a .py or .r file.

Cell Toolbar

Cell Toolbar
  1. Run Cell and Advance - Runs code in the selected cell and advances to the next cell.

  2. Add New Cell - Adds new input cell above or below the current cell.

  3. Move Cell Up - Moves the current input or markdown cell up.

  4. Move Cell Down - Moves the current cell down.

  5. Fold Cell - Folds (hides) the current cell. Folded cells can still run.

  6. Freeze Cell - Freezes the current input cell so that no changes are allowed; also prevents this cell from running.

  7. Markdown/Code dropdown - Allows you to select the type for the current input cell (as code or markdown).

  8. Add to Report - Adds the output of the selected cell to the Report Builder.

  9. Delete Cell - Permanently removes cell from the notebook.

Working with cells

There are two types of cells in the Notebook:

Markdown - Markdown cells allow you to add context to your analysis. Markdown cells contains text formatted using Markdown and display their output in-place when it’s run.

Code - Input Python or R code into the IN section of the cell. When this cell runs, any corresponding output (including visualizations) will be shown in the OUT section.

Notes:

  • When you run your Notebook, cells are executed in the order they are displayed, starting from the top cell.

  • To select or change a cell’s type, go to the dropdown menu in the cell toolbar and choose Code or Markdown.

  • To run a cell, select it and press Shift + Return. Or click Run Cell in the cell or main toolbar.

  • The number next to the cell label will increment by one every time code in the cell is successfully run.

  • To see available methods for an object, type the name of the object followed by . and then press tab.

Notebook status

The status indicator, located in the bottom right corner of the browser window, will notify you if there is an issue with your session. It may prompt you to restart the kernel.

  • Setting up notebook - Displayed when opening up a new Notebook, or after re-starting your session.

  • Ready - Notebook is ready to go.

  • Running - Your code is executing.

  • Loading dataframes - This message may display for larger datasets while dataframe information is loaded into the Notebook.

  • Notebook has encountered an unexpected error - Your session has crashed and will need to be restarted.

  • There was a problem with your session - Your session has terminated, and you need to click Restart to get things working again.

  • Cell is still running. Hang tight! - This can appear when code being run includes long-running, computationally intense functions. The Notebook is still online.

  • Notebook is having trouble, try running again - The Notebook is experiencing problems. Please try running your code again to fix the issue.

Accessing query results

The Notebook has access to the results of every query in your Report. However, the way you access those results differs depending on the language you’re using. In each case, all query results are delivered to the Notebook as a custom object called datasets. datasets contains objects of the following type:

In your Notebook code, reference query result sets in the datasets list by query name, position, or token. For example:

To return results for: Python R

First query added to report

datasets[0]

datasets[[1]]

Second query added to report

datasets[1]

datasets[[2]]

Query named 'Active Users'

datasets["Active Users"]

datasets[["Active Users"]]

Query with token '6763b688fb54'

datasets["6763b688fb54"]

datasets[["6763b688fb54"]]

Notes:

  • The datasets object won’t update in the Notebook until after all queries in the Report have run successfully.

  • R is 1-indexed and Python is 0-indexed.

  • If you refer to query results by the query name, remember to update your code if you rename the query in your Report.

  • The order of the results in the datasets object is based on when the query was added to the Report. Renaming a query may change the order it’s displayed in the report editor, but will not affect its position in the datasets object.

How to find a query’s token

To find the query token starting from the Notebook or editor, click View in the header, then View details, and then click SQL for the query you wish to use. The URL for SQL contains the query token at the end:

https://app.mode.com/ORGANIZATION_USERNAME/reports/REPORT_TOKEN/queries/QUERY_TOKEN

Query token

Memory management in Python

Analyst Studio’s Python notebook has 16GB of RAM and up to 16 CPU available to it. The free Analyst Studio Studio notebook is limited to 4GB of RAM and 1 CPU. To effectively manage memory usage in the Analyst Studio Notebooks, consider (1) data load of query result sets, (2) incremental library installation, (3) memory utilization in session.

Data load of query result sets

Query result sets are loaded into the notebook when the user explicitly references the query. Users can consistently load up to 2GB per raw query result as a pandas dataframes in the Notebook.

Incremental library installation

For Analyst Studio Business and Enterprise paid plans, the Notebook environment has up to 1 GB of memory available to load additional packages.

Memory utilization in session

Memory usage in the python Notebook can be checked by running the following command:

from pympler.tracker import SummaryTracker
tracker = SummaryTracker()
tracker.print_diff()

Output

Add CSV export to a cell

You can add an export button to a Notebook output cell so viewers can export the calculated results contained in any dataframe to a CSV. The following examples add an export button to an output cell that will generate a downloadable CSV of the query results of a query named “New Users”:

  • Python

  • R

import notebooksalamode as mode

# Required library in Python

df = datasets["New Users"]

# export_csv() accepts any valid pandas DataFrame.

mode.export_csv(df)

# This example uses the result set from a query named "New Users".

df <- datasets[["New Users"]]

# export_data() accepts any valid Data Frame.

export_data(df)

# This example uses the result set from a query named "New Users".

Supported libraries

Analyst Studio enables easier access to advanced analytical functions by supporting well-established, public libraries within Analyst Studio’s Notebooks. Common use cases include:

  • Data Manipulation - Cleaning, aggregating, and summarizing data.

  • Statistics - Simple things like distributions, regressions, and trend lines, as well as some advanced tasks like predictive modeling and machine learning.

  • Advanced Visualization - Python and R have many visualization libraries, enabling analysts to quickly build charts including heatmaps, small multiples, and distributions.

Python

Analyst Studio supports Python version 3.9 in the Notebooks.

Each environment comes preloaded with the following libraries:

Library Version (Py3) Version (Edge) Description

arrow

1.2.3

1.2.3

date & time manipulation & formatting

beautifulsoup4

4.11.1

4.11.1

parsing HTML, JSON & XML data

cufflinks

0.17.3

0.17.3

bind Plotly directly to pandas dataframes

cvxopt

1.3.0

1.3.0

convex optimization library

dask

2022.11.1

2022.11.1

flexible open-source Python library for parallel computing

duckdb

0.6.0

0.6.0

in-process database management system focused on analytical query processing

emcee

3.1.3

3.1.3

MIT MCMC library

engarde

0.4.0

0.4.0

defensive data analysis

fiona

1.8.22

1.8.22

read & write geospatial data files

folium

0.13.0

0.13.0

build Leaflet.js maps

gensim

4.2.0

4.2.0

unsupervised semantic modeling from plain text

geopandas

0.12.1

0.12.1

extends pandas to allow spatial operations on geometric types

gviz_api

1.10.0

1.10.0

helper library for Google Visualization API

hdbscan

0.8.29

0.8.29

clustering with minimal parameter tuning

igraph

0.10.2

0.10.2

network analysis tools

interpret

0.3.0

0.3.0

fit interpretable ML modes. Explain blackbox ML

jmespath

1.0.1

1.0.1

JSON element extraction

jsonify

0.5

0.5

converts from CSV to JSON

keras

2.11.0

2.11.0

neural networks API run on TensorFlow or Theano

lifelines

0.27.4

0.27.4

survival analysis

lifetimes

0.11.3

0.11.3

user behavior analysis

mapbox

0.18.1

0.18.1

client for Mapbox web services

matplotlib

3.6.2

3.6.2

2D plotting visualizations

networkx

2.8.8

2.8.8

complex network manipulation

nltk

3.7

3.7

natural language toolkit

numexpr

2.8.4

2.8.4

fast numerical array expression evaluator

numpy

1.22.1

1.22.1

various scientific computing functions

pandas

1.4.4

1.4.4

data structures & data analysis tools

pandas_profiling

3.5.0

3.5.0

generates profile reports from a pandas DataFrame

pandasql

0.7.3

0.7.3

query pandas dataframes using SQL syntax

patsy

0.5.3

0.5.3

describing statistical models/building design matrices

pip

21.2.5

-

package installer

plotly

5.10.0

5.10.0

data visualizations, dashboards & collaborative analysis

plotly-geo

1.0.0

1.0.0

geographic shape files to support plotly map functionality

prettytable

3.4.1

3.4.1

display tabular data in ASCII table format

prophet

1.1.1

1.1.1

forecasting with time series data

pygal

3.0.0

3.0.0

create interactive svg charts

pygraphviz

1.10

1.10

interface for Graphviz graph layout & visualizations

pygsheets

2.0.5

2.0.5

access Google spreadsheets through the Google Sheets API

pymc3

3.11.5

3.11.5

probabilistic programming & Bayesian modeling

pympler

1.0.1

1.0.1

measure, monitor and analyze the memory behavior of Python objects

pyproj

3.4.0

3.4.0

cartographic transformations & geodetic computations

pysal

2.7.0

2.7.0

geospatial analysis library

pyzipcode

2.2

-

query zip codes & location data

pyzipcode3

2.2

2.2

query zip codes & location data

requests

2.28.1

2.28.1

make HTTP requests

scikit-image

0.19.3

0.19.3

image processing

scikit-learn

1.1.3

1.1.3

tools for data mining & analysis

scikits.bootstrap

1.1.0

1.1.0

bootstrap confidence interval algorithms for scipy

scipy

1.7.3

1.7.3

advanced math, science & engineering functions

scrapy

2.7.0

2.7.1

scraping web pages

seaborn

0.12.1

0.12.1

statistical graphics visualizations

shapely

1.8.5.post1

1.8.5.post1

manipulation & analysis of geometric objects

six

1.16.0

1.16.0

Python 2 & 3 compatibility library

spacy

3.4.2

3.4.3

advanced natural language processing, including all small pipelines

squarify

0.4.3

0.4.3

implementation of the squarify treemap layout algorithm

statsmodels

0.13.5

0.13.5

estimate statistical models & perform statistical tests

sympy

1.11.1

1.11.1

symbolic mathematics

tabulate

0.9.0

0.9.0

pretty-print tabular data

tensorflow

2.10.0

2.11.0

numerical computation using data flow graphs

tensorflow-decision-forests

1.1.0

1.1.0

train, run and interpret decision forest models in tensorflow

textblob

0.17.1

0.17.1

common natural language processing tasks

ua_parser

0.16.1

-

fast & reliable user agent parser

urllib3

1.26.13

1.26.13

HTTP client for python

wordcloud

1.8.2.2

1.8.2.2

wordcloud generator

xgboost

1.7.1

1.7.1

optimized distributed gradient boosting library

We strongly discourage using either the requests or pygsheets libraries to access APIs that require authentication using personally identifiable credentials and information, as they will be visible to viewers of your Report.

Edge

Analyst Studio provides access to an additional Python 3 environment called Python 3 Edge where pending library upgrades are staged. Analysts should use Edge as an alternative environment where they can test out the updated versions of supported Python libraries without fear of jeopardizing scheduled Reports.

Analyst Studio will announce periodic scheduled promotion events via emails to Analyst Studio account administrators. Users will have at least 30 days from that time for testing and validation before the library updates will be made in the broader Python 3 environment. Any Notebooks using the Edge environment will be migrated to use the Python 3 environment at the same time.

Analysts can access Edge via the environment dropdown in the upper right-hand corner of the Notebook. When switching between environments, remember to Restart the Notebook session.

python edge environment

R

The Notebook supports R version 4.2.0 and comes preloaded with the following R packages:

Library Version Description

BTYD

2.4.3

buy-til-you-die (BTYD) models

BTYDplus

1.2.0

extends BTYD

CausalImpact

1.2.7

estimates causal effect of intervention on time series

GGally

2.1.2

extension to ggplot2

MASS

7.3-58.1

functions & datasets to support Venables & Ripley

RColorBrewer

1.1-3

ColorBrewer palettes

assertthat

0.2.1

easy pre- and post-assertions

blob

1.2.3

S3 class to represent BLOBs

caret

6.0-93

streamlines creation of predictive models

cluster

2.1.4

cluster analysis extended Rousseeuw et al.

colorspace

2.0-3

color space manipulation

data.table

1.14.2

extends data.frame

diagrammeR

1.0.9

Build graph/network structures

dichromat

2.0-0.1

color schemes for dichromats

digest

0.6.29

create compact hash digests of R objects

dplyr

1.0.10

a grammar of data manipulation

forcats

0.5.2

working with categorical variables (factors)

forecast

8.17.0

forecasting for time series & linear models

fpp3

0.4.0

Datasets referenced in book "Forecasting: principles and practice"

ggdendro

0.1.23

dendrograms & tree plots with ggplot2

ggplot2

3.3.6

system for creating graphics

ggpubr

0.4.0

publication-ready ggplot2 plots

ggridges

0.5.3

ridgeline plots in ggplot2

ggthemes

4.2.4

extra themes, scales, & geoms for ggplot2

glue

1.6.2

glue strings to data

gtable

0.3.1

arrange grobs in tables

hts

6.0.2

hierarchical & grouped time series

httr

1.4.4

tools for working with URLs & HTTP*

iterators

1.0.14

provides iterator construct

itertools

0.1-3

various tools for creating iterators

janitor

2.1.0

various tools for creating iterators

kernlab

0.9-31

kernel-based machine learning lab

kknn

1.3.1

weighted k-nearest neighbors

lars

1.3

least angle regression, lasso & forward stagewise

lattice

0.20-45

trellis graphics

lazyeval

0.2.2

lazy (non-standard) evaluation

leaflet

2.1.1

Create interactive Web Maps

lubridate

1.8.0

date and time manipulation

magrittr

2.0.3

a forward-pipe operator

modelr

0.1.9

modelling functions that work with the pipe

munsell

0.5.0

utilities for using Munsell colors

nnet

7.3.17

feed-forward neural networks & multinomial log-linear models

plotly

4.10.0

data visualization, dashboards & collaborative analysis

prophet

1.0

automatic forecasting procedure

proto

1.0.0

prototype object-based programming

purrr

0.3.4

tools for working with functional vectors

reshape2

1.4.4

transform data between wide & long

rlang

1.0.5

functions for base types & core R & tidyverse features

scales

1.2.1

scale functions for visualizations

stringr

1.4.1

work with character strings & reg ex

tidyr

1.2.1

easily create tidy data

tidytext

0.3.4

conversion of text to and from tidy formats

tm

0.7-8

text mining

utf8

1.2.2

fixes bugs in R’s UTF-8 handling

viridisLite

0.4.1

port of matplotlib color maps

xml2

1.3.3

parse XML

zoo

1.8-11

S3 infrastructure for regular & irregular time series

We strongly discourage using the httr library to access APIs that require authentication using personally identifiable credentials and information, as they will be visible to viewers of your Report.

Install additional libraries

To use a publicly available library in the Notebook that is not listed above, users leverage each environment’s package manager to install that library at run-time. The Notebook environment has up to 1 GB of memory available to load additional packages.

This offers a workaround to try to install additional libraries, beyond what Analyst Studio currently supports, into the Notebook. It is not guaranteed to work in all cases. Only supported libraries have been tested to function as expected in Analyst Studio’s Notebooks.

Analyst Studio’s Notebook architecture does not enable manually installed libraries to have access to the Notebook’s kernel. This means that manually installed versions of popular and interactive libraries like Plotly, Bokeh, and ipywidgets will not function as expected even if the package installation appears to succeed.

Unlike officially supported libraries, you must install packages for any additional libraries in each individual report’s Notebook environment. You must add the below package installation commands to the Notebook in each report where you want the corresponding libraries to be available. Avoiding these commands can result in the library not installing and/or importing properly.

Some libraries require authentication with credentials (for example, Tweepy, requests, etc.). We strongly discourage using libraries that require authentication using personally identifiable credentials and information, as these credentials will be visible to viewers of your report.

Python

First, enter the following command into a Notebook cell for each public package that you want to install into the Python Notebook, as demonstrated below with the bloom-filter package (replace bloom-filter with the name of the package you want to install):

! pip install bloom-filter
              ^^^^^^^^^^^^
              Package name

Alternatively, users can try to upgrade a supported package to a more recent version using:

! pip install [package name]==[version.x.y] --upgrade

Next, in a subsequent cell, add an import statement for each library that you want to include in your environment. For example:

from bloom_filter import BloomFilter
     ^^^^^^^^^^^^
     Package name

You may now use any of the methods or functionality included in the library in subsequent Notebook cells.

R

First, enter the following command into a Notebook cell for each public package that you want to install into the R Notebook, as demonstrated below with the random package (replace random with the name of the package you want to install):

install.packages("random")
                  ^^^^^^
                  Package name

Next, invoke the library command for each library you want to include in your environment from the installed package(s). For example:

library("random")
         ^^^^^^
         Library name

You may now use any of the methods or functionality included in the library in subsequent Notebook cells.

Notebook keyboard shortcuts

General

Action Mac PC

Edit selected cell

Return

Enter

Run cell

Shift + Return

Shift + Enter

Select cell above

K or

K or

Select cell below

J or

J or

Insert cell above

A

A

Insert cell below

B

B

Move cell above

Shift + Option +

Shift + Alt +

Move cell below

Shift + Option +

Shift + Alt +

Code editor

Action Mac PC

Code complete or indent

Return

Enter

Select all

+ A

Ctrl + A

Undo

+ Z

Ctrl + Z

Redo

+ Y

Ctrl + Y

Run cell

+ Enter

Ctrl + Enter

Insert cell below

Option + Enter

Alt + Enter

Python Notebooks secrets store

Overview

The secrets store provides users with an intuitive and secure way to protect their credentials used in the Notebook. This helps users to extend their analysis by pulling in the data and libraries they need outside of SQL queries against their data warehouse. These credentials are stored encrypted and obfuscated to all users.

Secret store is currently only available for Python Notebooks.

Managing secrets

  • Users can add secrets at a Report level and the secret only applies to that Report.

  • All Editors of that Report can use, edit, and delete existing secrets. They can also add new secrets to the Report.

  • Once secret values are added, they will always be obfuscated. Editing a secret would mean replacing the old secret with a new one. There is no way to print a secret value after it is added.

  • Users cannot use a secret from a Notebook in another Report, even if duplicating a Report with an existing secret.

Using the secret store

  1. In the Python Notebook, click New Secret on the right side panel, under the Secrets tab and add the Display name and Secret value.

    Notebook new secret

    Secrets will need to meet the following criteria:

    1. The secret display name must be within 1-100 characters long.

    2. The secret display name can only contain alphanumeric characters and underscores, and must begin with a letter.

    3. The secret values must be within 1-4096 characters long.

  2. Once saved, users can use the Display Name as a variable in the Python cells.

    Use the display name as a variable

Editing secrets

Once secret values are added, they will always be obfuscated. Editing a secret would mean replacing the old secret with a new one.

Edit a secret

Deleting secrets

Deleting a secret will also break any existing references to the secret in the Notebook. Any editor of the Report can delete a secret and the action can’t be undone.

Delete a secret

Administrative features

  • Admins can use Discovery Database (DDB) to get a list of all Reports using secrets.

  • Changes made to secrets are audited, and customers should reach out to ThoughtSpot Support to obtain that information.

FAQs

Q: Can you visualize a Notebook-generated visualization with Analyst Studio’s native chart editor?

At this time, it is not possible to use our visualization tools, such as Quick Charts and Visual Explorer, to manipulate Python/R dataframes. To visualize data from a Notebook, you will need to use a visualization library to create a visualization. If you would like to see this functionality added in the future, please contact ThoughtSpot Support, and they will be happy to add a request on your behalf for future consideration.

Q: How to pass parameters into the Notebook?

To pass parameters to your Notebook, you must add them as a column in your SQL query. You can then access those column(s) in the dataset object in your Notebook:

SELECT
 '{{team}}' AS param
FROM
 benn.nfl_touchdowns

Q: Can I use dbt Metrics in Notebooks?

Yes. Since metrics charts are SQL Queries under the hood, their results are made available to the Notebook and appear as data frames alongside all other Query results in a given Report.

Q: When do queries in the Notebook start to execute after a report run?

SQL Queries are kicked off simultaneously, and their results come in based on the processing time of your database. The Notebook will wait until all SQL queries have successfully returned results before running. This is because the logic is set up such that the Notebook does not know which query results execution is dependent on, so to be safe, it waits for all the SQL queries to finish running.

Therefore, it is possible that the Notebook would render faster, but it must wait for all queries to finish running.

Q: Do you have a tutorial where I can learn Python for business analysis using real-world data?

We do have a tutorial available that teaches Python for business analysis using real-world data. This tutorial is designed for users with little or no experience with Python, and it covers everything from the basics of the language to advanced techniques for analyzing and visualizing data.

If you’re interested in learning how to use Python for business analysis, this tutorial is a great place to start. It includes step-by-step instructions and hands-on exercises to help you apply what you learn to real-world scenarios.


Was this page helpful?