Set up the JDBC driver for Pentaho

Use Pentaho to create a JDBC connection to ThoughtSpot.

You can use the Pentaho Data Integration (PDI) to create a JDBC connection. The Pentaho Data Integration (PDI) suite is a comprehensive data integration and business analytics platform. You can use it to create a JDBC connection to ThoughtSpot.

PDI consists of a core data integration (ETL) engine and GUI applications that allow you to define data integration jobs and transformations. Through Pentaho, we primarily use the JDBC driver to set up a connection. The process is not as complicated as with SSIS, and is much more lenient.

Community and enterprise editions of PDI are available. Using the community edition is sufficient, though you may use the enterprise edition, which is subscription based, and therefore contains extra features and provides technical support.

Use JDBC to connect to the ThoughtSpot Simba server from Pentaho. The connection will be made between a new ThoughtSpot Table Input and Output objects.

Check the ThoughtSpot IP and the simba_server status

Before you begin, you need to know the IP address or DNS name of the server you intend to connect your server to.

  1. SSH as admin or the thoughtspot user to your ThoughtSpot node.

  2. Verify the node IP(s).

    $ tscli node ls
    172.18.231.17
    172.18.231.18
  3. Make a note of each IP; there may be more than one.

  4. Configure the ThoughtSpot firewall to allow connections from your ETL client, by running the following command on any ThoughtSpot node: tscli firewall open-ports --ports 12345

  5. Exit or close the shell.

Install the Simba drivers in the Pentaho directories

Before starting the Pentaho Data Integration (PDI) client and creating the connection, ensure that the Simba JDBC client libraries are present in the Pentaho client/server machines. This will ensure that the drivers picked up at runtime.

  1. Log in to the local machine where you have already installed the Pentaho Data Integration (PDI) client.

  2. Click Here to download the JDBC driver.

  3. Select JDBC Driver to download the file thoughtspot_jdbc<version>.jar.

  4. Copy the thoughtspot_jdbc<version>.jar file to the following directories:

    • <Pentaho_install_dir>/server/data-integration-server/tomcat/webapps/pentaho-di/WED-INF/lib/

    • <Pentaho_install_dir>/design-tools/data-integration/lib/

    • <Pentaho_install_dir>/server/data-integration-server/tomcat/lib/

    • <Pentaho_install_dir>/design-tools/data-integration/plugins/spoon/agile-bi/lib/

Set up the driver

This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite. It is also known as the Kettle project.

Create a transformation

Do the following on your ETL workstation with the Pentaho client:

  1. Open the PDI client.

     ./spoon.sh &>/dev/null &
  2. Right-click View  Transformations tab.

  3. Select New to create a new transformation.

    View > Transformations > New
  4. Select Input under the Design tab to expand it.

  5. Drag and drop CSV File Input to the Transformation window.

    This opens a new CSV file.

    CSV file input
  6. Double-click the CSV File Input icon to open the CSV Input dialog .

  7. Name the Step.

  8. Select Browse next to the Filename field and provide the file you want to read from.

  9. Select OK.

    Add your CSV file and select OK
  10. In the CSV Input dialog, select Get Fields.

  11. Enter the number of lines you would like to sample in the Sample size dialog. The default setting is 100.

  12. Select OK when you are ready.

    The tool reads the file and suggests the field name and type.

    Scan results
  13. Select Preview to preview the data.

  14. Enter the number of rows to preview in the Preview size dialog.

    The default setting is 1000. Select OK to start the transformation in preview.

  15. Examine the preview data, then select Close.

    You may want to verify that you are able to read the data using the SQL query from ThoughtSpot.

    Preview data
  16. Select OK in the CSV Input dialog to confirm your CSV input settings.

Define the Output

  1. Select Design  Output.

  2. Drag and drop Table output to the Transformation window.

    Drag table output to the transformation window
  3. Double-click the Table output icon to open the Table output dialog.

  4. Enter a Step name.

  5. Select New to create a new connection.

    Create a new connection
  6. Enter or select the following information in the Database Connection dialog:

    Connection Name

    Any string

    Connection Type

    Generic database

    Access

    Native (JDBC)

    Custom Connection URL

    The IP is a node in your ThoughtSpot cluster. The name or schema of the database to which you are connecting.

    Use TQL to create a database name if needed. Ensure that there are no leading or trailing spaces. Syntax;; jdbc:simba://SERVER_IP:12345;Database=DATABASE_or_SCHEMA_NAME

    Custom Driver Class Name

    Ensure that there are no leading or trailing spaces.

    Syntax

    com.simba.client.core.jdbc4.JDBC4Driver

    User Name

    A ThoughtSpot username. If you leave this empty, you are prompted for it at connection time. This user should have Data Management privileges on ThoughtSpot.

    Password

    The password for the User Name. If you leave this empty, you are prompted for it at connection time.

    Database connection dialog
  7. Select Test to test your database connection.

  8. If you are able to make a successful connection to the ThoughtSpot Simba Server, select OK.

    After you test
  9. Select OK in the Database Connection dialog to create the new connection.

Import data

  1. In the Table output dialog, select the connection you just created.

  2. Select Browse next to the Target schema field and select your Target schema.

  3. Select OK when you are done.

  4. Connect the Input CSV icon to the Table output icon by clicking and dragging an arrow.

  5. When prompted, choose Main output of step.

    Main output of step
  6. Double-click the Table output icon to reopen the Table output dialog.

  7. Enter a Target table name.

  8. Select SQL.

    Choose SQL
  9. In the Simple SQL editor dialog, select Execute.

    The system processes and then displays the results of the SQL statements.

    Execute SQL
  10. Close all open dialogs.

  11. Select the Play button at the top of the Transformation window to execute the transformation.

    Execute a transformation
  12. Select Launch in the Execute a transformation dialog.

    Select Launch

    The system prompts you to save it if you have not already.

  13. View the Execution Results.

    View results

Related information


Was this page helpful?