Add a Hive DataFlow connection

You can add a connection to a Hive database using ThoughtSpot DataFlow.

Follow these steps:

  1. Click Connections in the top navigation bar.

  2. In the Connections interface, click Add connection in the top right corner.

  3. In the Create Connection interface, select the Connection type.

  4. After you select the Hive Connection type, the rest of the connection properties appear.

    See Connection properties for details, defaults, and examples.

    • Connection name
      Name your connection.

    • Connection type
      Choose the Hive connection type.

    • HiveServer2 HA configured
      Specify this option if using HiveServer2 High Availability.

    • HiveServer2 zookeeper namespace
      Specify zookeeper namespace as hivesever2. This is the default value. Only when using Hiveserver2 HA.

    • Host
      Specify the hostname or the IP address of the Hadoop system Only when not using Hiveserver2 HA.

    • Port
      Specify the port. Only when not using Hiveserver2 HA.

    • Hive security authentication
      Specifies the type of security protocol to connect to the instance. Based on the type of security select the authentication type and provide details.

    • User
      Specify the user to connect to Hive. This user must have data access privileges.

    • Password
      Specify the password.

    • Trust store
      Specify the trust store name for authentication. For SSL and Kerberos authentication only.

    • Trust store password
      Specify the password for the trust store For SSL and Kerberos authentication only.

    • Hive transport mode
      Applicable only for hive process engine. This specifies the network protocol used for communicating between hive nodes.

    • HTTP path
      This is specified as an option when http transport mode is selected. For HTTP transport mode only.

    • Hadoop distribution
      Provide the Hadoop distribution of the connection.

    • Distribution version
      Provide the version of the Hadoop distribution.

    • Hadoop conf path
      By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.

    • DFS HA configured
      Specify if using High Availability for DFS.

    • DFS name service
      Specify the logical name of the HDFS nameservice.

    • DFS name node IDs
      Specify a comma-separated list of NameNode IDs. System uses this property to determine all NameNodes in the cluster. XML property name is dfs.ha.namenodes.dfs.nameservices.

    • RPC address for namenode1
      Specify the fully-qualified RPC address for each listed NameNode. Defined as dfs.namenode.rpc-address.dfs.nameservices.name node ID 1. For DFS HA and Hadoop Extract only.

    • RPC address for namenode2
      Specify the fully-qualified RPC address for each listed NameNode. Define as dfs.namenode.rpc-address.dfs.nameservices.name node ID 2.

    • DFS host
      Specify the DFS hostname or the IP address.

    • DFS port
      Specify the associated DFS port.

    • Default DFS location
      Specify the location for the default source/target location.

    • Temp DFS location
      Specify the location for creating temp directory.

    • DFS security authentication
      Select the type of security being enabled.

    • Hadoop RPC protection
      Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection.

    • Hive principal
      Principal for authenticating hive services.

    • User principal
      To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).

    • User keytab
      To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).

    • KDC host
      Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerberos configuration-/etc/krb5.conf).

    • Default realm
      A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerberos configuration-/etc/krb5.conf).

    • Queue name
      Specify the queue name followed by a coma separated form in yarn.scheduler.capacity.root.queues. For Hadoop Extract only.

    • YARN web UI port
      Yarn Providing web UI for yarn RM and by default 8088 in use. For Hadoop Extract only.

    • Zookeeper quorum host
      Specify the value of hadoop.registry.zk.quorum from yarn-site.xml. Only when not using Hiveserver2 HA.

    • Yarn timeline webapp host
      Specify the ip address of yarn timeline service web application.

    • Yarn timeline webapp port
      Specify the port associated with the yarn timeline service web application.

    • Yarn timeline webapp version
      Specify the version associated with the yarn timeline service web application.

    • JDBC options
      Specify the options associated with the JDBC URL.

  5. Click Create connection.


Related information