Hive DataFlow connection reference
Learn about the fields used to create a Hive connection with ThoughtSpot DataFlow.
Here is a list of the fields for a Hive connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.
Connection properties
- Connection name
-
Name your connection.
Mandatory field. Example:
HiveConnection
- Connection type
-
Choose the Hive connection type.
Mandatory field.
Example:
Hive
- HiveServer2 HA configured
-
Specify this option if using HiveServer2 High Availability.
Mandatory field.
- HiveServer2 zookeeper namespace
-
Specify zookeeper namespace as hiveserver2. This is the default value.
Mandatory field.
Only when using Hiveserver2 HA.
Example:
hiveserver2
Other notes:
If value is different then, the value can be found from hive-site.xml against the property
hive.server2.zookeeper.namespace
.
- Host
-
Specify the hostname or the IP address of the Hadoop system.
Mandatory field.
Only when not using Hiveserver2 HA.
Example:
[email protected]
- Port
-
Specify the port.
Mandatory field.
Only when not using Hiveserver2 HA.
Example:
1234
- Hive security authentication
-
Specifies the type of security protocol to connect to the instance. Based on the type of security select the authentication type and provide details.
Mandatory field.
Example:
Kerberos
Valid Values:
Simple, Kerberos, LDAP, SSL, Kerberos & SSL, LDAP & SSL
Default:
Simple
Other notes:
The authentication type setup for the instance can be found from hive-site.xml against the property
hive.server2.authentication
.
- User
-
Specify the user to connect to Hive. This user must have data access privileges.
Mandatory field.
For simple, LDAP, and Simple authentication only.
Example:
userdi
Default:
simple
- Password
-
Specify the password.
Optional field. + For simple, LDAP authentication only.
Example: + pswrd234%!
- Trust store
-
Specify the trust store name for authentication
Mandatory field.
For SSL and Kerberos & SSL authentication only.
Example:
trust store
Default:
SSL
- Trust store password
-
Specify the password for the trust store.
Mandatory field.
For SSL and Kerberos & SSL authentication only.
Example:
password
Default:
SSL
- Hive transport mode
-
Applicable only for hive process engine. This specifies the network protocol used for communicating between hive nodes.
Mandatory field.
Example:
binary
Valid Values:
Binary, HTTP
Default:
binary
Other notes:
The Hive transport mode can be identified from hive-site.xml against the property hive.server2.transport.mode.
- HTTP path
-
This is specified as an option when http transport mode is selected.
Mandatory field.
For HTTP transport mode only.
Example:
cliservice
Valid Values:
cliservice
Default:
cliservice
Other notes:
The HTTP Path value can be identified from
hive-site.xml
against the propertyhive.server2.thrift.http.path
.
- Hadoop distribution
-
Provide the distribution of Hadoop being connected to.
Mandatory field.
Example:
Hortonworks
Valid Values:
CDH, Hortonworks, EMR
Default:
CDH
- Distribution version
-
Provide the version of the Distribution chosen above.
Mandatory field.
Example:
2.6.5
Valid Values:
Any Numeric value
Default:
6.3.x
- Hadoop conf path
-
By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.
Mandatory field.
Example:
$DI_HOME/app/path
Other notes:
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files.
- DFS HA configured
-
Specify if using High Availability for DFS.
Optional field.
For Hadoop Extract only.
Example:
Checked
- DFS name service
-
Specify the logical name of the HDFS nameservice.
Mandatory field.
For DFS HA and Hadoop Extract only.
Example:
lahdfs
Other notes:
It is available in hdfs-site.xml and defined as dfs.nameservices
- DFS name node IDs
-
Specify a comma-separated list of NameNode IDs. System uses this property to determine all NameNodes in the cluster. XML property name is
dfs.ha.namenodes.dfs.nameservices
.Mandatory field.
For DFS HA and Hadoop Extract only.
Example:
nn1, nn2
- RPC address for namenode1
-
Specify the fully-qualified RPC address for each listed NameNode. Defined as
dfs.namenode.rpc-address.dfs.nameservices.name node ID 1
.Mandatory field.
For DFS HA and Hadoop Extract only.
Example:
lclabh.example.com:5678
- RPC address for namenode2
-
Specify the fully-qualified RPC address for each listed NameNode. Define as
dfs.namenode.rpc-address.dfs.nameservices.name node ID 2
.Mandatory field.
For DFS HA and Hadoop Extract only.
Example:
lvclabh.example.com:9876
- DFS host
-
Specify the DFS hostname or the IP address.
Mandatory field.
For Hadoop Extract only, when not using DFS HA.
Example: + [email protected]
- DFS port
-
Specify the associated DFS port.
Mandatory field.
For Hadoop Extract only, when not using DFS HA.
Example:
1234
- Default DFS location
-
Specify the location for the default source/target location.
Mandatory field.
For Hadoop Extract only.
Example:
/tmp
- Temp DFS location
-
Specify the location for creating temp directory.
Mandatory field.
For Hadoop Extract only.
Example:
/tmp
- DFS security authentication
-
Select the type of security being enabled.
Mandatory field.
For Hadoop Extract only.
Example:
Kerberos
Valid Values:
Simple, Kerberos
Default:
simple
- Hadoop RPC protection
-
Hadoop cluster administrators control the quality of protection using the configuration parameter
hadoop.rpc.protection
.Mandatory field.
When using Kerberos DFS security authentication and Hadoop Extract.
Example:
none
Valid Values:
None, authentication, integrity, privacy
Default:
authentication
Other notes:
It is available in core-site.xml.
- Hive principal
-
Principal for authenticating hive services.
Mandatory field.
Example:
hive/[email protected]
Other notes:
It is available in hive-site.xml
- User principal
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).
Mandatory field.
Example:
- User keytab
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).
Mandatory field.
Example:
/app/keytabs/labuser.keytab
- KDC host
-
Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerbores configuration-/etc/krb5.conf)
Mandatory field.
Example:
example.example.com
- Default realm
-
A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerbores configuration-/etc/krb5.conf).
Mandatory field.
Example:
labhdp.example.com
- Queue name
-
Specify the queue name followed by a coma separated form in yarn.scheduler.capacity.root.queues.
Mandatory field.
For Hadoop Extract only.
Example:
default
Other notes:
It is available in capacity-scheduler.xml
- YARN web UI port
-
Yarn Providing web UI for yarn RM and by default 8088 in use.
Mandatory field.
For Hadoop Extract only.
Example:
8088
- Zookeeper quorum host
-
Specify the value of hadoop.registry.zk.quorum from yarn-site.xml.
Mandatory field.
Only when not using Hiveserver2 HA.
Example:
lvclhdp1.example.com:21,lvclabhdp12.example.com:81,lvclabhdp12.example.com:2093
- Yarn timeline webapp host
-
Specify the ip address of yarn timeline service web application.
Mandatory field.
Example:
8188
- Yarn timeline webapp port
-
Specify the port associated with the yarn timeline service web application.
Mandatory field.
Example:
8190
- Yarn timeline webapp version
-
Specify the version associated with the yarn timeline service web application.
Mandatory field.
Example:
v1
- JDBC options
-
Specify the options associated with the JDBC URL.
Optional field.
Example:
jdbc:sqlserver://[serverName[\instanceName][:portNumber]]
Sync properties
- Data extraction mode
-
Specify the extraction type.
Mandatory field. Example:
Hadoop Extract Valid Values:
Hadoop Extract, JDBC
Default:
Hadoop Extract
- Null value
-
Specifies the string literal that should indicate the null value in the extracted data. During the data load the column value matching this string will be loaded as null in the target.
Mandatory field.
For Hadoop Extract only.
Example:
NULL
Valid Values:
NULL
Default:
NULL
- Enclosing character
-
Specify if the text columns in the source data needs to be enclosed in quotes.
Mandatory field.
Example:
DOUBLE
Valid Values:
SINGLE, DOUBLE
Default:
DOUBLE
- Escape character
-
Specify the escape character if using a text qualifier in the source data.
Mandatory field.
Example:
\"
Valid Values:
\\, Any ASCII character
Default:
\"
- TS load options
-
Specifies the parameters passed with the
tsload
command, in addition to the commands already included by the application. The format for these parameters is:--<param_1_name> <optional_param_1_value>
--<param_2_name> <optional_param_2_value>
Optional field.
Example:
--max_ignored_rows 0
Valid Values:
--null_value "
--escape_character "
--max_ignored_rows 0
Default:
--max_ignored_rows 0
Related information