HDFS DataFlow connection reference
Learn about the fields used to create an HDFS connection with ThoughtSpot DataFlow.
Here is a list of the fields for an HDFS connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.
Connection properties
- Connection name
-
Name your connection.
Mandatory field.
Example:
HDFSConnection
- Connection type
-
Choose the Google BigQuery connection type.
Mandatory field.
Example:
HDFS
- User
-
Specify the user to connect to HDFS file system. This user must have data access privileges.
Mandatory field.
For Hive security with simple, LDAP, and SSL authentication only.
Example:
user1
- Hadoop distribution
-
Provide the distribution of Hadoop being connected to.
Mandatory field.
Example:
Hortonworks
Valid Values:
CDH, Hortonworks, EMR
Default:
CDH
- Distribution version
-
Provide the version of the Distribution chosen above.
Mandatory field.
Example:
2.6.5
Valid Values:
Valid distribution number of the Hadoop distribution
Default:
6.3.x
- Hadoop conf path
-
By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.
Mandatory field.
Example:
/app/path
Other notes:
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files.
- HDFS HA configured
-
Enables High Availability for HDFS.
Optional field.
- HDFS name service
-
The logical name of given to HDFS nameservice.
Mandatory field.
For HDFS HA only.
Example:
lahdfs
Other notes:
It is available in
hdfs-site.xml
and defined asdfs.nameservices
.
- HDFS name node IDs
-
Provides the list of NameNode IDs separted by comma and DataNodes use this property to determine all the NameNodes in the cluster. XML property name is
dfs.ha.namenodes.dfs.nameservices
.Mandatory field.
For HDFS HA only.
Example:
nn1,nn2
- RPC address for namenode1
-
To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenodes.rpc-address.dfs.nameservices.name_node_ID_1>
.Mandatory field.
For HDFS HA only.
Example:
www.example1.com:1234
- RPC address for namenode2
-
To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenode.rpc-address.dfs.nameservices.name_node_ID_2
.Mandatory field.
For HDFS HA only.
Example:
www.example2.com:1234
- DFS host
-
Specify the DFS hostname or the IP address.
Mandatory field.
For when not using HDFS HA.
- DFS port
-
Specify the associated DFS port.
Mandatory field.
For when not using HDFS HA.
- Default HDFS location
-
Specify the location for the default source/target location.
Mandatory field.
Example:
/tmp
- Temp HDFS location
-
Specify the location for creating temp directory.
Mandatory field.
Example:
/tmp
- HDFS security authentication
-
Select the type of security being enabled.
Mandatory field.
Example:
Kerberos
Valid Values:
Simple, Kerberos
Default:
simple
- Hadoop RPC protection
-
Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection.
Mandatory field.
For DFS security authentication with Kerberos only.
Example:
none
Valid Values:
None, authentication, integrity, privacy
Default:
authentication
Other notes:
It is available in
core-site.xml
.
- Hive principal
-
Principal for authenticating hive services.
Mandatory field.
Example:
hive/[email protected]
Other notes:
It is available in
hive-site.xml
.
- User principal
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).
Mandatory field.
Example:
- User keytab
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos).
Mandatory field.
Example:
/app/keytabs/labuser.keytab
- KDC host
-
Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerberos configuration-/etc/krb5.conf).
Mandatory field.
Example:
- Default realm
-
A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerberos configuration-/etc/krb5.conf).
Mandatory field.
Example:
name.example.com
Sync properties
- Column delimiter
-
Specify the column delimiter character. Mandatory field.
Example:
1
Valid Values:
Any ASCII character
Default:
ASCII 01 (SOH)
- Enable archive on success
-
Specify if data needs to be archived once it is succeeded.
Optional field.
Example:
No
Valid Values:
Yes
Default:
No
- Delete on success
-
Specify if data needs to be deleted after execution is successful.
Optional field.
Example:
No
Valid Values:
Yes
Default:
No
- Compression
-
Specify this if the file is compressed and what kind of compressed file it is.
Mandatory field.
Example:
gzip
Valid Values:
None, gzip
Default:
None
- Enclosing character
-
Specify if the text columns in the source data needs to be enclosed in quotes.
Optional field.
Example:
Single
Valid Values:
Single, Double, Empty
Default:
Double
- Escape character
-
Specify the escape character if using a text qualifier in the source data.
Optional field.
Example:
\\
Valid Values:
Any ASCII character
Default:
Empty
- Null value
-
Specify the string literal that represents NULL values in data. During the data load, the column value that matches this string loads as NULL into ThoughtSpot.
Optional field.
Example:
NULL
Valid Values:
NULL
Default:
NULL
- Date style
-
Specifies how to interpret the date format.
Optional field.
Example:
YMD
Valid Values:
YMD, MDY, DMY, DMONY, MONDY, Y2MD, MDY2, DMY2, DMONY2, MONDY2
Default:
YMD
- Date delimiter
-
Specifies the separator used in the date format (only default delimiter is supported).
Optional field.
Example:
-
Valid Values:
Any printable ASCII character
Default:
-
- Time style
-
Specifies the format of the time portion in the data.
Optional field.
Example:
24HOUR
Valid Values:
12 HOUR
- Time delimiter
-
Specifies the character used as separate the time components. (Only default delimiter is supported).
Optional field.
Example:
:
Valid Values:
Any printable ASCII character
Default:
:
- TS load options
-
Specify additional parameters passed with the
tsload
command. The format for these parameters is:--<param_1_name> <optional_param_1_value>
Optional field.
Example:
--max_ignored_rows 0
Valid Values:
--null_value ""
--escape_character ""
--max_ignored_rows 0
Default:
--max_ignored_rows 0
Related information