HDFS connection reference
Learn about the fields used to create an HDFS connection with ThoughtSpot DataFlow.
Here is a list of the fields for an HDFS connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.
Connection properties
- Connection name
-
Name your connection. Mandatory field.
- Example:
-
HDFSConnection
- Connection type
-
Choose the Google BigQuery connection type. Mandatory field.
- Example:
-
HDFS
- User
-
Specify the user to connect to HDFS file system. This user must have data access privileges. Mandatory field.
For Hive security with simple, LDAP, and SSL authentication only.- Example:
-
user1
- Hadoop distribution
-
Provide the distribution of Hadoop being connected to. Optional field.
- Example:
-
Hortonworks
- Valid Values:
-
CDH, Hortonworks, EMR
- Default:
-
CDH
- Distribution version
-
Provide the version of the Distribution chosen above. Optional field.
- Example:
-
2.6.5
- Valid Values:
-
Valid distribution number of the Hadoop distribution
- Default:
-
6.3.x
- Hadoop conf path
-
By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings. Optional field.
- Example:
-
/app/path
- Other notes:
-
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files.
- DFS HA configured
-
Enables High Availability for HDFS Optional field.
- DFS name service
-
The logical name of given to HDFS nameservice. Mandatory field.
For HDFS HA only.- Example:
-
lahdfs
- Other notes:
-
It is available in
hdfs-site.xml
and defined asdfs.nameservices
.
- DFS name node IDs
-
Provides the list of NameNode IDs separated by comma and DataNodes use this property to determine all the NameNodes in the cluster. XML property name is
dfs.ha.namenodes.dfs.nameservices
. Mandatory field.
For HDFS HA only.- Example:
-
nn1,nn2
- RPC address for namenode1
-
To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenodes.rpc-address.dfs.nameservices.name_node_ID_1>
. Mandatory field.
For HDFS HA only.- Example:
-
www.example1.com:1234
- RPC address for namenode2
-
To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenode.rpc-address.dfs.nameservices.name_node_ID_2
. Mandatory field.
For HDFS HA only.- Example:
-
www.example2.com:1234
- DFS host
-
Specify the DFS hostname or the IP address. Mandatory field.
For when not using HDFS HA.
- DFS port
-
Specify the associated DFS port. Mandatory field.
For when not using HDFS HA.
- Default HDFS location
-
Specify the location for the default source/target location. Mandatory field.
- Example:
-
/tmp
- Temp HDFS location
-
Specify the location for creating temp directory. Mandatory field.
- Example:
-
/tmp
- DFS security authentication
-
Select the type of security being enabled. Mandatory field.
- Example:
-
Kerberos
- Valid Values:
-
Simple, Kerberos
- Default:
-
simple
- Hadoop RPC protection
-
Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection. Mandatory field.
For DFS security authentication with Kerberos only.- Example:
-
none
- Valid Values:
-
None, authentication, integrity, privacy
- Default:
-
authentication
- Other notes:
-
It is available in
core-site.xml
.
- Hive principal
-
Principal for authenticating hive services. Mandatory field.
- Example:
-
hive/[email protected]
- Other notes:
-
It is available in
hive-site.xml
.
- User principal
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos). Mandatory field.
- Example:
- User keytab
-
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos). Mandatory field.
- Example:
-
/app/keytabs/labuser.keytab
- KDC host
-
Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerberos configuration-/etc/krb5.conf). Mandatory field.
- Example:
- Default realm
-
A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerberos configuration-/etc/krb5.conf). Mandatory field.
- Example:
-
name.example.com
Sync properties
- Column delimiter
-
Specify the column delimiter character. Mandatory field.
- Example:
-
1
- Valid Values:
-
Any ASCII character
- Default:
-
ASCII 01 (SOH)
- Enable archive on success
-
Specify if data needs to be archived once it is succeeded. Optional field.
- Example:
-
No
- Valid Values:
-
Yes
- Default:
-
No
- Delete on success
-
Specify if data needs to be deleted after execution is successful. Optional field.
- Example:
-
No
- Valid Values:
-
Yes
- Default:
-
No
- Compression
-
Specify this if the file is compressed and what kind of compressed file it is. Mandatory field.
- Example:
-
gzip
- Valid Values:
-
None, gzip
- Default:
-
None
- Enclosing character
-
Specify if the text columns in the source data needs to be enclosed in quotes. Optional field.
- Example:
-
Single
- Valid Values:
-
Single, Double, Empty
- Default:
-
Double
- Escape character
-
Specify the escape character if using a text qualifier in the source data. Optional field.
- Example:
-
\\
- Valid Values:
-
Any ASCII character
- Default:
-
Empty
- Null value
-
Specify the string literal that represents NULL values in data. During the data load, the column value that matches this string loads as NULL into ThoughtSpot. Optional field.
- Example:
-
NULL
- Valid Values:
-
NULL
- Default:
-
NULL
- Date style
-
Specifies how to interpret the date format. Optional field.
- Example:
-
YMD
- Valid Values:
-
YMD, MDY, DMY, DMONY, MONDY, Y2MD, MDY2, DMY2, DMONY2, MONDY2
- Default:
-
YMD
- Date delimiter
-
Specifies the separator used in the date format (only default delimiter is supported). Optional field.
- Example:
-
-
- Valid Values:
-
Any printable ASCII character
- Default:
-
-
- Time style
-
Specifies the format of the time portion in the data. Optional field.
- Example:
-
24HOUR
- Valid Values:
-
12 HOUR
- Time delimiter
-
Specifies the character used as separate the time components. (Only default delimiter is supported) Optional field.
- Example:
-
:
- Valid Values:
-
Any printable ASCII character
- Default:
-
:
- Max ignored rows
-
Abort the transaction after encountering 'n' ignored rows. Optional field.
- Example:
-
0
- Valid Values:
-
Any numeric value
- Default:
-
0
- tsload options
-
Specifies the parameters passed with the
tsload
command, in addition to the commands already included by the application. The format for these parameters is:<param_1_name> = <param_1_value>
- Example:
-
date_time_format = %Y-%m-%d date_format = %Y-%m-%d;time_format = %H:%M:%S
- Valid Values:
-
null_value = NULL max_ignored_rows = 0
- Default:
-
max_ignored_rows = 0
Related information