HDFS DataFlow connection reference

Learn about the fields used to create an HDFS connection with ThoughtSpot DataFlow.

Here is a list of the fields for an HDFS connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.

Connection properties

Connection name

Name your connection. Mandatory field.

Example:

HDFSConnection

Connection type

Choose the Google BigQuery connection type. Mandatory field.

Example:

HDFS

User

Specify the user to connect to HDFS file system. This user must have data access privileges. Mandatory field.
For Hive security with simple, LDAP, and SSL authentication only.

Example:

user1

Hadoop distribution

Provide the distribution of Hadoop being connected to. Mandatory field.

Example:

Hortonworks

Valid Values:

CDH, Hortonworks, EMR

Default:

CDH

Distribution version

Provide the version of the Distribution chosen above. Mandatory field.

Example:

2.6.5

Valid Values:

Valid distribution number of the Hadoop distribution

Default:

6.3.x

Hadoop conf path

By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings. Mandatory field.

Example:

/app/path

Other notes:

An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files.

HDFS HA configured

Enables High Availability for HDFS Optional field.

HDFS name service

The logical name of given to HDFS nameservice. Mandatory field.
For HDFS HA only.

Example:

lahdfs

Other notes:

It is available in hdfs-site.xml and defined as dfs.nameservices.

HDFS name node IDs

Provides the list of NameNode IDs separated by comma and DataNodes use this property to determine all the NameNodes in the cluster. XML property name is dfs.ha.namenodes.dfs.nameservices. Mandatory field.
For HDFS HA only.

Example:

nn1,nn2

RPC address for namenode1

To specify the fully-qualified RPC address for each listed NameNode and defined as dfs.namenodes.rpc-address.dfs.nameservices.name_node_ID_1>. Mandatory field.
For HDFS HA only.

Example:

www.example1.com:1234

RPC address for namenode2

To specify the fully-qualified RPC address for each listed NameNode and defined as dfs.namenode.rpc-address.dfs.nameservices.name_node_ID_2. Mandatory field.
For HDFS HA only.

Example:

www.example2.com:1234

DFS host

Specify the DFS hostname or the IP address. Mandatory field.
For when not using HDFS HA.

DFS port

Specify the associated DFS port. Mandatory field.
For when not using HDFS HA.

Default HDFS location

Specify the location for the default source/target location. Mandatory field.

Example:

/tmp

Temp HDFS location

Specify the location for creating temp directory. Mandatory field.

Example:

/tmp

HDFS security authentication

Select the type of security being enabled. Mandatory field.

Example:

Kerberos

Valid Values:

Simple, Kerberos

Default:

simple

Hadoop RPC protection

Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection. Mandatory field.
For DFS security authentication with Kerberos only.

Example:

none

Valid Values:

None, authentication, integrity, privacy

Default:

authentication

Other notes:

It is available in core-site.xml.

Hive principal

Principal for authenticating hive services. Mandatory field.

Example:

hive/host@name.example.com

Other notes:

It is available in hive-site.xml.

User principal

To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos). Mandatory field.

User keytab

To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab (Configured while enabling Kerberos). Mandatory field.

Example:

/app/keytabs/labuser.keytab

KDC host

Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerberos configuration-/etc/krb5.conf). Mandatory field.

Default realm

A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerberos configuration-/etc/krb5.conf). Mandatory field.

Example:

name.example.com

Sync properties

Column delimiter

Specify the column delimiter character. Mandatory field.

Example:

1

Valid Values:

Any ASCII character

Default:

ASCII 01 (SOH)

Enable archive on success

Specify if data needs to be archived once it is succeeded. Optional field.

Example:

No

Valid Values:

Yes

Default:

No

Delete on success

Specify if data needs to be deleted after execution is successful. Optional field.

Example:

No

Valid Values:

Yes

Default:

No

Compression

Specify this if the file is compressed and what kind of compressed file it is. Mandatory field.

Example:

gzip

Valid Values:

None, gzip

Default:

None

Enclosing character

Specify if the text columns in the source data needs to be enclosed in quotes. Optional field.

Example:

Single

Valid Values:

Single, Double, Empty

Default:

Double

Escape character

Specify the escape character if using a text qualifier in the source data. Optional field.

Example:

\\

Valid Values:

Any ASCII character

Default:

Empty

Null value

Specify the string literal that represents NULL values in data. During the data load, the column value that matches this string loads as NULL into ThoughtSpot. Optional field.

Example:

NULL

Valid Values:

NULL

Default:

NULL

Date style

Specifies how to interpret the date format. Optional field.

Example:

YMD

Valid Values:

YMD, MDY, DMY, DMONY, MONDY, Y2MD, MDY2, DMY2, DMONY2, MONDY2

Default:

YMD

Date delimiter

Specifies the separator used in the date format (only default delimiter is supported). Optional field.

Example:

-

Valid Values:

Any printable ASCII character

Default:

-

Time style

Specifies the format of the time portion in the data. Optional field.

Example:

24HOUR

Valid Values:

12 HOUR

Time delimiter

Specifies the character used as separate the time components. (Only default delimiter is supported) Optional field.

Example:

:

Valid Values:

Any printable ASCII character

Default:

:

TS load options

Specifies the parameters passed with the tsload command, in addition to the commands already included by the application. The format for these parameters is:

--<param_1_name> <optional_param_1_value>
--<param_2_name> <optional_param_2_value>

Optional field.

Example:
--max_ignored_rows 0
Valid Values:
--null_value ""
--escape_character ""
--max_ignored_rows 0
Default:
--max_ignored_rows 0

Related information