Action nodes¶

Action nodes in Integrator define tasks involved in collecting, processing, and ingesting raw data in the Hadoop cluster. The supported Hadoop jobs and individual system tasks (Java, Shell, etc.) are as follows:

Sqoop

MR

EXEC

Java

HIVE Query

SSH

Spark

Sub-Workflow

DistCp

HDFS

Done

Druid

Sqoop¶

Retrieves data from RDP or runs a simple query.

MR¶

Runs JAR files in a local directory.

EXEC¶

Runs local files such as Python and shell.

Java¶

Runs a Java class. (Note that the main function must be defined.)

HIVE Query¶

Runs a HIVE query.

SSH¶

Runs a command remotely. Note that SSH passwordless login must be set up for the remote server.

Spark¶

Runs SPARK.

Sub-Workflow¶

Used for association with existing workflows. When running an association of multiple workflows, it defines each workflow as a task.

DistCp¶

Copies files from the source Hadoop cluster to the target Hadoop cluster.

HDFS¶

Used to manage Hadoop files.

Done¶

Creates a Done file upon completion.

Druid¶

Used for incremental ingestion of data into the Druid engine.