Action nodes

Action nodes in Integrator define tasks involved in collecting, processing, and ingesting raw data in the Hadoop cluster. The supported Hadoop jobs and individual system tasks (Java, Shell, etc.) are as follows:

Sqoop

Retrieves data from RDP or runs a simple query.

MR

Runs JAR files in a local directory.

EXEC

Runs local files such as Python and shell.

Java

Runs a Java class. (Note that the main function must be defined.)

HIVE Query

Runs a HIVE query.

SSH

Runs a command remotely. Note that SSH passwordless login must be set up for the remote server.

Spark

Runs SPARK.

Sub-Workflow

Used for association with existing workflows. When running an association of multiple workflows, it defines each workflow as a task.

DistCp

Copies files from the source Hadoop cluster to the target Hadoop cluster.

HDFS

Used to manage Hadoop files.

Done

Creates a Done file upon completion.

Druid

Used for incremental ingestion of data into the Druid engine.