Manage a dataflow¶
A dataflow is the unit of processing a dataset. A single dataflow can be associated with multiple datasets to perform transformations. That is, a dataset must belong to a dataflow for transformation rules to be applied. It forms a relationship such as a “join” or “union” with other datasets.
As shown below, the dataflow details page shows the dependency among all datasets in a dataflow, and the transformation rules applied to each dataset.
The following subsections cover the processes involved in defining a dataflow, such as adding a dataset, editing transformation rules, and creating a data snapshot with transformation results.
The Dataflow menu can be accessed under MANAGEMENT > Data Preparation > Dataflow on the left-hand panel of the main screen.
- Add a dataset
- Edit rules
- Rule types
- Function list
- length
- if
- isnull
- isnan
- upper
- lower
- trim
- ltrim
- rtrim
- substring
- concat
- concat_ws
- year
- month
- day
- hour
- minute
- second
- millisecond
- now
- add_time
- sum
- avg
- max
- min
- count
- math.abs
- math.acos
- math.asin
- math.atan
- math.cbrt
- math.ceil
- math.cos
- math.cosh
- math.exp
- math.expm1
- math.getExponent
- math.round
- math.signum
- math.sin
- math.sinh
- math.sqrt
- math.tan
- math.tanh
- time_diff
- timestamp
- row_number
- rolling_sum
- rolling_avg
- lag
- lead
- ismismatched
- contains
- startswith
- endswith
- Create a data snapshot