Create a data source

This section explains the process of ingesting various types of source data into the Metatron engine and converting them into data sources.

To create a data source, click the + New button at the top right of the Data Source home screen.

../../_images/create_datasource_2.en.png

Then, select the type of source data.

../../_images/create_datasource_3.en.png

Create a data source from a file

Creates a data source from a file stored on your local PC.

  1. On the source data type selection page, select File.

  2. Select a file to be used as a data source from your local PC. You can either click the Import button and select the file, or drag and drop a file to the box. Once a file is selected, click Next.

    ../../_images/create_datasource_file_1.en.png
  3. From the file, select the sheet to be included in the data source.

    Note

    If the “No preview data” message is shown in spite of there being data, check whether the Column delimiter and Line Separator have been configured correctly. In this example, the Line Separator must be set to “r”? the carriage return for MS Windows.

    ../../_images/create_datasource_file_3.en.png
    • File name: Name of the imported file. You can replace it with another file.

    • File sheet list: Displays the sheets included in the imported file. Select the sheet from which you want to create a data source.

    • File sheet name: Name of the currently selected sheet.

    • Size: Size of the imported file.

    • Column: Number of columns in the imported file.

    • Row: Displayed number of rows and total number of rows in the imported file. Enter the number of rows to be displayed on the page.

    • Type: Displays how many data types are recognized from the columns. The data type of each column can be modified later.

    • Use the first row as the head column: Select the check box to use the first row of the file as column headers. If you don’t select it, a new row is inserted as a column header row.

  4. Configure the schema of the data source.

    ../../_images/create_datasource_file_4.en.png
    • Search by column header: Searches the imported file for columns by name.

    • bin button (top-right): Deletes the selected columns,

    • Role: Displays all, dimension, or measure columns from the imported file.

    • Recommended filters: Displays columns to which a top-priority filter is applied.

    • Type: Filters the columns in the imported file by field type.

    • Column list section: Lists columns filtered by specified criteria. Once you have selected columns, a panel appears at the bottom of the screen. After selecting your desired batch action in the panel, click Apply to perform the batch action on the selected columns.

    • Individual column settings section: This area is used to set the attributes of a column selected from the column list. Missing is used to set nulls in the column.

      • Replace with: Replaces the nulls with the value typed in.

      • Discard: Discards the nulls.

      • Do not set: Leaves the nulls as nulls. However, the nulls in the timestamp column are mandatorily discarded.

    • Timestamp setting: Determines how to timestamp each row. You can either designate an existing time-type column as a timestamp column, or create a new time-type column whose values are all timestamped with the current time.

      Note

      Metatron Druid is a time-series engine that requires a timestamp for each row when a data source is created.

    • Add column: If the data includes a latitude and a longitude column, you can combine them into a new Point-type column. Deleting this column works in the same way as other columns.

  5. Configure data source ingestion and click Next.

    ../../_images/create_datasource_file_10.en.png
    • Segment Granularity: In Druid, a data source is stored into multiple segments to be processed over multiple nodes in the distributed cluster environment. This granularity setting defines the time intervals into which the data source is partitioned.

    • Query Granularity: Defines the minimum time period by which data is queried. This ensures faster returns by aggregating data per granularity interval.

    • Rollup: “Data rollup” summarizes data based on its dimension (for details on the concept of data rollup, refer to Data roll-up). A summarization rule might be summing up all values in each column or applying a set of expressions such as profit=sales=expenses.

    • Advanced settings: Configures how to ingest data. Type in the text box in the JSON format. For example,

      {maxRowsInMemory : 75000,
      maxOccupationInMemory : -1,
      maxShardLength : -2147483648,
      leaveIntermediate : false,
      cleanupOnFailure : true,
      overwriteFiles : false,
      ignoreInvalidRows : false,
      assumeTimeSorted : false}
      
  6. Confirm the information about the data set from the imported file, enter the Name and Description, and click Done to create a data source. It may take a few seconds or minutes depending on the amount of data as the source data is ingested into the internal Metatron engine (Druid).

    ../../_images/create_datasource_file_12.en.png
  7. After data ingestion is complete, you can check the status. In the example below, the status is set to ENABLED and a histogram is displayed.

    ../../_images/create_datasource_file_13.en.png
  8. In the Data tab, you can check the ingested data in the form of a table.

    ../../_images/create_datasource_file_15.en.png
  9. On the Data Source management home screen, you will find a newly-created data source. While data is being ingested, the status is displayed as Disabled as shown below; the status changes to Enabled once ingestion is complete. After that, you can use the data source.

    ../../_images/create_datasource_file_16.en.png

Create a data source from a database

Creates a data source from an external database.

  1. On the source data type selection page, select Database.

  2. Enter the information to connect the database.

    ../../_images/create_datasource_db_2.en.png
    • Ingestion type: Select how to ingest data into the data source.

      • Ingested data: Displays data sources that contain data ingested into the Metatron storage.

      • Linked data: Displays data sources that load data from linked databases whenever necessary.

    • Load a data connection: Automatically loads access information for a database that is already registered as a data connection. However, you must verify the connection by clicking the Validation check button.

    • DB type: Select the type of the database to be connected.

    • Host: Enter the hostname to connect to the database.

    • Port: Enter the port to connect to the database.

    • User name: Enter the username of the database.

    • Password: Enter the password of the database.

    • Validation check: Once you fill out all fields, the Test button becomes active. Click on it to verify if the connection is valid: The validity of the connection appears below the button.

  3. Select data. You can either select a table from the connected database, or write a query yourself.

    ../../_images/create_datasource_db_4.en.png
    • Table: Select a database and a table to display the table’s data. Once the data being ingested has been displayed, confirm the data and click Next.

    • Query: Write a query to import the data you want, and click Run to display the data in the lower section. Confirm the data and click Next.

  4. The rest of the process is identical to Create a data source from a file. However, when creating a data source from a database, you must configure additional ingestion settings as follows.

    ../../_images/create_datasource_db_6.en.png
    • Ingest once: Ingest the data currently stored in the database only this once. When selecting the Limited record count, you can specify how many rows are to be ingested from the first row.

      ../../_images/create_datasource_db_7.en.png
    • Ingest periodically: Saves data on a regular basis.

      ../../_images/create_datasource_db_8.en.png

Create a data source from a staging database

Creates a data source from Metatron’s internal Hive database.

  1. On the source data type selection page, select Staging DB.

  2. Once you select the database and its table to connect, the data is displayed.

    ../../_images/create_datasource_stagingdb_2.en.png
  3. The rest of the process is identical to Create a data source from a database.

    ../../_images/create_datasource_stagingdb_3.en.png

Add a data source with the Metatron engine

Migrates a data source stored in a previous Metatron version.

  1. On the source data type selection page, select Metatron Engine.

  2. When data sources created in a previous version of Metatron are listed on the left as shown below, select the check boxes of the data sources you want to migrate to the current version.

    ../../_images/create_datasource_metatron_engine_2.en.png
  3. Click Done to migrate the selected data sources.

    ../../_images/create_datasource_metatron_engine_3.en.png