Install Guide Detailed

This document is a guide for installing metatron and using data preparation feature from the scratch Linux OS environment (CentOS 7).

1. Install requirements

Run following commands by root.

yum clean all && yum repolist && yum -y update
yum -y install tar unzip vi vim telnet apr apr-util apr-devel apr-util-devel net-tools curl openssl elinks locate python-setuptools
yum -y install java-1.8.0-openjdk-devel.x86_64
export JAVA_HOME=/usr/lib/jvm/java
export PATH=$PATH:$JAVA_HOME/bin

2. Install Hadoop

Run below commands by root. You’d better to download the Hadoop binary from the closest mirror.

yum -y install openssh-server openssh-clients rsync netstat wget
yum -y update libselinux

ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

wget http://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar -zxvf hadoop-2.7.3.tar.gz -C /opt
rm -f hadoop-2.7.3.tar.gz
ln -s /opt/hadoop-2.7.3 /opt/hadoop

export HADOOP_PREFIX=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

sed -i "/^export JAVA_HOME/ s:.*:export JAVA_HOME=$JAVA_HOME:" $HADOOP_CONF_DIR/hadoop-env.sh
sed -i "/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=$HADOOP_CONF_DIR:" $HADOOP_CONF_DIR/hadoop-env.sh

Put files below into $HADOOP_CONF_DIR.

Run followings by root.

$HADOOP_PREFIX/bin/hdfs namenode -format

Append following contents into /root/.ssh/config

Host *
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  LogLevel quiet
  Port 2122

Run followings by root.

chmod 600 /root/.ssh/config
chown root:root /root/.ssh/config

chmod +x $HADOOP_CONF_DIR/*-env.sh

sed  -i "/^[^#]*UsePAM/ s/.*/#&/"  /etc/ssh/sshd_config
echo "UsePAM no" >> /etc/ssh/sshd_config
echo "Port 2122" >> /etc/ssh/sshd_config

Restart SSH server.

service sshd restart

Run HDFS and Yarn daemons.

start-dfs.sh
start-yarn.sh

Test if Hadoop works fine.

hdfs dfs -mkdir -p /user/hadoop/input
hdfs dfs -put $HADOOP_PREFIX/LICENSE.txt /user/hadoop/input
hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/hadoop/input /user/hadoop/output

3. Install MySQL

wget http://dev.mysql.com/get/mysql57-community-release-el7-7.noarch.rpm \
      && yum -y localinstall mysql57-community-release-el7-7.noarch.rpm \
      && yum repolist enabled | grep "mysql.*-community.*" \
      && yum -y install mysql-community-server mysql \
      && rm -f mysql57-community-release-el7-7.noarch.rpm
service mysqld start

Get the temporary password with the following command.

grep 'temporary password' /var/log/mysqld.log | awk {'print $11'}
Z&O+estx9vTt

Run mysql_secure_installation with the temporary password.

mysql_secure_installation
Enter password for user root: -> Z&O+estx9vTt
New password: -> Metatron123$
Re-enter new password: -> Metatron123$
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
New password: -> Metatron123$
Re-enter new password: -> Metatron123$
Do you wish to continue with the password provided? -> y
Remove anonymous users? -> enter
Disallow root login remotely? -> enter
Remove test database and access to it? -> enter
Reload privilege tables now? -> enter

Connect to MySQL.

mysql -uroot -pMetatron123$

4. Install Hive

wget http://mirror.navercorp.com/apache/hive/hive-2.3.6/apache-hive-2.3.6-bin.tar.gz \
      && tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /opt \
      && rm -f apache-hive-2.3.6-bin.tar.gz \
      && ln -s /opt/apache-hive-2.3.6-bin /opt/hive
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/hcatalog/sbin
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar
mv mysql-connector-java-5.1.38.jar $HIVE_HOME/lib/

Put files below into $HIVE_HOME/conf.

Initialize the Hive metastore.

mysql -uroot -pMetatron123$
create database hive_metastore;
create user 'hive'@'%' identified by 'Metatron123$';
grant all privileges on *.* to 'hive'@'%';
grant all privileges on hive_metastore.* to 'hive'@'%';
create user 'hive'@'localhost' identified by 'Metatron123$';
grant all privileges on *.* to 'hive'@'localhost';
grant all privileges on hive_metastore.* to 'hive'@'localhost';
flush privileges;
quit
schematool -initSchema -dbType mysql

Start Hive.

hdfs dfs -mkdir -p /user/hive/warehouse
mkdir -p $HIVE_HOME/hcatalog/var/log
hcat_server.sh start
hiveserver2 &

Connect to Hive.

beeline -u jdbc:hive2://localhost:10000 "" ""

5. Install Druid

wget https://sktmetatronkrsouthshared.blob.core.windows.net/metatron-public/discovery-dist/latest/druid-0.9.1-latest-hadoop-2.7.3-bin.tar.gz
mkdir /servers
tar zxf druid-0.9.1-latest-hadoop-2.7.3-bin.tar.gz -C /servers
ln -s /servers/druid-* /servers/druid
export DRUID_HOME=/servers/druid

Put files below into each target locations.

Download URL Target Location
jvm.config $DRUID_HOME/conf/druid/single/jvm.config
runtime.properties $DRUID_HOME/conf/druid/single/broker/runtime.properties
runtime.properties $DRUID_HOME/conf/druid/single/historical/runtime.properties
runtime.properties $DRUID_HOME/conf/druid/single/middleManager/runtime.properties
cd $DRUID_HOME
./start-single.sh

Check if you connect to http://localhost:8090/

6. Install Metatron

wget https://sktmetatronkrsouthshared.blob.core.windows.net/metatron-public/discovery-dist/latest/metatron-discovery-latest-bin.tar.gz
mkdir /servers
tar zxf metatron-discovery-latest-bin.tar.gz -C /servers
ln -s /servers/metatron-discovery-* /servers/metatron-discovery
export METATRON_HOME=/servers/metatron-discovery

Put files below into $METATRON_HOME/conf.

Initialize Metatron.

mysql -uroot -pMetatron123$
create database polaris;
create user 'polaris'@'%' identified by 'Metatron123$';
grant all privileges on *.* to 'polaris'@'%';
grant all privileges on hive_metastore.* to 'polaris'@'%';
create user 'polaris'@'localhost' identified by 'Metatron123$';
grant all privileges on *.* to 'polaris'@'localhost';
grant all privileges on hive_metastore.* to 'polaris'@'localhost';
flush privileges;
quit
cd $METATRON_HOME
bin/metatron.sh --init start

To watch the progress, tail the log file.

tail -f logs/metatron-*.out

Connect to http://localhost:8180/

7. Install Preptool

yum -y install https://centos7.iuscommunity.org/ius-release.rpm \
      && yum install -y python36u python36u-libs python36u-devel python36u-pip git \
      && ln -s /bin/python3.6 /bin/python3 \
      && ln -s /bin/pip3.6 /bin/pip3 \
      && pip3 install requests
yum -y install git
git clone https://github.com/metatron-app/discovery-prep-tool.git
cd discovery-prep-tool

Download a test file.

python3 preptool -f sales-data-sample.csv

If you get “File dataset created”, then it works.