Build Impala from Scratch

This article shows how to build Cloudera Impala from scratch, including compiling, linking and using.


My Environment:

CentOS 6.5

No network proxy (if set there would be a lot of downloading problems)


Impala Commit (Version):

Change-Id: I3b5cefb4d7193045fc6fc5e94766589c2299b5b1

commit d90f3f3fd1a134578b1860be1b2f41a57a8d8896 1 parent ee40ba2


 

 

Get Impala source:

git clone https://github.com/cloudera/impala

 

 

1. Install tools:

sudo yum install boost-test boost-program-options libevent-devel automake libtool flex bison gcc-c++ openssl-devel \

make cmake doxygen.x86_64 glib-devel boost-devel python-devel bzip2-devel svn libevent-devel cyrus-sasl-devel \

wget git unzip

 

 

2. Uninstall the old boost library (yum remove boost) of CentOS 6.5, and install the Boost lib 1.46.1.

For example:

export BOOST_ROOT=’/usr/local/boost_1_46_0

 

cd boost_1_46_1/

./bjam threading=multi –layout=tagged

./bjam  threading=multi –layout=tagged install

 

 

3. Install LLVM 3.3:

 

wget http://llvm.org/releases/3.3/llvm-3.3.src.tar.gz

tar xvzf llvm-3.3.src.tar.gz

cd llvm-3.3.src/tools

svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_33/final/ clang

cd ../projects

svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_33/final/ compiler-rt

cd ..

./configure –with-pic

make -j4 REQUIRES_RTTI=1

sudo make install

 

 

4. Set up the JDK path. (environment variable: /etc/.bashrc, ~/.bash_profile)

 

 

5. Install Maven 3.0.4:

 

wget http://www.fightrice.com/mirrors/apache/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz

tar xvf apache-maven-3.0.4.tar.gz && sudo mv apache-maven-3.0.4 /usr/local

 

Update  ~/.bashrc,add the environment variables:

 

export M2_HOME=/usr/local/apache-maven-3.0.4

export M2=$M2_HOME/bin

export PATH=$M2:$PATH

 

source ~/.bashrc

mvn -version

 

 

6. Check the paths in bin/set-classpath.sh and Build Impala:

 

cd $IMPALA_HOME

./build_all

 

 

7. If there are errors like no -lboost_date_time, update it to -lboost_date_time-mt in the Makefile.

Other problems are similar, change them all to “*-mt” or update them to a new version, these problems are searchable on Google.

 

 

8. easy_install prettytable

easy_install thrift

 

(9. Build the thirdparty files, it seems automatic in the new Impala versions.)

 

(10. Download setuptools-5.1.zip and install it if needed when building.)

 


 

Configure and start Impala:   (The blue text should be customized)

 

1. The /etc/hosts:

 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.24.132   master

 

 

2. Configure and run HDFS/Hadoop:

 

(1) Create the hadoop data directory (In this case, file:///home/yc/hdfs)

(2) Mkdir the /var/run/hadoop-hdfs

(3) Configure XML of Hadoop in the following directory:

 

$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/etc/hadoop

 

 

The configured “hdfs-site.xml” should look like this, be careful about “dn.50010“.

 

hdfs-site.xml

========================================================================

<configuration>

 

<property>

<name>dfs.client.read.shortcircuit</name>

<value>true</value>

</property>

 

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

 

<property>

<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>

<value>true</value>

</property>

 

<property>

<name>dfs.datanode.data.dir</name>

<value>file:///home/yc/hdfs</value>

</property>

 

<property>

<name>dfs.client.use.legacy.blockreader.local</name>

<value>false</value>

</property>

 

<property>

<name>dfs.datanode.data.dir.perm</name>

<value>750</value>

</property>

 

<property>

<name>dfs.block.local-path-access.user</name>

<value>root</value>

</property>

 

<property>

<name>dfs.client.file-block-storage-locations.timeout</name>

<value>5000</value>

</property>

 

<property>

<name>dfs.domain.socket.path</name>

<value>/var/run/hadoop-hdfs/dn.50010</value>

</property>

 

<property>

<name>dfs.client.file-block-storage-locations.timeout.millis</name>

<value>10000</value>

</property>

 

</configuration>

========================================================================

 

 

core-site.xml

========================================================================

<configuration>

 

<property>

<name>hadoop.native.lib</name>

<value>true</value>

<description>Should native hadoop libraries, if present, be used.</description>

</property>

 

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

 

<property>

<name>dfs.client.read.shortcircuit</name>

<value>true</value>

</property>

 

<property>

<name>dfs.client.use.legacy.blockreader.local</name>

<value>false</value>

</property>

 

<property>

<name>dfs.client.read.shortcircuit.skip.checksum</name>

<value>false</value>

</property>

 

<property>

<name>hadoop.tmp.dir</name>

<value>/home/yc/hdfs/tmp</value>

<description>A base for other temporary directories.</description>

</property>

 

</configuration>

========================================================================

 

 

 

 

 

yarn-site.xml

========================================================================

<?xml version=”1.0″?>

<configuration>

<!– Site specific YARN configuration properties –>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

========================================================================

 

 

mapred-site.xml:

========================================================================

<?xml version=”1.0″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<!– Put site-specific property overrides in this file. –>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

========================================================================

 

 

hive-site.xml:

========================================================================

<configuration>

</configuration>

========================================================================

 

 

 

Copy

core-site.xml  hdfs-site.xml  hive-site.xml

into

$IMPALA_HOME/conf

 

 

Configure

$IMPALA_HOME/bin/set-classpath.sh

as follows:

 

========================================================================

CLASSPATH=\

$IMPALA_HOME/conf:\

$IMPALA_HOME/fe/src/test/resources:\

$IMPALA_HOME/fe/target/classes:\

$IMPALA_HOME/fe/target/dependency:\

$IMPALA_HOME/fe/target/test-classes:

 

for jar in `ls ${IMPALA_HOME}/fe/target/dependency/*.jar`; do

CLASSPATH=${CLASSPATH}:$jar

done

 

export CLASSPATH

========================================================================

 

 

 

Format HDFS:

$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/bin/hdfs datanode -format

 

$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/sbin/start-all.sh

 

Optional:

(./bin/hdfs datanode)

 

Type in command “jps”, you will see things like this (number would be different):

 

1404

54375 NameNode

54646 NodeManager

558 SecondaryNameNode

54663 ResourceManager

6545 Jps

384 DataNode

54695 NodeManager

54727 NodeManager

2061

65490 NameNode

 

 

Create directories in HDFS:

 

$HADOOP_HOME/bin/hdfs dfs -mkdir  /tmp

$HADOOP_HOME/bin/hdfs dfs -mkdir  /user

$HADOOP_HOME/bin/hdfs dfs -mkdir  /user/impala

$HADOOP_HOME/bin/hdfs dfs -mkdir  /user/impala/tab1

 

 

Put data into HDFS:

 

$HADOOP_HOME/bin/hdfs dfs -put ./tab1.csv /user/impala/tab1

 


 

3. Start Impala Daemons:

 

cd $IMPALA_HOME

 

./be/build/debug/statestore/statestored

 

./bin/start-impalad.sh

 

./bin/start-catalogd.sh

 

(Do not need to start Hive)

 

Start Impala Shell:

 

./bin/impala-shell.sh

 

(WARNING: Do NOT need to start $IMPALA_HOME/thirdparty/hive-0.10.0-cdh4.5.0/bin/hiveserver2)

 


 

References:

Impala old version building: https://github.com/tomdz/impala

http://hi.baidu.com/huareal/item/52be8401cf349729a1312d66

http://hi.baidu.com/huareal/item/d651821043df5cfa86ad4eff

http://www.blogjava.net/ivanwan/archive/2006/05/18.html

Leave a comment