This article shows how to build Cloudera Impala from scratch, including compiling, linking and using.
My Environment:
CentOS 6.5
No network proxy (if set there would be a lot of downloading problems)
Impala Commit (Version):
Change-Id: I3b5cefb4d7193045fc6fc5e94766589c2299b5b1
commit d90f3f3fd1a134578b1860be1b2f41a57a8d8896 1 parent ee40ba2
Get Impala source:
git clone https://github.com/cloudera/impala
1. Install tools:
sudo yum install boost-test boost-program-options libevent-devel automake libtool flex bison gcc-c++ openssl-devel \
make cmake doxygen.x86_64 glib-devel boost-devel python-devel bzip2-devel svn libevent-devel cyrus-sasl-devel \
wget git unzip
2. Uninstall the old boost library (yum remove boost) of CentOS 6.5, and install the Boost lib 1.46.1.
For example:
export BOOST_ROOT=’/usr/local/boost_1_46_0
cd boost_1_46_1/
./bjam threading=multi –layout=tagged
./bjam threading=multi –layout=tagged install
3. Install LLVM 3.3:
wget http://llvm.org/releases/3.3/llvm-3.3.src.tar.gz
tar xvzf llvm-3.3.src.tar.gz
cd llvm-3.3.src/tools
svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_33/final/ clang
cd ../projects
svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_33/final/ compiler-rt
cd ..
./configure –with-pic
make -j4 REQUIRES_RTTI=1
sudo make install
4. Set up the JDK path. (environment variable: /etc/.bashrc, ~/.bash_profile)
5. Install Maven 3.0.4:
wget http://www.fightrice.com/mirrors/apache/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz
tar xvf apache-maven-3.0.4.tar.gz && sudo mv apache-maven-3.0.4 /usr/local
Update ~/.bashrc,add the environment variables:
export M2_HOME=/usr/local/apache-maven-3.0.4
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
source ~/.bashrc
mvn -version
6. Check the paths in bin/set-classpath.sh and Build Impala:
cd $IMPALA_HOME
./build_all
7. If there are errors like no -lboost_date_time, update it to -lboost_date_time-mt in the Makefile.
Other problems are similar, change them all to “*-mt” or update them to a new version, these problems are searchable on Google.
8. easy_install prettytable
easy_install thrift
(9. Build the thirdparty files, it seems automatic in the new Impala versions.)
(10. Download setuptools-5.1.zip and install it if needed when building.)
Configure and start Impala: (The blue text should be customized)
1. The /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.24.132 master
2. Configure and run HDFS/Hadoop:
(1) Create the hadoop data directory (In this case, file:///home/yc/hdfs)
(2) Mkdir the /var/run/hadoop-hdfs
(3) Configure XML of Hadoop in the following directory:
$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/etc/hadoop
The configured “hdfs-site.xml” should look like this, be careful about “dn.50010“.
hdfs-site.xml
========================================================================
<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/yc/hdfs</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader.local</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
</property>
<property>
<name>dfs.block.local-path-access.user</name>
<value>root</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout</name>
<value>5000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn.50010</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout.millis</name>
<value>10000</value>
</property>
</configuration>
========================================================================
core-site.xml
========================================================================
<configuration>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader.local</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yc/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
========================================================================
yarn-site.xml
========================================================================
<?xml version=”1.0″?>
<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
========================================================================
mapred-site.xml:
========================================================================
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
========================================================================
hive-site.xml:
========================================================================
<configuration>
</configuration>
========================================================================
Copy
core-site.xml hdfs-site.xml hive-site.xml
into
$IMPALA_HOME/conf
Configure
$IMPALA_HOME/bin/set-classpath.sh
as follows:
========================================================================
CLASSPATH=\
$IMPALA_HOME/conf:\
$IMPALA_HOME/fe/src/test/resources:\
$IMPALA_HOME/fe/target/classes:\
$IMPALA_HOME/fe/target/dependency:\
$IMPALA_HOME/fe/target/test-classes:
for jar in `ls ${IMPALA_HOME}/fe/target/dependency/*.jar`; do
CLASSPATH=${CLASSPATH}:$jar
done
export CLASSPATH
========================================================================
Format HDFS:
$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/bin/hdfs datanode -format
$IMPALA_HOME/thirdparty/hadoop-2.0.0-cdh4.5.0/sbin/start-all.sh
Optional:
(./bin/hdfs datanode)
Type in command “jps”, you will see things like this (number would be different):
1404
54375 NameNode
54646 NodeManager
558 SecondaryNameNode
54663 ResourceManager
6545 Jps
384 DataNode
54695 NodeManager
54727 NodeManager
2061
65490 NameNode
Create directories in HDFS:
$HADOOP_HOME/bin/hdfs dfs -mkdir /tmp
$HADOOP_HOME/bin/hdfs dfs -mkdir /user
$HADOOP_HOME/bin/hdfs dfs -mkdir /user/impala
$HADOOP_HOME/bin/hdfs dfs -mkdir /user/impala/tab1
Put data into HDFS:
$HADOOP_HOME/bin/hdfs dfs -put ./tab1.csv /user/impala/tab1
3. Start Impala Daemons:
cd $IMPALA_HOME
./be/build/debug/statestore/statestored
./bin/start-impalad.sh
./bin/start-catalogd.sh
(Do not need to start Hive)
Start Impala Shell:
./bin/impala-shell.sh
(WARNING: Do NOT need to start $IMPALA_HOME/thirdparty/hive-0.10.0-cdh4.5.0/bin/hiveserver2)
References:
Impala old version building: https://github.com/tomdz/impala
http://hi.baidu.com/huareal/item/52be8401cf349729a1312d66