Friday, October 31, 2014

Import Hadoop Source Project Into Eclipse And Build Hadoop-2.2.0 On Mac OS X

The Official Building Document is where I start. There you can find all the steps and cautions you need to build binary version of Hadoop from source. The most essential URLs are as follows:
  1. Read-only version of hadoop source provided by Apache
  2. The latest version of BUILDING.txt (You can find the corresponding BUILDING.txt from the root path of the hadoop source project)
  3. Native Libraries Guide for Hadoop-2.2.0
  4. Working with Hadoop under Eclipse
Now let's start! My environment is as follows:
JDK-1.7.0_25
Hadoop-2.2.0
Maven-3.0.5
Mac OS X-10.9.4
Protocol Buffer-2.5.0
FindBugs-3.0.0
Ant-1.9.4

After deploying all the items above, add relative environment variables to '~/.profile'.
export LD_LIBRARY_PATH="/usr/local/lib"  # For protocol buffer
export JAVA_HOME=$(/usr/libexec/java_home)
export ANT_HOME="/Users/jasonzhu/general/ant-1.9.4"
export FINDBUGS_HOME="/Users/jasonzhu/general/findbugs-3.0.0"
export HADOOP_HOME="/Users/jasonzhu/general/hadoop-2.2.0"
export PATH=$ANT_HOME/bin:$FINDBUGS_HOME/bin:$HADOOP_HOME/bin:$PATH

Remember to make it valid by 'source ~/.profile' after editing. You can double-check by issuing the following command. If all the versions prints out normally, just move on!
java -version
hadoop version
mvn -version
protoc --version
findbugs -version
ant -version

Another prerequisite is required by BUILDING.txt, in which it says "A one-time manual step is required to enable building Hadoop OS X with Java 7 every time the JDK is updated":
sudo mkdir `/usr/libexec/java_home`/Classes
sudo ln -s `/usr/libexec/java_home`/lib/tools.jar `/usr/libexec/java_home`/Classes/classes.jar

Then we are going to git clone the hadoop source project to our local filesystem:
git clone git://git.apache.org/hadoop.git

When it is done, we can check out all the remote branches in hadoop project by issuing 'git branch -r' in the root path of the project. Switch to branch '2.2.0' via 'git checkout branch-2.2.0'. Open pom.xml in the root path of the project so as to make sure it has changed to branch-2.2.0:
<modelVersion>4.0.0</modelVersion> 
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-main</artifactId>
<version>2.2.0</version> 
<description>Apache Hadoop Main</description> 
<name>Apache Hadoop Main</name>
<packaging>pom</packaging>


Still in the root path of the project, execute commands as below:
mvn install -DskipTests
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

Possible Problem #1:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hadoop-hdfs: Compilation failure
[ERROR] Failure executing javac, but could not parse the error:
[ERROR] The system is out of resources.
[ERROR] Consult the following stack trace for details.
[ERROR] java.lang.OutOfMemoryError: Java heap space
add "export MAVEN_OPTS="-Xmx2048m -XX:MaxPermSize=2048m"" to '~/.profile' and source it to make it effective.

Possible Problem #2:
[ERROR] Failed to execute goal on project hadoop-hdfs-httpfs: Could not resolve dependencies for project org.apache.hadoop:hadoop-hdfs-httpfs:war:3.0.0-SNAPSHOT: Could not find artifact org.apache.hadoop:hadoop-hdfs:jar:tests:3.0.0-SNAPSHOT in apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
At first, I'm using 'mvn install -Dmaven.test.skip=true', and the error above is thrown. Then I found out that there are some difference between '-DskipTests' and '-Dmaven.test.skip=true'. the former one compiles the tests, but not executes it, whereas the latter one doesn't compile or execute the tests. We should be aware of that.

Finally, installing 'm2e' in Eclipse and import hadoop source project.
Eclipse -> import -> Existing Maven projects.

When project imported, you don't have to be surprised by so many errors in almost all sub-projects of hadoop (Well, at least for me, there are soooo many red crosses on my projects). The most common one is "Plugin execution not covered by lifecycle configuration ... Maven Project Build Lifecycle Mapping Problem", this is caused by the asynchronized development of m2e eclipse plugin and maven itself. By now, no good solutions to this problem has been found by me. If anyone have some better idea, please leave a message, big thanks! Anyway, we can still track, read and revise the source code in eclipse before building the project from command line.

Building hadoop is a lot more easy than I thought before. There is detailed instruction in BUILDING.txt, too. The most essential part is as follows:
Maven build goals:

 * Clean                     : mvn clean
 * Compile                   : mvn compile [-Pnative]
 * Run tests                 : mvn test [-Pnative]
 * Create JAR                : mvn package
 * Run findbugs              : mvn compile findbugs:findbugs
 * Run checkstyle            : mvn compile checkstyle:checkstyle
 * Install JAR in M2 cache   : mvn install
 * Deploy JAR to Maven repo  : mvn deploy
 * Run clover                : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
 * Run Rat                   : mvn apache-rat:check
 * Build javadocs            : mvn javadoc:javadoc
 * Build distribution        : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
 * Change Hadoop version     : mvn versions:set -DnewVersion=NEWVERSION

 Build options:

  * Use -Pnative to compile/bundle native code
  * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
  * Use -Psrc to create a project source TAR.GZ
  * Use -Dtar to create a TAR with the distribution (using -Pdist)

As is said above in Native Libraries Guide for Hadoop-2.2.0, the native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform. Consequently, we can build hadoop with command:
mvn package -Pdist,docs,src -DskipTests -Dtar


© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

No comments:

Post a Comment