Ubuntu20.04 hadoop3.3.1 源代码编译

最编程 2024-03-12 20:24:10

...

一、概述

总体编译下来较为顺利，也遇到了一些问题，会在下面概述一下

1.环境准备

1.ubuntu20.04桌面版，在虚拟机vmware中安装好

2.下载hadoop3.3.1源码包https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-src.tar.gz 或者自行进入官网下载

3.编译环境准备，这个主要解压缩源码包中有一个BUILDING.txt （好好看这个文件，里面有依赖包安装命令，编译步骤等等）打开查看

* Unix System
* JDK 1.8
* Maven 3.3 or later
* Protocol Buffers 3.7.1 (if compiling native code)
* CMake 3.1 or newer (if compiling native code)
* Zlib devel (if compiling native code)
* Cyrus SASL devel (if compiling native code)
* One of the compilers that support thread_local storage: GCC 4.8.1 or later, Visual Studio,
Clang (community version), Clang (version for iOS 9 and later) (if compiling native code)
* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
* Doxygen ( if compiling libhdfspp and generating the documents )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
* python (for releasedocs)
* bats (for shell code testing)
* Node.js / bower / Ember-cli (for YARN UI v2 building)

这里主要提一下我们为什么要重新编译hadoop，由于appache给出的hadoop的安装包没有提供带C程序访问的接口，所以我们在使用本地库（本地库可以用来做压缩，以及支持C程序等等）的时候就会出问题,需要对Hadoop源码包进行重新编译

所以对于上面标注 if compiling native code 的依赖软件包我们都需要在编译前安装， Protocol Buffers 3.7.1 这个版本就需要按照给出的确定版本来安装，不然后面可能会出现各种奇怪的问题

建议

2.开始编译

openjdk安装配置，maven的安装代理镜像的配置，这些都不再多说，网上有很多，

执行：mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true 不编译帮助文档

3.问题概述

1. 我在后期编译过程中出现磁盘不足的情况报错，后来用虚拟机扩充了磁盘才执行完成，所以第一步安装虚拟机器的时候一定要给足够的磁盘，我开始给的是20g最后不够了，所以这里建议给40g的磁盘空间

2.openjdk的问题，很多使用 sudo apt-get install openjdk的方式安装，这个有很多问题，例如JAVA_HOME环境变量没有配置，这个在后面CMake编译的时候会使用到这个环境变量，还有就是缺少tools.jar, 缺少include文件夹等等一些列问题，所以这里建议不要使用命令安装，直接去官网离线下载openjdk的包，然后按照我们平时的安装步骤，解压缩，配置/etc/profile 配置~/.bashrc的方式来完成,具体的上网搜索

openjdk离线包下载地址 https://download.java.net/openjdk/jdk8u41/ri/openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz

openJDK安装后,ubuntu下面可能会出现，每次切换用户（包括切换到root）后需要执行source /etc/profile的方式，不然就在后期执行过程中午发获取到java的环境

方法一：

在 ~/.bashrc 里面加一句 source /etc/profile ,然后执行一次 source ~/.bashrc 使该文件生效即可。

方法二：

直接将配置语句写在 ~/.bashrc 里面，然后执行一次 source ~/.bashrc 使该文件生效即可。
这个问题需要解决，不然会在后面执行CMake中获取JAVA_HOME环境变量获取不到导致报错，编译失败

3.编译到YARN相关模块的时候报错，报了一个

The engine "node" is incompatible with this module. Expected version ">=12.0.0".
error Found incompatible module

开始我以为本地环境少了nodejs包，就在本地安装了nodejs，并且ubuntu20.04默认安装的版本较低，这个要安装高版本的还需要做一些操作，具体感兴趣自行上网搜索，但是全部安装完成后，还是报错
然后我就仔细阅读了编译日志的输出，发现在编译过程中会执行一个node的安装，版本是8.x的，当时就觉得很郁闷，所以我就找到了对应的模块包，打开了对应的pom文件
hadoop-3.3.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-catalog\hadoop-yarn-applications-catalog-webapp 我的是这个模块编译报错
然后我在这个子模块的pom文件中发现了

<plugin>
<groupId>com.github.eirslett</groupId>
<artifactId>frontend-maven-plugin</artifactId>
<configuration>
<workingDirectory>target</workingDirectory>
</configuration>
<executions>
<execution>
<id>install node and yarn</id>
<goals>
<goal>install-node-and-yarn</goal>
</goals>
<phase>generate-resources</phase>
<configuration>
<nodeVersion>v8.11.3</nodeVersion>
<yarnVersion>v1.7.0</yarnVersion>
</configuration>
</execution>
<execution>
<id>yarn install</id>
<goals>
<goal>yarn</goal>
</goals>
</execution>
</executions>
</plugin>

所以我把这个版本给换成了v12.22.10，然后重新编译，完美通过这个坑

3.总结

首先编译开搞以前，一定要看BUILDING.txt，不然你会踩很多坑

然后就是openjdk安装，最好是自行下载包安装配置JAVA_HOME等环境变量

最后就是遇到问题一定要多看编译的错误日志，根据错误日志分析解决

最后附上我编译好的hadoop包：链接: https://pan.baidu.com/s/1110Gmmas2UtIe379g37oYQ?pwd=tbbg 提取码: tbbg

上一篇：在 MR 作业提交过程中指定第三方依赖性 jars

下一篇： yarn: command not found

Ubuntu20.04 hadoop3.3.1 源代码编译

一、概述

1.环境准备

2.开始编译

3.问题概述

3.总结

编译并安装 gcc 源代码

监听 Rust 源代码上的 GPT - 编译器(36)

监听 Rust 源代码上的 GPT - 编译器(33)

监听 Rust 源代码上的 GPT - 编译器(32)

监听 Rust 源代码上的 GPT - 编译器(41)

Hadoop 入门教程（超详细） - 4. Hadoop 编译源代码

Open3D 编译和安装 - 下载源代码

xposed 源代码编译和安装 - 第 2 步：编译 XposedBridge

使用 QFIL 编译和闪存高通 sdm845_la2.0 源代码

openwrt 开发说明 I：源代码下载和编译