apache hadoop三种架构介绍(standAlone,伪分布,分布式环境介绍以及安装)

发布于:2021-11-29 04:32:36

apache hadoop三种架构介绍
standAlone,伪分布,分布式环境介绍以及安装

1、伪分布式的运行环境:只在一台机器上面就可以运行我们的hadoop


bin:存放集群运行启动以及一些管理脚本


sbin:存放集群运行启动以及一些管理脚本


etc/hadoop:存放的是配置文件的路径


lib/native 本地库,很重要,本地库主要是用于我们的数据的压缩,支持我们的C程序访问


本地库的检测命令:bin/hadoop checknative


本地C程序的访问库


hadoop: true /export/servers/hadoop-2.7.5/lib/native/libhadoop.so.1.0.0


压缩库


zlib: true /lib64/libz.so.1


snappy: false 谷歌出品的一种压缩算法,一般都是使用snappy来进行压缩


lz4: true revision:99


bzip2: false


hadoop当中最少更改6个配置文件


1、core-site.xml:核心配置文件,决定了我们的hadoop集群是单机版的,还是分布式的


2、hadoop-env.sh:配置jdk的环境变量


3、hdfs-site.xml:hdfs模块相关的配置


4、mapred-site.xml:关于mr的配置都在这里


5、yarn-site.xml:关于yarn集群的配置


6、slaves:决定我们的从节点在哪些机器上面


standALone运行环境


core-site.xml配置












fs.defaultFS
hdfs://192.168.52.100:8020


hadoop.tmp.dir
/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas



io.file.buffer.size
4096




fs.trash.interval
10080



hdfs-site.xml的配置














dfs.namenode.secondary.http-address
node01:50090




dfs.namenode.http-address
node01:50070




dfs.namenode.name.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2



dfs.datanode.data.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2




dfs.namenode.edits.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/nn/edits




dfs.namenode.checkpoint.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/snn/name


dfs.namenode.checkpoint.edits.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits




dfs.replication
3



dfs.permissions
false



dfs.blocksize
134217728



mapred-site.xml的配置











mapreduce.framework.name
yarn




mapreduce.job.ubertask.enable
true




mapreduce.jobhistory.address
node01:10020



mapreduce.jobhistory.webapp.address
node01:19888



yarn-site.xml的配置








yarn.resourcemanager.hostname
node01


yarn.nodemanager.aux-services
mapreduce_shuffle



yarn.log-aggregation-enable
true


yarn.log-aggregation.retain-seconds
604800



slaves文件的配置


localhost

hadoop-env.sh配置


export JAVA_HOME=/export/servers/jdk1.8.0_141

伪分布式的运行环境


伪分布式的运行环境,是在standAlone环境的基础上面转换过来的


有多个从节点,运行在多个机器上面


主要就是在slaves文件的基础上面更改 即可


然后重新启动即可


core-site.xml配置












fs.defaultFS
hdfs://192.168.52.100:8020


hadoop.tmp.dir
/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas



io.file.buffer.size
4096




fs.trash.interval
10080



hdfs-site.xml的配置














dfs.namenode.secondary.http-address
node01:50090




dfs.namenode.http-address
node01:50070




dfs.namenode.name.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2



dfs.datanode.data.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2




dfs.namenode.edits.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/nn/edits




dfs.namenode.checkpoint.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/snn/name


dfs.namenode.checkpoint.edits.dir
file:///export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits




dfs.replication
3



dfs.permissions
false



dfs.blocksize
134217728



mapred-site.xml的配置











mapreduce.framework.name
yarn




mapreduce.job.ubertask.enable
true




mapreduce.jobhistory.address
node01:10020



mapreduce.jobhistory.webapp.address
node01:19888



yarn-site.xml的配置








yarn.resourcemanager.hostname
node01


yarn.nodemanager.aux-services
mapreduce_shuffle



yarn.log-aggregation-enable
true


yarn.log-aggregation.retain-seconds
604800



slaves文件的配置


node01
node02
node03

hadoop-env.sh配置


export JAVA_HOME=/export/servers/jdk1.8.0_141

完全分布式环境搭建运行


使用完全分布式,实现namenode高可用,ResourceManager的高可用集群运行服务规划


停止掉所有的伪分布式环境,删除hadoop的安装包,重新解压,重新配置,重新启动


第一台机器执行以下命令进行解压


cd /export/softwares
tar -zxvf hadoop-2.7.5.tar.gz -C ../servers/

配置文件的修改:使用ipad++


core-site.xml配置











ha.zookeeper.quorum
node01:2181,node02:2181,node03:2181



fs.defaultFS
hdfs://ns



hadoop.tmp.dir
/export/servers/hadoop-2.7.5/data/tmp



fs.trash.interval
10080



hdfs-site.xml的配置













dfs.nameservices
ns



dfs.ha.namenodes.ns
nn1,nn2




dfs.namenode.rpc-address.ns.nn1
node01:8020



dfs.namenode.rpc-address.ns.nn2
node02:8020



dfs.namenode.servicerpc-address.ns.nn1
node01:8022



dfs.namenode.servicerpc-address.ns.nn2
node02:8022




dfs.namenode.http-address.ns.nn1
node01:50070



dfs.namenode.http-address.ns.nn2
node02:50070




dfs.namenode.shared.edits.dir
qjournal://node01:8485;node02:8485;node03:8485/ns1



dfs.client.failover.proxy.provider.ns
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider




dfs.ha.fencing.methods
sshfence




dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa



dfs.journalnode.edits.dir
/export/servers/hadoop-2.7.5/data/dfs/jn



dfs.ha.automatic-failover.enabled
true



dfs.namenode.name.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/name



dfs.namenode.edits.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/edits



dfs.datanode.data.dir
file:///export/servers/hadoop-2.7.5/data/dfs/dn



dfs.permissions
false



dfs.blocksize
134217728



mapred-site.xml的配置











mapreduce.framework.name
yarn



mapreduce.jobhistory.address
node03:10020



mapreduce.jobhistory.webapp.address
node03:19888



mapreduce.jobtracker.system.dir
/export/servers/hadoop-2.7.5/data/system/jobtracker



mapreduce.map.memory.mb
1024




mapreduce.reduce.memory.mb
1024




mapreduce.task.io.sort.mb
100





mapreduce.task.io.sort.factor
10



mapreduce.reduce.shuffle.parallelcopies
25


yarn.app.mapreduce.am.command-opts
-Xmx1024m



yarn.app.mapreduce.am.resource.mb
1536



mapreduce.cluster.local.dir
/export/servers/hadoop-2.7.5/data/system/local



yarn-site.xml的配置
















yarn.log-aggregation-enable
true





yarn.resourcemanager.ha.enabled
true



yarn.resourcemanager.cluster-id
mycluster



yarn.resourcemanager.ha.rm-ids
rm1,rm2



yarn.resourcemanager.hostname.rm1
node03



yarn.resourcemanager.hostname.rm2
node02




yarn.resourcemanager.address.rm1
node03:8032


yarn.resourcemanager.scheduler.address.rm1
node03:8030


yarn.resourcemanager.resource-tracker.address.rm1
node03:8031


yarn.resourcemanager.admin.address.rm1
node03:8033


yarn.resourcemanager.webapp.address.rm1
node03:8088




yarn.resourcemanager.address.rm2
node02:8032


yarn.resourcemanager.scheduler.address.rm2
node02:8030


yarn.resourcemanager.resource-tracker.address.rm2
node02:8031


yarn.resourcemanager.admin.address.rm2
node02:8033


yarn.resourcemanager.webapp.address.rm2
node02:8088



yarn.resourcemanager.recovery.enabled
true



yarn.resourcemanager.ha.id
rm1
If we want to launch more than one RM in single node, we need this configuration




yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore


yarn.resourcemanager.zk-address
node02:2181,node03:2181,node01:2181
For multiple zk services, separate them with comma



yarn.resourcemanager.ha.automatic-failover.enabled
true
Enable automatic failover; By default, it is enabled only when HA is enabled.


yarn.client.failover-proxy-provider
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider



yarn.nodemanager.resource.cpu-vcores
4



yarn.nodemanager.resource.memory-mb
512



yarn.scheduler.minimum-allocation-mb
512



yarn.scheduler.maximum-allocation-mb
512



yarn.log-aggregation.retain-seconds
2592000



yarn.nodemanager.log.retain-seconds
604800



yarn.nodemanager.log-aggregation.compression-type
gz



yarn.nodemanager.local-dirs
/export/servers/hadoop-2.7.5/yarn/local



yarn.resourcemanager.max-completed-applications
1000



yarn.nodemanager.aux-services
mapreduce_shuffle




yarn.resourcemanager.connect.retry-interval.ms
2000



slaves文件的配置


node01
node02
node03

hadoop-env.sh配置


export JAVA_HOME=/export/servers/jdk1.8.0_141

集群启动过程
将第一台机器的安装包发送到其他机器上
第一台机器执行以下命令:


cd /export/servers
scp -r hadoop-2.7.5/ node02:$PWD
scp -r hadoop-2.7.5/ node03:$PWD

三台机器上共同创建目录
三台机器执行以下命令


mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits

更改node02的rm2(重要)
第二台机器执行以下命令


cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml



yarn.resourcemanager.ha.id
rm2
If we want to launch more than one RM in single node, we need this configuration


启动HDFS过程:
node01机器执行以下命令


cd /export/servers/hadoop-2.7.5
bin/hdfs zkfc -formatZK
sbin/hadoop-daemons.sh start journalnode
bin/hdfs namenode -format
bin/hdfs namenode -initializeSharedEdits -force
sbin/start-dfs.sh

node02上面执行


cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode

启动yarn过程:
node03上面执行


cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh

node02上执行


cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh

查看resourceManager状态:
node03上面执行


cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm1

node02上面执行


cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm2

node03启动jobHistory:
node03机器执行以下命令启动jobHistory


cd /export/servers/hadoop-2.7.5
sbin/mr-jobhistory-daemon.sh start historyserver

hdfs状态查看
node01机器查看hdfs状态
页面访问:
http://192.168.52.100:50070/dfshealth.html#tab-overview
node02机器查看hdfs状态
页面访问:
http://192.168.52.110:50070/dfshealth.html#tab-overview


yarn集群访问查看
页面访问:
http://node03:8088/cluster
历史任务浏览界面
页面访问:
http://192.168.52.120:19888/jobhistory

相关推荐

最新更新

猜你喜欢