Hue大数据可视化分析_hue可视化-CSDN博客

Hue是一个开源的Apache Hadoop UI系统，由Cloudera Desktop演化而来，最后Cloudera公司将其贡献给Apache基金会的Hadoop社区，它是基于Python Web框架Django实现的。可以通过Hue来集成hdfs、yarn、hive、mysql、hbase等框架，实现管理hdfs上的数据、执行MapReduce程序等等。

说明：三台机器的主机名分别为：bigdata.centos01、bigdata.centos02、bigdata.centos03。HDFS和YARN未做HA。

一、环境说明

	centos01	centos02	centos03
HDFS	NameNode DataNode	DataNode	DataNode
YARN	ResourceManager NodeManager	NodeManager	NodeManager
Hbase	Master(active) RegionServer	Master(backup) RegionServer	RegionServer
Hive	-	-	Hive
Mysql	-	-	Mysql
Hue	-	-	Hue

二、Hue下载安装

1. 下载

wget http://archive.cloudera.com/cdh5/cdh/5/hue-3.9.0-cdh5.9.3.tar.gz

2. 编译安装

安装依赖

sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make  mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

编译

编译完成会出现build文件夹

make apps

3. 基础配置

修改desktop/conf/hue.ini文件，该文件是hue的全局配置文件。

[desktop]
    # 随机串，用于会话的存储，越长越好
    secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
    # 主机和用户名的绑定
    http_host=bigdata.centos03
    http_port=8888
    # 时区设置
    time_zone=Asia/Shanghai

4. 服务启动

nohup build/env/bin/supervisor &

三、框架集成

1. HDFS集成

hdfs-site.xml(hadoop)

# 新增如下配置，默认就是true，可不配
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

core-site.html(hadoop)

# 新增如下配置
<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>

hue.ini

[[hdfs_clusters]]

    [[[default]]]

      fs_defaultfs=hdfs://bigdata.centos01:9000
      
      webhdfs_url=http://bigdata.centos01:50070/webhdfs/v1

      hadoop_conf_dir=/opt/modules/hadoop-2.5.0/etc/hadoop
	  
      hadoop_bin=/opt/modules/hadoop-2.5.0/etc/hadoop

      hadoop_hdfs_home=/opt/modules/hadoop-2.5.0

2. YARN集成

hue.ini

[[yarn_clusters]]

    [[[default]]]
      # 运行resourceManager的主机
      resourcemanager_host=bigdata.centos01

      # ResourceManager IPC监听端口
      resourcemanager_port=8032

      # 是否将作业提交到此群集
      submit_to=True

      # ResourceManager API
      resourcemanager_api_url=http://bigdata.centos01:8088

      # ProxyServer API
      proxy_api_url=http://bigdata.centos01:8088

      # HistoryServer API
      history_server_api_url=http://bigdata.centos01:19888

3. Hive集成

hue.ini

[beeswax]

  # HiveServer2运行的主机
  hive_server_host=bigdata.centos03

  # HiveServer2 Thrift server运行端口号
  hive_server_port=10000

  # hive配置文件
  hive_conf_dir=/opt/modules/hive-0.13.1-bin/conf

服务启动

# 启动hiveserver2
nohup bin/hiveserver2 &

4. Mysql集成

hue.ini

 [[[mysql]]]
      # 用于hue的界面显示
      nice_name="my db"

      # 数据库名
      name=test

      engine=mysql

      host=bigdata.centos03

      port=3306

      user=root

      password=123456

5. Hbase集成

hue.ini

[hbase]
  
  hbase_clusters=(Cluster|bigdata.centos01:9090)

  hbase_conf_dir=/opt/modules/hbase-0.98.6-cdh5.3.9/conf

服务启动

hue需要依赖于hbase的thrift服务

bin/hbase-daemon.sh start thrift

四、测试

通过hive执行sql来对集成测试，看能否查询到存储在hbase的数据，能否提交任务给yarn执行MapReduce程序得到结果数据。

select * from logs limit 10;

select searchname,count(*) num from logs group by searchname order by num desc limit 10;