配置Kettle连接大数据HDFS
一、软件环境
1.Hadoop集群,版本:Hadoop3.3.0
2.ETL工具Kettle,版本:pdi-ce-7.0.0.0-25
3.MySQL服务器,版本:mysql5.7.37
二、开始
(1)解压 Kettle
使用命令:unzip pdi-ce-7.0.0.0-25.zip
(2)拷贝mysql-connect-java-5.1.32.jar到/home/hadoop/software/data-integration/lib路径下
(3)修改Kettle配置文件
使用命令进入以下配置文件:
vi data-integration/plugins/pentaho-big-data-plugin/plugin.properties
修改参数:active.hadoop.configuration=hdp24
(4)Kettle连接HDFS需要替换的Hadoop文件
文件路径为:
data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp24/
进入 /home/hadoop/software/hadoop-3.3.0/etc/hadoop目录下,把core-site.xml,mapred-site.xml,yarn-site.xml 三个文件拷贝到 /home/hadoop/software/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp24 路径下
- 拷贝core-site.xml
命令:cp core-site.xml /home/hadoop/software/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp24
- 拷贝mapred-site.xml
命令:cp mapred-site.xml /home/hadoop/software/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp24
- 拷贝yarn-site.xml
命令:cp yarn-site.xml /home/hadoop/software/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp24
(5)启动kettle
- 进入/home/hadoop/software/data-integration路径下,使用命令 ./spoon.sh 启动Kettle
- 启动成功,点击文件->新建->转换
- 找到BigData,把Hadoop File OutPut拖动出来
- 点击测试,出现以下界面说明Kettle连接大数据HDFS连接成功