Pentaho Data Integration Development Guidelines
Pentaho Data Integration Development Guidelines
Development
This page intentionally left blank.
Contents
Overview .............................................................................................................................................................. 1
Directory / Folder Structures ............................................................................................................................ 2
Client/Workstation Folder Structure............................................................................................................ 2
Server Folder Structure ................................................................................................................................. 3
Development for Project ............................................................................................................................... 3
Configuration ...................................................................................................................................................... 4
Kettle Properties............................................................................................................................................. 4
Project Properties .......................................................................................................................................... 5
Content Migration Overview ............................................................................................................................. 5
Export Content ............................................................................................................................................... 5
Import Content ............................................................................................................................................... 5
Related Information ........................................................................................................................................... 6
Best Practice Check List ..................................................................................................................................... 7
This page intentionally left blank.
Overview
This document is designed to give developers and administrators best practices around the set up
and configuration of directories to be used for Pentaho Data Integration (PDI) development and
execution.
Keep these Pentaho Architecture principles in mind while you are working through this document:
Some of the things discussed here include folder structures for workstations and servers,
configuration, and migrating content.
The intention of this document is to speak about topics generally; however, these are the specific
versions covered here:
Software Version
Pentaho Data Integration 4.x, 5.x, 6.x, 7.x
Content: Transformations and jobs will be stored in this folder. Root folders of home and public
should always be used.
Config: Project properties and other configuration files
Input: Files that are input to the solution
Output: Files that are produced as part of this solution
Env: Environment folder where each unique environment variation is stored. The following
variants will help you create different folders within one project:
o Pentaho Version: Each Version should have its own folder.
o Content Storage: If you are connecting to files or a repository.
o Server Environment: Dev, Testing, and Production should all have their own env variant.
o Container: Where the variant is intended to execute (workstation, carte, or DI server).
project1
-content
--public
---project1
--home
---admin
---user1
-config
--project1.properties
-input
-output
-env
--ws_files_v7
---.kettle
----kettle.properties
----repositories.xml
----shared.xml
---spoon.bat/sh
--ws_files_v6
---.kettle
----kettle.properties
----repositories.xml
----shared.xml
---spoon.bat/sh
Note that logs are collected centrally for ease of monitoring, archive, and maintenance. Parameters,
inputs, and outputs are all collected per project.
server
-data-integration-server or -pentaho-server (if using Pentaho 7.0)
--pentaho-solutions
---system
----slave-config.xml
-data-integration
set KETTLE_HOME=U:/projects/project1/env_ws_files_v51
call U:\myfiles\pdi-ee-client-5.1\data-integration\Spoon.bat
Kettle Properties
Each JVM that runs Pentaho Data Integration should source the default kettle properties for that
environment. Each kettle.properties should define at least these minimum global settings.
Additional project specific setting should be in the project.properties files, not the global
kettle.properties file.
KETTLE_CHANNEL_LOG_SCHEMA=
KETTLE_CHANNEL_LOG_DB=
KETTLE_CHANNEL_LOG_TABLE=
KETTLE_JOB_LOG_DB=
KETTLE_JOB_LOG_SCHEMA=
KETTLE_JOB_LOG_TABLE=
KETTLE_JOBENTRY_LOG_SCHEMA=
KETTLE_JOBENTRY_LOG_DB=
KETTLE_JOBENTRY_LOG_TABLE=
KETTLE_TRANS_LOG_SCHEMA=
KETTLE_TRANS_LOG_DB=
KETTLE_TRANS_LOG_TABLE=
KETTLE_STEP_LOG_SCHEMA=
KETTLE_STEP_LOG_DB=
KETTLE_STEP_LOG_TABLE=
KETTLE_TRANS_PERFORMANCE_LOG_DB=
KETTLE_TRANS_PERFORMANCE_LOG_SCHEMA=
KETTLE_TRANS_PERFORMANCE_LOG_TABLE=
KETTLE_REDIRECT_STDERR=Y
KETTLE_REDIRECT_STDOUT=Y
PROJECT_DIR=\projects
ActiveProject.Home=$PROJECT_DIR\$PROJECT_NAME
project1_target_hostname=192.168.1.1
project1_target_db=dbname
project1_target_user=admin
project1_target_pass=password
External repository (SVN, git, etc…) - With this method, you would use external tools and
methods to migrate the content to the server. Typically, a checkout of the project.
Pentaho Repository (Enterprise, Database, File) - When migrating from a repository, the
content must be exported from the workstation and imported into the server repository.
Export Content
Graphical Export - The entire repository or individual folders can be exported.
Command Line - A command line can be used for the export step where many options are
available. This is often used in script automation scenarios.
Job step - There is a job step that can be used to selectively export repository contents
based on variables and parameters.
Import Content
Graphical Import - The contents of a prior repository export can be imported through the
Spoon Repository menu. To keep the directories correct, always choose the “/” node for
import.
Command Line - A command line can be used for the import step where many options are
available. This is often used in script automation scenarios.
1
Roland Bouman’s blog post on Managing kettle job configurations has more details on working with project.properties files
in kettle.