How To Extend RapidMiner 5
How To Extend RapidMiner 5
RapidMiner 5
Extend
Rapid-I www.rapid-i.com
c 2012 by Rapid-I GmbH. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of Rapid-I GmbH.
Contents
1 Introduction 2 Using the Scripting Operator 2.1 Writing the Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Connecting with other Operators . . . . . . . . . . . . . . . . . . . 3 The 3.1 3.2 3.3 3.4 3.5 RapidMiner data storage strategy The Example Table . . . . . . . . . The ExampleSet and its Attributes More than one ExampleSet . . . . . Changing data on the y . . . . . . The ExampleSet layer stack . . . .
1 3 4 6 9 10 11 14 15 16 19 21 21 22 24 26 28 30 31 33 34
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 Creating your own Extension 5 Building Operators 5.1 Our rst operator . . . . . . . . . . . . . . . 5.2 Adding Ports . . . . . . . . . . . . . . . . . . 5.3 Declaring operators to RapidMiner . . . . . 5.4 Adding preconditions to input ports . . . . . 5.5 Adding generation rules to the output ports 5.6 Adding documentation to the operators . . . 5.7 Creating super operators . . . . . . . . . . . 5.8 Adding a PortExtender . . . . . . . . . . . . 5.9 Adding meta data transformation rules . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . parameters
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
36 36 39 40 43 44 46 51 53
6 Building special data objects 6.1 Dening the object class . . . . . 6.2 Processing your own IOObjects . 6.3 Taking a look into your IOObject 6.4 Leaving the 80s . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
7 Publishing a RapidMiner Extension 59 7.1 The extension bundle . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7.2 The ant build le . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8 Using advanced Extension mechanism 8.1 The PluginInit class . . . . . . . . . . . . . 8.2 Adding custom congurators . . . . . . . . 8.2.1 Usage . . . . . . . . . . . . . . . . . 8.2.2 Customizing the conguration panel 8.3 Adding custom GUI elements . . . . . . . 8.4 Adding custom actions to the GUI . . . . . 69 69 70 71 75 78 81
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
1 Introduction
If you are reading this tutorial, you probably have already installed RapidMiner 5 and gained some experience by playing around with the enormous set of operators. Chances are that you already have been part of the RapidMiner Community for some time and it already has been quite a while ago, since you last developed your own extension. Back then you might have developed for RapidMiner 4.x, in which case you will probably notice the great number of changes from version 4.6 to 5.0 immediately: The new ow layout gives a complete new quality of insight into your processes, even for untrained users. The typed ports give detailed information what kind of input is desired and make process design a much simpler game. Where you had to remember the name of attributes in earlier versions, you now can select them from a drop-down menu, even if the process has never been run! These and several other improvements make the life of todays data analysts much easier and they can spend much more time with their family instead of having to wait for a restarted process because of a typo in an attributes name. But even with the huge amount of functions provided by RapidMiner, sometimes you have a problem at hand, that is unsolvable or only solvable with what seems to be a too complex process. Then you have two choices:
1. Introduction
On the one hand you could use the built-in scripting operator for writing a quick and dirty hack. If this solves your problem, very well, go ahead. Chapter Using the Scripting Operator will illustrate how to access the RapidMiner API without even starting an IDE. The other solution is to build your own extension to RapidMiner, providing new operators and new data objects with all the functionality of RapidMiner 5. This option is more heavy weight, so it really depends on the task at hand and the need for reusability, if its worth to go this way. If its a more general problem or if you are going to implement something like a new learning scheme, building an extension is denitively the best way to let the community participate in your work: You let all members prot from your achievements and they will give you valuable feedback. And always keep in mind, that its a good feeling to know, that your piece of software is still used by someone and you didnt waste all the time you spent hunting bugs. As a more experienced user, you might already have written a plug-in for the old versions of RapidMiner. Then you will be confronted with the down-side of all the advantages of version 5: We unfortunately had to break with the backward compatibility to 4.x. All these features simply didnt t into the old plug-in framework, and so we decided to rather publish a new extension mechanism than articially limiting its possibilities. Thats why you will have to change some code in order to port your old plug-ins to RapidMiner 5. Where we thought it helpful, there will be short hints. For easily recognizing these paragraphs, they will be shaded with light gray, so that you might skip uninteresting parts without missing valuable information.
Using the Scripting Operator Lets assume we have the following situation: We get data from a machine, that counts the seconds since it was switched on. Each entry in this log le has this time stamp. Unfortunately other data sources we are going to use have an absolut time stamp. So we have to transform the relative format into a regular date and time format. Since RapidMiner doesnt provide an operator solving this particular problem, we decide to write a small script. This problem doesnt seem to be worth the eort of building a complete extension, because we cant believe there are many other machines around, that dont have an integrated clock, and so dont expect to be able to reuse an extension. Hence we prefer to build a simple process, which should do the trick:
Figure 2.1: A simple process for applying a script As a rst step we are going to load the data and then directly apply our script. As a last step we will do some date adjustment, but we will come back to this later. After loading we have an ExampleSet consisting of a number of attributes, describing the machines state. They are called att1, att2 to att500. The time
stamp is contained in an attribute named relative time. During scripting we might ignore the states attributes. We just want to focus on the one single attribute relative time. Next we insert an Execute Script operator. It lets us implement a simple program written using the Groovy scripting language. This script can be entered in the script parameter of the operator. The language is quite equal to Java, but if you need further documentation, you may refer to the Groovy homepage at http: //groovy.codehaus.org/.
We now have the ExampleSet stored in a local variable and might use the whole RapidMiner API for accessing data. Since we are going to transform the relative time attribute we utilize the Attribute object of the example set to retrieve this Attribute:
1 2
We now have access to the attribute and its values stored inside the single examples. But we want to create a new date attribute and we cannot change the type of an existing attribute. So we have to create a new one. We could give it any arbitrary name, but for now it seems to be reasonable to just wrap a date( ) around the old name. Therefore we extract the old name and create a new Attribute object:
1 2
If we execute this script, it will crash, because it doesnt know the Ontology class, which denes the value types of RapidMiners attributes. To solve this problem, we have to import it manually, as we would have to do with any class, thats not part of the standard imports. So we will add the following line at the top of the script:
1
import com . r a p i d m i n e r . t o o l s . Ontology ; ExampleSet ex ampl eSe t = i n p u t [ 0 ] ; A t t r i b u t e s a t t r i b u t e s = ex ampl eSe t . g e t A t t r i b u t e s ( ) ; A t t r i b u t e s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e time ) ; S t r i n g newName = ( d a t e ( + s o u r c e A t t r i b u t e . getName ( ) + ) ; A t t r i b u t e t a r g e t A t t r i b u t e = A t t r i b u t e F a c t o r y . c r e a t e A t t r i b u t e ( newName , Ontology . DATE TIME) ;
Now we have created a new attribute, but it has not been attached to any of the underlying data columns, yet. What we have to do now, is to connect the new Attribute with the values of the old one. We could insert a new column into the data table, or just reuse the old. Since reusing saves copying of the data, we take this approach here. The mechanics of the data storage will be described in the next chapter in detail.
1
Now the new date attribute will use the old integer values as if they would have been dates. The problem is that the formats are not compatible: The date attribute will save dates using milliseconds after the 1st of January 1970. The integer in our attribute contained the seconds after the rst start up of the machine. At rst we will tackle the problem with the wrong unit. We have to multiply each entry with 1000 to convert the seconds to milliseconds. The problem is, that we cannot access the new attribute yet, because it isnt part of the example set. We will change that, by adding it to the example sets attributes and removing the old attribute:
1 2
a t t r i b u t e s . addRegular ( t a r g e t A t t r i b u t e ) ; a t t r i b u t e s . remove ( s o u r c e A t t r i b u t e ) ;
Only thing we have to do now is to iterate over all examples, get the value of the attribute, multiply it with 1000 and write it back. This is fairly easy:
1 2 3 4
All we have to do now is to return the example set. If we want to return more than one data object, we could wrap it in an array. The outgoing ports of the script operator will deliver the corresponding object in the array: The rst port the rst element of the array, the second the second and so on. This time, we simply could return the single object, because we only have one output. The complete code now looks like:
1 2 3 4 5 6 7
import com . r a p i d m i n e r . t o o l s . Ontology ; ExampleSet ex ampl eSe t = i n p u t [ 0 ] ; A t t r i b u t e s a t t r i b u t e s = ex ampl eSe t . g e t A t t r i b u t e s ( ) ; A t t r i b u t e s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e time ) ; S t r i n g newName = ( d a t e ( + s o u r c e A t t r i b u t e . getName ( ) + ) ; A t t r i b u t e t a r g e t A t t r i b u t e = A t t r i b u t e F a c t o r y . c r e a t e A t t r i b u t e ( newName , Ontology . DATE TIME) ; ta rge tA ttr ibu te . setTableIndex ( sourceAttribute . getTableIndex () ) ; a t t r i b u t e s . addRegular ( t a r g e t A t t r i b u t e ) ; a t t r i b u t e s . remove ( s o u r c e A t t r i b u t e ) ; f o r ( Example example : ex ampl eSe t ) { double timeStampValue = example . g e t V a l u e ( t a r g e t A t t r i b u t e ) ; example . s e t V a l u e ( t a r g e t A t t r i b u t e , timeStampValue 1 0 0 0 ) ; } return ( e xam pleS et ) ;
8 9 10 11 12 13 14 15 16 17
use this operator to adjust the date: We have written a script to transform the seconds after startup time into a date format. But this is now relative to the 1st January 1970 and not to the startup time. So we want to use the Adjust Date operator to correct this. With correct parameter settings, it will add the dierence between the startup time of the machine and the 1st January 1970. But when trying to select the correct attribute, we notice one of the limitations of the scripting operator: It doesnt take care of the meta data of data objects. Every information in the meta data is lost and so one cannot select the attributes in the drop down list, we have to type it manually. The process then works, but if you have become used to the benets from the meta data transformation, you probably wont like to loose them, especially not in a more complex process setup. The only way of not loosing them when writing your own code is to build your own Extension to RapidMiner. The next chapters will show how this works, and how meta data can be treated correctly.
Chances are that you have made rst contact with the RapidMiner API for accessing data in the script above. If you are already an experienced RapidMiner developer and have already written plug-ins for RapidMiner 4.x, you are already familiar with the underlying data structures, you might skip this part. Although there have been several improvements in details, the concepts havent been changed. If you still read this, you might ask, why theres a complete section about such a simple thing like storing data. But storing data isnt as simple as it sounds, if we have certain requirements like they occur frequently in data mining tasks. High data volume with both a high number of rows which might grow into the millions and in the same time a high number of columns. Especially in text mining tasks, working on over 100.000 columns is very common. Data might be sparse, that means that only a very small fraction of entries diers from a default value. Data is accessed in many dierent ways, sequentially or in random order, read or written or both. Data manipulation is crucial, but not only single values have to be altered. In many applications hole columns or rows must be added or removed. For cross-validation complete folds have to be selected or deselected.
Data might be of dierent types like numbers, dates, times, words or whole texts. Some columns might have a dierent meaning, as well in reality as for the analysis. One might be the classication, others might be input from sensors. The order of rows must be changeable; some algorithms need a random sequence, some other a special ordering. These requirements need a special treatment and this makes everything a little bit more complex. What you have seen in the script example above was the surface of a layer concept, we will describe in detail now. In the next section we will begin our introduction with the basement: The ExampleTable.
row n
Figure 3.1: The inner structure of an ExampleTable. Columns exist only logically as indicated by the dotted lines.
10
We see this in the image above, where the single numerical values are shown as black boxes inside the grey boxes of the rows. The columns are logically present, that means each value can be addressed using the column index, but since the columns are not represented by objects, they are only indicated by the dotted lines. The ExampleTable combines an arbitrary number of these rows, which are represented by the DataRow interface. There are some dierent implementations of the DataRow interface, using either dierent java number types like double, oat or int for data storage or saving the row in a sparse manner: Values dierent from zero are stored together with an index, so if one retrieves the value of column x, the array of indices is searched for x, if found the respective value will be returned. The dierent data types may save memory consumption hence a oat only consumes four bytes and saves the four bytes compared to a double. But this is paid with a loss of precision: Rounding errors might occur, or if you switch to integer representation, the fractional part is lost.
11
ExampleSet
attribute 1 attribute 2 att2 n n attribute 3 att3 n attribute 4 att4
att1
n
example 1 example 2
ExampleTable
column 1 row 1 column 2 column 3 column 4
row 2
Figure 3.2: A simple ExampleSet build a top of an ExampleTable. References are shown by the long dashed lines.
The Attributes are used to access the correct column in the table. As depicted, att3 references column four in the table, while att4 references the third column. Theres no specic guarantee on the ordering, the attributes keep track of the columns they refer to. The mechanism to retrieve a value by calling getValue( Attribute) on an example is as follows: 1. The Example will retrieve the corresponding DataRow from its ExampleSet parent ExampleTable. 2. The Example will ask the DataRow to deliver the value of the Attribute by calling get(Attribute) 3. The DataRow will ask the Attribute to retrieve the value from the correct column of itself by invoking getValue(DataRow). The same way is used when writing values into an Example. Although this
12
mechanism seems to be more complex than it needs to, we will see, that it allows a exible view concept that wouldnt be possible otherwise. Anyway we are now familiar how to retrieve values, but as mentioned above, we have concentrated our focus on numerical values. How are nominal values stored and accessed? The underlying ExampleTable only stores numbers, so how should this be possible? The key to this is the Attribute object. It does not only store a name, that is printed bold in the picture above, and not only a type like numerical, nominal or date, but it also may contain a NominalMapping. This object is a Map, translating the numerical values into Strings and vice versa. So if you want to set an Examples value of a nominal attribute, you might call:
1
example . s e t V a l u e ( a t t r i b u t e , new v a l u e ) ;
S t r i n g v a l u e = example . getNominalValue ( a t t r i b u t e ) ;
If the value is unknown a new entry in the mapping will be created. The index of this mapping will be stored as numerical value in the ExampleTable. So be carefully when directly manipulating the ExampleTable or when accessing the indices behind the nominal values! Changes might result in undesired behaviour. The methods for manipulating the numerical values look quite dierent and we have used them already in the script example. Anyway we will describe them again in more detail:
1 2
double v a l u e = 9d ; example . s e t V a l u e ( a t t r i b u t e , v a l u e ) ;
double v a l u e = example . g e t V a l u e ( a t t r i b u t e ) ;
One special value is the missing value. There are several possibilities why a specic value might be missing and we have to cope with that. In RapidMiner several operators handle missing values, but what do we do during programming? Missing values are simply encoded as Double.NaN. So you will receive a NaN when getting the value and have to pass a NaN when you want to set a value unknown. On nominal attributes you simply could pass null as String for the nominal value.
13
Beside from being used for accessing the data, the Attribute object holds additional information about the column. We already have seen that an Attribute is of a certain type, which is depicted by the small n in the graphic, n for numerical attributes, nom for nominals. There are a few other types like date, time and the subtypes of nominal text, polynominal and binominal. How the attribute is used during analysis is controlled by its role. There are several predened roles like label and prediction, cluster, weight, batch and several more. You are free to set user dened roles in RapidMiner using the Set Role operator, but these are not interpreted by RapidMiner operators. All attributes with a role have in common, that they are not treated as regular attributes and hence are not used for analysis, if not required as their special role like the label for learning from examples. The Attributes object of an ExampleSet manages the special roles. It oers several methods for manipulating these rules. Please keep in mind, that iterating over the single Attributes of an Attributes Object does only iterate over the regular attributes! If you want all attributes the allAttributes () method must be used.
14
ExampleSet
attribute 1 attribute 2 att2 n n attribute 3 att3 n attribute 5 att4
ExampleSet
attribute 1 attribute 2 name n
att1
n
kunde
n
example 1 example 2
example 1
example 2
ExampleTable
column 1 row 1 column 2 column 3 column 4
row 2
The setting above is frequently used for example in an attribute selection process. We dont want to remove the column from memory each time we de-select an attribute to test the performance of the remaining set. In most of the times we have to re-add it later and it would not be ecient to reload the complete ExampleSet, instead, we simply might use a copy of the original ExampleSet or add the Attribute again. One potential danger, one always has to keep in mind, is marked by the red cells. They are shared now in two ExampleSets. If we are going to change the value in one of the ExampleSets it will be changed in the other one, too, because the underlying data is changed. This can be very confusing, especially if the attributes have dierent names (here att1 and kunde). Please take care of this, by either building a materialized copy in your RapidMiner process or using on the y calculations for the changed values.
15
for an example, where each value is transformed in the same way, but you must use the same data elsewhere in the process. In this case you can make the calculation each time a value is requested. This might even save computation time and memory, if the values are requested only once, like it is frequent the case when applying a model or even during training for some models. The class that does this is the ViewAttribute. It wraps around another Attribute, which can even be another ViewAttribute, to retrieve the value and then delegates the actual computation to a ViewModel. The computed value is then returned as result. One Attribute can be shared by several ViewAttributes. The image below depicts this.
ExampleSet
attribute 1 attribute 2
attribute 3
view
view attribute 3
att1
n n
att2
att3 = 1 n
att3 = 2 n
example 1 example 2
Figure 3.4: Two binominal ViewAttributes indicate if the numerical att3 was either 1 or 2
16
the attribute handling to its parent. The principle will be shown in the image below, where the attributes are shown in dotted lines to indicate that they are only logically present.
ExampleSet
attribute 1 attribute 2 att2 n n attribute 3 att3 n attribute 4 att4
att1
n
example 1 example 2
ExampleSet
attribute 1 attribute 2 att2 n n attribute 3 att3 n attribute 4 att4
att1
n
Figure 3.5: The stacking of two ExampleSets to realize a sampling. The attributes are take from the parent.
17
When you are going to build your own Extension, you will need Java with version 1.6 and above as well as an IDE like Eclipse. The example projects that come with this tutorial are Eclipse projects, so we strongly recommend using Eclipse, which is freely available at Eclipse.org. On our website you will nd a tutorial how to check out the latest version of RapidMiner from the svn repository. Please test if it starts by creating a debug conguration and starting the RapidMinerGUI class.
If started from Eclipse, RapidMiner will only allocate as much RAM as default for any java program: 64 MB. Since this is really insucient for most real data mining applications, you will have to increase this. Select Run / Debug Congurations. . . and select the one for RapidMiner. Got to the Arguments tab and enter Xmx256m. You might enter any number after Xmx, but ensure that that much megabytes of RAM are available. Especially on 32 bit systems the maximum is relatively low around 1.5 GB. After you have done this, we will add two additional projects: One is the tutorial extension that already contains everything described in the next chapters. Whenever you are not sure, there is example code. The other one is an Extension template, where you only change a few le names and entries to adapt it for your own Extension. You might use it while reading for experimenting with own implementations of what is described here. Together with this tutorial you got two zip les. Each of them contains one of the projects, which we will now import into Eclipse.
19
4. Select Import. . . from the File menu. 5. When the selection menu for the project type opens, select Existing Projects into Workspace from the General folder and click next. 6. The Import Projects page appears. Select the radio button before Select archive le : and select one of the two zip les with the Browse button. 7. The project will be listed in the Projects window. Select it by checking it and click Finish. 8. The project will show up in the Package Explorer. Repeat the steps for the second zip le. After this, you should have three projects, and the Package Explorer should look like the picture below.
Figure 4.1: Our three projects Now you can start implementing. If you are going to deploy your Extension to RapidMiner for testing purpose, you might execute the install target of the ant le build .xml. Please make sure that the RapidMiner Vega project is named exactly as above, because the ant le references RapidMiner. Otherwise the deployment wouldnt work, without changing the le. We will go into details later, how to adapt the build le.
20
5 Building Operators
There are two types of operators in RapidMiner: Normal operators and such which contain one or more sub processes. We call the second type super operator, to dierentiate from the normal operators. For getting some training we will start to implement a normal operator. Once nished, we will show how to transfer these techniques to the super operators and which special concerns might arise there.
21
5. Building Operators
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
package com . r a p i d m i n e r . o p e r a t o r . p r e p r o c e s s i n g . t r a n s f o r m a t i o n ; import com . r a p i d m i n e r . o p e r a t o r . O p e r a t o r ; import com . r a p i d m i n e r . o p e r a t o r . O p e r a t o r D e s c r i p t i o n ; /* * * This is the Nu merical2 Date tutorial operator . * * @author Sebastian Land */ public c l a s s N u m e r i c a l 2 D a t e O p e r a t o r extends O p e r a t o r { /* * * Constructor */ public N u m e r i c a l 2 D a t e O p e r a t o r ( O p e r a t o r D e s c r i p t i o n description ) { super ( d e s c r i p t i o n ) ; } }
17 18 19
Please mention, that you have to set unique names for the ports of one operator. If you want to follow the name convention, you are recommended to write the names in lower case and use blanks to separate words. If you would add this
22
operator to your process, you would see that the two ports are already attached. Heres how it would look like:
Figure 5.1: Your new operator But in contrast to the usual ports of RapidMiner operators, they are simply white. Normally the ports are colored in the color of the needed object that has to be fed into the port. If it is not connected to a port generating an object of the desired type, half of the port will be drawn in a warning red. We will come to this. For now, we just want to see how we can add some function to the operator. For this we have to override the following function:
1 2 3 4
The default implementation simply does nothing, but we now can add the function described detailed in the Scripting chapter above. Therefore we just have to change the method of getting input and delivering the result. Take a look in the rst and the last line:
1 2 3 4 5 6 7
9 10
@Override public void doWork ( ) throws O p e r a t o r E x c e p t i o n { ExampleSet ex ampl eSe t = e x a m p l e S e t I n p u t . getData ( ) ; A t t r i b u t e s a t t r i b u t e s = ex ampl eSe t . g e t A t t r i b u t e s ( ) ; A t t r i b u t e s o u r c e A t t r i b u t e = a t t r i b u t e s . g e t ( r e l a t i v e time ) ; S t r i n g newName = d a t e ( + s o u r c e A t t r i b u t e . getName ( ) + ) ; Attribute targetAttribute = AttributeFactory . createAttribute ( newName , Ontology . DATE TIME) ; ta rge tA ttr ib ute . setTableIndex ( sourceAttribute . getTableIndex () ) ; a t t r i b u t e s . addRegular ( t a r g e t A t t r i b u t e ) ; a t t r i b u t e s . remove ( s o u r c e A t t r i b u t e ) ;
23
5. Building Operators
11 12 13
14
15 16 17 18
f o r ( Example example : ex ampl eSe t ) { double timeStampValue = example . g e t V a l u e ( targetAttribute ) ; example . s e t V a l u e ( t a r g e t A t t r i b u t e , timeStampValue 1000) ; } exampleSetOutput . d e l i v e r ( e xam pleS et ) ; }
We see that one call suces to retrieve the ExampleSet from the input port. And the single line 17 delivers the result to the output port. We could execute this operator and would receive the same output as with the scripting operator above. If you already have written operators in previous RapidMiner versions, you will remember the two methods getInputClasses and getOutputClasses, which dened the input and output classes back then. The simplest way is to delete these needless methods and create one port per input object. If your operator doesnt use a xed number of objects, you could insert a PortExtender, but we will come back to this when describing super operators. Beside this, you will have to exchange the main working method. Instead of the deprecated apply method you now have to implement the doWork method. Since it doesnt receive anything as input and is of type void, you are forced to use the ports for retrieving input and delivering output.
24
of the Extensions jar. We dont have to bother now how this works, but we will take care later on. So lets take a look how to specify operators to RapidMiner:
<?xml v e r s i o n= 1 . 0 e n c o d i n g=UTF8 s t a n d a l o n e= no ?> < o p e r a t o r s n a m e= t e m p l a t e v e r s i o n= 5 . 0 d o c b u n d l e=com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / OperatorsDocTemplate > <group key= > <group key= d a t a t r a n s f o r m a t i o n > ... <o p e r a t o r > <key > n u m e r i c a l t o d a t e </key > < c l a s s >com . r a p i d m i n e r . o p e r a t o r . p r e p r o c e s s i n g . t r a n s f o r m a t i o n </ class> < r e p l a c e s >Numerical2Date </ r e p l a c e s > </ o p e r a t o r > ... </group > </group > </ o p e r a t o r s >
1 2
3 4 5 6 7 8
9 10 11 12 13 14
While the rst line only contains information about the xml format used, the second line contains several important properties. The name attribute must be the namespace as specied in the manifest, version must currently be xed at 5.0. The most important attribute docbundle must link to another xml le, which contains the documentation for the operators. There the behavior of each operator should be described in detail to guide other users when utilizing an extension. The child tags of operators reect the group structure in RapidMiners New Operators tree. The group with the empty key corresponds to the invisible root of the operator tree. Custom operators and groups might be inserted only as children of this root. Each group and operator has a key that should consist only of lower case letters, digits and underscores. In RapidMiner these keys are translated to a language dependent name using one of the documentation bundles. As you might see from the above example, operators are simply inserted as child tags of groups. They must contain two child tags: Beside the key tag, there must be a class tag, containing the qualied class name of the implementing class.
25
5. Building Operators
Optionally there might be a replaces tag. It species how this operator was called in 4.x versions of RapidMiner. If it is set, each operator with that name will be replaced during import of a 4.x process automatically with this new operator. That might be important for renaming the operators to obey the new naming schema. When we have saved a le looking like this, adding an operator to RapidMiner, we only need to execute the ant target install to deploy the Extension to RapidMiner. The ant target should be executed and its status messages should be logged to the Console view. They should look like this:
1 2 3 4 5
7 8 9 10
createJar : [ echo ] C r e a t i n g j a r . . . [ echo ] M a n i f e s t C l a s s p a t h : [ mkdir ] C r e a t e d d i r : C: \ RapidMiner Vega \ r e l e a s e \ l i b f i l e s [ j a r ] B u i l d i n g j a r : C: \ RapidMiner Vega \ r e l e a s e \ r a p i d m i n e r TemplateExtension 5 . 0 . j a r [ d e l e t e ] D e l e t i n g d i r e c t o r y C: \ RapidMiner Vega \ r e l e a s e \ libfiles install : [ move ] Moving 1 f i l e t o C: \ RapidMiner Vega \ l i b \ p l u g i n s BUILD SUCCESSFUL T o t a l time : 5 s e c o n d s
Now there should be a rapidminerTemplate Extension5.0.jar le in the lib /plugins directory of the RapidMiner project. RapidMiner will load all Extensions on the next start up. Again, for making this work, RapidMiner needs to be stored in the same workspace and with the same name as depicted above. Otherwise the path entries in the build .xml of the Extension project must be adapted!
26
to ease the use of the operator. This can be done by adding preconditions to the ports. These preconditions will register errors, if they are not fullled and are registered during construction time of the operator. So we will have to add a few code fragments to the constructor. For example this precondition will check if a compatible IOObject is delivered:
1 2 3 4
Since this is one of the most common cases, there exists a shortcut to achieve this. We can specify the target IOObject class already when constructing the input port:
1
There are many more special preconditions, which for example test if an example set satises some conditions, if it contains a special attribute of a specic role, or if the attribute with a name is inserted. In this case, we could add a precondition that tests, if the attribute relative time is part of the input example set.
1
The ExampleSetPrecondition is more powerful than required here. In fact, it can check not only if xed names are part of the example set, but also if the regular attributes are of a certain type, which special attributes have to be contained and of which type they must be. We dont need this here, so we chose a constructor ignoring most options and insert the most general value type for not making any condition. If we insert the operator into a process without connecting an example set output port with our input port, an error is shown. If we attach an example set without the relative time attribute, the following warning is shown:
27
5. Building Operators
Figure 5.2: A warning is shown if the precondition is not fullled. In addition to the getInputClasses / getOutputClasses approach of 4.x now much more detailed conditions might be formulated. You might even write your own precondition to check on any information that is part of the meta data. You could even create your own errors with special error messages and Quick Fixes.
Figure 5.3: Half the way done The problem is, that our operator still doesnt do any transformation of the meta data. It already makes use of the meta data to check the preconditions, but doesnt deliver any meta data to the output port. We can change this by adding generation rules in the constructor:
1 2 3 4
28
g e t T r a n s f o r m e r ( ) . addPassThroughRule ( e x a m p l e S e t I n p u t , exampleSetOutput ) ; }
This rule will simply pass the received meta data to the output port. This will cause the warning to vanish, but then the meta data doesnt reect the actual delivered data: As you remember, we change not only the name of one attribute, but also its value type. This should be reected in the meta data and thats why we have to implement a much more special transformation rule. We can do this using an anonymous class, so it will look like this:
1
2 3
4 5 6
g e t T r a n s f o r m e r ( ) . addRule ( new ExampleSetPassThroughRule ( e x a m p l e S e t I n p u t , exampleSetOutput , S e t R e l a t i o n .EQUAL) { @Override public ExampleSetMetaData modifyExampleSet ( ExampleSetMetaData metaData ) throws UndefinedParameterError { return metaData ; } }) ;
Of course this wont do anything except passing the received meta data to the output port, as long as we dont change the meta data. But we now have a hook, where we can grab the meta data and change it, so that it reects the changes made on the data during executing this operator. After adding some meaningful code, the method will look like this:
1
3 4 5 6 7 8 9
public ExampleSetMetaData modifyExampleSet ( ExampleSetMetaData metaData ) throws U n d e f i n e d P a r a m e t e r E r r o r { AttributeMetaData timeAMD = metaData . getAttributeByName ( r e l a t i v e time ) ; i f (timeAMD != n u l l ) { timeAMD . setType ( Ontology . DATE TIME) ; timeAMD . setName ( d a t e ( + timeAMD . getName ( ) + ) ) ; timeAMD . s e t V a l u e S e t R e l a t i o n ( S e t R e l a t i o n .UNKNOWN) ; } return metaData ; }
29
5. Building Operators
If we insert the operator into a process, we will see, that the meta data is now correctly transformed and every alert vanishes. We are now even able to select the attribute for the Adjust Date operator in the drop down list.
Figure 5.4: The result of our work: The meta data correctly describes the resulting data.
30
mechanism of RapidMiner 5. As we have mentioned above, theres a link to an operator documentation bundle in the operator descriptor le. This le is called OperatorsDocTemplate.xml in the template project we created above. It does not only oer the possibility to enter a full length description of the operator, but also assigns a more readable and explanatory name than the key, as well as a synopsis of the help. The structure this le must have is quite simple:
1 2 3 4 5 6 7 8 9 10 11 12
<?xml v e r s i o n= 1 . 0 e n c o d i n g= windows 1252 s t a n d a l o n e= no ?> <o p e r a t o r H e l p > <group > <key > d a t a t r a n s f o r m a t i o n </key > <name>Data T r a n s f o r m a t i o n </name> </group > <o p e r a t o r > <name>ExperimentEmbedder </name> < s y n o p s i s > ... < / s y n o p s i s > < h e l p > ... < / h e l p > </ o p e r a t o r > </o p e r a t o r H e l p >
The second line contains the xml root node operatorHelp. A sequence consisting of two tags might be added as child to this element: The group and the operator tag. The group tag translates a key of a group into a language specic name. The operator tag oers three child tags. The name tag does the translation of the key, while the synopsis and help might contain arbitrary escaped html text for documenting the operators behaviour, as one would enter into a body tag of an html page. To escape the text, each and must be exchanged by the corresponding xml entities < and >. Please have in mind, that the rendering capacity of the help window is limited. One should stick to rather simple HTML.
31
5. Building Operators
The user might specify the learner and the way how performance is measured and then it executes these subprocesses as it needs. This section will describe how you can implement your own super operators. Lets assume, we have a process that should be executed once every minute, checking something inside a database. If you would have the RapidMiner Enterprise Analytics Server, this would be only two clicks away. But the order is stuck somewhere inside another department and you need a solution really fast. So lets build a super operator that re-executes its inner operators every minute. In order to do this, we have again to create a new class, but this time it has to extend the OperatorChain class. The name of the super class is somehow misleading, because there is no chain anymore, but we stick to this name because of historical reasons. As with a simple operator, we have to implement a constructor. The empty class looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
/* * * This super operator will execute it s inner process infinitely * once every minute . * @author Sebastian Land */ public c l a s s L o o p I n f i n i t e l y extends OperatorChain { /* * * Constructor */ public L o o p I n f i n i t e l y ( O p e r a t o r D e s c r i p t i o n d e s c r i p t i o n ) { super ( d e s c r i p t i o n , Executed P r o c e s s ) ; } }
In contrast to the simple operator we must give the super constructor the names of the subprocesses, we are going to create inside our super operator. The number of names we pass to the super constructor determines the number of created subprocesses. If you want to follow the naming convention, you should start each word uppercase and use blanks to separate words. Later we might access these subprocesses by index to execute them. But lets rst dene some ports to pass data to the super operator.
32
Beside the PortPairExtender theres also a PortExtender available, but we want an equal number of input and output ports. The PortPairExtender takes care of this, so we dont have to do anything else. Lets take a closer look at the constructor. In addition to the name, we have to specify to which input ports the extender should attach. The getInputPorts method delivers the input ports of the current operator, so the port extender is attached on the left side of the operator box. The paired ports are added to the inner sources of the rst subprocess. You see, that you can access the subprocesses via the getSubprocess method. If you are familiar with RapidMiners integrated super operators like the Loop operator, you know that there are always input ports on the left and output ports on the right of the subprocess. But for distinguishing these ports from the in- and output ports of the super operator, we call them inner sources and inner sinks. In fact an inner source is technically an output port for the super operator, because he has to deliver data to this port, while the inner sink is an input port for the super operator where it can retrieve the output of the subprocesses from. If we would want to deliver outputs from our loop, we could add the following second variant of the PortPairExtender to collect the outputs from all iterations and pass them as a collection to the output of our super operator:
1
33
5. Building Operators
Figure 5.5: Our port extenders which return a collection on the right But since we want to run innitely, we will never return anything. So we omit this change and get back to the rst PortPairExtender. In order to make a PortExtender work, we have to initialize them during construction time of the operator. You simply have to add the following line in the constructor:
1
inputPortPairExtender . s t a r t () ;
g e t T r a n s f o r m e r ( ) . addRule ( i n p u t P o r t P a i r E x t e n d e r . makePassThroughRule ( ) );
If we take a look inside our operator, we see a strange behaviour. Although there is meta data information present at the sources, the inner operators doesnt seem to recognize them. They dont do anything with the information. The reason, why this looks like this, is that we have to add a rule dening when the subprocess meta data has to be transformed. The ordering of the rules denition is crucial, because if the meta data isnt forwarded to the inner ports, theres nothing the meta data transformation of the inner operators can do. This line will add the rule:
34
Figure 5.6: The meta data transformation of the inner operators seems to be dead.
After all, with the rules in correct order, our operator looks like this:
1 2 3
public c l a s s L o o p I n f i n i t e l y extends OperatorChain { private f i n a l P o r t P a i r E x t e n d e r i n p u t P o r t P a i r E x t e n d e r = new PortPairExtender ( input , getInputPorts ( ) , getSubprocess (0) . getInnerSources () ) ; /* * * Constructor */ public L o o p I n f i n i t e l y ( O p e r a t o r D e s c r i p t i o n d e s c r i p t i o n ) { super ( d e s c r i p t i o n , Executed P r o c e s s ) ; inputPortPairExtender . s t a r t () ; g e t T r a n s f o r m e r ( ) . addRule ( i n p u t P o r t P a i r E x t e n d e r . makePassThroughRule ( ) ) ; g e t T r a n s f o r m e r ( ) . addRule ( new S u b p r o c e s s T r a n s f o r m R u l e ( getSubprocess (0) ) ) ; } }
4 5 6 7 8 9 10 11 12 13
14
15 16
35
5. Building Operators
@Override public void doWork ( ) throws O p e r a t o r E x c e p t i o n { i n p u t P o r t P a i r E x t e n d e r . passDataThrough ( ) ; while ( true ) { inApplyLoop ( ) ; getSubprocess (0) . execute () ; } }
You see that we have full control over which subprocess is executed when. In contrast to the old RapidMiner versions, where the subprocess was rather implicitly dened by the position of the child operators inside the chain, they are now clearly separated. This eases not only the process design and increases the understandability of a process, but makes writing super operators easier, too. Over and above the old and complex method for dening, which operator has to deliver which class, is now the same as for all operators. All you have to do is to reformulate the old getInnerOperatorCondition method as a new input port precondition.
36
interval might change or be dierent in other settings we want to avoid hard coding it. Its now time for dening our rst parameter. Parameters are presented to the users in the parameter tab of RapidMiner, where they can alter the parameters values. There are several types of parameters available for dening real or integer numbers, strings, collections of strings in comboboxes either editable or not. Special types for selecting an attribute or several attributes are available, too. The most complex parameter type might even dene an own GUI component as a conguration wizard. Parameters might be either normal or expert parameters. The last arent shown, when the user did not switch to expert mode. So its good practice to dene parameters as expert whose eect is only understandable by those who have deeper knowledge of the underlying algorithm. All of these parameters must have default values otherwise the user is bothered with dening a parameter he cannot understand. That would be even worse than showing it with a reasonable default value. Further guidance might be oered to the user by dening parameter dependencies. Some parameters are only used if other parameters are set to specic parameters. A simple and well known example is the use of a local random seed. Many of RapidMiners operators oer the possibility to take random numbers from a local random generator instead of using the global random number sequence. This is useful for ensuring reproducible results in sub parts of your process. If you want use such a local random generator, this must be initialized with a so called seed. So if you check the parameter use local random seed of the X-Validation operator, a eld is shown to insert such a seed. Technically the eld is shown, because all its dependencies were satised. This time there has only been one, namely the use local random seed parameter has to be checked, but in general there might be arbitrary conditions. Using these dependencies show the user in each situation which parameter will have an eect and he isnt bothered with irrelevant parameters. If you are familiar with the great amount of parameters kernel based methods like the SVM oer, you probably will immediately understand, why this is important. Lets do something practical and add parameters to our operator. In fact, we
37
5. Building Operators
We see, that we must return a list of ParameterTypes. If we are extending another operator or some abstract class providing basic functionality, we have to call the super method in order to retrieve the parameters dened there. Otherwise the functionality provided by the super class might fail, because we dont have dened the needed parameters. For now, we want to add a parameter dening the number of seconds between the starts of subprocess execution. Using an integer for that, it would look like that:
1 2 3 4
5 6
@Override public L i s t <ParameterType > getParameterTypes ( ) { L i s t <ParameterType > t y p e s = super . getParameterTypes ( ) ; t y p e s . add ( new ParameterTypeInt (PARAMETER FREQUENCY, This p a r a m e t e r d e f i n e s t h e number o f s e c o n d s between t h e s t a r t o f two s u b s e q u e n t s u b p r o c e s s e x e c u t i o n s . , 1 , I n t e g e r .MAX VALUE, 5 , f a l s e ) ) ; return t y p e s ; }
First of all we retrieve the list of ParameterTypes of the super class and then add our own parameter. This is of type integer and shall be named with the public constant PARAMETER FREQUENCY. The following string should describe the functionality of this parameter type and is shown in the tool tip of this parameter. The three integer values dene the minimal, the maximal and the default value. The last parameter determines if the parameter is expert or not. In this case we decided, that this parameter is quite understandable. Before we can take a look at the result, we have to add the constant to the class. This is important, to give API users access to the parameters if they want to utilize this operator internally. Otherwise they would have to retype the string and if then the parameter name is changed because of any reason, might be a
38
typo or something similar, each utilizing class would have to be adapted, too. To avoid this, simply dene a public constant:
1
i n t s e c o n d s B e t w e e n S t a r t s = g e t P a r a m e t e r A s I n t (PARAMETER FREQUENCY) ;
Now we are going to use the wait functionality of Javas threads to ensure that we pause. Since this isnt RapidMiner specic, this will not be explained in detail, but the code nally looks like this:
1 2 3
@Override public void doWork ( ) throws O p e r a t o r E x c e p t i o n { int secondsBetweenStarts = getParameterAsInt ( PARAMETER FREQUENCY) ; i n p u t P o r t P a i r E x t e n d e r . passDataThrough ( ) ; while ( true ) { checkForStop ( ) ; long s t a r t = System . c u r r e n t T i m e M i l l i s ( ) ; getSubprocess (0) . execute () ; long end = System . c u r r e n t T i m e M i l l i s ( ) ;
4 5 6 7 8 9 10
39
5. Building Operators
11 12
13 14 15 16 17
18 19 20 21
long w a i t = ( s e c o n d s B e t w e e n S t a r t s 1 0 0 0 ) ( end start ) ; i f ( w a i t > 0 ) { // if we have to wait anyway try { Thread . s l e e p ( w a i t ) ; } catch ( I n t e r r u p t e d E x c e p t i o n e ) { // Don t do anything : Only executing too early } } } }
public s t a t i c f i n a l S t r i n g PARAMETER RESTRICT FREQUENCY = restrict frequency ; ... @Override public L i s t <ParameterType > getParameterTypes ( ) { L i s t <ParameterType > t y p e s = super . getParameterTypes ( ) ; t y p e s . add ( new ParameterTypeBoolean ( PARAMETER RESTRICT FREQUENCY, I f checked , t h e f r e q u e n c y o f s u b p r o c e s s e x e c u t i o n might be r e s t r i c t e d . , f a l s e , false ) ) ; ParameterType t y p e = new ParameterTypeInt ( PARAMETER FREQUENCY, This p a r a m e t e r d e f i n e s t h e number o f s e c o n d s between t h e s t a r t o f two s u b s e q u e n t
2 3 4 5 6 7 8
9 10
40
11
12 13 14 15
s u b p r o c e s s e x e c u t i o n s . , 1 , I n t e g e r .MAX VALUE, 5 , f a l s e ) ; t y p e . r e g i s t e r D e p e n d e n c y C o n d i t i o n ( new B o o l e a n P a r a m e t e r C o n d i t i o n ( this , PARAMETER RESTRICT FREQUENCY, true , true ) ) ; t y p e s . add ( t y p e ) ; return t y p e s ; }
For registering the condition, we had to remember the type in a local variable, which must be added to the list separately. But then its fairly easy to add a condition. Here we add a BooleanParameterCondition, which needs to have a reference to a ParameterHandler. For operators, this is the operator itself. The second method argument is the name of the referenced parameter. The two Boolean values indicate if the parameter becomes mandatory if the condition is satised and the second denes the value the referenced parameter must have in order to full this satised. The resulting parameter tab now looks like this, depending on the parameter settings:
Figure 5.8: The parameter tab without restrict frequency checked Now you already have all basic the knowledge you need to write your rst own operator for RapidMiner. For further detail information about classes available in RapidMiner you might refer to the API documentation, which is available as download on our website at rapid-i.com. The next chapter will show, how you can extend not only the functionality of RapidMiner by adding operators, but adding new data objects to pass between the operators.
41
5. Building Operators
Figure 5.9: The parameter tab with restrict frequency checked: The conditioned parameter is shown
42
If you are from the scientic community or trying to integrate RapidMiner with another program, you will sooner or later face the problem, that the standard data objects dont full all your requirements. Lets assume for example you are going to analyze data recorded from some sort of game engine. You are planning to use machine learning algorithms to make the characters played by the computer a little bit smarter. The format the original data comes cant directly be expressed as a table. So you have to write some preprocessing steps anyway and you decide to do this in RapidMiner. The plan is to make everything as modular as possible. Although you could simply write one operator that reads in the data from a le, and does all the translation and feature extraction, you decide, that it would be best to split it up. With this modularity, it will be much easier to extend the mechanism later on and optimize the steps separately. This can be achieved as follows. Users who are familiar with the time series or the text processing extension are already familiar with this approach. We have one super operator which loads the data and passes it to an inner sub process. Inside this sub process, a special data object, representing the current data is passed from one operator to the next, each one changing the data or adding new information. This added data is nally written into a table which is returned as an ExampleSet to the subsequent RapidMiner operators, which now do the actual learning. We already learned how to build operators, both normal and super operators, and how to pass data between them. Now we are going to dene a new data object.
43
package com . r a p i d m i n e r . game ; import com . r a p i d m i n e r . o p e r a t o r . R e s u l t O b j e c t A d a p t e r ; /* * * This class contains the game date , recorded during * runtime of the game . * * @author Sebastian Land */ public c l a s s GameDataIOObject extends R e s u l t O b j e c t A d a p t e r { private s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1 7 2 5 1 5 9 0 5 9 7 9 7 5 6 9 3 4 5L ; }
14
This is only an empty object, that doesnt hold any information. We will add some content now:
1 2 3 4 5 6 7 8 9 10
package com . r a p i d m i n e r . game ; import com . r a p i d m i n e r . o p e r a t o r . R e s u l t O b j e c t A d a p t e r ; /* * * This class contains the game date , re corded during * runtime of the game . * * @author Sebastian Land */
44
11 12 13
public c l a s s GameDataIOObject extends R e s u l t O b j e c t A d a p t e r { private s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1 7 2 5 1 5 9 0 5 9 7 9 7 5 6 9 3 4 5L ; private GameData data ; public GameDataIOObject ( GameData data ) { t h i s . data = data ; } public GameData getGameData ( ) { return data ; } }
14 15 16 17 18 19 20 21 22 23 24
This class already gives access to an object of the class GameData, which shall be the representative for everything we want to access. This might be more complex in real-world applications, but you might conclude how things work in general. Now we want to extract attribute values from the game data, which the super operator can store into a table. This data table might then be returned as example set for learning. This should be done by operators contained in the super operators sub process. Each of them could retrieve the GameData from the GameDataIOObject and attach one or more attributes. Only one GameData is treated per execution of the sub process and each becomes a single example of the resulting ExampleSet. So we need a mechanism to add data to the IOObject. For making things less complicated, we assume that we only have numerical attributes. This way we save the eort of remembering the correct types of the data. Lets add a Map for storing the values with identier as local variable:
1
Then we extend the GameDataIOObject with two methods for accessing the map:
1 2
45
3 4 5 6 7 8 9 10 11 12 13 14
* as an attribute in the resulting ExampleSet . */ public void s e t V a l u e ( S t r i n g i d e n t i f i e r , double v a l u e ) { valueMap . put ( i d e n t i f i e r , v a l u e ) ; } /* * * For extracting all identifiers / values */ public Map<S t r i n g , Double > getValueMap ( ) { return valueMap ; }
import j a v a . u t i l . L i n k e d L i s t ; import j a v a . u t i l . L i s t ; import com . r a p i d m i n e r . example . ExampleSet ; import com . r a p i d m i n e r . o p e r a t o r . OperatorChain ; import com . r a p i d m i n e r . o p e r a t o r . O p e r a t o r D e s c r i p t i o n ; import com . r a p i d m i n e r . o p e r a t o r . O p e r a t o r E x c e p t i o n ; import com . r a p i d m i n e r . o p e r a t o r . p o r t s . I n p u t P o r t ; import com . r a p i d m i n e r . o p e r a t o r . p o r t s . OutputPort ; import com . r a p i d m i n e r . o p e r a t o r . p o r t s . metadata . SubprocessTransformRule ; /* *
11 12
46
13
14
15 16 17 18 19 20 21
* This operator will feed all GameData objects to it s inner sub process and * will execute it in order to build an example set from the extracted * key value pairs . * * @author Sebastian Land */ public c l a s s ProcessGameDataOperator extends OperatorChain { private OutputPort innerGameDataSource = g e t S u b p r o c e s s ( 0 ) . g e t I n n e r S o u r c e s ( ) . c r e a t e P o r t ( game data ) ; private I n p u t P o r t innerGameDataSink = g e t S u b p r o c e s s ( 0 ) . g e t I n n e r S i n k s ( ) . c r e a t e P o r t ( game data ) ; private OutputPort exampleSetOutput = g e t O u t p u t P o r t s ( ) . c r e a t e P o r t ( example s e t ) ; public ProcessGameDataOperator ( O p e r a t o r D e s c r i p t i o n description ) { super ( d e s c r i p t i o n , P r o p e r t y E x t r a c t i o n ) ; /* * very short and insufficient meta data tran sformati on : Should be much * more sophisticated . */ getTransformer ( ) . addGenerationRule ( innerGameDataSource , GameDataIOObject . c l a s s ) ; g e t T r a n s f o r m e r ( ) . addRule ( new S u b p r o c e s s T r a n s f o r m R u l e ( getSubprocess (0) ) ) ; g e t T r a n s f o r m e r ( ) . a d d G e n e r a t i o n R u l e ( exampleSetOutput , ExampleSet . c l a s s ) ; } @Override public void doWork ( ) throws O p e r a t o r E x c e p t i o n { L i s t <GameData> loadedData = new L i n k e d L i s t <GameData > () ; loadedData . add ( new GameData ( ) ) ; /* * * Iterate over all GameData objects and feed them through the subprocess one by one . * Extending ExampleSet each time by one example
22
23
24 25
26 27 28
29 30 31 32
33
34
35 36 37 38 39
40 41 42
43
47
44 45 46 47
48 49
*/ ExampleSet r e s u l t S e t = n u l l ; f o r ( GameData gameData : loadedData ) { innerGameDataSource . d e l i v e r ( new GameDataIOObject ( gameData ) ) ; getSubprocess (0) . execute () ; GameDataIOObject r e s u l t = innerGameDataSink . getData ( ) ; i f ( r e s u l t S e t == n u l l ) resultSet = createInitialExampleSet ( result ) ; else extendExampleSet ( r e s u l t S e t , r e s u l t ) ; } exampleSetOutput . d e l i v e r ( r e s u l t S e t ) ; } /* * * This method has to extend the given resultSet by the example extracted from * the result object . */ private void extendExampleSet ( ExampleSet r e s u l t S e t , GameDataIOObject r e s u l t ) { } /* * * This will create the first initial example set from the result object . * At first the M e m o r y E x a m p l e T a b l e will be created to storing the data , then * for each entry in the map an attribute is created and put together into an * example set . */ private ExampleSet c r e a t e I n i t i a l E x a m p l e S e t ( GameDataIOObject result ) { return n u l l ; } }
50 51 52
53 54 55 56 57 58 59 60 61
62 63 64
65 66 67 68
69
70
71 72 73
74 75 76
48
Of course this operator still lacks all real functionality consisting of reading the game data from a source of some kind, probably depending on some parameter settings specifying the location. But the previous sections should have made it clear, which steps one would have to go, if one has such a task at hand. Now we want to build one of the inner operators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
package com . r a p i d m i n e r . o p e r a t o r . game . e x t r a c t o r s ; import import import import import import com . r a p i d m i n e r . com . r a p i d m i n e r . com . r a p i d m i n e r . com . r a p i d m i n e r . com . r a p i d m i n e r . com . r a p i d m i n e r . operator operator operator operator operator operator . Operator ; . OperatorDescription ; . OperatorException ; . game . GameDataIOObject ; . ports . InputPort ; . p o r t s . OutputPort ;
/* * * A simple extractor of properties of a game data object . * * @author Sebastian Land */ public c l a s s E x t r a c t A g e O p e r a t o r extends O p e r a t o r { /* * defining the ports */ private I n p u t P o r t gameDataInput = g e t I n p u t P o r t s ( ) . c r e a t e P o r t ( game data , GameDataIOObject . c l a s s ) ; private OutputPort gameDataOutput = g e t O u t p u t P o r t s ( ) . c r e a t e P o r t ( game data ) ; /* * * The default constructor needed in exactly this signature */
19
20 21 22 23
49
24 25 26 27
public E x t r a c t A g e O p e r a t o r ( O p e r a t o r D e s c r i p t i o n d e s c r i p t i o n ) { super ( d e s c r i p t i o n ) ; /* * Adding a rule for meta data tr ansforma tion : GameData will be passed through */ g e t T r a n s f o r m e r ( ) . addPassThroughRule ( gameDataInput , gameDataOutput ) ; } @Override public void doWork ( ) throws O p e r a t o r E x c e p t i o n { GameDataIOObject i n p u t = gameDataInput . getData ( ) ; extractValues ( input ) ; gameDataOutput . d e l i v e r ( i n p u t ) ; } /* * * This method could extract arbitrary properties from the GameData and put it as a key value pair into * the G a m e D at a I O O b j e c t . Each pair will become a single attribute in the resulting ExampleSet and hence * each execution of the subprocess must result in exactly the same number of pairs . * Otherwise for some examples there are undefined attributes . */ private void e x t r a c t V a l u e s ( GameDataIOObject i n p u t ) { i n p u t . s e t V a l u e ( Age , i n p u t . getGameData ( ) . getAge ( ) ) ; } }
28
29 30 31 32 33 34 35 36 37 38 39 40 41
42
43
44
45 46 47 48 49
This is just a simple example for extracting one attribute, adding it and passing the object. Of course it is a good idea to let this operator inherit from an AbstractExtractionOperator which already provides all functionality that is shared among all extraction operators. Then only the method extractValues have to be implemented and one could concentrate on the real problem of extracting the values. The image below shows a sub process with four extraction operators.
50
Figure 6.2: The sub process containing several extraction operators like the one described above
Of course its possible to build more complex constructions. You might think of splitting and merging the GameDataIOObject, or building loops and conditions inside the sub process. The latter might be achieved by creating new super operators. Every way of treating your own IOObjects is possible by combining what we have learned.
51
Figure 6.3: If nothing else is dened, RapidMiner will return the default String representation as result.
method is too chatty: The IDE will hang for seconds until the huge string is built. This can be avoided by implementing it in the following way:
toString
1 2 3 4
5 6
7 8 9 10 11 12 13
@Override public S t r i n g t o R e s u l t S t r i n g ( ) { S t r i n g B u i l d e r b u i l d e r = new S t r i n g B u i l d e r ( ) ; b u i l d e r . append ( The f o l l o w i n g v a l u e s have been e x t r a c t e d : \ n ); f o r ( S t r i n g key : getValueMap ( ) . k e y S e t ( ) ) { b u i l d e r . append ( key + : \ t + getValueMap ( ) . g e t ( key ) + \ n ) ; } b u i l d e r . append ( \ n \ nThe data : \ n ) ; b u i l d e r . append ( data . t o S t r i n g ( ) ) ; return b u i l d e r . t o S t r i n g ( ) ; }
52
53
have parameters as operators do. They are used during automatic reporting of objects and control the output. The handling of these parameters and their value is done by the abstract class, all we have to do is to take their values into account when rendering. Here are the methods we have to implement:
1 2 3 4
public c l a s s GameDataRenderer extends A b s t r a c t R e n d e r e r { @Override public R e p o r t a b l e c r e a t e R e p o r t a b l e ( O b j e c t r e n d e r a b l e , I O C o nt a i n e r i o C o n t a i n e r , i n t d e s i r e d W i d t h , i n t desiredHeight ) { return n u l l ; } @Override public S t r i n g getName ( ) { return GameData ; } @Override public Component g e t V i s u a l i z a t i o n C o m p o n e n t ( O b j e c t r e n d e r a b l e , IO C o n t a i ne r i o C o n t a i n e r ) { return n u l l ; } }
5 6 7 8 9 10 11 12 13 14
15 16 17
The rst method must return an object of a class implementing one of the sub interfaces of Reportable, but this should not be treated here. One could take a look at the interfaces and some of the implementations in the core to get an example. In this tutorial we will focus on the visualization inside the RapidMiner graphical user interface. Attention: Since RapidMiner 5 the IOContainer will be empty or null in any case. It cannot be used anymore and only remains for compatibility reasons. Please make sure your renderers do not depend on it! The second method returns an arbitrary Java Component used for displaying content in Swing. Everything is possible, but since we want to see the values as a table, we are going to render it as such. We dont have to implement everything ourselves, we might use a subclass of the AbstractRenderer, the
54
As the name already indicates, it will show a table based upon a table model. All we have to do is to return this table model:
1 2 3 4 5 6
AbstractTableModelTableRenderer.
/* * * A renderer for the extracted values of G a m e D a t a I O O b j e c t s * * @author Sebastian Land */ public c l a s s GameDataRenderer extends AbstractTableModelTableRenderer { @Override public S t r i n g getName ( ) { return E x t r a c t e d V a l u e s ; } @Override public TableModel getTableModel ( O b j e c t r e n d e r a b l e , I O C o nt a i n e r i o C o n t a i n e r , boolean i s R e p o r t i n g ) { i f ( r e n d e r a b l e instanceof GameDataIOObject ) { GameDataIOObject o b j e c t = ( GameDataIOObject ) renderable ; f i n a l L i s t <Pair <S t r i n g , Double >> v a l u e s = new A r r a y L i s t <Pair <S t r i n g , Double >>() ; f o r ( S t r i n g key : o b j e c t . getValueMap ( ) . keySet ( ) ) { v a l u e s . add ( new Pair <S t r i n g , Double >( key , o b j e c t . getValueMap ( ) . g e t ( key ) ) ) ; } return new A b s t r a c t T a b l e M o d e l ( ) { private s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1L ; @Override public i n t getColumnCount ( ) { return 2 ; } @Override public i n t getRowCount ( ) { return v a l u e s . s i z e ( ) ;
7 8 9 10 11 12 13 14
15 16
17
18
19
20 21 22 23
24 25 26 27 28 29 30 31 32
55
33 34 35 36
} @Override public S t r i n g getColumnName ( i n t column ) { i f ( column == 0 ) return A t t r i b u t e ; return Value ; } @Override public O b j e c t getValueAt ( i n t rowIndex , i n t columnIndex ) { Pair <S t r i n g , Double > p a i r = v a l u e s . g e t ( rowIndex ) ; i f ( columnIndex == 0 ) return p a i r . g e t F i r s t () ; return p a i r . g e t S e c o n d ( ) ; } }; } return new D e f a u l t T a b l e M o d e l ( ) ; } }
37 38 39 40 41 42
43
44 45
46 47 48 49 50 51 52
There are some other convenience methods in the AbstractTableModelTableRenderer for changing the appearance of the table. For example the following methods change the behaviour of the table by enabling or disabling some features:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
@Override public boolean i s S o r t a b l e ( ) { return f a l s e ; } @Override public boolean i s A u t o r e s i z e ( ) { return f a l s e ; } @Override public boolean isColumnMovable ( ) { return true ; }
56
Figure 6.5: The result of our eort in building a table representation of the attached values
57
Now we should be able to create our own operators, even super operators, process meta data, build loops over our own IOObjects and render the results. The only problem is: How to get this into RapidMiner? For most people its not an appropriate option to check out the repository version of RapidMiner, extend it by own functions and then update the code and merge conicts each time the code base is changed. Another problem is, that this is only deployable by building a complete RapidMiner. But dont worry: RapidMiner 5 oers a exible extension mechanism that will solve all problems of that kind.
59
NFO,
which describes the functionality of this Extension and may contain a short text. This gives the user an orientation when the Extension shows up in the update and installation mechanism, where he might download new Extensions in a convenient way. Additionally this text will show up in the about box of this Extension, available in the About installed extensions menu . The most important le for the Extension is the manifest. It contains all the information that RapidMiner needs to nd out, where to nd the les for the operator conguration, their documentation and several other things. Lets take a look in this le:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17
18
19
M a n i f e s t V e r s i o n : 1 . 0 AntV e r s i o n : Apache Ant 1 . 7 . 1 Created By : 10.0 b23 ( Sun M i c r o s y s t e m s I n c . ) Implementation Vendor : r a p i d i Implementation T i t l e : T u t o r i a l E x t e n s i o n Implementation URL: www. r a p i d i . com Implementation V e r s i o n : 5 . 0 . 0 0 0 S p e c i f i c a t i o n T i t l e : T u t o r i a l E x t e n s i o n S p e c i f i c a t i o n V e r s i o n : 5 . 0 . 0 0 0 RapidMiner V e r s i o n : 5 . 0 RapidMiner Type : R a p i d M i n e r E x t e n s i o n P l u g i n D e p e n d e n c i e s : E x t e n s i o n ID : r m x t u t o r i a l Namespace : t u t o r i a l I n i t i a l i z a t i o n C l a s s : com . r a p i d m i n e r . P l u g i n I n i t T u t o r i a l IOObject D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / i o o b j e c t s T u t o r i a l . xml Operator D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / O p e r a t o r s T u t o r i a l . xml ParseRule D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / p a r s e r u l e s T u t o r i a l . xml Group D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / g r o u p s T u t o r i a l .
60
20
21
22
properties Error D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / E r r o r s T u t o r i a l . properties U s e r E r r o r D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / UserErrorMessagesTutorial . properties GUI D e s c r i p t o r : /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / GUITutorial . properties
The table below gives details about each entry, thats interpreted by RapidMiner. The rst three lines might be ignored, since they are storing java specic content. Entry
ImplementationVendor
ImplementationTitle
RapidMinerType PluginDependencies
Description The vendor of this extension, probably you or your company The name of this extension, by convention it should be end with Extension and each word is uppercase The URL of the vendor The version of this Extension, must be in x.y.zzz notation Should be the same as Implementation-Title Should be the same as Implementation-Version This is the smallest version of RapidMiner, this extension is compatible with. Notation always is x.y Currently only RapidMiner\ Extension is supported A semicolon separated list of Extensions this Extension depends on. The dependent Extensions are specied by their ID (see Extension-ID) and the smallest compatible version in braces. For example if the dependency would be rmx text[5.0] , then the Text Processing Extension with at least version 5.0 must be available, too.
61
ExtensionID
Namespace
Initialization Class
IOObjectDescriptor
OperatorDescriptor
ParseRuleDescriptor
GroupDescriptor
This is the ID of this extension. By convention, they start with rmx\ . To ensure that these IDs are unique, Rapid-i manages a list with all known Extensions and their IDs. Please contact Rapid-i for getting a unique ID. If you are interested in publishing your Extension, this is needed anyway to store it on the public update server, accessed by all RapidMiner installations. As the ID, this should be unique. It is used for distinguishing operators of this Extension from other operators. Also it helps RapidMiner to search for extensions, if unknown operator names are encountered in a process. Species a class, whose methods will be called during initialization of the Extension. This oers a hook to set some global properties or register other properties. We will come to this later. This resource maps Renderers to IOObjects. This is needed to tie the Renderer we implemented above to our IOObject. This resource maps the Operator classes to keys as we have seen in the example above. It additionally manages the group structure and links to the documentation. This resource contains rules for transforming old RapidMiner 4.x processes to the new process format. You only need to take care about this, if you have changed operators between 4.x and 5.0. It might be used to reset parameters, replace operators and so on. This resource denes properties of operator groups like colors and group icons.
62
ErrorDescriptor
UserErrorDescriptor
GUIDescriptor
If your Extension adds error messages, these should be addressed with a key and the message itself should be written to this le. This way it is possible to make the Extension available in dierent languages by translating this descriptor. RapidMiner will select the appropriate language le then. If you want to throw UserErrors not present in the core descriptions, you might add them here. This resource might contain properties for localizing GUI elements as we have seen before.
This seems to be rather complex, but theres no need to put together the manifest yourself. Instead we will use the ant build le we used in the chapters above for creating everything thats needed. Only thing we have to keep in mind is not to delete any of these les. Where ever the properties point to, these les must exist!
3 4 5 6 7 8
< p r o j e c t name= R a p i d M i n e r P l u g i n T e m p l a t e V e g a > < d e s c r i p t i o n >B u i l d f i l e f o r t h e RapidMiner Template e x t e n s i o n </ description > < p r o p e r t y name=rm . d i r l o c a t i o n= . . / RapidMiner Vega /> < p r o p e r t y name= b u i l d . b u i l d l o c a t i o n= b u i l d /> < p r o p e r t y name= b u i l d . r e s o u r c e s l o c a t i o n= r e s o u r c e s /> < p r o p e r t y name= b u i l d . l i b l o c a t i o n= l i b /> < p r o p e r t y name= c h e c k . s o u r c e s l o c a t i o n= s r c /> < p r o p e r t y name= j a v a d o c . t a r g e t D i r l o c a t i o n= j a v a d o c />
63
9 10
11 12 13 14 15 16 17
18
19
20
21
22
23
24
25 26 27 28 29 30 31 32 33 34 35 36 37
< p r o p e r t y name= e x t e n s i o n . name v a l u e= Template /> < p r o p e r t y name= e x t e n s i o n . name . l o n g v a l u e= RapidMiner Template E x t e n s i o n /> < p r o p e r t y name= e x t e n s i o n . namespace v a l u e= t e m p l a t e /> < p r o p e r t y name= e x t e n s i o n . vendor v a l u e= r a p i d i /> < p r o p e r t y name= e x t e n s i o n . admin v a l u e= S e b a s t i a n Land /> < p r o p e r t y name= e x t e n s i o n . u r l v a l u e=www. r a p i d i . com /> < p r o p e r t y name= e x t e n s i o n . n e e d s V e r s i o n v a l u e= 5 . 0 /> < p r o p e r t y name= e x t e n s i o n . d e p e n d e n c i e s v a l u e= /> < p r o p e r t y name= e x t e n s i o n . i n i t C l a s s v a l u e=com . r a p i d m i n e r . P l u g i n I n i t T e m p l a t e /> < p r o p e r t y name= e x t e n s i o n . o b j e c t D e f i n i t i o n v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / i o o b j e c t s T e m p l a t e . xml /> < p r o p e r t y name= e x t e n s i o n . o p e r a t o r D e f i n i t i o n v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / O p e r a t o r s T em p l a t e . xml /> < p r o p e r t y name= e x t e n s i o n . p a r s e R u l e D e f i n i t i o n v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / p a r s e r u l e s T e m p l a t e . xml /> < p r o p e r t y name= e x t e n s i o n . g r o u p P r o p e r t i e s v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / groupsTemplate . p r o p e r t i e s /> < p r o p e r t y name= e x t e n s i o n . e r r o r D e s c r i p t i o n v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / E r r o r s T e m p l a t e . p r o p e r t i e s /> < p r o p e r t y name= e x t e n s i o n . u s e r E r r o r s v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / U s e r E r r o r M e s s a g e s T e m p l a t e . p r o p e r t i e s /> < p r o p e r t y name= e x t e n s i o n . g u i D e s c r i p t i o n v a l u e= /com/ r a p i d m i n e r / r e s o u r c e s / i 1 8 n / GUITemplate . p r o p e r t i e s /> <! S r c f i l e s > <path i d= b u i l d . s o u r c e s . path > < d i r s e t d i r= s r c > < i n c l u d e name= /> </ d i r s e t > </path > < f i l e s e t d i r= s r c i d= b u i l d . s o u r c e s > < i n c l u d e name= / . j a v a /> </ f i l e s e t > < f i l e s e t i d= b u i l d . d e p e n d e n t E x t e n s i o n s d i r= . . /> <import f i l e = \ $ { rm . d i r } / b u i l d e x t e n s i o n . xml /> </ p r o j e c t >
None of these properties might be removed or set to a wrong value. If thats the case, the build process will fail! We will describe the properties in detail now, to understand what correct values are:
64
Property
rm.dir
build . build
build . resources
build . lib
check.sources
javadoc.targetDir
extension.name extension.name.long
extension.namespace
Description Denes the path to the RapidMiner project relative to this le. This is the build directory of your project relative to this le. Should be build This is the resource directory of your project. This is used to separate program les from other resources like icons and the mentioned conguration les. Please keep in mind that you should have a complete package structure below this directory, too. In Eclipse you should use it as source folder. By default it should be resources . This is the directory of the libraries used by your Extension. All . jar les stored in this directory will be extracted and copied into the resulting jar le, so that all classes are available. This should point to your source directory, which must be src and must not be changed. It is used for performing some checks, listing you formal problems in your classes. This property points to the sub directory of the RapidMiner release directory, where the java doc will be generated. This will be used during deploying the release, but as well might be used for generating the Java API documentation during development using the ant target javaDoc.generate. The name of the extension. This must be a combination of the extension.name value with prepended RapidMiner and appended Extension: RapidMiner <extension.name> Extension Corresponds to the namespace entry of the manifest described above.
65
extension.vendor
extension.admin
extension. url
extension.needsVersion
extension.dependencies
extension. initClass
extension. objectDenition
extension. operatorsDenition
extension.parseRuleDenition
extension.groupProperties
extension. errorDescription
extension.userErrors
extension.guiDescription
build . sources
Corresponds to the ImplementationVendor entry of the manifest described above. In fact this entry isnt used anywhere. It is just used for pointing to a person you might contact if you want to contribute to the Extension or have found a bug. Corresponds to the ImplementationURL entry of the manifest described above. Corresponds to the RapidMinerVersion entry of the manifest described above. Corresponds to the PluginDependencies entry of the manifest described above. Corresponds to the Initialization Class entry of the manifest described above. Corresponds to the IOObjectDescriptor entry of the manifest described above. Corresponds to the OperatorDescriptor entry of the manifest described above. Corresponds to the ParseRuleDescriptor entry of the manifest described above. Corresponds to the GroupDescriptor entry of the manifest described above. Corresponds to the ErrorDescriptor entry of the manifest described above. Corresponds to the UserErrorDescriptor entry of the manifest described above. Corresponds to the GUIDescriptor entry of the manifest described above. Must specify a path containing all sources that must be used for the Extension. The sources of RapidMiner are automatically included. A leset on the sources used for publishing the source code.
66
build .dependentExtensions
A leset containing all build .xml les of dependent Extensions. The les will be used for building the Extension, so that this extension can link against its . jar le.
67
So far we have got a basic introduction and you should now be able to implement our own operators. This chapter will show some more advanced options to modify RapidMiner. This will cover the PluginInit class as well as creating custom dockable windows, which will be available as view in the perspectives.
public s t a t i c void i n i t P l u g i n ( )
The initPluging method will be called directly after the extension is initialized. This is the rst hook during start up. No initialization of the operators or renderers has taken place when this is called.
1
69
This method is called during start up as the second hook. It is called before the GUI of the mainframe is created. The MainFrame is passed as an argument to register GUI elements. The operators and renderers have been registered in the meanwhile.
1
public s t a t i c void i n i t F i n a l C h e c k s ( )
initFinalChecks
is the last hook before the splash screen is closed, third in the row.
Figure 8.1: A conguration dialog for CRM connections. Imagine that you want to create a RapidMiner extention which oers an operator for reading data from a CRM system. Your operator will need the information about how to access the CRM, such as an URL, a username or a password. One approach would be to add text elds to the parameters of the operator and let the user type in the required information. Though this may seem convenient at rst, it gets quite uncomfortable if you want to use the same information about the CRM in another RapidMiner process or operator, as you have to type in the
70
information multiple times. A way of dealing with that problem is to dene the CRM connection globally and let the user select the CRM they want to get data from. This is a scenario where the so called Congurators come in handy. A congurator manages items of a certain type globally and enables to create, edit and delete them though a custom conguration dialog. For this example, we will implement a congurator for CRM entries, which automatically allows us to congure those entries with a dialog, accessible through the Tools menu. Moreover, a congurator can be used along with a drop-down list which allows the user to easily select a CRM connection in the conguration of our operator.
8.2.1 Usage
In order to implement your own congurator, you need to know the following classes: Congurable is an item which can be modied through a Congurator Congurator instantiates and congures subclasses of Congurable CongurationManager is used to register Congurators in RapidMiner ParameterTypeCongurable is a ParameterType which creates a drop-down list for congurators and can be used in the conguration settings of operators The rst thing we have to do is to create a new class describing a single CRM connection entry, which implements the Congurable interface. It is advised to extend AbstractCongurable instead, because by doing so, we dont have to deal with handling parameter values. In this case, you dont have to write any code that deals with the actual conguration:
1 2 3 4 5 6
import com . r a p i d m i n e r . t o o l s . c o n f i g . A b s t r a c t C o n f i g u r a b l e ; public c l a s s CRMConfigurable extends A b s t r a c t C o n f i g u r a b l e { /* * Actual business logic of this configurable . */ public CRMConnection c o n n e c t ( ) {
71
7 8 9 10 11 12
S t r i n g username = g e t P a r a m e t e r ( username ) ; S t r i n g u r l = getParameter ( u r l ) ; URLConnection con = new URL( u r l ) . open Connec tion ( ) ; // do something with the connection ... } }
Next, we must extend the abstract Congurator class. Each congurator has a unique typeID, a String in order to identify the congurator in RapidMiner and an I18NBaseKey, which will be used as the base key for retrieving localized information from the resource le. Also, we want to add some ParameterTypes to our Congurator, because they specify how an entry can be edited through the conguration dialog. In our example, we need ParameterTypes describing the URL and the username which should be used for the CRM connection. For that matter, you would simply have to overwrite the getParameterTypes and add a new ParameterTypeString, as shown in the following implementation:
1 2 3 4 5 6 7 8 9
import j a v a . u t i l . A r r a y L i s t ; import j a v a . u t i l . L i s t ; import com . r a p i d m i n e r . p a r a m e t e r . ParameterType ; import com . r a p i d m i n e r . p a r a m e t e r . P a r a m e t e r T y p e S t r i n g ; import com . r a p i d m i n e r . t o o l s . c o n f i g . C o n f i g u r a t o r ; /* * * A simple im plementa tion of { @link Configurator } with one parameter field . */ public c l a s s CRMConfigurator extends C o n f i g u r a t o r <CRMConfigurable > { @Override public C l a s s <CRMConfigurable > g e t C o n f i g u r a b l e C l a s s ( ) { return CRMConfigurable . c l a s s ; } @Override public S t r i n g getI18NBaseKey ( ) { return c r m c o n f i g ; } @Override
10 11 12 13 14 15 16 17 18 19 20 21 22 23
72
24 25
26
27
28 29 30 31 32 33 34 35
public L i s t <ParameterType > getParameterTypes ( ) { L i s t <ParameterType > v a l u e s = new A r r a y L i s t < ParameterType > () ; v a l u e s . add ( new P a r a m e t e r T y p e S t r i n g ( URL , The URL to connect to , false ) ) ; v a l u e s . add ( new P a r a m e t e r T y p e S t r i n g ( Username , The username f o r t h e CRM , f a l s e ) ) ; return v a l u e s ; } @Override public S t r i n g getTypeId ( ) { return CRMConfig ; } }
Apart from the methods getTypeID, getI18NBaseKey and getParameterTypes, you also have to implement the method getCongurableClass which simply returns the used Congurable implementation class, so in this case the class CRMCongurable. Now, we have to add localized information to the resource le which is specied in the GUI-Descriptor entry of the manifest. Among other things, you can specify the text for each important GUI element of the conguration dialog in this le. As for our example, the resource le could look like this:
1 2
g u i . c o n f i g u r a b l e . c r m c o n f i g . name = CRM C o n n e c t i o n g u i . c o n f i g u r a b l e . c r m c o n f i g . d e s c r i p t i o n = An e n t r y d e s c r i b i n g a CRM connection . gui . dialog . configuration gui . dialog . configuration d a t a c o n n e c t i o n . png gui . dialog . configuration d a t a l o c k . png gui . dialog . configuration connection or c r e a t e gui . dialog . configuration gui . action . configuration gui . action . configuration gui . action . configuration gui . action . configuration CRM c o n n e c t i o n s . . c r m c o n f i g . t i t l e = CRM C o n n e c t i o n Manager . crmconfig . connection entry . icon = . crmconfig . connection readonly entry . icon = . c r m c o n f i g . message = P l e a s e c h o o s e a CRM a new one . . c r m c o n f i g . i c o n = d a t a c o n n e c t i o n e d i t . png . c r m c o n f i g . l a b e l = Manage CRM c o n n e c t i o n s . . . . c r m c o n f i g . mne = C . c r m c o n f i g . i c o n = d a t a c o n n e c t i o n e d i t . png . crmconfig . tip = Create , e d i t and d e l e t e
3 4 5
8 9 10 11 12
73
13 14 15
In order to get access to our new congurator, we have to register it in the CongurationManager. This step is important, because we need RapidMiner to know our new congurator, so that the CRM operator and other parts of RapidMiner can access it. For this need, we can simply call the register method within the initialization procedure. This should be done through the initPlugin method of the PluginInit class:
1 2 3 4
As our congurator is now ready to be used, we want to add new elements to the conguration settings of our CRM operator, with which the user can select a CRM from a drop-down list or open the conguration dialog directly by clicking on a button. For that matter, we will add the ParameterType ParameterTypeCongurable to the imports:
1
import com . r a p i d m i n e r . t o o l s . c o n f i g . P a r a m e t e r T y p e C o n f i g u r a b l e ;
After that, we just add a new ParameterTypeCongurable to the getParameterTypes() method of the operator:
1 2 3
4 5 6
public L i s t <ParameterType > getParameterTypes ( ) { L i s t <ParameterType > t y p e s = super . getParameterTypes ( ) ; ParameterType t y p e = new P a r a m e t e r T y p e C o n f i g u r a b l e ( PARAMETER CONFIG, Choose a CRM c o n n t e c t i o n , c r m c o n f i g ) ; t y p e s . add ( t y p e ) ; return t y p e s ; }
We now successfully created our own congurator and are able to use it to congure CRM entries for our operator. In the next step, we will look at how to customize the standard conguration dialog.
74
Figure 8.2: The ParameterTypeCongurable creates a drop-down list. The user can easily choose which CRM connection should be used.
import j a v a . awt . GridLayout ; import import import import import j a v a x . swing . JComponent ; j a v a x . swing . JFrame ; j a v a x . swing . J L a b e l ; j a v a x . swing . JPanel ; j a v a x . swing . J T e x t F i e l d ;
import com . r a p i d m i n e r . t o o l s . c o n f i g . C o n f i g u r a b l e ; import com . r a p i d m i n e r . t o o l s . c o n f i g . g u i . C o n f i g u r a t i o n P a n e l ; public c l a s s CRMConfigurationPanel extends C o n f i g u r a t i o n P a n e l < CRMConfigurable > { private J T e x t F i e l d nameField = new J T e x t F i e l d ( ) ; private J T e x t F i e l d u r l F i e l d = new J T e x t F i e l d ( ) ; private J T e x t F i e l d u s e r n a m e F i e l d = new J T e x t F i e l d ( ) ; @Override public boolean c h e c k F i e l d s ( ) {
13 14 15 16 17 18 19
75
20 21
// validates the user input return u r l F i e l d . g e t T e x t ( ) . s t a r t s W i t h ( h t t p : / / ) ? true : f a l s e ; } @Override public JComponent getComponent ( ) { // returns a custom GUI component G r i d B a g C o n s t r a i n t s c = new G r i d B a g C o n s t r a i n t s ( ) ; c . a nc hor = G r i d B a g C o n s t r a i n t s . FIRST LINE START ; c . weighty = 0 ; c . weightx = 1 ; c . f i l l = G r i d B a g C o n s t r a i n t s .BOTH; c . g r i d w i d t h = G r i d B a g C o n s t r a i n t s .REMAINDER; JPanel p a n e l = new JPanel ( new GridBagLayout ( ) ) ; p a n e l . add ( new J L a b e l ( Name : ) , c ) ; p a n e l . add ( nameField , c ) ; p a n e l . add ( new J L a b e l ( URL: ) , c ) ; p a n e l . add ( u r l F i e l d , c ) ; p a n e l . add ( new J L a b e l ( Username : ) , c ) ; p a n e l . add ( u s e r n a m e F i e l d , c ) ; c . weighty = 1 ; p a n e l . add ( new JPanel ( ) , c ) ; return p a n e l ; } @Override public void updateComponents ( CRMConfigurable c o n f i g u r a b l e ) { // used to update the Panel , according to the given configurable nameField . s e t T e x t ( c o n f i g u r a b l e . getName ( ) ) ; u r l F i e l d . s e t T e x t ( c o n f i g u r a b l e . g e t P a r a m e t e r ( URL ) ) ; usernameField . setText ( c o n f i g u r a b l e . getParameter ( Username ) ) ; } @Override public void u p d a t e C o n f i g u r a b l e ( CRMConfigurable c o n f i g u r a b l e ) { // reads field values from the panel and updates the parameter values of the configurable
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52
53 54 55 56
57
76
58 59
60
61 62
What is still left to do is to specify the usage of the new CRMCongurationPanel in our congurator. Therefore, we have to override the CongurationPanel method in the CRMCongurator class:
1 2
3 4
Figure 8.3: The CRMCongurationPanel is now used for conguring CRM connection entries. That way, our new CRMCongurationPanel will be used instead of the default implementation. In this example, the text elds will show the name, URL and username of the selected entry and makes it possible to edit them as well. When
77
it comes to saving the user input, a validation of the input will be requested through calling the checkFields method, after which updateCongurable is called in order to get the input from our panel. This way, you can easily create your own custom conguration panels and organize it the way you want.
package com . r a p i d m i n e r ; import j a v a . awt . BorderLayout ; import j a v a . awt . Component ; import j a v a x . swing . J L a b e l ; import j a v a x . swing . JPanel ; import com . r a p i d m i n e r . g u i . t o o l s . ResourceDockKey ; import com . v l s o l u t i o n s . swing . d o c k i n g . DockKey ; import com . v l s o l u t i o n s . swing . d o c k i n g . Dockable ; /* * * A very simple example of a new dockable window . * @author Sebastian Land */ public c l a s s SimpleWindow extends JPanel implements Dockable { private s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1L ;
78
21
private f i n a l DockKey DOCK KEY = new ResourceDockKey ( t u t o r i a l . simple window ) ; private J L a b e l l a b e l = new J L a b e l ( H e l l o u s e r . ) ; public SimpleWindow ( ) { // adding content to this window s e t L a y o u t ( new BorderLayout ( ) ) ; add ( label , BorderLayout .CENTER) ; } public void s e t L a b e l ( S t r i n g l a b e l T e x t ) { t h i s . l a b e l . s e t T e x t ( l a b e l T e x t+TEST ) ; System . out . p r i n t l n ( l a b e l T e x t ) ; revalidate () ; } @Override public Component getComponent ( ) { return t h i s ; } @Override public DockKey getDockKey ( ) { return DOCK KEY; } }
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
While the content of the window is rather simple and only a variant of the well known Hello World program, we see the new concept of the ResourceDockKey. A DockKey contains information about a Dockable, for example it stores the name and the icon of this window. The ResourceDockKey will retrieve this information from the GUI resource bundle that is loaded in a language dependent manner from a resource le. This le is specied in the GUIDescriptor entry of the manifest. So the window title and tooltip can be translated without changing the source code and the correct language is automatically chosen. In the template project, the GUI properties le is called GUITemplate.properties. This is an example of what might describe the new window:
1 2
79
The window2.png has been added to com/rapidminer/resources/icons/16 in the resources directory, so that it is available when starting RapidMiner. The last remaining task before we can take a look at our brand new window, we have to register it at RapidMiners MainFrame. Since we want to do this independently from operators execution, and in fact want to have the window before any process is executed, we have to use one of the PluginInit hooks. So we are going to ll the initGui method:
1 2 3
Thats all we need and after we have repeated the deployment of our Extension, we can selected the new view from the menu. The result might look this:
Figure 8.4: The new window is shown as a dockable window on the right.
80
public s t a t i c void i n i t G u i ( MainFrame mainframe ) { f i n a l SimpleWindow simpleWindow = new SimpleWindow ( ) ; mainframe . g e t D o c k i n g D e s k t o p ( ) . r e g i s t e r D o c k a b l e ( simpleWindow ) ; JMenu menu = new ResourceMenu ( t u t o r i a l . t u t o r i a l ) ; mainframe . getMainMenuBar ( ) . add ( menu ) ; }
The ResourceMenu behaves similar to the ResourceDockKey and will retrieve its settings from the resource bundle. When might add three properties per menu:
1 2 3
The label will be used as name, while the mne is the mnemonic for this menu entry. The case of this letter denes where in the word the underscore will be placed. The text in the tip property will be show up as tool tip. But this isnt very satisfactory. Although we have an additional menu, we dont have any option in there, so we will add an action. Again, we will use a resource based variant that will gather all required information from the GUI properties. The method will nally look like this:
1 2 3
public s t a t i c void i n i t G u i ( MainFrame mainframe ) { f i n a l SimpleWindow simpleWindow = new SimpleWindow ( ) ; mainframe . g e t D o c k i n g D e s k t o p ( ) . r e g i s t e r D o c k a b l e ( simpleWindow ) ; JMenu menu = new ResourceMenu ( t u t o r i a l . t u t o r i a l ) ;
4 5
81
7 8 9 10 11 12 13 14 15 16
menu . add ( new R e s o u r c e A c t i o n ( true , t u t o r i a l . g r e e t i n g s , Earthling ) { private s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1L ; @Override public void a c t i o n P e r f o r m e d ( ActionEvent e ) { simpleWindow . s e t L a b e l ( G r e e t i n g s ! ) ; } }) ; mainframe . getMainMenuBar ( ) . add ( menu ) ; }
We have added a menu entry, by specifying a new ResourceAction. The action will give a name to the menu entry and an icon if present, as well as a tooltip. The constant true in the constructor will force the usage of a 16 pixel icon instead of a larger size. Each action reads ve properties, all of which begin with gui . action . followed by the key, a dot and then the property identier. The ve property identiers are label, which describes the text visible in the menu, mne for choosing the mnemonic, tip for the tooltip, icon for the icon , acc for specifying a short cut to this action. This could be F3 or control pressed F3 as examples. See KeyStroke class of Java and especially the getKeyStroke method documentation for details. The property le might contain something like that:
1 2 3 4 5
. . . . .
. . . . .
82
Another feature is the {0}. This will be replaced with the string value of the rst argument given to the constructor of any resource based element after the resource identier key. In the above example the rst and only additional parameter is the String Earthling and hence the menu entry will be named Greet Earthling! This mechanism works for all label and tooltips in all resource based GUI elements.
83