Scaffold Hunter
Scaffold Hunter
User Manual
2. Requirements 3
2.1. System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2. Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1. Setting Up a MySQL/MariaDB Database . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3. Java Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Installation 7
3.1. Installing Scaffold Hunter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2. Installing Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3. Starting Scaffold Hunter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4. Start Dialog 9
4.1. Setting up a Database Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2. User Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5. Getting Started 11
5.1. Session Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2. Dataset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2.1. Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.2. Calculation of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.3. Scaffold Tree Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3. Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4. Initial Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6. Main Window 29
6.1. Structure of the Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2. Subset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2.1. The Subset Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2.2. Creating Subsets from the Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2.3. Deleting Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2.4. Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2.5. Reducing Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3. Managing and Arranging Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.1. Opening and Closing Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.2. Split Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.3. Multiple Main Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3.4. Moving Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3.5. Renaming Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4. Interaction between Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4.1. Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4.2. Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.5. Tooltip Window and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.5.1. Configuring the Tooltip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5.2. Configuring the Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5.3. Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
i
Contents Contents
6.6. Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7. Views 39
7.1. Scaffold Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1.1. Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1.2. Context Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.1.3. Tree Menu & Tool Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.1.4. Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.1.5. Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.1.6. Property Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.1.7. Exporting Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2. Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.1. Sorting and Ordering the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.2. Resizing Table Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.3. The Minimap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.4. Sticky Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.5. Setting Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.6. Selecting Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.7. Writing and Editing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.8. Placeholders in Table Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3. Dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3.1. The View in Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3.2. Side Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3.3. Tree Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.4. Tool Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.5. Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3.6. Clustering Configuration Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.4. Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4.1. Choosing the Data to be Displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4.2. Dot Color and Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4.3. Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4.4. Zooming, Scrolling and Rotating the Diagram . . . . . . . . . . . . . . . . . . . . . 54
7.4.5. Highlighting, Selecting and Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4.6. Hiding the Axis Ticks and Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5. Tree Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5.1. TreeMap & Tool Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.5.2. Property Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.5.3. Color Legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.5.4. Detail View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.6. Heat Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.6.1. The View in Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.6.2. Navigation and Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.6.3. Clustering of Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6.4. Sorting of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6.5. Color Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7. Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.7.1. Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.7.2. Creating a new cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.7.3. Property Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.7.4. Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.7.5. Detailed view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8. Export 67
Bibliography 69
ii
Contents Contents
A. Keyboard Shortcuts 71
iii
1. Introduction
Searching chemical space for a particular compound is like looking for a needle in a haystack. To
assist you with this task, the software Scaffold Hunter was developed. Scaffold Hunter uses the concept
of scaffolds, i.e., underlying molecular frameworks that serve as simplified representatives for classes
of similar molecules. These scaffolds can be organized in a tree-like hierarchy based on the inclusion
relation, enabling the user to navigate in the associated chemical space in an intuitive way. Furthermore,
Scaffold Hunter supports a variety of views allowing the user to inspect chemical compound data from
different perspectives.
The first step in using Scaffold Hunter is to aggregate data in a database, and create a so-called scaffold
tree for this data. These topics are covered in the first part of the manual. Once this is done the data
can be navigated, filtered, viewed and annotated by multiple users. This is described in the second part
of the manual.
History. The Scaffold Hunter project was initiated by Stefan Wetzel and Karsten Klein and a first
version of Scaffold Hunter was implemented by the project group 504 [3] at the TU Dortmund in 2009.
For version 2.x, many parts of it were improved, extended and rewritten by the project group 552 [2],
resulting in a program with many additional features, better support of chemical workflows, and improved
usability. Currently, the project is managed by Nils Kriege and Karsten Klein at TU Dortmund, and
continuously improved in a cooperation of TU Dortmund and the Max-Planck-Institute for Molecular
Physiology.
1
2. Requirements
2.2. Database
An SQL database is required to store the imported datasets and the user’s sessions. The currently
supported databases systems are:
MySQL/MariaDB
Version 5.0 or later is required. A copy of MySQL/MariaDB can be obtained at https://www.mysql.com
or https://mariadb.org; see Sec. 2.2.1: Setting Up a MySQL/MariaDB Database for configuration
instructions.
HSQLDB
Version 2.2.4 of HSQLDB is included and can be used for small personal databases. Additional space
on the hard drive is required if this type of database is used. Please note that using HSQLDB leads
to an increased memory requirement compared to MySQL/MariaDB.
In this section a basic setup of a MySQL/MariaDB database and user account is described that is sufficient
for use with Scaffold Hunter. We assume that the server software is already installed and running.
Please refer to https://www.mysql.com or https://mariadb.org for detailed installation instructions
concerning MySQL/MariaDB depending on your operation system.
MySQL/MariaDB uses the latin1 character encoding by default. It is important to configure the MySQL/MariaDB
server, that it uses UTF-8 character encoding. Otherwise, importing properties from files which contain
non-latin characters will lead to errors. For MySQL/MariaDB 5.0 or greater you can simply use one of
the following two options. Please refer to the MySQL/MariaDB manual at https://www.mysql.com or
https://mariadb.org if you are using another version and check if the provided information is still
valid.
3
2.2. DATABASE CHAPTER 2. REQUIREMENTS
Option 2 Using the MySQL/MariaDB service: Edit the MySQL/MariaDB Server configuration file (https:
//dev.mysql.com/doc/refman/5.7/en/option-files.html) and add the following lines.
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
During the installation process of MySQL/MariaDB typically a root user account is created. We recommend
not to use this account with Scaffold Hunter and to create a dedicated user account with restricted right
of access. To create both, a database and a new user account, you may connect to the MySQL/MariaDB
server with your root account using the command mysql. The following SQL statements can be used to
create a database scaffoldhunter and a user scaffoldhunter that is allowed to access only the newly
created database from localhost and has to identify with the specified password.
CREATE DATABASE scaffoldhunter default character set = "UTF8" default collate = "utf8_general_ci";
CREATE USER 'scaffoldhunter'@'localhost' IDENTIFIED BY 'password';
GRANT ALL ON scaffoldhunter.* TO 'scaffoldhunter'@'localhost';
In case you prefer to install the MySQL/MariaDB server on a separate machine, localhost should be
replaced by %. Please note that in this case additional system specific configuration may be required,
e.g., adjustment of firewall settings.
MySQL/MariaDB typically defaults to a 1 MiB maximum package size which can lead to errors during
saving a session. We therefore recommend to increase the max package size to 32 MiB. This can be
done by adding the line max allowed packet = 32M under the [mysqld] section to the MySQL/MariaDB
server configuration file (https://dev.mysql.com/doc/refman/5.7/en/option-files.html).
[mysqld]
max_allowed_packet = 32M
In MySQL 5.6.20 or newer and MariaDB 10.0.14 or newer a restriction for writing large objects has been
introduced. The log file size has to be at least ten times the size of any write operation performed on
the database. If you receive error messages upon saving a session and you have already increased the
package size (see Sec. 2.2.1.3: Allow Large Packages), you could try to increase the log file size. This
can be done similar to the packet size by adding the line innodb log file size = 320M.
Scaffold Hunter tries to detect, whether your database version is affected by this mentioned restriction.
If the version could not be detected properly, the restriction is applied by default to prevent session
4
CHAPTER 2. REQUIREMENTS 2.3. JAVA VIRTUAL MACHINE
losses. In this case you might increase the log file size, even though your database does not use one of
the affected MySQL/MariaDB versions.
5
3. Installation
7
4. Start Dialog
The Start Dialog shown in Fig. 4.1 is the first window that appears at program startup. Here you can
select the database to work with and enter your username and password. If the checkbox “Save password”
is selected, the password will be saved in encrypted form. With the checkbox “Open last used session”
you can choose to automatically open the session most recently used under the given username.
9
4.2. USER MANAGEMENT CHAPTER 4. START DIALOG
connection, there are a few settings to be made. The “URL” field holds the server name or IP address.
For example, “localhost” or “127.0.0.1” both would point to the local machine. The protocol name
“jdbc:mysql://” is not necessary and will be added automatically. In the “Database” field the database
or schema name is entered. Please note that if an existing database is entered not containing tables
created by Scaffold Hunter, you will be asked if you want to overwrite all existing data when connecting
to the database for the first time. The “DB Username” is used to log in to the database. This user must
be configured on the MySQL/MariaDB server and needs all rights to manipulate data in the database,
see Sec. 2.2.1: Setting Up a MySQL/MariaDB Database for further information on database and user
account creation. The connection to the database also requires a password, which can be saved in the
configuration file. Please note that password is stored unencrypted. If this password is not saved, you
will be asked to enter the password each time you log in to the database.
Scaffold Hunter includes its own user management. A profile is used to save personal settings and open
sessions (see Sec. 5.1: Session Management). To create a new user, click on the button “Create User”.
Thereby the User Creation Dialog shown in Fig. 4.3 is opened, where you can choose the database on which
to create the user, as well as a username and password. After clicking “Ok” the program will connect
to the selected database and create a new user with the specified username. If the specified username
already exists in the database user creation will fail and an error message will be displayed.
10
5. Getting Started
The Session Management Dialog shown in Fig. 5.1 appears after logging in, or by selecting “Session
→ Load / Create” from the menu. If “Open last used session” was selected in the Start Dialog, the
session management window does not appear and the last used session is opened instead. The session
management consists of two areas. The upper area is used for creating new sessions. The lower area
shows a list of available sessions and options for them. At the first start the session list will be empty,
and a new session has to be created in order to start Scaffold Hunter.
To create a new session, choose a title, the dataset to create the session on, and the scaffold tree. The
title can be changed later. After clicking on the “Create” button, a filter dialog will appear, allowing to
filter the selected dataset. The created session will then be displayed in the list of sessions in the lower
area. To create a new dataset or to add a new tree to a dataset, click “Manage Datasets”.
In the lower area of the session management window, the available sessions are shown with their title,
dataset, tree and molecule count. Here the sessions can be renamed or removed, if they are no longer
needed. To open a session, select the desired session and click the “Open” button. Initially the last used
session is selected in the list.
11
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
Important: If you are running into database exceptions while saving a session
of a larger dataset, this might be a problem with the maximum package size
or the maximum log file size of MySQL/MariaDB. Please refer to Sec. 2.2.1.3:
Allow Large Packages and Sec. 2.2.1.4: Increase size of log file for further
instructions about how to configure the MySQL/MariaDB server.
The Dataset Management Dialog shown in Fig. 5.2 gives you the possibility to import, modify or delete
the molecular data stored in Scaffold Hunter’s internal database. In Scaffold Hunter you can manage
your projects by the use of datasets. Each dataset consists of a set of molecules and a collection of
properties for these molecules. For each dataset you can generate an arbitrary number of scaffold trees
using different generation rules. You will later use the scaffold tree of your choice, to navigate through
the molecules in the dataset. In the following the actions you can execute on datasets and trees are
explained.
Actions on datasets
New dataset To create a new dataset by importing chemical compound data, click on “New Dataset”
which opens the import wizard. Read Sec. 5.2.1: Importing Data to get more information about the
import process.
Add properties If you have an existing dataset and want to add some additional molecular properties
(e.g. bioactivity information from a bio assay), select the dataset from the list of datasets and click “Add
12
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
Properties”. A wizard similar to the import wizard described in Sec. 5.2.1: Importing Data will appear.
The mechanics behind adding properties are described in Sec. 5.2.1.7: Add Properties.
Calculate properties Scaffold Hunter gives you the opportunity to calculate additional properties
(e.g. properties derived from the structure of a compound) for the molecules of a dataset. Additionally,
to enable some features like clustering molecules based on structural similarity, you may need to calculate
fingerprint properties. To calculate properties for a dataset, select the dataset from the list of datasets
and click “Calculate Properties”. A dialog to create calculation tasks appears. Please read Sec. 5.2.2:
Calculation of Properties to learn more about the calculation process.
Rename dataset If you created a dataset and would like to change its name or description, select
the dataset from the list of datasets and click “Rename Dataset”. A rename dialog will appear.
Delete dataset If there is an existing dataset you don’t need anymore, select it from the list of
datasets and click “Delete Dataset”. A confirmation dialog will appear. If you confirm the deletion, the
dataset and all trees in it will be deleted.
Warning: Scaffold Hunter can be used by multiple users. If you are working
on a database shared with other users, you will also destroy any work done
by other users on the dataset being deleted and the corresponding trees. First
ensure nobody needs the dataset and the corresponding trees anymore.
Actions on trees
New tree To create a new tree, select the dataset the tree should be generated for from the list of
datasets and click “New Tree”. The tree generation dialog will appear. Read Sec. 5.2.3: Scaffold Tree
Generation to learn more about how the tree generation process works.
Calculate properties This option allows you to calculate additional properties for the scaffolds of a
tree. The calculation works in the same way as the calculation for molecules. The only difference is that
these properties only refer to the scaffolds of a selected tree and are not associated with the whole data
set. Please read Sec. 5.2.2: Calculation of Properties to learn more about the calculation process.
Rename tree If you created a tree and would like to change its name or description, select the tree
from the list of trees and click “Rename Tree”. A rename dialog will appear.
Delete tree If there is an existing tree you don’t need anymore, select it from the list of trees on the
left and click “Delete Tree”. A confirmation dialog will appear. If you confirm the deletion, the tree will
be deleted.
Warning: Scaffold Hunter can be used by multiple users. If you are working
on a database shared with other users, you will also destroy any work done
by other users on the tree being deleted. First ensure nobody needs the tree
anymore.
13
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
Using different import plugins, Scaffold Hunter can import data from different sources. Currently there
are plugins to import data from CSV files, SDF files and MySQL databases bundled with the program.
For a more detailed description of the plugins see Sec. 5.2.1.3: Import Plugins. To perform an import
you have to create one or more import jobs, one for each data source you want to import. These jobs
will run sequentially and the data imported by those jobs will be merged according to rules which can
be selected during the import process. In general Scaffold Hunter considers two structures equal if they
have the same canonical SMILES string. If a structure is present in more than one source and properties
from these sources are mapped to the same internal property the property entries for the structure will
be merged. The following merge strategies are available to influence which data will be found in the
finished dataset:
Do not overwrite: The first seen occurrence of the property is used and will not be overwritten by the
current job.
Concatenate (only for textual properties): The property values will be appended at the end of the
existing value.
Minimum / maximum (only for numeric properties): The minimum or maximum of the available
property values will be retained.
For each dataset you can choose whether the chirality information of the molecules should be considered
or not. This can be done by the checkbox in the bottom left corner of the import dialog. This can have an
impact on the SMILES generation, since two different molecules may become equal in structure if their
chirality information is ignored. As mentioned, two molecules with equal SMILES will be merged into one
molecule during the import process. By default, chirality information is used by Scaffold Hunter.
When you click on “New Dataset” the Create Import Jobs dialog as shown in Fig. 5.3 will open. In the
top section of the dialog a name and a more detailed description for the new dataset can be entered. On
the left hand side you have two lists. First the list of “Import Plugins”, where you can select a plugin
appropriate for the data source you want to import. Underneath there is the list of “Import Jobs”,
which contains all import jobs created so far. The buttons beside this list can be used to delete import
jobs and to change their order. During import the jobs will be executed one after the other from top
to bottom. On the right hand side there are the settings for the currently selected plugin. Beneath the
settings you can enter an optional name for the import job. A click on the button beside the name field
will create a new import job with the selected settings. Below this you can find a button to start the
import process.
CSV Import Plugin With the CSV plugin you can import data from comma separated value (CSV)
files. In Fig. 5.4 you can see the configuration panel of the CSV plugin. You can select a CSV file,
different cell separators and quotation characters. Next to some default values you can type in any other
14
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
character. One of these options allows you to check the “quotation required” box to configure the plugin
to read only quoted content. With the “First row contains names” switch you can configure the plugin
to use the first row in the CSV file for the names of the columns. If you deselect this switch the columns
will be numbered. Finally there is a “Preview with current settings” button. When you click on this
button the first 10 rows of the CSV file will be read according to your settings and displayed in the table
below. An example can be seen in Fig. 5.5.
SQL Import Plugin The SQL import plugin allows to integrate data from SQL-based databases. The
configuration panel, as seen in Fig. 5.6 provides various configuration options explained in the follow-
ing.
DB Type This field allows to select the type of the SQL database you want to connect to. Currently
only MySQL/MariaDB is tested. Every JDBC compatible database should be useable.
15
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
DB Host In “DB Host” you can select the database host, the server on which the source database
is running. If you need a specific port to connect to the string is: HOSTNAME:PORT
DB schema In the “DB schema” field you can enter the database (schema) name you want to
use.
DB User / DB Password In the “DB User” and “DB Password” fields you can fill in your database
connection credentials. Only read access is needed.
Get tables from database Once you have entered the connection data you can click the “Get tables
from database” button. The plugin will connect to the database and put all table names found in the
schema in the “DB Table” list. A connection done with the ChEBI database is shown in Fig. 5.7.
DB Table The “DB Table” list contains the tables found after clicking on “Get tables from database”.
When selecting a table a simple SELECT statement will be generated.
SQL Statement In the “SQL Statement” field you can type in an SQL SELECT statement that will
be run on the selected database. The resulting rows from this statement will be used as source for the
import.
16
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
Figure 5.7.: SQL Import Plugin after “Get tables from database”
Execute SQL Query By clicking on “Execute SQL Query” your SQL Statement will be executed
and the “SMILES” and “MOL” lists will be filled with the resulting column names. This is shown in
Fig. 5.8.
SMILES / MOL In the “SMILES” and “MOL” lists you can select the column names which contain
structure information in SMILES and/or MOL format.
SDF Import Plugin The SDF Import plugin is very easy to use. It just has a field “SDF file name”
where you enter the path to the SDF file which you want to import.
Once you have created at least one import job, you can start the import process by clicking on “Start
Import”. A new Dialog named “Map Imported Properties” will appear. In Fig. 5.9 you can see an
example of this dialog with two import jobs. For every import Job there are the same GUI elements. At
the top you have a line with the name of the following import job.
Structure name property In the “Structure name property” list you select one of the properties for
the name field of the structures. The name field will be used at several places in the program to present
17
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
a name together with a structure, however it is not used internally so there are no special requirements
for this property. If the name property is missing for some structure, the SMILES string will be used
instead.
Name/Molecular structure merge strategy In the “Name merge strategy” and “Molecular structure
merge strategy” you can select the merge strategies for the molecule name or 3D/2D structure according
to Par. 5.2.1: Importing Data.
Map all Unmapped Properties With the “Map all Unmapped Properties” button all properties for
which you did not give mapping information will be mapped automatically. The property name will be
the name obtained from the import plugin, the type (numeric or text) is set also according to the plugin
information as well. If multiple sources have properties with the same name, these properties will be
mapped to the same internal property.
The mapping table In the table you have four columns. The first column (“Imported Property”)
shows the property name obtained from the plugin. In the next column (“Mapped to”) you can select an
existing internal property for this value, alternatively you can choose to create a “new internal property”
which will open the “Create New Internal Property” Dialog (see Sec. 5.2.1.5: Create New Internal Prop-
erty dialog) or you can choose “do not map” which will clear the table cell. In the “Transform Function”
column you can input a transformation function for numerical properties, for example to convert loga-
rithmic to linear values. Symbols you can use are numbers, +, −, ∗, /, ^, (, ), log, log10, exp, ceil and
floor. For the input value use the variable x. So in case you want to convert aforementioned logarithmic
18
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
values you would enter exp(x). In the last column (“Merge Strategy”) you can select the merge strategy
for this property.
Preview Selected Property With “Preview Selected Property” you can open a list where you can see
the first 100 occurrences of the property, which is currently selected in the table.
In the “Create New Internal Property” dialog, see Fig. 5.10 you can create new internal properties.
First you give the property a name using the “Property name” field. In the “Property type” list you
can select one of the property types (Number, Text, fingerprint types that can be used for clustering in
the dendogram view (see Sec. 7.3: Dendrogram)). In “Property description” you can enter a detailed
description for this property.
19
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
When you start the import in the “Map Imported Properties” dialog the import runs and the “Import
into database” dialog will show up. At the top you see the current process, in the list underneath you
can see some information during the import process. When the import is finished click on “Close” which
concludes creation of a new dataset.
It is possible to add new properties to an existing dataset as well, by using the “Add Properties” button
in the “Dataset and Tree Management” dialog. The process is very similar to the normal import process.
However you cannot add new molecules to the dataset, due to technical reasons. Should your source
contain molecules which are not contained in the current dataset, these molecules will simply be ignored
during import.
Furthermore the “Map Imported Properties” Dialog contains a section “Merge Properties” with two fields
named “Source Property” and “Internal Property” where you can select either to merge the datasets by
molecule structure or an arbitrary property.
Scaffold Hunter does not support to add molecules to an existing dataset. Nevertheless, it is possible to
workaround this limitation by reimporting the existing dataset together with the new molecules. This
requires the follwing steps: First, create an unfiltered session for the dataset (see Sec. 5.1: Session
Management and Sec. 5.3: Filtering). Second export the root subset of the session in SDF format (CSV
format may not cover all molecular informations!) together with all attributes as described in Chap. 8:
Export. Third, the exported SDF file can be used as a dataset source. Therefore you have to open the
New Dataset“ dialog and create two or more import jobs (see Sec. 5.2.1.2: Creating Import Jobs). The
”
first job should contain the existing dataset. The other job(s) contain the source(s) of the molecules to
add. Forth, re-generate the trees (see Sec. 5.2.3: Scaffold Tree Generation) and re-calculate the scaffold
properties (see Sec. 5.2.2: Calculation of Properties) if necessarry.
20
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
A plugin system was integrated into Scaffold Hunter to support calculation of properties for the molecules
and scaffold of a dataset. Scaffold Hunter provides a basic set of built-in plugins, and the plugin system
allows users to write calculation plugins suited to their particular needs. See Chap. B: How to Write
Plugins for a detailed description how to write a plugin.
In the Create Calc Job Dialog shown on Fig. 5.12 you can create a batch of calc jobs, that should be
executed one after each other. Each calc job is an instance of a calc plugin, which is configured based
on your needs. Each plugin instance obtains a list of all molecules with their structural data and their
existing properties, and can use this data to calculate new properties for all (or some) molecules. To
create a batch of calc jobs, perform the following steps:
1. Select a calc plugin In the top-left corner of the dialog you will find the list of calc plugins. Select
one by clicking on it and proceed with the next step.
2. Configure the calc plugin As you selected a calc plugin, the configuration panel on the right hand
side of the dialog will show the plugin description and, depending on the plugin, a bunch of options to
adjust the plugin. Choose the options you would like to use, and proceed with the next step.
3. Add the configured plugin to the list of calc jobs After you have finished configuring the plugin,
you can make the configured plugin instance a calc job. To do this, click on “Create New Calc Job” in
the bottom-right corner of the dialog. If you want to give the calc job a name, enter it in the text field
labeled “Calc Job Name” first. Otherwise leave the field blank. Now the calc job of the chosen name
(or a default name) appears in the list of calc jobs in the bottom-left corner of the dialog. Proceed with
step 1 if you want to add another calc job.
21
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
4. (optional) Edit the list of calc jobs The list of calc jobs defines the order in which the jobs are exe-
cuted. To change the order, select a calc job and move it up or down using the arrow buttons. To delete
a calc job, select it and delete it by clicking on the “X” button. If you want to change the configuration
of a calc job, select it. The configuration panel on the right hand side of the dialog then lets you modify
the job.
If you are ready to start the calculation, click “Start Calc”. If you want to return to the previous dialog
without any changes on your dataset, click “Cancel”. After you have started the calculation, a progress
window as shown in Fig. 5.13 will appear and inform you about the calculation status and possible issues
during calculation.
In addition to the specific options provided by each plugin, there are several plugins that provide trans-
form options. Transform options are used to transform the structure of all molecules, before the plugin
starts the calculation. The following transform options are available:
Use largest fragment Sometimes multiple molecule fragments are stored (unconnected) in one single
structure. If, for example, the plugin calculates a fingerprint which takes the molecular structure into
account, you may want just the largest fragment of the compound to be used for the calculation.
Use deglycosylated molecules Perhaps deglycosylation (Read Sec. 5.2.3.2: Deglycosylate Molecules
First for an explanation of deglycosylation) has an effect on the accuracy of a calculated fingerprint. In
this case you may want to deglycosylate before the calculation.
Recalculate 2D-coordinates Some calc plugins (e.g. fingerprints) may depend on the 2D-coordinates
of the molecules. If you are not sure whether all imported molecules have appropriate 2D-coordinates,
you propably want to recalculate them for all molecules before calculating the fingerprint.
22
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
Currently there are just three calc plugins bundeled with Scaffold Hunter; one plugin to calculate ad-
ditional SMILES strings, and two plugins to calculate fingerprints which can be used for clustering of
molecules. Each plugin supplies its own description, so just select a plugin from the list of calc plugins as
shown in Fig. 5.12 and consult the plugin description for an explanation of what the plugin does.
Scaffold Hunter reduces each molecule of a dataset to its scaffold and arrange the resulting scaffolds
hierarchically in a tree structure called Scaffold Tree. Scaffold Trees were first described by [5] and allow
navigation through the molecules in a meaningful manner. In this chapter you will learn how to generate
a scaffold tree.
Figure 5.14.: Dialog (a) allows you to configure tree generation; (b) shows the dialog indicating the
process during tree generation
Fig. 5.14a shows the Scaffold Tree Generation Dialog, where you have to fill in a tree title, can fill in an
explaining description of the tree and have to choose some generation options. See Sec. 5.2.3.1: Using
Custom Rules and Sec. 5.2.3.2: Deglycosylate Molecules First to learn more about the options you have.
By clicking “Cancel”, you will be returned to the Sec. 5.2: Dataset Management dialog. By clicking
“Generate Tree” Scaffold Hunter will generate and store the tree with the given options, while showing
the progress in a window shown in Fig. 5.14b.
The ring pruning rules define an order according to which rings are pruned from the scaffold for the
generation of the scaffold tree. In each step of the tree generation, starting from a child scaffold, all
possible parent scaffolds are generated that could be obtained by the removal of one peripheral ring. The
hierarchical set of rules is used to select one pruning among the candidates, i.e. one ring to be pruned.
Thereby each rule might decrease the number of candidate prunings until only a single solutions remains.
In the default mode the rules are applied exactly as described in the Scaffold Tree publication [5]. To
23
5.2. DATASET MANAGEMENT CHAPTER 5. GETTING STARTED
provide more flexibility with ring pruning, the custom rules option allows to define a customized subset
of rules in a desired order selected from a wider range of implemented rules. For this purpose a generic
framework for rules was defined. Each rule consists of two parts:
1. a numerical descriptor that describes either each of the potentially pruned rings or each of the
different remaining parent scaffolds that could be generated from the child scaffold, and
2. a selection criterion that defines whether the candidate prunings with the minimum or the maximum
descriptor value should be retained (i.e. each pruning candidate except the minimum / maximum
are removed from the set of remaining pruning operations) and passed on to the next rule until
only one pruning remains.
If multiple solutions have the minimum / maximum rule descriptor value, all of these are retained. If all
pruning candidates of a scaffold result in the same descriptor value, all pruning candidates are passed
on to the next rule. Like with in the original ruleset, if the custom rules are not successful to identify
a single pruning, a tie-break rule is evoked that selects the first scaffolds after lexicographically sorting
the parent scaffolds according to their canonical SMILES strings.
To choose a set of rules for tree generation, select a ruleset from the drop-down box in the Scaffold Tree
Generation Dialog shown on Fig. 5.14a. If no appropriate ruleset exists, you first have to click on “Edit
Rulesets” to open the Ruleset Management Dialog and create a custom ruleset.
On the left side of Fig. 5.15 you see a list of currently available rulesets. If no ruleset is available you
can create a new one by clicking “New Ruleset”.
If you select a ruleset from the ruleset list, then you can edit the ruleset with the controls located on
the right side of the dialog. You have to enter a name for the ruleset and you can choose rules from the
existing ruleset list on the left side of the dialog to be used in your custom ruleset. To use a rule, select
it and move it to the used rule list by clicking on the right arrow button. You can even select and move
multiple rules by using the Shift or Ctrl modifier keys during selection.
Each row in the used rule list represents one rule, with the descriptor name in the first column, and the
selection criterion in the second column. To change the selection criterion of a rule, select ”maximum
24
CHAPTER 5. GETTING STARTED 5.2. DATASET MANAGEMENT
The order of the rules in the used rules list determines the order of execution of the rules. You can
change the execution order of a rule by selecting it first and then clicking on the arrow buttons to move
it up or down.
If you finished editing your custom ruleset, click the “Save” button to save the ruleset.
An adaption of the default rules within this framework would look like (for detailed explanation of the
rule name see below):
RRPsize11p "minimum value"
SCPnoLinkerBonds "minimum value"
SCPabsDelta "maximum value"
SCPdelta "maximum value"
RRPsize6 "maximum value"
RRPsize5 "maximum value"
RRPsize3 "maximum value"
RRPnoHetAt "minimum value"
RRPnoNAt "minimum value"
RRPnoOAt "minimum value"
RRPnoSAt "minimum value"
RRPringSize "minimum value"
SCPnoAroRings "minimum value"
RRPhetAtLinked "maximum value"
Detailed Description of the Properties The names on the left are the rules that can be used in a
ruleset.
Rule Description
SCPnoLinkerBonds number of acyclic linker bonds
SCPdelta delta value indicating nonlinear ring fusions, spiro systems
bridged systems as defined in [5]
SCPabsDelta absolute delta value indicating nonlinear ring fusions, spiro sys-
tems bridged systems as defined in [5]
SCPnoAroRings number of aromatic rings
SCPnoHetAt number of heteroatoms
SCPnoNAt number of nitrogen atoms
SCPnoOAt number of oxygen atoms
SCPnoSAt number of sulfur atoms
RAPdelta delta value
RAPabsDelta absolute delta value
RAPnoRings number of rings
RAPnoAroRings number of aromatic rings
RAPnoHetAt number of heteroatoms
RAPnoNAt number of nitrogen atoms
25
5.3. FILTERING CHAPTER 5. GETTING STARTED
This check box activates the deglycosylation of molecules that was described in [4]. The procedure iter-
atively removes terminal pentose and hexose sugars based on a substructure matching before generation
of the scaffold tree. It is especially useful when processing natural product structures, i.e. from the Dic-
tionary of Natural Products (DNP), that contain molecules with multiple glycosylation patterns.
5.3. Filtering
The Filter Dialog shown in Fig. 5.16 is used to filter the molecules of a dataset, so that you can work with
molecules relevant to your problem. On the left hand side of the dialog a list shows the previously used
filtersets. A filterset is a set of filters. One filter defines a constraint for one property of the molecules
or scaffolds of the dataset. A combination of filters can be used to filter the dataset by more than one
26
CHAPTER 5. GETTING STARTED 5.4. INITIAL VIEWS
property. In the lower left corner of the window there are buttons to add a new filterset, or to delete an
existing one. After selecting one of these filtersets, or after creating a new one, the right hand side of the
window shows the details of the selected filterset. Here the title can be changed and filters can be added
or modified. To add a new filter, simply select a property from the “Choose property” box. For this
property, the user can then select a constraint in the appearing drop-down box. For numerical properties,
this box holds comparison operators like “greater” (>) or “equal” (=). For text properties, the user can
select comparison operators like “contains” or “ends with”. Both property types can also be filtered by
whether or not they are defined. With the “minus” button at the end of each row, the corresponding
filter can be removed. With the “conjunctive / disjunctive” box you can define the operator used to
combine the filters. With the conjunctive method, all filters have to match to allow a molecule to be
added to the resulting dataset. With the disjunctive method, only one filter needs to match.
The “Save” button simply saves the changes made to the selected filterset. To use a filterset, select it
and click the “Ok” button. If the filterset was not saved before clicking “Ok” and you continue without
saving, the current filterset is used for filtering, but the changes are discarded afterwards.
27
5.4. INITIAL VIEWS CHAPTER 5. GETTING STARTED
28
6. Main Window
Scaffold Hunter includes several views, that allow the user to visualize and navigate through the imported
chemical data in various ways. Currently the following types of views are supported:
Scaffold tree view The scaffold tree view shows chemical structures arranged in a scaffold tree. This
is the only type of visualization that was supported by Scaffold Hunter version 1.x, and is still a pivotal
part of the application.
Dendrogram view The dendrogram view shows the result of a hierarchical clustering of chemical struc-
tures, supporting various linkage methods and distance measures.
Plot view The plot view shows molecules in a two- or three-dimensional scatter plot based on the
molecules’ properties.
Table view The table view shows molecules in tabular form, including their structure, description and
properties.
TreeMap view The treemap view shows chemical structures arranged in a tree map. Different proper-
ties can be visualized by mapping them to the size and color of structures within the tree map.
See Sec. 6.3: Managing and Arranging Views for an explanation of how to open and manage views
within Scaffold Hunter. The individual types of views are described in greater detail in Chap. 7: Views.
Another concept crucial to Scaffold Hunter is that of subsets, as explained in Sec. 6.2: Subset Manage-
ment.
Menu bar This is the main menu bar at the top of the Scaffold Hunter window, through which most
of Scaffold Hunter’s functionality is accessible. Most of the menu items are available regardless of the
current view. One notable exception is the view-specific sub-menu, which changes to reflect the actions
available in each given view type.
Tool bar The tool bar is located below the menu bar. Most of the tool bar items are view-specific, and
will change when switching to a different type of view.
29
6.2. SUBSET MANAGEMENT CHAPTER 6. MAIN WINDOW
View tabs This tab bar is the central part of the Scaffold Hunter window, and contains tabs for each
view. Managing views is described in Sec. 6.3: Managing and Arranging Views.
Side bar The side bar is shown on the left hand side of the main window, and can optionally be hidden
using the “Show Side Bar” button in the “Window” menu or in the tool bar. It consists of several panels
that can be folded or expanded independently by clicking the corresponding caption bar. Each panel
displays additional information about the data that is visible in the currently active view.
Subset bar The subset bar on the right hand side of the window shows a tree of all subsets in the
current session. Like the side bar, this part of the user interface can be hidden, using the “Show Subset
Bar” button in the “Window” menu or in the tool bar. In addition to the subset tree, the subset bar
also contains buttons for creating new subsets from the selection. Subset management is explained in
detail in Sec. 6.2: Subset Management.
A central part of the Scaffold Hunter user interface is dedicated to the concept of subsets. Each subset
identifies a set of molecules, all of which are part of the main dataset (hence a subset thereof). Subsets
are arranged in a tree structure, where each subset can have any number of child subsets. Every subset
is guaranteed to contain only molecules that are also in its parent subset.
At the top of the subset tree is the root subset, which is automatically created when starting a new
Scaffold Hunter session. The root subset may be identical to the dataset that has been imported, but it
can also be smaller in case a filter was used when creating the session.
The subset tree is the main part of the subset bar at the right hand side of the Scaffold Hunter main
window. In a newly created session, the tree contains only the root subset. All subsets, including the root
subset, can be assigned a custom name and comment. The current subset, i.e. the subset that is shown
in the currently active view, is always shown in bold. The subset tree itself only contains the name of
each subset. Additional information about subsets can be obtained by hovering the mouse cursor above
the subset name, which will pop up a tooltip window.
Most functionality regarding subsets is available using the context menu of the subset tree. For conve-
nience the main menu also contains the “Subset” menu, which always refers to the current subset.
The most straightforward way to create a new subset is by first selecting all molecules to be included
in the subset, and then making a subset from the selection. Since there is only one, global selection
in Scaffold Hunter, which may include molecules that are not part of the current view, there are two
slightly different ways to decide which molecules are to be included in the new subset.
Subset from the total selection In order to create a subset including all selected molecules, regardless
of the current view, choose “Selection → Make Subset” in the main menu, or use the first “Make Subset”
button at the bottom of the subset bar. Since the new subset does not necessarily have any relation to
any of the existing subsets, it will be added to the subset tree as a child of the root subset.
30
CHAPTER 6. MAIN WINDOW 6.2. SUBSET MANAGEMENT
Subset from the selection in the current view To create a subset including only the selected molecules
that are also in the current view, choose “Selection → Make Subset from Current View”, or use the second
“Make Subset” button in the subset bar. the new subset is guaranteed to include no molecules that are
not part of the current view’s subset, so that subset will be used as its parent.
The subset of the current view can be deleted by selecting “Subset → Delete” from the main menu. In
the subset tree, subsets are deleted by selecting them and choosing “Delete” from the context menu.
Deleting a subset always closes any open views currently associated with that subset.
If the subset being deleted has any children, those will not automatically be deleted as well. Instead,
they will be re-attached to the parent of the subset being deleted. It is not possible to delete the root
subset.
The basic set operations of union, intersection and difference can be used to create new subsets from
existing subsets. Since these operations require a selection of two or more subsets, they are not available
in the “Subset” menu, but only in the context menu of the subset tree.
As in many other user interfaces, the selection of multiple subsets in the tree is possible using Ctrl
+ Left click to add or remove a single subset to/from the selection. To select multiple adjacent
subsets, Shift + Left click can be used. Once all the desired subsets have been selected, open the
context menu by right-clicking on one of them, and choose “Make Union”, “Make Intersection” or “Make
Difference”. You will then be asked to enter a name for the new subset.
Since the subset tree requires each subset to be an actual subset of its parent, the set operations differ
slightly in where the newly created subset will be added to the tree. For the difference operation, the
order of subsets is also relevant.
Union Generally speaking, none of the original subsets is a superset of the new subset. As a result the
new subset cannot be added below any of them in the tree. In order to ensure that the new subset’s
position is both deterministic and as meaningful as possible, the parent will always be the lowest common
ancestor of all the selected subsets.
Intersection All selected subsets would be a valid parent for the new subset. For deterministic results,
the subset that was clicked on when opening the context menu will be the new subset’s parent.
Difference Mathematically, the difference operation is only defined for two sets. The difference opera-
tion used here deviates from this definition by extending it to mean “one subset, minus the union of an
arbitrary number of other subsets”. The base subset, from which the others will be subtracted, is the
one that was clicked on when opening the context menu. This is also the subset that will then become
the new subset’s parent.
There are multiple ways to automatically create smaller subsets that are more convenient than manual
selection when working with large datasets.
31
6.3. MANAGING AND ARRANGING VIEWS CHAPTER 6. MAIN WINDOW
Filtering The same filtering that is available while creating a session can also be used on individual
subsets. Select “Subset → Filter” to start filtering the subset of the current view, or choose “Filter” in
the context menu of any subset in the subset tree. The Filter Dialog itself works just like the one that is
used during session creation, as described in Sec. 5.3: Filtering.
Once the filter settings have been applied, you will be asked to enter a name for the new subset, and the
subset will be added to the subset tree as a child of the original subset.
Creating Random Subsets You may create a random subset of a dataset by selecting “Random Subset”,
either in the “Subset” menu or the context menu. A dialog allows you to select the number of molecules
that should be contained in the created subset.
Splitting by Scaffold Subtrees In order to create subsets based on the scaffold tree structure, you
can select “Split by Scaffold Subtrees”. A subset for each scaffold with a specified number of rings is
created. A subset includes all molecules which belong to the scaffold or one of its descendants. Since
multiple subsets are created by this function, you can specify a prefix for the newly created subsets.
Scaffold Hunter automatically numbers the subsets consecutively and appends the number to the subset
name.
There are several ways of creating a new view. The “Window → Add View” submenu can be used to
create a new view of a given type, showing the root subset. A more flexible way of opening new views
is using either the “Subset” menu or the subset tree’s context menu:
Show in Current View (only in subset tree context menu)
Replaces the subset shown in the currently active view, while otherwise retaining the view’s settings.
Show in New View
Creates a new tab in the current window, showing the selected subset in the chosen type of view.
Show in New Window
Creates a new window and adds a new tab to it, showing the selected subset in the chosen type of
view.
Note that the “Subset” menu always refers to the current view’s subset, while the context menu refers
to the subset being clicked on.
Usually the fastest way to close a view is by using the ’X’ button in the view’s tab, but there are also
menu items (“Window → Close View”) and a keyboard shortcut (“Ctrl+W”) available.
It is possible to simultaneously show two views in the same window, either side by side or one above
the other. To split the window, use “Window → Split Horizontally” and “Window → Split Vertically”.
This will create a new, initially empty tab bar, that can then be filled by moving views to it (see Sec.
32
CHAPTER 6. MAIN WINDOW 6.4. INTERACTION BETWEEN VIEWS
6.3.4: Moving Views). The same menu items can also be used to change the orientation of an existing
split.
To get back to a window with no split, use “Window → Single Tab Bar”. This will join all views from
both sides of the split in a single tab bar.
Sometimes it can be convenient to open views in separate windows, for example to place them on different
monitors, or to avoid a tab bar getting too crowded. Scaffold Hunter supports this by allowing multiple
main windows within the same instance of the application. Windows can be opened using “Window →
Open New Window”, and closed using “Window → Close Window”. Note that while each window can
have its own views and layout, all windows still work on the same dataset and subset tree. Thus, opening
multiple main windows is not the same as opening multiple instances of Scaffold Hunter.
Within a single tab bar, views can easily be moved by simply dragging their tab to a new position.
It’s currently not possible to move views to a different tab bar (in case of split views) or to a different
window via drag & drop. To move windows from one window or tab bar to another, use “Window →
Move View to Window” and “Window → Move View to Tab Bar”.
Views are initially named after their type. For example, the tab of a newly created scaffold tree view
will be called “Scaffold Tree”. Views can be renamed to give them a more descriptive label. To do so,
select “Window → Rename View” in the main menu, or “Rename” in the tab’s context menu.
6.4.1. Selection
In order to create subsets containing molecules of interest it is possible to select several molecules in the
views, which are then highlighted in red in all views.
Single molecules or the molecules belonging to a scaffold can be selected or deselected by left clicking a
molecule or a scaffold or in some views by dragging a selection box around them or by using the scaffold’s
context menu. Another possibility to change the selection is the subset menu. If the menu has been
opened via the main menu, it offers options to add the current subset of the active view to the selection,
to remove it, or to replace the selection. The same applies to the subset which has been used to open
the context menu in the subset bar.
The user can also select all molecules of the dataset, clear the selection, invert the current selection or
confine the current selection to the molecules that are visible in the active view by using the selection
menu in the main menu.
The total number of molecules selected and the number of molecules selected that are visible in the active
view are displayed at the bottom of the subset bar. Both have an associated “Make Subset” button that
can be used to create a new subset from the respective set.
33
6.5. TOOLTIP WINDOW AND COMMENTS CHAPTER 6. MAIN WINDOW
Selection colors Table 6.1 shows the possible colors of molecules and scaffold, depending on their se-
lection state:
6.4.2. Flags
Besides the selection, there is another kind of marking that is shared by all views: flags. Flags are
intended as a fast way of marking molecules and scaffolds, for example as already seen or investigate
later, and are available in a public and in a private variant. Public flags are visible to everyone working
on the dataset, while private flags are only visible to the user who created them.
The user can mark molecules and scaffolds with flags by using the context menu or the main menu entry
“Selection”. If the context menu is used, the private and public flags of the molecule or scaffold under
the cursor can be toggled. The selection menu allows the user to add or remove a public or private flag
for all molecules of the current selection.
An interactive tooltip window (see Fig. 6.1) is available for the scaffold tree view and the dendrogram
view when you move your mouse over a molecule or scaffold. It combines the following components in
one window.
• a large image of the structure
• configurable information on the properties
34
CHAPTER 6. MAIN WINDOW 6.5. TOOLTIP WINDOW AND COMMENTS
You can find some general options at Session → Preferences → General Configuration in the
menu bar.
• enable/disable tooltip
• the size of the image shown in the tooltip
• whether it shows undefined properties or not
• the delay after which the tooltip is shown
You can find a dialog to configure the properties at Session → Tooltip Configuration in the menu
bar. As you can see in Fig. 6.2 you have a check box in front of every property to enable or disable the
property in the tooltip. As the tooltip can be shown for molecules and scaffolds there are two types of
properties. If the tooltip is displayed over a molecule only the molecule properties will be shown.
The properties for scaffolds are somewhat more complex. You can derive accumulated properties for
scaffolds from the molecule properties that belong to the displayed scaffold. Therefore you can choose an
accumulation function for each numerical molecule property. For example you can display the average,
minimum or maximum of the molecule property when using the tooltip with scaffolds. If you enable
35
6.6. SETTINGS CHAPTER 6. MAIN WINDOW
the checkbox Cumulative for scaffold and children this accumulation function is applied to the
molecules of the scaffold and all molecules that belong to any child scaffold.
6.5.3. Comments
You can add, modify and delete comments for a molecule or scaffold directly in the tooltip. You can find
the controls in the bottom left corner of the tooltip (see Fig. 6.1).
Private, public, local and global comments Comments can be private or public and local or global.
Therefore you have 4 different types of comments (private & local, private & global, public & local,
public & global). If a comment is private it is only accessible by the user who created it, while a public
comment can be seen by any other user working on the same database. If a comment is local it is limited
to the current tree. If you choose global instead it will be visible for all datasets and all trees on the
current database for molecules or scaffolds with the same structure.
Add, modify or delete a comment To add a new comment select the type in the drop-down box and
click the plus button. Note that only one comment of each type can exists for each molecule or scaffold.
Therefore not all or no types may be available in the drop-down box. After adding a new comment a
text field with a save button, delete button and some information about the modification will appear.
You can now modify the (empty or previously existent) comments by editing the text in the text field
and pressing the save button afterwards. The save button will be only accessible if you already modified
the text. Otherwise it will be greyed out. To delete a comment simply press the delete button. The text
field with the according controls will disappear.
6.6. Settings
The Scaffold Hunter settings are divided into two types. There are global preferences and settings
for a specific view. All settings are stored in the database. Therefore your settings travel with your
profile.
Preferences The global preferences are accessible via Session → Preferences in the menu bar. The
Dialog Fig. 6.3 has a General Configuration tab with settings that will apply to all views and the
36
CHAPTER 6. MAIN WINDOW 6.6. SETTINGS
main window. The options are self explaining. If you select an option by clicking on it you get a more
detailed description in the label below.
There are also tabs for types of views. For example you can find general options for the scaffold tree view.
These settings will apply to all views of one type that are open (e.g. all scaffold tree views). Currently
only the scaffold tree view uses this global settings, but in future there may be also other views available
here.
Configure View The preferences for one specific view are accessible over Session → Configure View.
The dialog Fig. 6.4 has the same layout as the global preferences. It shows one tab for each open view.
If you have two open scaffold tree views you will find two tabs here. The name of the tabs matches
the name of the view tabs in the main window. Renaming the views can thereby significantly improve
the assignment between preferences tabs and the corresponding view (see Sec. 6.3.5: Renaming Views).
Some options may also be available directly over the menu bar or tool bar (for further details see Sec.
7: Views).
37
7. Views
This chapter will give you a deeper understanding of each available view. If you want to know how
to handle, manage and arrange views in general terms you may want to read Sec. 6.3: Managing and
Arranging Views instead.
The scaffold tree view visualizes a scaffold tree, which was generated as described in Sec. 5.2.3: Scaffold
Tree Generation.
7.1.1. Navigation
A new scaffold tree view will show an overview of the whole graph. You may zoom in to get more detailed
information on a part of the graph. Depending on the scale, scaffolds will be represented simplified by a
rectangle or by their structural formula. There are several ways to zoom in/out: You may use the mouse
wheel to magnify/demagnify the view on the area of the mouse pointer or by using the buttons in the
tool bar or “Tree” menu. You may also select the area of interest by a rectangle on the GraphMap, see
Sec. 7.1.1.1: GraphMap. The other navigation mechanism is panning: By keeping the left mouse button
pressed on an empty area in the graph pane and moving the mouse the viewpoint is panned. You may
also use the scrollbars for panning.
Usually not all scaffolds are shown when a new scaffold tree view is opened. Subtrees beginning at a
scaffold can be shown or hidden respectively by clicking on the +/– symbol at the lower left corner of the
scaffold. In addition, the context menu (Sec. 7.1.2: Context Menu) contains options to open complete
subtrees or to open all children on the next hierarchy level. Inverse operations exist to hide scaffolds you
are not interested in. The command to expand the whole tree or to reset it to the default hierarchy level
can be invoked via the tool bar or “Tree” menu (Sec. 7.1.3: Tree Menu & Tool Bar).
A scaffold will be shown in different colors based on the global selection as described in Sec. 6.4.1:
Selection. A left-click on an unselected or partially selected scaffold will add all molecules associated
with the scaffold to the selection, a left-click on a fully selected scaffold will deselect all associated
molecules.
You may also navigate using keyboard shortcuts, listed in App. A: Keyboard Shortcuts. There is a cursor
set on one scaffold marked by a blue rectangle. The cursor can be moved using the arrow keys. The
camera automatically retains focus on the cursor.
7.1.1.1. GraphMap
The GraphMap is found in the side bar on the left hand side and provides an overview on the whole
graph. The current viewport of the graph pane is marked by a transparent red rectangle. Dragging this
rectangle with the mouse will change the viewport of the graph pane accordingly. Click the left mouse
button at an area that is not currently visible to center the viewport on the selected point. You can
define a new viewport by marking a rectangle on the graph map: Click the left mouse button on an
empty area and keep it pressed while moving the mouse. When releasing the mouse button the view
39
7.1. SCAFFOLD TREE CHAPTER 7. VIEWS
will be panned and zoomed to completely contain the selected area. Using your mouse wheel on the
GraphMap will magnify or demagnify the graph pane at the specified point.
The Magnifying Glass in the side bar below the GraphMap is opened by a mouse click on its title bar.
It shows the area below the mouse pointer magnified. You may move the mouse pointer over the graph
pane and navigate in the graph as usual while the Magnifying Glass is active.
Click the right mouse button on a scaffold to access the context menu. The menu mainly contains options
that are directly related to the scaffold: You can open/close the children of the scaffold, open the whole
subtree rooted at the scaffold or open all children on the next hierarchy level by the option “Expand
next level”. In addition you can choose to expand the whole tree, by clicking on “Expand all Nodes”.
You can hide the hierarchy levels under the scaffold by choosing “Reduce to this level”.
The commands “Select Subtree” and “Deselect Subtree” add or remove all molecules from the selection,
which are associated with scaffolds located in the subtree that is rooted at the current scaffold. The
command “Subset from Subtree” will create a new subset which contains these molecules.
There are two additional entries to set or remove public and private flags, see Sec. 6.4.2: Flags.
Figure 7.2.: Meaning of the icons (from left to right): Zoom In, Zoom Out, Fit Graph, Fit Selection,
Expand all Nodes, Number of Rings to Default Level, Increase/Decrease Radius, Lock Radii
while Zooming, Reset Radii, Scale cusor node up/down, Normalize cursor node, Scale Se-
lected Scaffolds up/down, Normalize Selected Scaffolds, Normalize all Scaffolds, Configure
Property Mappings, Show Scaffold Molecules, Export Image
The “Tree” menu provides access to many different functions related to the scaffold tree view. The tool
bar shown in Fig. 7.2 allows quick access to most of these functions. When you hover the cursor over
an item of the menu or tool bar a tooltip with a description will appear. It follows a list of the menu
items, each with a short description.
40
CHAPTER 7. VIEWS 7.1. SCAFFOLD TREE
Layout Can be used to choose one of the three available layout algorithms. The graph in the current
tab will be laid out accordingly. See Sec. 7.1.4: Layout for a brief description of the layouts.
Show Scaffold Molecules Toggles the molecule view for each node in the scaffold tree, which displays
the molecules associated with a scaffold.
Configure Property Mappings Allows to map different imported properties to several visual features of
the nodes in the scaffold tree. See Sec. 7.1.6: Property Mappings for details.
Zoom In/Out Changes the zoom level of the graph pane.
Fit Graph Changes the zoom level and pans the graph pane to show the whole graph.
Fit Selection Changes the zoom level to show all selected and partially selected scaffolds.
Expand all Nodes Expands the whole Graph.
Number of Rings to Default Level Resets the graph, such that only the hierarchy levels which are
shown by default are displayed.
Increase/Decrease Radius Allows adjustment of the distance between two adjacent rings in the radial
layout.
Lock Radii While Zooming Disables/enables the automatic adjustment of the distance between two
adjacent rings in the radial layout.
Reset Radii Resets all ring radii to their default values.
Node Scale Resizes the scaffolds of the graph. There are buttons for scaling up/down all currently
selected scaffolds or just the scaffold the cursor is on. In addition “Normalize” buttons are provided
to reset scaffolds to their original size.
Grey out virtual scaffolds If selected, virtual scaffolds are rendered in grey, if not, virtual scaffolds are
rendered just as other scaffolds.
Export Image Allows exporting the current viewport or the whole graph as an image.
7.1.4. Layout
Figure 7.3.: Different layouts of the same graph: (a) Radial Layout, (b) Linear Layout, (c) Balloon
Layout
41
7.1. SCAFFOLD TREE CHAPTER 7. VIEWS
Scaffold Hunter offers three different layouts, which can be selected by using the “Layout” option in the
“Tree” menu Sec. 7.1.3: Tree Menu & Tool Bar. The different layouts are shown in Fig. 7.3. The Radial
Layout (Fig. 7.3a) is the default layout method. With this layout the scaffolds are located according to
their depth on circles with a common center. The Linear Layout (Fig. 7.3b) orders the scaffolds from
left to right. Both layouts emphasize the depth of scaffolds. With the Balloon Layout (Fig. 7.3c) each
scaffold is the center of a circle on which its children are distributed. This layout separates subtrees more
clearly from each other.
For the Radial Layout the distance between the circles may be customized. By default the distance
is automatically adjusted depending on the zoom. The “Lock Radii While Zooming” button disables
this mechanism and allows the user to set the distance using the buttons “Increase Radius”/“Decrease
Radius”.
7.1.5. Sorting
Initially the ordering of scaffolds in the tree is chosen in a deterministic way but without any meaning.
Alternatively scaffolds can be sorted according to some property value using the “Sort” side bar panel
show in Fig. 7.4.
The same panel can be used to define the ordering of the molecules which are shown when the option
“Show Scaffold Molecules” is toggled. To do so select the property by which these molecules should be
sorted in the uppermost box.
The property used for sorting can be chosen using the box below “Sort Scaffolds by”. An accumulation
method to determine how scaffold properties will be derived from the properties of associated molecules
can be selected in the box below. The checkboxes underneath can be used to add a colored background
to the graph, where each segment which has the same value of the selected property is colored in the
same shade of the selected color. A label showing this value can be displayed in each segment.
After clicking on “Sort” the scaffolds on the first layer will be reordered according to their property
values. Sorting by property values is applied only to the scaffolds on the first layer. You can however
create a subset out of a subtree and sort the scaffold tree displaying the subset.
Properties associated with the scaffolds can be mapped to different visual properties of the Scaffold Tree.
Scaffold properties are either directly associated with a scaffold, such as Number of Oxygen Atoms or
can be derived from the properties of the structures which are associated with a scaffold.
The following visual properties are available for mapping:
42
CHAPTER 7. VIEWS 7.1. SCAFFOLD TREE
Node Background Color This changes the background color of the scaffold nodes in the scaffold tree.
Either two colors can be selected and the value range of the property is mapped to the gradient between
those two colors or value intervals can be specified and a color can be assigned to each interval.
Edge Thickness This maps the absolute difference between the values associated with two adjacent
scaffolds onto their connecting edge. The maximum and minimum displayed thickness are predefined
and cannot be adjusted. A gradient mapping maps the greatest difference to the maximum thickness
and the smallest difference to the minimum thickness, intermediate differences are linearly mapped to
intermediate thickness values. Alternatively intervals of differences can be defined, which will then be
mapped to different thickness values.
Edge Color Similarly to the way values can be mapped to the background color of a node they can also
be mapped to the adjacent edges. On the edge a gradient between the colors will be displayed, which is
determined by the adjacent scaffolds.
Label The value associated with a scaffold can be displayed by a label below the node in the scaffold
tree.
Info Bar This visualizes distributions of a property for the molecules associated with a scaffold. Intervals
can be specified and a color is associated with each interval. The distribution of these intervals can then
be visualized as shown in Fig. 7.5. Textual properties with ten different values or less can also be
mapped to colors to display such a distribution.
The dialog to configure these mappings is shown in Fig. 7.6. On the left the visual property can be
selected, on the right settings for the selected visual property can be configured. These settings may vary
depending on the visual property. In general the scaffold property to be mapped can be selected using
the box at the top, in case of derived properties a second box will be displayed to select the accumulation
function. If the box “Cumulative for scaffolds and children” below is selected the accumulation function
will not only be applied to values of the structures associated with the scaffold, but also include the values
of all structures, which belong to scaffolds located in the subtree below the scaffold. The minimum and
maximum values of the selected property are associated with the two selected colors while intermediate
values are represented by color gradation. With the default setting the minimum and maximum is
determined based on all values of the selected property occurring in the whole dataset. This allows to
compare different subsets using multiple views, since a color is associated with the same value across all
views. The checkbox “Fit interval borders to current subset” allows to adjust this mapping, such that
the minimum and maximum values associated with the two colors are determined based on the molecules
of the subset only. When this option is selected the whole range of colors will be used to represent the
property values of molecules in the subset. Please note that in views based on different subsets the same
color then may represent different property values. The option has an analogous effect in case of “Edge
Thickness” mapping.
43
7.1. SCAFFOLD TREE CHAPTER 7. VIEWS
Clicking the “Export Image” button in the tool bar or the “Tree” menu will open the export dialog
shown in Fig. 7.7, which allows you to save the current graph as an image. You can either export the
current view on the graph, which will save only the part of the graph which is currently visible in the
graph pane, or the whole graph. There are two file formats available (SVG and PNG) as well as options
to control the image size. The aspect ratio of the exported image is fixed.
Since SVG is a file format for vector graphics the exported image can be scaled without quality loss.
Consequently the specified image size will not have much effect on the quality or file size for SVGs but is
very important for raster formats such as PNG. Even if the current view shows rectangles for scaffolds
the exported image will show their structural formulas, if the respective option is enabled.
44
CHAPTER 7. VIEWS 7.1. SCAFFOLD TREE
Figure 7.6.: The property mapping dialog showing an interval mapping to the Node Background Color.
45
7.2. TABLE CHAPTER 7. VIEWS
7.2. Table
An overview of the aggregated molecule information can be seen in the table view. All the molecule
properties, as well as their titles, SVG-images and the flags set by users, are shown in form of a table.
When you open a table view the data is initially sorted by the molecule title, but the sort order can
be changed anytime by clicking on a column header. The table view supports three sort criteria, which
are applied by consecutive clicks on column headers. The column selected last gets the highest sort
priority.
Apart from sorting the rows of the table you can reorder the columns at any time by dragging them to
their new position.
A table cell is often too small to show the information that it holds. In this case three little dots appear
at the end of the cell, to indicate that the content is not completely shown. To address this problem
there are two ways to resize table cells: To change the width of a column you can simply click on the
right boundary line of the column header and move it to the left or right, to make the whole column
smaller or wider. The height of the rows can be changed via three buttons in the tool bar:
• the increase button allows you to increase the number of text lines that are shown in a table
cell. Each time it is clicked one more line is added.
46
CHAPTER 7. VIEWS 7.2. TABLE
• the decrease button decreases the number of text lines per cell.
• the normalize button sets the number of text lines back to the default value of one.
The content of a table cell is also shown in the Detail View panel in the side bar, as soon as you move
the mouse pointer to a cell.
Figure 7.9.: Example of a table that shows three text lines in a table cell
The Minimap shows an abstract view of the complete table, where the section that is currently shown
in the main view is highlighted in light red. Apart from just informative reasons the minimap can also
be used to scroll the table. Just click on the highlighted box and drag it to another position.
When the table holds a lot of columns – which is a very common case – then it is eligible that some
columns are not scrolled horizontally but stay visible the whole time. The molecule title column is an
example; it may be helpful if this column is always visible. This can be accomplished by putting table
columns into the sticky mode: A sticky column does not scroll horizontally.
This feature can be accessed by two buttons in the tool bar:
• A click on this button turns the leftmost floating column (a column which is not yet sticky)
into the sticky mode. It will not scroll horizontally anymore.
• This button turns the rightmost sticky column back into floating mode, which means that
the column will scroll as usual.
To indicate that a column is sticky it is shown a bit darker than a normal, floating column.
Flags (public and private) can be set by using the context menu that appears when you right-click
anywhere in the table. The menu item “Toggle public flag (for molecule title)” lets you toggle the public
flag for the specified molecule. The item “Toggle private flag (for molecule title)” does the same for
private flags. The flags are shown in the table in separate columns.
47
7.3. DENDROGRAM CHAPTER 7. VIEWS
Figure 7.10.: A table with two sticky columns. Note that the scrollbar at the bottom does not include
the sticky columns.
The selection of molecules (a row in the table) works slightly different than with common tables. The
only change we made is that the current selection will not be cleared when you select a new molecule
by clicking on the corresponding row in the table. So you can select/unselect multiple molecules by
consecutive clicking on them.
Additionally you can select many molecules in a row when you hold down the Shift key while clicking
on the first and the last molecule you want to select (which is the common behavior that you may know
from other programs).
The comment editor for molecules can be opened by using the context menu.
The table can be used to access all molecule information which is stored in the database. To conserve
memory these information are loaded just when they should be shown. This may cause a little delay in
displaying the content of the table, which can be noticed at fast scrolling over a long distance. During
the loading process the table cells are filled with three dots (“. . . ”) as placeholders. These placeholders
are replaced by the real content as soon as it is available.
7.3. Dendrogram
The dendrogram view shows the result of a hierarchical clustering. A hierarchical clustering works on
a (sub)set of molecules and a couple of properties. In the beginning, the clustering algorithm puts
each molecule into one cluster. Then it searches for two molecules (clusters) with the smallest distance
between each other based on the chosen properties. When found, they are merged into a new cluster.
This will be repeated until only one cluster remains. The result is a tree which represents the merge
history like shown in Fig. 7.11.
The dendrogram view shows this result, allows the user to navigate in the hierarchical tree structure and
to highlight different clusters based on their relation.
The dendrogram view is separated in four parts, shown in Fig. 7.12. Each part is marked by a surrounding
color as explained below:
48
CHAPTER 7. VIEWS 7.3. DENDROGRAM
49
7.3. DENDROGRAM CHAPTER 7. VIEWS
• The “Zoom” element shows the molecule under the mouse in detail
• The “Info” element shows statistical data about the currently chosen clustering
• The “Method Used” element shows which configuration was used to generate the actual dendrogram
• The “Table Detail” element is equal to its correspondent in the real table view
In this part of the view you can see an actual dendrogram like shown in Fig. 7.11. The molecules on
the bottom are represented by black boxes because the zoom level is too small to show the real images.
The cluster selection bar shown in red in the middle is draggable and divides the subtrees below it into
different clusters. Each cluster separated this way is painted in a different own color. The Info panel in
the side bar informs about the number of clusters created this way and their size. With the mouse wheel
or the keys “+” and “-“ the view can be zoomed in and out. At a closer zoom level the black boxes at
the bottom are replaced by the real molecule images like shown in Fig. 7.12 (marked yellow).
The zoom is separated into vertical and horizontal zoom. This is necessary, because the tree usually
has a very small height and a huge width. So when zooming with the mouse wheel, the view scales the
images and than adapts the tree above to fit in the window. Zooming while holding the Ctrl key will
scale only the tree, not the image. In addition to click on an image to add/remove it from the selection,
clicking on an inner node of the tree will result in selecting all leaves below it, if at least one leaf is
unselected and deselect them otherwise.
The two buttons on the left shown in Fig. 7.14 are the standard items to show/hide the side- and subset
bar. The third button from the left shows/hides a special table below the dendrogram. The next four
buttons change the vertical and horizontal zoom of the dendrogram. The second button from the right
fits the dendrogram into the available space. The button at the right zooms the view so that the whole
selection is visible.
50
CHAPTER 7. VIEWS 7.3. DENDROGRAM
7.3.5. Table
The shown table below the dendrogram is the standard table view with one additional function, see Sec.
7.2: Table. The first column shows in which subclusters the selection bar in the dendrogram has divided
the molecules. An example is shown in Fig. 7.12 (marked green).
After clicking the start clustering button in the side bar, a configuration panel as in Fig. 7.15 will show
up. Here, the user can choose from two clustering algorithms and select their parameters:
Warning: The exact clustering can work on 216 (65.536) molecules at a max,
because of technical restrictions in Java and memory limitation. Some of the
parameters (Euclide in combination with Ward, Median or Centroid ) may
work on more molecules, but this is experimental and should only be used
with caution. Please note that these restrictions do not apply to the heuristic
algorithm.
Heuristic clustering: The heuristic SAHN clustering algorithm should be used if the runtime or the
memory consumption by the exact clustering is to high. Empiric tests showed that this algorithm can
be considered useful for subsets with a size larger that 10000 molecules. Two parameters are selectable:
quality and dimensionality. Both have a direct influence on quality and speed. The dimensionality
should be set according to the dimensionality of the data. Low for a dimensionality between 1 and 7,
51
7.4. PLOT CHAPTER 7. VIEWS
Mid 7-13 and High > 13. If unsure start with a low dimensionality and use a higher value if the quality
is insufficient regardless of which quality is used.
The linkage method defines the way in which the distance between clusters is calculated and controls
which pair should be merged.
• Complete Linkage: Merges the two clusters which have the minimum distance between the
farthest pair of both cluster members.
• Group Average Linkage: Merges the two clusters with the minimum distance of the unweighted
cluster average (UPGMA).
• McQuittys Linkage: Merges the two clusters with the minimum distance of the weighted cluster
average (WPGMA).
• Single Linkage: Merges the two clusters with the minimum distance between the nearest pair of
both cluster members.
• Ward Linkage: Merges the two clusters that lead to minimal variance increase.
• Centroid Linkage: Merges the two clusters with the minimum distance of the unweighted centers
(UPGMC).
• Median Linkage: Merges the two clusters with the minimum distance of the weighted cluster
centers (WPGMC).
The accepted types of properties depend on the distance computation.
• Euclide: Any number of numerical properties.
• Tanimoto: Exactly one property which has to be a binary fingerprint.
• Tanimoto Bit: Same as Tanimoto but works on bit strings not on character strings.
• Jaccard: Works on one fingerprint property with feature counts on each position.
7.4. Plot
The plot view allows you to view numerical molecule data in form of two- or three-dimensional scatter-
plots. Any numerical data that is stored in the database can be displayed by linking them either to one
of the three spatial axes or to the color and size attributes of a dot in the diagram.
The first thing you want to do after opening a plot view is to select the data that should be displayed.
This can be done with the Property Mapping panel in the side bar. Here you find a combo box for each
of the three spatial axes, for the dot color and the dot size, which allows you to chose from numerical
molecule properties. Additionally there is a slider to apply a jitter to the data on the spatial axes.
Please note that the data has to be loaded from the database when the mapping is changed. This can
cause a slight delay before the diagram is displayed. The mapping of data to axes is also shown in the
Legend panel of the side bar. There you can also see the domain of each axis.
52
CHAPTER 7. VIEWS 7.4. PLOT
The default color and size for dots can be changed at the tool bar. These values are used when no data
is mapped to the color/size properties. As soon as you change the property mapping the elements in the
tool bar change to let you enter an interval for the color/size of the dots.
7.4.3. Hyperplanes
Sometimes you do not want to see the complete set of data, but a small part of it. This is where the
concept of hyperplanes becomes useful. Hyperplanes allow you to define thresholds for an axis, that
means a minimum and a maximum value that a data must have to be displayed as dot in the diagram.
They can be applied to any of the 5 dimensions (x, y, z, color and size). When hyperplanes are set the
associated threshold values are shown in the diagram as black lines.
Hyperplanes can be set and adjusted using the Hyperplanes panel in the side bar. For each of the 5
possible dimensions there is a slider that lets you change the upper and the lower threshold value. For
a better control the values can also be entered as numbers in the text fields.
53
7.4. PLOT CHAPTER 7. VIEWS
Figure 7.18.: The button for the color and the combo box for the dot size. The upper figure shows the
default behavior, when no data is mapped. The lower figure shows the behavior when data
is mapped to color/size.
Figure 7.19.: The use of hyperplanes in a diagram: Without hyperplanes (left figure), with hyperplanes
at the y-axis (middle) and with the y-axis adjusting to the hyperplanes (right figure).
While the axes always try to adjust themselves to the domain that is mapped this behavior may be
inappropriate when using hyperplanes. By default a hyperplane does not influence the domain of an
axis, it just defines which values should be shown and which not. To let an axis adjust according to the
visible range that is defined by the hyperplanes, click the checkbox for the axis in the panel.
The diagram can be zoomed by using the buttons in the tool bar or by using the mouse wheel. When
using the mouse wheel the diagram will be zoomed with keeping the focus at the current position of the
mouse pointer. Furthermore each axis can be scaled independently by using the buttons in the Scale
panel of the side bar. In case the diagram becomes too large to fit in the window you can scroll around
by clicking in the diagram with the left mouse button and drag it around. Rotating the diagram works
similar, just press the right mouse button while dragging. Rotation only works in 3D-mode.
54
CHAPTER 7. VIEWS 7.5. TREE MAP
The Fit Graph button in the tool bar always resets the zoom, scroll position and rotation angles. Zooming
and scrolling the diagram to show the current selection can be done by clicking the button “Zoom to
current selection” in the tool bar.
To avoid confusion with the dot colors the current selection and flags (public and private) are not shown
in the diagram by default. To make them visible you have to set the accordant highlighting mode, which
can be done with the combo box in the tool bar. When the highlighting mode is set to “Selection” then
the dots that represent the molecules in the current selection are shown in red. Setting the highlighting
mode to “Public flag” shows the molecules with a public flag in green, while the highlighting mode
“Private flag” shows the molecules with a private flag in blue.
These features can also be set and removed from a molecule in the plot view, according to Table 7.1.
Effect Action
add a molecule to or remove it place the mouse pointer over the dot and click the left
from the selection mouse button
add multiple molecules to the hold the Shift key and the left mouse button down while
selection drawing a box around the dots
remove multiple molecules from hold the Ctrl key and the left mouse button down while
the selection drawing a box around the dots
toggle the public flag of a place the mouse pointer over the dot and click the middle
molecule mouse button
toggle the private flag of a place the mouse pointer over the dot and click the middle
molecule mouse button while pressing the Shift key
Please note that the highlighting mode adjusts automatically to the action that is performed.
In any case, no matter if you perform one of these actions or not, the molecule that belongs to the dot
under the mouse pointer is shown in the Detail View panel in the side bar.
To toggle display of the axis ticks and grids there are two buttons in the tool bar:
The treemap view offers an alternative visualization of the scaffold tree, which is described in Sec. 5.2.3:
Scaffold Tree Generation. The hierarchy defined by the scaffold tree is represented by nested rectangles,
see Fig. 7.21 for an example showing a small data set.
55
7.5. TREE MAP CHAPTER 7. VIEWS
The treemap itself allows you to zoom and pan around to get a detailed view for all parts of the tree
map. This navigation works similar to the scaffold tree view. For zooming you can either use the mouse
wheel, the buttons in the tool bar or the “TreeMap” menu. To move around you can left-click into the
tree map and move your mouse while keeping the left mouse button pressed. When you right-click a
node in the tree map, the camera is centered around this node, so that the node fills out the whole
screen. The detail increases on small zoom levels showing the title and image of nodes.
Initially the tree map is plotted without colors and a uniform size for every molecule. The size and color
of all nodes can be changed by assigning chemical properties to them. Fig. 7.22 shows the “Property
Mapping” panel, which is used for this purpose.
56
CHAPTER 7. VIEWS 7.5. TREE MAP
Node type When “Scaffolds” is selected only the scaffolds of the current subset are plotted. The option
“Molecules” additionally plots the molecules.
Node size Allows to map a chemical property to the node size. Only molecule properties can be mapped
to the size, since the size of an inner nodes is always derived from its child nodes. Default is “Nr
of Molecules”.
Node color Allows to map a chemical property to the node color. The mapping can be combined with
an accumulation function. This function works differently for molecule and scaffold properties:
• When using a molecule property, the accumulation function determines how to derive the
colors of scaffolds from associated molecules. If “Subtree cumulative” is enabled, the color of
a scaffold is determined by the properties of all molecules that are contained in its subtree. If
the option is disabled, only the molecules directly associated with the scaffold are considered.
• Normally the lowest and highest property values in the whole dataset are mapped to the
lowest and highest colors in the legend and all intermediate values are mapped linearly to an
intermediate color. If “Fit interval borders to current subset” is enabled, then only the lowest
and highest properties in the current subset are considered for the two outer colors of the
legend. This may be helpful for small subsets, where a global color mapping would lead to
very similar colors over the whole subset.
Minimum Allows to choose the color associated with the minimum value.
Maximum Allows to choose the color associated with the maximum value. If a property is not defined
for a structure, it will be plotted in gray.
The “Legend” visualizes which property values have been mapped to which colors in the tree map. It
is only visible, if a chemical property has been mapped to the node colors as explained in Sec. 7.5.2:
Property Mappings. Fig. 7.23 shows the legend for the tree map in Fig. 7.21.
The “Detail View” shows additional information for a single scaffold or molecule. When you move the
mouse cursor over a chemical structure in the tree map, an image and information about its size and
color are displayed here.
57
7.6. HEAT MAP CHAPTER 7. VIEWS
The HeatMap view visualizes values of pre-selected properties of molecules in such a way that relations
between properties and molecules can be explored. Generally, a heat map is a matrix where it’s cells
are colored by a color mapping function. The color itself represents the cells value. The heat map of
Scaffold Hunter uses clustered molecules as column values and (clustered) properties as row values. Thus,
molecules and properties are sorted in an organized way.
As a rule, a cell’s color represents one property value of a single molecule. In other words, a column
represents all property values of one molecule and a row contains the value of a single property of all
molecules.
The HeatMap view consists of several parts as shown in Fig. 7.25. Each part is marked by a surrounding
color as explained below:
3. heat map row legends, when per property color mapping is enabled (marked yellow)
6. the property list (or property dendrogram when clustered sorting is enabled) (marked magenta)
The left side bar contains several panels (from top to bottom):
• Molecule property panel to sort properties manually or by clustering and configure color mapping
• Global color legend (not shown in Figure 7.25 because a per-property color mapping was used)
• Detail view panel to show detailed information about the currently hovered molecule
58
CHAPTER 7. VIEWS 7.6. HEAT MAP
Figure 7.25.: The whole HeatMap view showing a heat map with manually sorted properties visualized
by double gradient per-property color mapping.
The information space can be explored via zooming and scrolling. Therefore, a tool bar, global shortcuts
(see A) and individual inputs are available to perform these actions. In addition to that, all separator
lines between components within the HeatMap view can be moved as desired.
The two buttons on the left side depicted in Fig. 7.26 show or hide the side- and subset bar. The second
pair allows to perform horizontal zoom. The left one of this pair performs a zoom in and the right one
performs a zoom out operation. The third pair provides the functionality to perform a vertical zoom
(zoom in and zoom out). Both horizontal and vertical zoom use the center of the heat map as center
for zooming. The third to the last button performs a zoom to overview which zooms out as far as the
whole heat map can be seen (horizontally). The second to the last button performs a ’Fit to Selection”
which zooms and pans the heat map to show all selected columns (molecules) at once. The last button
changes the height of the top dendrogram, so that it will occupy all vertical space, which is not needed
by the heatmap cells or shrink to the minimum size, if the vertical space is scarse.
59
7.6. HEAT MAP CHAPTER 7. VIEWS
Besides the toolbar, the heat map can be zoomed horizontally by using the mouse wheel or pressing the
keys “+” and “-“ (global shortcuts). To zoom vertically, “CTRL” + mouse wheel has to be used. An
advantage of zooming by using the mouse is the defined center of zoom by the cursor which simplifies
exploring the data significantly.
To scroll horizontally, the heat map can be either dragged or scrolled by the horizontal scroll bar below it.
Besides the vertical scroll bar, both the heat map legend on the left and the property list (or dendrogram)
on the right of the heat map accept vertical scrolling by using the mouse wheel.
Figure 7.27.: Dendrogram showing clustered molecules above the heat map and it’s vertical scroll bar on
the right side.
The dendrogram depicting clustered molecules is a small version of the dendrogram provided by the
dendrogram view (see 7.3). It accepts the same commands as the heat map except that using “CTRL”
+ mouse wheel performs a vertical zoom on the dendrogram itself (see 7.27).
A single molecule can be selected by either clicking on it’s corresponding column or on the leaf of the den-
drogram as shown in Figure 7.28a. Multiple molecules can selected by clicking on the dendrogram.
Flags can be set by performing a right click on the dendrogram leave above the heat map (see Figure
7.28b).
60
CHAPTER 7. VIEWS 7.6. HEAT MAP
Molecules can be clustered by using the clustering configuration panel as described in 7.3.6. After the
clustering has been finished the heat map will be generated. All settings (e.g., color mapping) can be
updated independently without reclustering. Updated settings can be applied by clicking on the “update
HeatMap” button which triggers a regeneration of the whole heat map.
(a) Manual sorting of properties in (b) Dialog to configure clustering of properties and how
left side bar to visualize it’s result.
Color mapping is responsible to map each cell value to an appropriate color. The heat map of Scaffold
Hunter currently supports two color mapping modes (global and per-property) and three color mapping
functions:
• Interval
• Gradient
• Double Gradient
61
7.6. HEAT MAP CHAPTER 7. VIEWS
Figure 7.30.: Clustered properties are shown as a dendrogram instead of a common list.
Global color mapping means that all cell values of the heat map are mapped by the same function
(applied globally over all rows). A dedicated configuration dialog can be opened by clicking on the config
button (see figure 7.31). This dialog allows to select a color mapping function and do further adjustments
if needed.
Figure 7.31.: Configuration of color mapping function “Gradient” for global color mapping.
62
CHAPTER 7. VIEWS 7.7. CLOUD
Per-property color mapping applies an individually configured mapping function on each row (per prop-
erty). A configuration dialog can be opened to do further adjustments and select the color mapping
function to apply (see figure 7.32). The dialog can also be accessed by clicking the color-legend of the
respective property.
Figure 7.32.: Configuration of color mapping function “Interval” for per property color mapping.
7.7. Cloud
The Cloud view provides a visualization of scaffolds. The size of the scaffold’s representation is de-
termined by its frequency and the color of the scaffold’s representation can be mapped to arbitrary
properties of the scaffolds or associated molecules (i.e. imported and calculated properties). See Fig.
7.33 for an example.
7.7.1. Navigation
As any other view the Cloud view allows basic zooming and panning functionality. Zooming may be
performed either by using the mouse wheel or the buttons located in the view’s toolbar. By clicking
on the canvas and holding the left mouse button the cloud may be dragged. The cloud also allows the
selection of nodes by either clicking on single nodes or by dragging a rectangle on the canvas while holding
down the left mouse button and pressing the Shift or the Crtl key, like in most other views.
The computation of a cloud can be influenced by different options from the layout panel, which can be
seen in Fig. 7.34.
63
7.7. CLOUD CHAPTER 7. VIEWS
The most significant parameter is the used layout algorithm. These algorithms determine the way the
scaffolds of the current are placed inside the cloud. The algorithms can be divided into two groups:
• The first group of algorithms does not respect any kind of similarity between the nodes (i.e. the
scaffolds). The layout is only computed by considering the actual size of each node. This group
contains the Ertl algorithm and the Wordle algorithm. The former is usually the fastest algorithm
and produces very compact layouts.
• The second group supports similarities between scaffolds. These algorithms do not only try to
produce compact layouts, but they also try to place similar nodes as close as possible to each
other. This way all of these algorithms can produce quite different results, where none of them
can really be declared as the best. The available algorithms are CycleCover, Inflate and Push,
MDS, Seam Carving and Star Forest. These algorithms together with the Wordle algorithm were
originally designed for word clouds but have been modified to also work for chemical structures [1].
If you select an algorithm with similarity support, you can as well define the similarity metric. There
are four available types of metrics, where each of them requires a specific set of properties to work with.
After you selected a metric, the list of selectable properties is automatically reduced to the compatible
ones. If no property can be selected, then the data set does not contain a property, which is compatible
with the selected metric. The available metrics are:
• Euclide: Euclidian distance between one or more numerical properties.
• Tanimoto: Tanimoto distance on a single binary fingerprint property.
• Tanimoto Bit: Same as Tanimoto, but works on bit strings instead of character strings.
• Jaccard: A set similarity, which works on one fingerprint property with feature counts on each
position.
Any changes to the layout options have to be applied by pressing the update button below the options.
64
CHAPTER 7. VIEWS 7.7. CLOUD
The button text is drawn red, if there are any changes, which are not yet applied. If the button is
disabled, this means that one or more layout options are invalid. This can happen if no algorithm is
selected or if no property is selected, though the chosen algorithm requires one.
When the current subset is changed, a new cloud layout will be computed with the last selected settings.
However, if the new subset contains more than 500 scaffolds, you will be asked if you really want to
continue. This request should just prevent unintentional occupation of system resources, as large clouds
may take much time and system memory in order to succeed. After confirming the computation can be
aborted at any time.
The “Property Mapping” in this view works very similar to how it works in the TreeMap view. So you
may want to read Sec. 7.5.2: Property Mappings. The only difference is that in the cloud view chemical
properties can only be mapped to a color, since the size is already associated with the frequency of a
scaffold. The color legend also works in exactly the same way as in the TreeMap view. So, please, refer
to Sec. 7.5.3: Color Legend for explanation.
7.7.4. Filtering
The Cloud view supports filtering on the processed subsets. This feature can be used to prune out
structures with certain properties, e.g. structures which occur extremely often in the subset and are thus
not of interest. The filtering panel is shown in Fig. 7.35.
The top slider allows to select “Molecules number” or any other numeric scaffold property, where the
allowed minimum and maximum can be selected independently. The bottom slider is always fixed to
“Number of rings” and works the same way. The sliders implicitly show the minimum and maximum of
the corresponding property within the current subset.
When the subset is changed, the sliders will keep their selected values. Since a new subset might have
65
7.7. CLOUD CHAPTER 7. VIEWS
different minima und maxima, you might also have to readjust the sliders. The leftmost and rightmost
ticks of the sliders can be used to pin them to the minimum or maximum. Even if the actual minimum
and maximum values change for a new subset, this option will always ensure, that the sliders stay at the
leftmost or rightmost position.
The checkbox below manages how the filters are applied on the current cloud. If the layout should not
be adjusted, pruned nodes will just disappear from the cloud leaving a gap. New nodes will be added in
a heuristic manner, such that they may overlap with existing nodes. If the box is checked, every filter
operation will invoke a layout adjustment. Gaps will be closed by pulling nearby nodes into the gap,
while overlaps will be resolved by pushing overlapping nodes into different directions. These adjustments
may take some time depending on the size of the cloud and the number of nodes affected by the applied
filter operation.
It should also be mentioned that these layout adjustments do not match the quality of a complete
recomputation. The adjustments are kept very simple to be more responsive to user input. Therefore
the scaffold similarities do not receive as much attention as they do from the layout algorithms itself.
If these similarities are important to your work, you should consider to recompute the cloud after the
filters are set.
The detailed view gives some additional information about the cloud node. It shows the image of the
node under mouse cursor, its rate and the property that is used to calculate the rate.
66
8. Export
Scaffold Hunter offers the possibility to export a subset as a CSV file or as SDF. The export option can
be accessed from the menu bar or by right-clicking a subset. This will open the dialog shown in Fig.
8.1.
On the left-hand side you can select the destination file and the file type. The two checkboxes below
allow to export the SMILES of each molecules, as well as the descriptions of the properties. For CSV
files two additional settings become visible below. They determine the separator between each value and
the characters which are used to envelope each value.
The properties, which should be exported, can be selected on the right-hand side. The two buttons
below select or unselect all properties at once. You can also use the Ctrl- and Shift-Keys to select or
unselect multiple properties at once.
After clicking on the export-button, a small progress dialog pops up (see Fig. 8.2). It shows the current
process of the export and allows to abort it at any time.
67
Bibliography
[1] Lukas Barth, Stephen G. Kobourov, and Sergey Pupyrev. Experimental comparison of semantic
word clouds. In Experimental Algorithms - 13th International Symposium, SEA 2014, Copenhagen,
Denmark, June 29 - July 1, 2014. Proceedings, pages 247–258, 2014.
[2] Bernhard Dick, Thorsten Flügel, Henning Garus, Michael Hesse, Philipp Kopp, Philipp Lewe, Do-
minic Sacré, Till Schäfer, and Thomas Schmitz. Endbericht PG 552 - Drug Hunting. Technical
report, 2011.
[3] Adalbert Gorecki, Anke Arndt, Arbia Ben Ahmed, Andre Wiesniewski, Cengizhan Yücel, Geb-
hard Schrader, Henning Wagner, Michael Rex, Nils Kriege, Philipp Büderbender, Sergej Rakov,
and Vanessa Bembenek. Endbericht PG 504 - ChemBioSpaceExplorer. Technical report, 2007.
[4] Marcus A. Koch, Ansgar Schuffenhauer, Michael Scheck, Stefan Wetzel, Marco Casaulta, Alex Oder-
matt, Peter Ertl, and Herbert Waldmann. Charting biologically relevant chemical space: a structural
classification of natural products (sconp). Proceedings of the National Academy of Sciences of the
United States of America, 102(48):17272–17277, November 2005.
[5] Ansgar Schuffenhauer, Peter Ertl, Silvio Roggo, Stefan Wetzel, Marcus A. Koch, and Herbert Wald-
mann. The scaffold tree - visualization of the scaffold universe by hierarchical scaffold classification.
Journal of Chemical Information and Modeling, 47(1):47–58, 2007.
69
A. Keyboard Shortcuts
Key Action
+/- Zoom in/out
Zoom 0 Fit graph / Table: Normalize rows
s Fit selection / Table: Scroll to first selected molecule
Ctrl+A Select all molecules/scaffolds
Ctrl+Shift+A Deselect all molecules/scaffolds
Ctrl+I Invert selection
Selection
Ctrl+D Deselect all molecules in the current views subset
Ctrl+N Make subset from selected molecules
Ctrl+Shift+N Make subset from selected molecules in current view
Ctrl+W Close current view
Ctrl+PageUp Show next view
View Ctrl+PageDown Show previous view
ALT+← Toggle display of side bar
ALT+→ Toggle display of subset
Subset Ctrl+R Create random subset
Mouse Action
Left-click on molecule/scaffold Toggle selection of molecule/scaffold
Shift + Left-click + Drag Select all molecules/scaffolds in selection rectangle
Selection
Ctrl + Left-click + Drag Deselect all molecules/scaffolds in selection rectangle
Left-click on tree node Toggle selection of subtree (dendrogram view only)
Middle-click on molecule/scaffold Add/remove public flag for molecule/scaffold
Flags
Shift + Middle-click on molecule/scaffold Add/remove private flag for molecule/scaffold
WheelUp / WheelDown Zoom in/out
Zoom
Ctrl + WheelUp / WheelDown Zoom in/out vertically (dendrogram view only)
Left-click + Drag Move view area (table view only)
Navigation Right-click + Drag Rotate 3D-graph (plot view only)
Right-click on molecule/scaffold Open context menu (all views; except plot view)
71
APPENDIX A. KEYBOARD SHORTCUTS
Key Action
→ Move cursor clockwise
← Move cursor counter-clockwise
Navigation ↑ Move cursor to first child
↓ Move cursor to parent
C Focus cursor
Ctrl+↑ Increase radii
Ctrl+↓ Decrease radii
Radii
F Toggle radii lock
Ctrl+0 Reset radii
Enter Open/close children
Ctrl+Enter Open/close entire subtree
Expanding E Expand next level/Reduce to level
Ctrl+E Expand all nodes
Ctrl+Shift+E Reset to default expand level
Space Select/deselect scaffold
M (pressed) Show molecules
Visual
Ctrl+M Toggle permanent display of molecules
Ctrl+P Configure property mappings...
PageUp Scale up cursor scaffold
PageDown Scale down cursor scaffold
Alt+PageUp Scale up selected scaffolds
Node scaling Alt+PageDown Scale down selected scaffolds
N Normalize cursor scaffold
Alt+N Normalize selected scaffolds
A Normalize all nodes
L Switch to linear layout
Layout B Switch to balloon layout
R Switch to radial layout
Key Action
Ctrl+G Toggle display of grid
Ctrl+T Toggle display of ticks
Key Action
Ctrl+T Toggle display of table
Ctrl + Plus Zoom in vertically
Ctrl + Minus Zoom out horizontally
Key Action
H Adjust height of top dendrogram
Ctrl + Plus Zoom in vertically
Ctrl + Minus Zoom out vertically
72
APPENDIX A. KEYBOARD SHORTCUTS
Key Action
Shift + A Toggle animated relocation of nodes
73
B. How to Write Plugins
In this chapter you will learn the basic parts needed to write new plugins. Feel free to write new
import or calc plugins or to expand the existing ones. We are always happy to get new plugin (versions)
submitted.
To write a new import plugin you need the Scaffold Hunter source code. The plugins basis is held in
the edu.udo.scaffoldhunter.plugins.dataimport. You should put every new plugin inside the edu.
udo.scaffoldhunter.plugins.dataimport.impl.PLUGINNAME package.
PluginResults The PluginResults are built on base of a plugin run and contain all results of the
specific plugin with a given configuration. At first there are some basic results, such as the number of
results, name of the rows in resulting data, which of those rows maybe numeric, and finally iterables
consisting of the molecules.
ImportPlugin The import plugin itself is the interface to Scaffold Hunter. It gives instances of the
PluginSettingsPanel and the PluginResults back, has a method to test if a configuration will give
useable results or fail in the beginning and has some basic information, as name and an UID.
Next to those classes there are the Arguments, which is a simple Object consisting of a current config-
uration and a class implementing the Serializeable interface which contains data, which is saved into
the database. It could be used to save older settings.
The source tree already consists of a very simple plugin, it is named DummyImportPlugin. In this part
you will get a step by step guide whose result will be such a simple plugin, which can generate a simple
error message and gives two empty molecules back.
75
B.1. IMPORT PLUGINS APPENDIX B. HOW TO WRITE PLUGINS
B.1.4.1. ImportPlugin.java
getTitle() The getTitle() method returns a short name of the plugin. This is also the name which
is listed in the import dialog.
getID() In getID() your plugin should return a unique name of the plugin, this is for example used by
Scaffold Hunter to match saved properties for the plugins. During the development process of Scaffold
Hunter all Plugins used the form CLASSNAME_VERSION that should be unique enough.
getDescription() The getDescription() method returns a description about for what the import
plugin can be used, like ”This is an import plugin which can be used to import data from our internal
webfrontend”.
In the second step we will add an ImportPluginResult object, which will give back one molecule without
a structure but with two properties (one will be numerical).
B.1.5.1. Example2ImportPluginResults.java
As a new part in the second Example we have a PluginResults class. This class has to implement the
PluginResults interface. Let us go through the new methods.
76
APPENDIX B. HOW TO WRITE PLUGINS B.1. IMPORT PLUGINS
getSourceProperties() In the getSourceProperties() method you have to give back a Map of PropertyNames.
The Map can also include PropertyDefinitions, which could build the base for a more detailed way of
defining the type of the property, which should be implemented in further versions of Scaffold Hunter,
most times a null-value for the PropertyDefinition part is sufficient. So we generate a new Set with
the two property names ”title” and ”number”.
getProbablyNumeric() The getProbablyNumeric() method gives back a Set of strings containing the
property names of those properties which contain numeric values. It is used in the mapping dialog to
automatically select which properties should be treated as numbers. In this example it is the property
”number”.
getMolecules() This method contains the main plugin task. It returns the Iterable which contains
the new molecules. Here we create a No Notifying Molecule (NNMolecule). The usage of NNMolecule in
place of Molecule has a very high positive impact on the import speed. We add our two properties to
the molecule and put it into a simple List which we return. When you write your own plugin you will
probably write your own class implementing the Iterable interface.
getNumMolecules() The getNumMolecules() method returns the number of molecules which will be
imported. We built one Molecule so we return 1.
B.1.5.2. Example2ImportPlugin.java
The third example adds a very simple Settings Panel to the plugin where we can type in the molecule
title and an error message for the checkArguments(arguments) method.
B.1.6.1. Example3PluginArguments.java
At first there is a new Class, Example3PluginArguments, which holds the arguments for a single plugin
run. This is a very simple class with three fields:
• boolean error : Will be true, if the plugin should generate an error message in the checkArguments
method.
• String errorMessage : The message which will be given back if the plugin generates an error
message.
• String moleculeTitle : The content of the title property in the molecule.
77
B.1. IMPORT PLUGINS APPENDIX B. HOW TO WRITE PLUGINS
B.1.6.2. Example3PluginSettingsPanel.java
The second new class is the Example3PluginConfigurationPanel. So we are able to fill the configuration
panel within the import dialog with content.
getSettings() We do not safe any settings, so this method returns null. If you want to have saveable
settings generate a new class that implements Serializeable which holds those settings. An instance
of this class with the content which should be saved has to be returned here.
getArguments() The getArguments() method returns the current settings being made in a Example3PluginArguments
object. So here we read the JCheckBox state, and the content of the two JTextField instances and put
them to the corresponding fields in the returned object.
B.1.6.3. Example3ImportPluginResults.java
In the Results class we only had to add the arguments and built the molecule title property on it. So
first there is a new private field Example3ImportPluginArguments which holds the arguments for the
plugin run.
Example3ImportPluginResults(arguments) The new constructor just sets the internal arguments field
to the supplied arguments. When you write a plugin you should put some initialization here, like opening
database connections, counting of molecules, etc.
getMolecules() In the getMolecules() method we set the title property according to the moleculeTitle
field in the arguments.
B.1.6.4. Example3ImportPlugin.java
getSettingsPanel(settings,arguments) Here our new SettingsPanel is returned and we cast the ar-
guments into the right type.
getResults(arguments) Same for the results, they now await arguments, so the class gets them.
checkArguments(arguments) Now we are able to check, if a plugin run will ”succeed” so we either
give the error message from the arguments if the user wants it or return null.
B.1.7. Fourth and last version - Output a message during the import
At this point you are able to build configurations, check for an error at the beginning of the import
and return molecules. There is only one last part missing, the possibility to send Messages during the
import. This is realized in the Example4ImportPlugin.
78
APPENDIX B. HOW TO WRITE PLUGINS B.2. CALCULATION PLUGINS
B.1.7.1. Example4ImportPluginArguments.java
At first we added a new configuration option to switch the message on or off. It is named generateMessage.
B.1.7.2. Example4ImportPluginSettingsPanel.java
In the Settings panel the default value for the generateMessage is added. Furthermore there is a
JCheckBox added, to switch the Message on or off.
B.1.7.3. Example4ImportPluginResults.java
In the Example4ImportPluginResults some changes have been made. First there is a LinkedList which
holds the listeners which are registered with the results.
getMolecules() The getMolecules() method has been rewritten. Instead of a simple List it now
returns a self written Iterable<Molecule>, which still gives back our simple Molecule. But in the
getNext() method of the included Iterator<Molecule> the argument generateMessage is checked and
if we should send a Message a new instance of the type edu.udo.scaffoldhunter.model.data.Message
is generated which consists of a Message saying that the Molecule structure on base of a SMILES string
could not be generated. You only need to give a type of the message to the constructor of the message,
the name can be empty and the other two arguments null, they are set in other parts of the import process.
Some MessageTypes are already defined in the edu.udo.scaffoldhunter.model.dataimport.MergeMessageTypes
class. Two of them should be useful for import plugins:
If you need other MessageTypes just implement them using the MessageType interface.
Name Description
MOLECULE_BY_SMILES_FAILED Can’t build Molecule on base of SMILES
MOLECULE_BY_MOL_FAILED Can’t build Molecule on base of MOL
To write a new calc plugin you need the Scaffold Hunter source code. The calc plugin interfaces and
helper classes are stored in the following packages:
• edu.udo.scaffoldhunter.plugins
• edu.udo.scaffoldhunter.plugins.datacalculation
• edu.udo.scaffoldhunter.model.data
• edu.udo.scaffoldhunter.model.datacalculation
79
B.2. CALCULATION PLUGINS APPENDIX B. HOW TO WRITE PLUGINS
CalcPluginResults The CalcPluginResults class is constructed during a plugin run and contains all
results of the specific plugin with a given configuration. With help of the CalcPluginResults class the
calc plugin tells the plugin system which properties where calculated and supplies an Iterable over all
molecules (the calculated properties are attached to those molecules).
CalcPlugin The calc plugin itself is the interface to Scaffold Hunter. It returns instances of the
PluginSettingsPanel and the CalcPluginResults, has a setter method to get notified about existing
properties and has getter methods which provide basic information like the title, description and unique
identifier of the plugin.
Next to those classes there is a PluginArguments class, which is a simple Object representing a plugin
configuration and a CalcPluginSettings class implementing the Serializable interface which contains
data, that is saved into the database. It could be used by the plugin to save and retrieve settings like
e.g. an arguments history.
In the next sections we will – in a step-by-step guide – develop a plugin that reads an existing numerical
property, adds or subtracts 1.0 from the property and saves the new value into a new property. The
plugin is configurable and has the ability to send messages to the GUI if something goes wrong.
B.2.4.1. Example1CalcPlugin.java
extends AbstractCalcPlugin Your calc plugin has to inherit from AbstractCalcPlugin. This abstract
class implements the ImportPlugin interface for you and also overrides the toString() method so that
your plugin title is shown correctly in the list of calc plugins in the calc dialog.
80
APPENDIX B. HOW TO WRITE PLUGINS B.2. CALCULATION PLUGINS
getTitle() The getTitle() method returns a short name of the plugin. This is also the name which
is listed in the calc dialog.
getID() In getID() your plugin should return a unique name of the plugin, this is for example used by
Scaffold Hunter to match saved properties for the plugins. During the development process of Scaffold
Hunter all plugins used the form CLASSNAME_VERSION that should be unique enough.
getDescription() The getDescription() method returns a description about for what the calc plugin
can be used, like ”This is a calc plugin which can be used to calculate the xyz-fingerprint”.
B.2.5.1. Example2CalcPlugin.java
81
B.2. CALCULATION PLUGINS APPENDIX B. HOW TO WRITE PLUGINS
PropertyType Description
NumProperty An ordinary numerical property.
StringProperty An ordinary string property.
BitStringFingerprint A bit fingerprint represented by a string of 1 and 0 (chars).
BitFingerprint A bit fingerprint that interprets every bit of a string as a bit. This
is logically identical to BitStringFingerprint but has less memory
consumption.
NumericalFingerprint A fingerprint that consists of many numerical values. A Numeri-
calFingerprint is a simple string with integer values separated by
a comma: int,int,...
B.2.5.2. Example2CalcPluginTransformFunction.java
implements Function<Molecule, Molecule> Our transform function needs to implement the Function
interface and we want to transform from molecule to molecule.
apply(molecule) The apply() functions gets one molecule as input parameter, and returns the trans-
formed molecule. We just insert a mapping from the propDef (our PropertyDefinition) to the value
1.0 to the molecules property map. Then we return the molecule. The plugin system will read this map
and save the property.
B.2.6.1. Example3CalcPluginArguments.java
The Example3CalcPluginArguments class just has a boolean member variable encoding the state of the
checkbox. Additionally it has a getter and a setter method for this variable.
B.2.6.2. Example3CalcPluginSettingsPanel.java
82
APPENDIX B. HOW TO WRITE PLUGINS B.2. CALCULATION PLUGINS
B.2.6.3. Example3CalcPluginTransformFunction.java
apply(molecule) The apply() method now determines the property value based on the saved arguments.
B.2.6.4. Example3CalcPlugin.java
In the calc plugin itself there are just a few changes. The following methods changed:
B.2.7.1. Example4CalcPluginArguments.java
A new variable which saves the property chosen by the user is added together with corresponding getter
and setter methods to the PluginArguments from the last version.
B.2.7.2. Example4CalcPluginSettingsPanel.java
The PluginSettingsPanel from the last version is extended to show a JList with all numerical property
definitions. A list selection listener is used to update the Example4CalcPluginArguments with the
property definition selected by the user.
B.2.7.3. Example4CalcPluginTransformFunction.java
The apply() method was adjusted so that the value of the chosen property is read from the input
molecule, 1.0 or −1.0 is added and the resulting new property value is appended to the property map of
the output molecule.
83
B.2. CALCULATION PLUGINS APPENDIX B. HOW TO WRITE PLUGINS
B.2.7.4. Example4CalcPlugin.java
In comparison to the last version there are several small changes in Example4CalcPlugin. The construc-
tor was deleted and the creation of the property definition moved to the getResults() method.
The fifth and last version of the plugin can be found in edu.udo.scaffoldhunter.plugins.datacalculation.
impl.example5. Here we will enable the plugin to send messages to the GUI in case that the input prop-
erty chosen by the user is not defined for a molecule.
B.2.8.1. Example5CalcPlugin.java
In comparison to the last plugin version, there is just one small change: The getResults(arguments,
molecules,msgListener) method now passes the msgListener parameter to the constructor of the
Example5CalcPluginTransformFunction.
B.2.8.2. Example5CalcPluginTransformFunction.java
apply(molecule) In the apply() method an else clause (which sends the message) was inserted after
the if-block (which does the calculation). So in case the chosen property is not defined for the given
molecule, a CalcMessage is created and sent to the GUI. Note that the molecule title needed to construct
the message is read from the molecule properties map, by asking for the key CDKConstants.TITLE.
84
APPENDIX B. HOW TO WRITE PLUGINS B.2. CALCULATION PLUGINS
B.2.8.3. Example5CalcPluginArguments.java
B.2.8.4. Example5CalcPluginSettingsPanel.java
If you write a calc plugin that takes the structural information (e.g. 2D graph structure) of a molecule
into account, then you may want to give the user the opportunity to use transform options. Transform
options are a kind of preprocessing on all molecules before the calc plugin operates on them. See Sec.
5.2.2.1: Transform Options for more information.
Using transform options is rather simple. First change your PluginArguments class so that it inher-
its from AbstractCalcPluginArguments, which can be found in edu.udo.scaffoldhunter.plugins.
datacalculation. The second thing you have to do, is to integrate an instance of CalcPluginTransformOptionPanel
(can be found in edu.udo.scaffoldhunter.plugins.datacalculation) in your custom PluginSettingsPanel.
As the name suggests, you can add the CalcPluginTransformOptionPanel like any other Swing con-
tainer somewhere to your custom panel. When instantiating CalcPluginTransformOptionPanel, you
need to pass your PluginArguments instance to its constructor. After following the instructions above
your plugin will show a panel with transform options and depending on the users choice the molecules
are transformed automatically before they are processed by the plugin.
85
C. FAQ - Frequently Asked Questions
C.1. How can i try out Scaffold Hunter the easiest way?
You can use Scaffold Hunters demo mode. Download the demo package provided along with the regular
Scaffold Hunter releases and execute the run-demo start-script. Scaffold Hunter will then open an
included data set with a pre-configured session. You do not need to configure any database connection
or provide any data for analysis.
87
C.5. I CANNOT USE ALL CLUSTERING OPTIONS – THERE ARE NO PROPERTIES
AVAILABLE FOR SOME DISTANCE MEASURES!
APPENDIX C. FAQ - FREQUENTLY ASKED QUESTIONS
C.6. How can I prevent other users from getting access to my data?
If you conduct a study with confidential data and do not want any other user to access Scaffold Hunters
data, you need to configure the MySQL/MariaDB server accordingly. Make sure that only authorized
people have read access to the schema in which Scaffold Hunters data is stored. You can also install a
MySQL/MariaDB server on your local computer and limit the access to localhost. If the performance is
sufficient, you can also use the connection type HSQLDB in the Fig. 4.2, and save the database file on a
secure place somewhere on your harddisk.
1 http://www.mysql.de/products/workbench/
88