ParaView Catalyst UsersGuide
ParaView Catalyst UsersGuide
heCa
t
a
l
y
s
tUs
e
r
sGui
dev
2.
0
Pa
r
a
Vi
e
w4.
3.
1
Andr
e
wC.
Ba
ue
r
,Be
r
kGe
v
e
c
i
,Wi
l
l
S
c
hr
oe
de
r
Ki
t
wa
r
eI
nc
.
F
e
br
ua
r
y2
0
1
5
The ParaView Catalyst Users Guide is available under a Creative Commons Attribution license
c
(CC by 3.0).
2015,
Kitware Inc.
www.kitware.com
Cover image generated from a Helios simulation of compressible flow past a sphere. Helios is
developed by the U.S. Armys Aeroflightdynamics Directorate.
We would like to acknowledge the support from:
Ken Moreland is the project lead for Sandia. Sandia has contributed significantly
to the project both in development and vision. Sandia developers included Nathan
Fabian, Jeffrey Mauldin and Ken Moreland.
Jim Ahrens is the project lead at LANL. The LANL team has been integrating Catalyst with various LANL simulation codes and has contributed to the development
of the library.
Mark Potsdam, from Aeroflightdynamics Directorate, was the main technical point
of contact for Army SBIRs and has contributed significantly to the vision of Catalyst.
Contents
1 Introduction to ParaView
1.1 Motivation . . . . . . .
1.2 Example Workflow . . .
1.3 Further Information . .
Catalyst
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
7
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
11
13
14
14
15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
19
21
21
21
23
32
36
37
37
38
38
41
42
43
44
45
45
46
46
. . .
. . .
. . .
Tree
. . .
.
.
.
.
.
54
54
54
55
55
56
. . . .
. . . .
. . . .
Source
. . . .
5 Examples
59
5.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6 References
61
6.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7 Appendix
7.1 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 vtkWeakPointer, vtkSmartPointer and vtkNew . . . . . . . . . . . . . .
7.1.2 ParaView Catalyst Python Script for Outputting the Full Dataset . . .
7.1.3 Reusing Simulation Memory for Non-VTK Compliant Memory Layouts
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
62
62
62
64
Chapter 1
1.1
Motivation
Computing systems have been increasing in speed and capacity for many years now. Yet not all of the
various subsystems which make up a computing environment have been advancing equally as fast. This has
led to many changes in the way large-scale computing is performed. For example, simulations have long been
scaling towards millions of parallel computing cores in recognition that serial processing is inherently limited
by the bottleneck of a single processor. As a result, parallel computing methods and systems are now central
to modern computer simulation. Similarly, with the number of computing cores increasing to address bigger
problems, IO is now becoming a limiting factor as Table 1.1 below indicates. While the increase in FLOPS
between FY2010 and 2018 is expected to be on the order of 500, IO bandwidth is increasing on the order of
100 times.
2010
2 PF/s
0.2 TB/s
2018
1 EF/s
20 TB/s
Factor Change
500
100
Table 1.1: Potential exascale computer performance. Source: DOE Exascale Initiative Roadmap, Architecture and Technology Workshop, San Diego, December, 2009.
This divergence between computational power and IO bandwidth has profound implications on the way
future simulation is performed. In the past it was typical to break the simulation process into three pieces:
pre-processing (preparing input); simulation (execution); and post-processing (analyzing and visualizing results). This workflow is fairly simple and treats these three tasks independently, simplifying the development
of new computational tools by relying on a loose coupling via data file exchange between each of the pieces.
For example, pre-processing systems are typically used to discretize the domain, specify material properties,
boundary conditions, solver parameters, etc. finally writing this information out into one or more input files
to the simulation code. Similarly, the simulation process typically writes output files which are read in by
the post-processing system. However limitations in IO bandwidth throw a monkey wrench into this process,
as the time to read and write data on systems with relatively large computational power is becoming a
severe bottleneck to the simulation workflow. Savings can be obtained even for desktop systems with a small
amount of parallel processes. This is shown in the figure below for a 6 process run on a desktop machine
performing different analysis operations as well as IO. It is clear that the abundance of computational resources (cores) results in relatively rapid analysis, which even when taken together are faster than the time
it takes for the simulation code to save a full dataset.
Figure 1.1: Comparison of compute time in seconds for certain analysis operations vs. saving the full data
set for a 6 process run on a desktop machine.
The root problem is that due to the ever increasing computational power available on the top HPC
machines, analysts are now able to run simulations with increasing fidelity. Unfortunately this increase in
fidelity corresponds to an increase in the data generated by the simulation. Due to the divergence between
IO and computational capabilities, the resulting data bottleneck is now negatively impacting the simulation
workflow. For large problems, gone are the days when data could be routinely written to disk and/or
copied to a local workstation or visualization cluster. The cost of IO is becoming prohibitive, and large-scale
simulation in the era of cheap FLOPS and expensive IO requires new approaches.
One popular, but crude approach relies on configuring the simulation process to save results less frequently
(for example, in a time-varying analysis, every tenth time step may be saved, meaning that 90% of the
results are simply discarded). However even this strategy is problematic: It is possible that a full save of the
simulation data for even a single time step may exceed the capacity of the IO system, or require too much
time to be practical.
A better approach, and the approach that ParaView Catalyst takes, is to change the traditional threestep simulation workflow of pre-processing, simulation, and post-processing (shown in Figure 1.2a) to one
that integrates post-processing directly into the simulation process as shown in Figure 1.2b. This integration
of simulation with post-processing provides several key advantages. First, it avoids the need to save out
intermediate results for the purpose of post-processing; instead post-processing can be performed in situ as
the simulation is running, This saves considerable time as illustrated below in Figure 1.3.
5
(a)
(b)
(a)
(b)
Figure 1.3: Comparison of full workflow for CTH with post-processing results: (a) full workflow vs. with in
situ processing with ParaView Catalyst (b). Results courtesy of Sandia National Laboratories.
Further, instead of saving full datasets to disk, IO can be reduced by extracting only relevant information.
Data extracts like iso-contours, data slices, or streamlines are generally orders of magnitude smaller than
the full dataset (Figure 1.4). Thus writing out extracts significantly reduces the total IO cost.
Finally, unlike blind subsampling of results, using an integrated approach it becomes possible to analyze
the current state of the simulation and save only information pertinent to the scientific query at hand. For
example, it is possible to identify the signs of a forming shock and then save only that information in the
neighborhood of the shock.
There are other important applications that address the complexity of the simulation process. Using
co-processing it is possible to monitor the progress of the simulation, and ensure that it is progressing in a
valid way. It is not uncommon for a long running simulation (maybe days or longer in duration) to be tossed
out because initial conditions, boundary conditions or solution parameters were specified incorrectly. By
checking intermediate results its possible to catch mistakes like these and terminate such runs before they
incur excessive costs. Similarly, co-processing enables improved debugging of simulation codes. Visualization
can be used to great effect to identify regions of instability or numerical breakdown.
ParaView Catalyst was created as a library to achieve the integration of simulation and post-processing.
It has been designed to be easy to use and introduces minimal disruption into numerical codes. It leverages
standard systems such as VTK and ParaView (for post-processing) and utilizes modern scientific tools like
Python for control and analysis. Overall Catalyst has been shown to dramatically increase the effectiveness of
the simulation workflow by reducing the amount of IO, thereby reducing the time to gain insight into a given
problem, and more efficiently utilizing modern HPC environments with abundant FLOPS and restricted IO
Figure 1.4: Comparison of file size in bytes for saving full data set vs. saving specific analysis outputs.
bandwidth.
1.2
Example Workflow
Figure 1.5 demonstrates a typical workflow using Catalyst for in situ processing. In this figure it is assumed
that Catalyst has already been integrated into the simulation code (see Section 3 for details on how to
integrate Catalyst). The workflow is initiated by creating a Python script using ParaViews GUI which
specifies the desired output from the simulation. Next, when the simulation starts it loads this script; then
during execution any analysis and visualization output is generated in synchronous fashion (i.e., while the
simulation is running). Catalyst can produce images/screenshots, compute statistical quantities, generate
plots, and extract derived information such as polygonal data or iso-surfaces to visualize geometry and/or
data.
Catalyst has been used by a variety of simulation codes. An arbitrary list of these codes includes
PHASTA from UC Boulder, Hydra-TH, MPAS-O, XRAGE, NPIC and VPIC from LANL, Helios from the
Armys Aeroflightdynamics Directorate, and CTH, Albany and the Sierra simulation framework from Sandia,
H3D from UCSD, and Code Saturne from EDF have all been instrumented to use Catalyst. Some example
outputs are shown in Figure 1.6.
(a) PHASTA
(b) Helios
(d) CTH
Figure 1.6: Various results from simulation codes linked with ParaView Catalyst. Note that post-processing
with different packages was performed with (b) and (d).
Of course Catalyst is not necessarily applicable in all situations. First, if significant reductions in IO are
important, then its important that the specified analysis and visualization pipelines invoked by Catalyst
actually produce reduced data size. Another important consideration is whether these pipelines scale appropriately. If they do not, then a large-scale simulation may bog down during co-processing, detrimentally
impacting total analysis cycle time. However, both the underlying ParaView and VTK systems have been
developed with parallel scaling in mind, and generally perform well in most applications. Figure 1.7 shows
two scale plots for two popular algorithms: slicing through a dataset, and decimating large meshes (e.g.,
reducing the size of an output iso-contour).
Such stellar performance is typical of VTK algorithms, but we recommend that you confirm this behavior
for your particular analysis pipeline(s).
1.3
Further Information
(a) Slice
(b) Decimate
www.paraview.org The main ParaView page with links to wikis, code, documentation, etc.
www.paraview.org/download The main ParaView download page. Useful for installing ParaView on
local machines for creating Catalyst scripts and viewing Catalyst output.
www.paraview.org/in-situ The main page for ParaView Catalyst.
[email protected] The mailing list for general ParaView and Catalyst support.
www.github.com/Kitware/ParaViewCatalystExampleCode Example code for integrating a simulation
code with Catalyst as well as creating a variety of VTK data structures.
The remainder of this guide is broken up into three main sections. Section 2 addresses users that wish
to use simulation codes that have already been instrumented with Catalyst. Section 3 is for developers who
wish to instrument their simulation code. Section 4 focuses on those users who wish to install and maintain
Catalyst on their computing systems.
Chapter 2
Figure 2.1: Traditional workflow (blue) and ParaView Catalyst enhanced workflow (green).
With the ParaView Catalyst enhanced workflow, the user specifies visualization and analysis output
during the pre-processing step. These output data are then generated during the simulation run and later
analyzed by the user. The Catalyst output can be produced in a variety of formats such as rendered images
with pseudo-coloring of variables; plots (e.g. bar graphs, line plots, etc.); data extracts (e.g. iso-surfaces,
slices, streamlines, etc); and computed quantities (e.g. lift on a wing, maximum stress, flow rate, etc.).
The goal of the enhanced workflow is to reduce the time to gain insight into a given physical problem
by performing some of the traditional post-processing work in situ. While the enhanced workflow uses
ParaView Catalyst to produce in situ outputs, the user does not need to be familiar with ParaView to use
this functionality. Configuration of the pre-processing step can be based on generic information to produce
desired outputs (e.g. an iso-surface value and the variable to iso-surface) and the output can be written in
either image file or other formats with which the user has experience.
There are two major ways in which the user can utilize Catalyst for in situ analysis and visualization.
The first is to specify a set of parameters that are passed into a pre-configured Catalyst pipeline. The second
is to create a Catalyst pipeline script using ParaViews GUI.
2.1
Creating pre-configured Catalyst pipelines places more responsibility on the simulation developer but can
simplify matters for the user. Using pre-configured pipelines can lower the barrier to using Catalyst with a
simulation code. The concept is that for most filters there is a limited set of parameters that need to be set.
For example, for a slice filter the user only needs to specify a point and a normal defining the slice plane.
Another example is for the threshold filter where only the variable and range needs to be specified. For each
10
pipeline though, the parameters should also include a file name to output to and an output frequency. These
parameters can be presented for the user to set in their normal workflow for creating their simulation inputs.
2.2
The downside to using pre-configured scripts is that they are only as useful as the simulation developer
makes them. These scripts can cover a large amount of use cases of interest to the user but inevitably the
user will want more functionality or better control. This is where it is useful for the simulation user to create
their own Catalyst Python scripts pipeline using the ParaView GUI.
There are two main prerequisites for creating Catalyst Python scripts in the ParaView GUI. The first
is that ParaView is built with the Catalyst Script Generator plugin enabled. This plugin was previously
called the CoProcessing Script Generator plugin for versions of ParaView before 4.2. Note that this plugin is
enabled by default when building ParaView from source as well as for versions of ParaView installed from the
available installers. Additionally, the version of ParaView used to generate the script should also correspond
to the version of ParaView Catalyst that the simulation code runs with. The second prerequisite is that the
user has a representative dataset to start from. What we mean by this is that when reading the dataset
from disk into ParaView that it is the same dataset type (e.g. vtkUnstructuredGrid, vtkImageData, etc.)
and has the same attributes defined over the grids as the simulation adaptor code will provide to Catalyst
during simulation runs. Ideally, the geometry and the attribute ranges will be similar to what is provided
by the simulation runs configuration. The steps to create a Catalyst Python pipeline in the ParaView GUI
are:
1. First load the ParaView plugin for creating the scripts. Do this by selecting Manage Plugins. . .
under the Tools menu ( Tools Manage Plugins. . . ). In the window that pops up, select CatalystScriptGeneratorPlugin and press the Load Selected button. After this, press the Close button to close the
window. This will create two new top-level menu items, Writers and CoProcessing . Note that you can
have the plugin automatically loaded when ParaView starts up by expanding the CatalystScriptGeneratorPlugin information by clicking on the + sign in the box to the left of it and then by checking the
box to the right of Auto Load.
2. Next, load in a representative dataset and create a pipeline. In this case though, instead of actually
writing the desired output to a file we need to specify when and where the files will be created when
running the simulation. For data extracts we specify at this point that information by choosing an
appropriate writer under the Writers menu. The user should specify a descriptive file name as well as
a write frequency in the Properties panel as shown in the image below. The file name must contain
a %t in it as this gets replaced by the time step when creating the file. Note that the step to specify
screenshot outputs for Catalyst is done later.
3. Once the full Catalyst pipeline has been created, the Python script must be exported from ParaView. This is done by choosing the Export State wizard under the CoProcessing menu ( CoProcessing
Export State ). The user can click on the Next button in the initial window that pops up.
4. After that, the user must select the sources (i.e. pipeline objects without any input connections) that
the adaptor will create and add them to the output. Note that typically this does not include sources
from the Sources menu since the generated Python script will instantiate those objects as needed (e.g.
for seeds for a streamline). In the case shown in Figure 2.2 the source is the filename 10.pvti reader
that is analogous to the input that the simulation codes adaptor will provide. The user can either
double click on the desired sources in the left box to add them to the right box or select the desired
sources in the left box and click Add. This is shown in Figure 2.3 below. After all of the proper sources
have been selected, click on Next.
5. The next step is labeling the inputs. The most common case is a single input in which case we use
the convention that it should be named input, the default value. For situations where the adaptor can
provide multiple sources (e.g. fluid-structure interaction codes where a separate input exists for the
fluid domain and the solid domain), the user will need to label which input corresponds to which label.
This is shown in Figure 2.4. After this is done, click Next.
11
Figure 2.2: Example of a pipeline with one writer included. It writes output from the Slice filter. Since the
writer is selected its file name and write frequency properties are also shown.
6. The next page in the wizard gives the user the option to allow Catalyst to check for a Live Visualization
connection and to output screenshots from different views. Check the box next to Live Visualization
to enable it. For screenshots, there are a variety of options. The first is a global option which will
rescale the lookup table for pseudo-coloring to the current data range for all views. The other options
are per view and are:
Image type choice of image format to output the screenshot in.
File Name the name of the file to create. It must contain a %t in it so that the actual simulation
time step value will replace it.
Write Frequency how often the screenshot should be created.
Magnification the user can create an image with a higher resolution than the resolution shown
in the current ParaView GUI view.
Fit to Screen specify whether to fit the data in the screenshot. This gives the same results in
Catalyst as clicking on the
button in the ParaView GUI.
If there are multiple views, the user should toggle through each one with the Next View and Previous
View buttons in the window. After everything has been set, click on the Finish button to create the
Python script.
7. The final step is specifying the name of the generated Python script. Specify a directory and a name
to save the script at and click OK when finished.
12
Figure 2.3: Selecting filename 10.pvti as an input for the Catalyst pipeline.
2.2.1
A question that often arises is how to create a representative dataset. There are two ways to do this. The
first way is to run the simulation with Catalyst with a script that outputs the full grid with all attribute
information. Appendix 7.1.2 has a script that can be used for this purpose. The second way is by using
the sources and filters in ParaView. The easiest grids to create within the GUI are image data grids (i.e.
uniform rectilinear grids), polydata and unstructured grids. For those knowledgeable enough about VTK,
the programmable source can also be used to create all grid types. If a multi-block grid is needed, the Group
Datasets filter can be used to group together multiple datasets into a single output. The next step is to
create the attribute information (i.e. point and/or cell data). This can be easily done with the Calculator
filter as it can create data with one or three components, name the array to match the name of the array
provided by the adaptor, and set an appropriate range of values for the data. Once this is done, the user
should save this out and then read the file back in to have the reader act as the source for the pipeline.
13
2.2.2
For users that are comfortable programming in Python, we encourage them to modify the given scripts as
desired. The following information can be helpful for doing this:
Sphinx generated ParaView Python API documentation at
www.paraview.org/ParaView3/Doc/Nightly/www/py-doc/index.html.
Using the ParaView GUI trace functionality to determine how to create desired filters and set their
parameters. This is done with Start Trace and Stop Trace under the Tools menu.
Using the ParaView GUI Python shell with tab completion. This is done with Python Shell under the
Tools menu.
2.3
ParaView Live
In addition to being able to set up pipelines a priori, through ParaView Lives capabilities the analyst can
connect to the running simulation through the ParaView GUI in order to modify existing pipelines. This is
useful for modifying the existing pipelines to improve the quality of information coming out of a Catalyst
enabled simulation. The live connection is done through Catalyst Connect. . . . This connects the simulation
to the pvserver where the data will be sent to. After the connection is made, the GUIs pipeline will look
like Figure 2.6. The live connection uses ParaViews concept of not performing anything computationally
expensive without specific prompting by the user. Thus, by default none of the Catalyst extracts are sent
to the server. This is indicated by the
icon to the left of the pipeline sources. To have the output from
a source sent to the ParaView server, click on the
icon. It will then change to
to indicate that it
is available on the ParaView server. This is shown for Contour0 in Figure 2.6. To stop the extract from
being sent to the server, just delete the object in the ParaView servers pipeline (e.g. Extract: Contour0 in
Figure 2.6).
Beginning in ParaView 4.2, the live functionality was improved to allow the simulation to also pause the
Catalyst enabled simulation run. This is useful for examining the simulation state at a specific point in time.
The design is based on debugging tools such that the simulation can be paused at the next available call to the
14
Catalyst libraries, at a specific time step or when the simulation time passes a specific value. Additionally,
a breakpoint that has not been reached yet can be removed as well. These controls are available under the
Catalyst menu. Note that the Pipeline Browser shows the simulation run state to signify the status of the
simulation. The icons for this are:
indicates that a breakpoint has been set but not yet reached.
2.4
A key point to keep in mind when creating Catalyst pipelines is that the choice and order of filters can make
a dramatic difference in the performance of Catalyst (this is true with ParaView as well). Often, the source
of performance degradation is when dealing with very large amounts of data. For memory-limited machines
like todays supercomputers, poor decisions when creating a pipeline can cause the executable to crash due
to insufficient memory. The worst case scenario is creating an unstructured grid from a topologically regular
grid. This is because the filter will change from using a compact grid data structure to a more general grid
data structure.
We classify the filters into several categories, ordered from most memory efficient to least memory efficient
and list some commonly used filters for each category:
1. Total shallow copy or output independent of input negligible memory used in creating a filters
output. The filters in this category are:
Annotate Time
Glyph
Outline
Append Attributes
Group Datasets
Outline Corners
Extract Block
Histogram
Extract Datasets
Integrate Variables
Extract Level
Normal Glyphs
Probe Location
2. Add field data the same grid is used but an extra variable is stored. The filters in this category are:
Block Scalars
Curvature
Gradient
Calculator
Elevation
Level Scalars
Generate Ids
Median
Compute Derivatives
Mesh Quality
15
Random Vectors
Transform
Warp (scalar)
Surface Flow
Warp (vector)
Process Id Scalars
Surface Vectors
3. Topology changing, dimension reduction the output is a polygonal dataset but the output cells are
one or more dimensions less than the input cell dimensions. The filters in this category are:
Cell Centers
Contour
Extract CTH Fragments
Extract CTH Parts
Extract Surface
Feature Edges
Mask Points
Outline (curvilinear)
Slice
Stream Tracer
4. Topology changing, moderate reduction reduces the total number of cells in the dataset but outputs
in either a polygonal or unstructured grid format. The filters in this category are:
Clip
Decimate
Extract Selection
Quadric Clustering
Threshold
5. Topology changing, no reduction does not reduce the number of cells in the dataset while changing
the topology of the dataset and outputs in either a polygonal or unstructured grid format. The filters
in this category are:
Append Datasets
Append Geometry
Clean
Clean to Grid
Connectivity
D3
Delaunay 2D/3D
Extract Edges
Linear Extrusion
Loop Subdivision
Reflect
Rotational Extrusion
Shrink
Smooth
Subdivide
Tessellate
Tetrahedralize
Triangle Strips
Triangulate
When creating a pipeline, the filters should generally be ordered in this same fashion to limit data explosion. For example, pipelines should be organized to reduce dimensionality early. Additionally, reduction is
preferred over extraction (e.g. the Slice filter is preferred over the Clip filter). Extracting should only be
done when reducing by an order of magnitude or more. When outputting data extracts, subsampling (e.g.
the Extract Subset filter or the Decimate filter) can be used to reduce file size but caution should be used
to make sure that the data reduction doesnt hide any fine features.
16
Chapter 3
A developer creating an adaptor needs to have knowledge of the simulation code data structures, relevant
understanding of the appropriate VTK data model, and the Catalyst API. Examples of adaptors are available
online at www.github.com/Kitware/ParaViewCatalystExampleCode.
3.1
High-Level View
While interfacing Catalyst with a simulation code may require significant effort, the impact on the code
base is minimal. In most situations, there are only three functions that need to be called from the existing
simulation code:
1. Initialize Catalyst needs to be initialized in order to be put in the proper state. For codes that depend
on MPI, this is normally done after MPI Init() is called. The initialize method is often implemented
in the adaptor.
2. CoProcess This function calls the adaptor code to check on any computations that Catalyst may
need to do. This call needs to provide the grid and field data structures to the adaptor as well as time
and time step information. It may also provide additional control information but that is not required.
This is normally called at the end of every time step update in the simulation code (i.e. after the fields
have been updated to the new time step and/or the grid has been modified).
3. Finalize On completion the simulation code must call Catalyst to finalize any state and properly
clean up after itself. For codes that depend on MPI, this is normally done before MPI Finalize() is
called. The finalize method is often implemented in the adaptor.
This is demonstrated in the code below:
MPI_Init(argc, argv);
#ifdef CATALYST
CatalystInit(argc, argv);
17
#endif
for(int timeStep=0;timeStep<numberOfTimeSteps;timeStep++)
{
<update grids and fields to timeStep>
#ifdef CATALYST
CatalystCoProcess(timeStep, time, <grid info>, <field info>);
#endif
}
#ifdef CATALYST
CatalystFinalize();
#endif
MPI_Finalize();
The adaptor code should be implemented in a separate source file. The reason for this is that it simplifies
the simulation code build process. The fact that there are only three calls to the adaptor from the simulation
code also helps in this matter.
As shown in Figure 3.1, the adaptor code is responsible for the interface between the simulation code
and Catalyst. Besides being responsible for initializing and finalizing Catalyst, the other responsibilities of
the adaptor are:
Querying Catalyst to see if any co-processing needs to be performed.
Providing VTK data objects representing the grids and fields for co-processing.
The pseudo-code shown below gives an idea of what this would look like in the adaptor:
void
{
1.
2.
3.
4.
5.
6.
7.
}
A complete example of a simple adaptor is shown below. Following this section well discuss the details
of the API to help solidify the understanding of the flow of information.
vtkCPProcessor* Processor = NULL; // static data
void CatalystInit(int numScripts, char* scripts[])
{
if(Processor == NULL)
{
Processor = vtkCPProcessor::New();
Processor->Initialize();
}
// scripts are passed in as command line arguments
for(int i=0;i<numScripts;i++)
{
vtkCPPythonScriptPipeline* pipeline =
vtkCPPythonScriptPipeline::New();
pipeline->Initialize(scripts[i]);
Processor->AddPipeline(pipeline);
pipeline->Delete();
18
}
}
void CatalystFinalize()
{
if(Processor)
{
Processor->Delete();
Processor = NULL;
}
}
// The grid is a uniform, rectilinear grid that can be specified
// with the number of points in each direction and the uniform
// spacing between points. There is only one field called
// temperature which is specified over the points/nodes of the grid.
void CatalystCoProcess(int timeStep, double time, unsigned int numPoints[3],
double spacing[3], double* field)
{
vtkCPDataDescription* dataDescription = vtkCPDataDescription::New();
dataDescription->AddInput("input");
dataDescription->SetTimeData(time, timeStep);
if(Processor->RequestDataDescription(dataDescription) != 0)
{
// Catalyst needs to output data
// Create an axis-aligned, uniform grid
vtkImageData* grid = vtkImageData::New();
grid->SetExtents(0, numPoints[0]-1, 0, numPoints[1]-1, 0, numPoints[2]-1);
dataDescription->GetInputDescriptionByName("input")->SetGrid(grid);
grid->Delete();
// Create a field associated with points
vtkDoubleArray* array = vtkDoubleArray::New();
array->SetName("temperature");
array->SetArray(field, grid->GetNumberOfPoints(), 1);
grid->GetPointData()->AddArray(array);
array->Delete();
Processor->CoProcess(dataDescription);
}
dataDescription->Delete();
}
3.2
Overview
Before we go into the details of the VTK and Catalyst API required for writing the adaptor code, we would
like to highlight some general details to keep in mind:
VTK does indexing starting at 0.
A vtkIdType is an integer type that is set during Catalyst configuration. It can either be a 32 bit or a
64 bit integer and by default is based on a native data type. A user may decide to manually configure
Catalyst to use either size. The advantage here could be reusing existing data array memory instead
of allocating extra memory to store essentially the same information in a different data type.
The most up-to-date Doxygen generated VTK documentation can be found at www.vtk.org/doc/
nightly/html/classes.html.
The most up-to-date Doxygen generated ParaView documentation can be found at www.paraview.
org/ParaView3/Doc/Nightly/html/classes.html.
19
The most in-depth knowledge of VTK that is required for writing the adaptor code is how to create the
VTK objects that are used to represent the grid and the field information. Since VTK is a general toolkit,
it has a variety of ways of representing grids. The reason for this is that it needs the generality of being able
to handle topologically unstructured grids while also having the constraint that it can handle simpler grids
(e.g. topologically uniform, axis-aligned grids) efficiently as well. Figure 3.2 shows the types of grids that
are supported in VTK.
In addition to these dataset types, VTK also supports a wide variety of cell types as well. These include
all of the normal 2D and 3D linear cell types such as triangles, quadrilaterals, tetrahedron, pyramids,
prisms/wedges and hexahedron. VTK also supports associating field information with each point or cell in
the datasets. In VTK this is called attribute data in general and point data and cell data when it is with
respect to points or cells in the dataset, respectively. Figure 3.11 shows the difference between point data
and cell data. The overall structure of a VTK dataset is that it has grid information, arrays for information
associated with each point in the grid and arrays for information associated with each cell in the grid. This
is shown in Figure 3.3.
20
3.3
It is important to know that VTK uses a pipeline architecture to process data. See the Pipeline section of
the ParaView Users Guide (www.paraview.org/files/v4.0/ParaViewManual.v4.0.pdf). This pipeline
architecture has some consequences for writing the adaptor. The first is that the objects that process the
data, filters in VTK parlance, are not allowed to modify any input data objects. The second is since VTK
is a visualization system, once data objects are created they are typically not incrementally modified (e.g.
removing a cell from a grid). Hence many of the data objects are stored in flat arrays in memory to preserve
computational efficiency. In the sections that follow, we do not discuss the full public API of each object
since just a few methods are used when creating data objects from scratch. In addition, most of the methods
described below are setter methods with corresponding getter methods that are not described here. For a
full description of these classes we refer the reader to the online Doxygen documentation and the VTK and
ParaView Users Guides.
3.3.1
vtkObject
Nearly all VTK classes derive from vtkObject. This class provides many basic capabilities including reference counting (to handle the creation, sharing and deletion of objects). Reference counting enables the
VTK user to track how many places an instantiated object is used as well as when it can be deleted (i.e.
when its reference count goes to zero). VTK doesnt allow vtkObjects to be created directly through their
constructors. Instead all objects that derive from vtkObject use the static New() method to create a new
instance (this method is referred to as an object factory). Because of reference counting, users are also not
allowed to directly delete an object. Instead, the reference count is reduced on an instance by invoking the
Delete() method on it when it is no longer needed within a particular scope. Thus the vtkObject (and its
subclasses) will automatically be deleted when the reference count goes to zero. The following code snippet
shows an example of how VTK objects are created, referenced and deleted.
vtkDoubleArray* a = vtkDoubleArray::New();
vtkPointData* pd = vtkPointData::New();
pd->AddArray(a);
a->Delete();
a->SetName("an array");
pd->Delete();
//
//
//
//
//
//
as ref count = 1
pds ref count = 1
as ref count = 2
as ref count = 1
valid as a hasnt been deleted
deletes both pd and a
Some key points here; dereferencing a or pd after pd has been deleted is a bug. It is valid though to
dereference a pointer to a VTK object after Delete() has been called on it as long as its reference count
is one or greater. To simplify the management of objects that derive from vtkObject, vtkWeakPointer,
vtkSmartPointer and vtkNew can be used. These are covered in Appendix 7.1.1.
3.3.2
vtkDataArray
The first major VTK data object we will discuss is vtkDataArray and its concrete implementations (e.g.
vtkDoubleArray, vtkIntArray, vtkFloatArray, vtkIdTypeArray, etc.). Concrete classes that derive from
vtkDataArray typically store numerical data and are always homogeneous. They also store their data in a
contiguous block of memory and assume that the data is not sparse. Since there can be many data arrays
associated with a grid, we identify them with a string name and use the const char* GetName() and void
SetName(const char* name) methods to get and set the name of the array, respectively. vtkDataArray uses
the concept of tuples and components. A component is a single data value of a tuple. A tuple is a set
of pieces of information representing a single concept. For example, for representing pressure there would
be a single component in each tuple. For velocity in a 3D space there would be 3 components in a tuple.
The number of tuples in a vtkDataArray corresponds to the number of these objects to be represented. For
example, if the array was being used to store values at nodes, or points, of the grid, the number of tuples for
the array would be equal to the number of nodes in the grid it is defined over. This is shown in Figure 3.4
below where we have a tuple of size 3 and 6 nodes in the grid, resulting in an array of size 18.
21
Figure 3.4: Grid representation of a vtkDataArray specified at nodes of the grid. The node index is in red
and the array index of each tuple component is shown in blue.
vtkDataArray can either use existing memory or allocate its own space to store the data. The preferred
way is to use existing memory. If the memory layout matches what VTK is expecting this is straight-forward.
Appendix 7.1.3 details how to reuse existing simulation memory if the layout does not match what VTK
is expecting. If the memory layout matches, reusing that memory is the recommended way since no extra
memory is needed to store the information in VTK format and no memory copy operation needs to be
performed. The methods to do this for a vtkFloatArray are:
void SetArray(float* array, vtkIdType size, int save)
void SetArray(float* array, vtkIdType size, int save, int deleteMethod)
The parameters are:
array the pointer to the existing chunk of memory to be used.
size the length of the array which needs to be at least the number of tuples multiplied by the number
of components in the vtkDataArray.
save set to 1 to keep the object from deleting the memory when it is deleted or set to 0 to have the
memory deleted when the object is deleted. By default the memory will be freed using free().
deleteMethod set to VTK DATA ARRAY FREE to use free() or to VTK DATA ARRAY DELETE
to use delete[] to free the memory.
As VTK filters dont modify their input, it is guaranteed that Catalyst will not modify any of the values in
the passed in array. An example of creating a vtkFloatArray from existing memory is shown below:
vtkFloatArray* arr = vtkFloatArray::New();
arr->SetName("an array");
float* values = new float[300];
arr->SetArray(values, 300, 0, vtkDoubleArray::VTK_DATA_ARRAY_DELETE);
arr->SetNumberOfComponents(3);
In this example, values will be deleted when the array arr gets deleted. It will have 100 tuples and 3
components. The component values still need to be specified in this example however.
If the memory layout doesnt match what VTK expects, the adaptor can allocate additional memory in
order to pass the data to Catalyst. There are multiple ways to set the size of the array. For adaptors the
length of the array is usually known before it is constructed. In this case the user should call SetNumberOfComponents(int) first and then SetNumberOfTuples(vtkIdType) to set the proper length. The values of the
array should be set using one of the following, assuming were using a vtkFloatArray object:
22
void SetValue(vtkIdType id, float value) set a single value at location id in the array.
void SetTupleValue(vtkIdType i, float* tuple) set all components of the ith tuple in the array.
It is important to note that the above methods do not perform range checking. This enables faster execution
time but at the expense of potential memory corruption. Some sample code is shown below.
vtkIntArray* arr = vtkIntArray::New();
arr->SetNumberOfComponents(3);
arr->SetNumberOfTuples(100);
arr->SetName("an array");
int tuple[3];
for(vtkIdType i=0;i<100;i++)
{
tuple[0] = <value>;
tuple[1] = <value>;
tuple[2] = <value>;
arr->SetTupleValue(i, tuple);
}
If the array length isnt know ahead of time then the following methods, which perform range checking and
allocate memory as necessary, should be used, again assuming that the object is a vtkFloatArray:
vtkIdType InsertNextValue(float value) set a single value in the next location in the array and return
the newly created array index.
vtkIdType InsertNextTupleValue(const float* tuple) set the next tuple of values in the array and
return the newly created tuple index.
void InsertValue(vtkIdType id, float value) set the value at location id in the array to value.
void InsertTupleValue(vtkIdType i, const float* tuple) set the tuple values for the array at tuple
location i.
Note that all of these methods will allocate memory as needed. Similar to C++ s stl vector, memory is
not allocated for every call to these methods though but doubled when inserting beyond its capacity. The
Squeeze() method can be used to regain all of the unused capacity. For the last two functions, the user needs
to be careful since using them can result in uninitialized values to be contained in the array.
3.3.3
Grid Types
VTK has a variety of grid types to choose from. They all derive from vtkDataSet and are inherently spatial structures. vtkDataSet also allows the subcomponents, i.e. the points and cells in VTK parlance, to
have attributes stored in vtkDataArrays set on them. The types of grids available in VTK are polygonal
mesh/polydata, unstructured grid, structured (curvilinear) grid, rectilinear grid and image data/uniform
rectilinear grid. In VTK these grid types correspond to the following classes respectively: vtkPolyData,
vtkUnstructuredGrid, vtkStructuredGrid, vtkRectilinearGrid and vtkImageData/vtkUniformGrid. Examples of each are shown in the Figure 3.2. The class hierarchy is shown Figure 3.5.
The most efficient grids for storage take advantage of a predefined topology and geometry. They are also
the least general. Note that vtkUniformGrid and vtkStructuredGrid both support blanking.
Topologically Structured Grids
vtkImageData, vtkUniformData, vtkRectilinearGrid and vtkStructuredGrid all assume a regular grid topology. When iterating over points or cells, the order is fastest in the logical i direction, next in the logical
j direction and finally in the logical k direction. For axis-aligned grids these correspond to the x-, y-, and
z-directions, respectively. For these regular grids, we use what are called extents for describing their topology
23
as well as how they are partitioned over multiple processes. Extents are arrays of 6 integers which specify
the start and end indices of the points in each of the three logical directions. The whole extent is the extent
for the entire grid and sub-extent, often referred to just as the extent, is one portion of the whole extent
that is accessed at a time. For adaptors the sub-extent will usually correspond to an individual processs
part of the grid. This is shown in Figure 3.6. Note that due to using extents, the partitioning is forced to
be logically blocked into contiguous pieces. While extents exist for each logical direction, these grids are not
required to have more than a single point in any logical direction. This allows the creation of 1D, 2D or 3D
structured grids.
Figure 3.6: Three process partition of a grid. The whole extent for the points is (0, 8, 0, 6, 0, 0). Process
0 (blue) has an extent of (0, 2, 0, 6, 0, 0) which results in 21 points and 12 cells, process 1 (grey) has an
extent of (2, 8, 0, 3, 0, 0) which results in 28 points and 18 cells, and process 2 (red) has an extent of (2, 8,
3, 6, 0, 0) which results in 28 points and 18 cells.
For each of VTKs topologically structured grid types, the user must set the extent for each process. This
can be done with either of the two following methods:
void SetExtent (int x1, int x2, int y1, int y2, int z1, int z2)
void SetExtent (int extent[6])
Negative values for extents can be used as long as the second extent in a direction is greater than or equal
to the first. The user should not use the SetDimensions() methods of any of these classes as this will cause
problems with partitioning the structured grids in parallel. Additionally, in the adaptor the user must call
24
either of the following two methods in the vtkCPInputDataDescription object for setting the whole extent
of the grid:
void SetWholeExtent (int x1, int x2, int y1, int y2, int z1, int z2)
void SetWholeExtent (int extent[6])
We will go into the details of this later but it is worth mentioning here as this step is often forgotten.
As we mentioned earlier, iterating over points and cells is fastest in the logical i direction, then the logical
j direction, and slowest in the logical k direction. Indexing of points and cells is done independent of its whole
extent but the logical coordinates are with respect to the whole extent. For example, the flat indexing and
logical indexing of the points and cells are shown in Figure 3.7 below for process 2s partition in Figure 3.6.
Figure 3.7: Showing the cell numbering in white and the point numbering for the corners for process 2s
extents in the above figure. The first number is its index and the set of numbers in parentheses are its logical
global coordinates.
25
26
grid->SetXCoordinates(xCoords);
xCoords->Delete();
grid->SetYCoordinates(yCoords);
yCoords->Delete();
In this example, the grid has 231 points and 200 2D cells. The points are irregularly spaced in both the X
and Y directions. Since the grid only has one layer of points in the z direction the z coordinates array does
not need to be set explicitly. This results in having the grid lie in the z=0 plane.
vtkPointSet
The remaining grids, vtkStructuredGrid, vtkPolyData and vtkUnstructuredGrid, are all geometrically irregular grids. Subsequently, they all derive from vtkPointSet which explicitly stores the point locations in a
vtkPoints object which has a vtkDataArray as a data member. The first way to set the point coordinates of
the grid is to create a vtkDataArray and use vtkPoints void SetData(vtkDataArray* coords) method. The
vtkDataArray object must have three components (i.e. tuple size of 3) in order to be used as the coordinates
of a vtkPointSet. The other option for creating the points is to build up the points array directly in vtkPoints. The first method to call is to set the proper data precision for the coordinate representation using
void SetDataTypeToFloat() or void SetDataTypeToDouble(). If the number of points are known a priori, the
next call should be setting the number of points with void SetNumberOfPoints(vtkIdType numberOfPoints).
After that, the coordinates can be set with the following methods:
void SetPoint(vtkIdType id, double x, double y, double z)
void SetPoint(vtkIdType id, float x[3])
void SetPoint(vtkIdType id, double x[3])
It is important to remember that the Set methods are the fastest but the reason for that is that they dont
do range checking (i.e. they can overwrite memory not allocated by the array).
If the number of points is not known a priori, then the user should allocate an estimated size with the
int Allocate (const vtkIdType size, const vtkIdType ext=1000) method. size is the estimated size. When a
value is inserted which exceeds the vtkPoint objects capacity, the capacity of the object is doubled. This
used to be what the ext parameter was used for but that is no longer used. As with vtkDataArray, there
are Insert methods to add in coordinate values to vtkPoints and allocate memory as needed. They are:
void InsertPoint (vtkIdType id, double x, double y, double z)
void InsertPoint (vtkIdType id, const float x[3])
void InsertPoint (vtkIdType id, const double x[3])
vtkIdType InsertNextPoint (double x, double y, double z)
vtkIdType InsertNextPoint (const float x[3])
vtkIdType InsertNextPoint (const double x[3])
We reiterate the warning that using InsertPoint() improperly may lead to having uninitialized data in the
array. Use void Squeeze() to reclaim unused memory.
The final step is to define the vtkPoints in the vtkPointSet via the void SetPoints(vtkPoints* points)
method.
vtkStructuredGrid
vtkStructuredGrid is still a topologically regular grid but is geometrically irregular. All of the major functions
for creating a vtkStructuredGrid have been discussed already. The only thing left to mention is that the
ordering of the coordinates in vtkPoints must match the ordering that they are iterated through. This was
shown in the figure above. An example of creating a structured grid is:
27
28
29
vtkUnstructuredGrid
vtkUnstructuredGrid supports all VTK cell types. It also derives from vtkPointSet for storing point information. It uses a single vtkCellArray to store all of the cells. Similar to vtkPolyData, we recommend using
void Allocate(vtkIdType size) to pre-allocate memory for storing cells. In this case though we recommend
a value of numCells*(numPointsPerCell+1) for the size. Similarly, for inserting cells, either the following
methods should be used:
vtkIdType InsertNextCell(int type, vtkIdType numPoints, vtkIdType* pts)
vtkIdType InsertNextCell(int type, vtkIdList* pts)
These are the same as for vtkPolyData. Similarly, the example for creating points and cells for a vtkUnstructuredGrid is the same as for vtkPolyData.
vtkCellArray
The functions listed above for adding cells to either vtkUnstructuredGrid or vtkPolyData are the simplest to
use. The problem with this approach is that it wont reuse existing memory for storing the cell connectivity
arrays. Internally in vtkUnstructuredGrid and vtkPolyData, this information is stored in vtkCellArray objects. vtkCellArray is a supporting object that explicitly represents cell connectivity using a vtkIdTypeArray.
The data in the array is stored in the form: (n,id1,id2,...,idn, n,id1,id2,...,idn, ...) where n is the number
of points in the cell, and id is a zero-offset index into the vtkDataArray in vtkPoints. This is shown in
Figure 3.10. Advantages of this data structure are its compactness, simplicity, and easy interface to external
data. However, it is totally inadequate for random access. We include this information for completeness but
unless a users native simulation data structure matches the vtkCellArray form, we suggest that users add
cells to vtkPolyData and vtkUnstructuredGrid through the interfaces of those classes and not by creating
a vtkCellArray and directly populating it with data. Note that if vtkCellArray is directly used with existing allocated memory, the user can configure Catalyst to have vtkIdType match the native type that the
simulation code uses to store ids.
The directions for using an existing array with vtkCellArray is to first create a vtkIdTypeArray and
use the SetArray() method to reuse existing memory. Next, use void SetCells(vktIdType numberOfCells,
vtkIdTypeArray* cells) to use the cells array in vtkCellArray. The next steps depend on which grid type is
being used.
For a vtkPolyData, as we mentioned above, it actually has four vtkCellArrays to store its cells, one for
vertex and polyvertex cells, one for line and polyline cells, one for triangle, quadrilateral, polygonal and pixel
cells, and one for triangle strips. To set the cell arrays, the following methods should be used in the given
order:
void SetVerts(vtkCellArray* v)
void SetLines(vtkCellArray* l)
void SetPolys(vtkCellArray* p)
void SetStrips(vtkCellArray* s)
void BuildCells()
Any of the above Set methods can be skipped if there are no corresponding cells of the proper type. This
order should be followed to ensure that the ordering of the cells matches any cell data attributes that exist.
The last method builds up the full cell information that enables random access to a vtkPolyDatas cells.
For vtkUnstructuredGrids, for simplicity we assume that there arent any polyhedral cells. In this case
the following methods can be used:
void SetCells(int type, vtkCellArray* cells)
void SetCells(int* types, vtkCellArray* cells)
30
31
the point data will have continuous values as long as the points are properly associated with the cells. In
general, cell data will be discontinuous unless there is a constant value for the field.
The main class for field data is vtkFieldData which is a container to store vtkDataArrays. The arrays are
stored in a flat array of memory and accessed either by their index location or the name of the vtkDataArray.
The main method here is int AddArray(vtkAbstractArray* array). This method appends an array to the
vtkFieldData objects list of arrays unless an array with that name already exists. If an array with that
name already exists then it replaces that with the passed in vtkDataArray. The return value is the index
in the list where the array was inserted. Note that every vtkDataObject has a vtkFieldData member object
which can be accessed through the vtkFieldData* GetFieldData() method. This can be used for storing
meta-data about the vtkDataObject and the arrays stored in the vtkFieldData do not have to have the same
number of tuples.
If we want to store arrays where the tuples are associated with either points or cells, we use vtkPointData
and vtkCellData, respectively. Both of these derive from vtkFieldData. Every vtkDataSet has both a
vtkPointData and a vtkCellData object and they are accessed with vtkPointData* GetPointData() and
vtkCellData* GetCellData(). Note that the arrays in either of these objects should have the number of
tuples matching the number of grid entities of the corresponding type. There is no explicit check when
inserting arrays into either of these but many filters will give warnings and/or fail if this condition isnt met.
The following snippet of code demonstrates how arrays are added to point data and cell data.
vtkDoubleArray* pressure = vtkDoubleArray::New();
pressure->SetNumberOfTuples(grid->GetNumberOfPoints());
pressure->SetName("pressure");
grid->GetPointData()->AddArray(pressure);
pressure->Delete();
vtkFloatArray* temperature = vtkFloatArray::New();
temperature->SetName("temperature");
temperature->SetNumberOfTuples(grid->GetNumberOfCells());
grid->GetCellData()->AddArray(temperature);
temperature->Delete();
3.3.4
Multi-Block Datasets
So far weve covered all of the main datasets and how to define attributes over them (i.e. the point and cell
data). For many situations though we will want to use multiple datasets to represent our simulation data
structures. Examples include overlapping grids (e.g. AMR) or when a single dataset type isnt appropriate
for storing the cell topology (e.g. using a vtkUniformGrid and a vtkUnstructuredGrid). The main class
for this is the vtkCompositeDataSet class. This is an abstract class that is intended to simplify the way
to iterate through the vtkDataSets stored in the different concrete derived classes. There are two main
types of composite datasets. The first type is for AMR type grids where only vtkUniformGrid datasets
are used to discretize the domain. These types of composite data sets have support for automatically
stitching the grids together through blanking. The two classes for AMR grids are vtkOverlappingAMR and
vtkNonOverlappingAMR which both derive from vtkUniformGridAMR. The second composite dataset type
supports all grids that derive from vtkDataSet but require any blanking needed for overlapping grids to be
taken care of explicitly. The two classes for these are vtkMultiBlockDataSet and vtkMultiPieceDataSet and
both derive from vtkDataObjectTree. The multi-block dataset hierarchy is shown in Figure 3.12. Because
vtkCompositeDataSet derives from vtkDataObject it has a vtkFieldData object that can be accessed by the
vtkFieldData* GetFieldData() method. This can be useful for storing meta-data.
vtkMultiBlockDataSet
vtkMultiBlockDataSet is the most general of the concrete implementations of vtkCompositeDataSet. Each
block can contain either any vtkDataSet type or any vtkCompositeDataSet type. This leads to a hierarchy
of datasets that can be stored in a vtkMultiBlockDataSet. An example of this is shown in Figure 3.13.
The vtkMultiBlockDataSet can be used recursively to store blocks at different levels. For each level that
32
a vtkMultiBlockDataSet is used, the adaptor should set the amount of blocks at that level using the void
SetNumberOfBlocks(unsigned int numBlocks) method. In parallel, the tree hierarchy must match on each
process but leaves of the tree are only required to be non-empty on at least one process. If the leaf is a
vtkDataSet then it should be non-empty on exactly one process. To set a sub-block of a vtkMultiBlockDataSet, use the void SetBlock(unsigned int blockNumber, vtkDataObject* dataObject) method. This
method assigns dataObject into the blockNumber location of its direct children.
vtkMultiPieceDataSet
The vtkMultiPieceDataSet class groups datasets that span multiple processes together into a single logical
sub-block of a vtkCompositeDataSet. The purpose is to help avoid some of the rigidity of concrete instances
of vtkDataSets while maintaining their logical grouping together. One example of this is the rigidity of partitioning topologically regular grids into logically rectangular blocks of cells. A process may have multiple
sub-blocks of the topologically regular grid such that when trying to combine them would cause the combined
blocks to not be able to be stored in a topologically convex sub-block. Another use for the vtkMultiPieceDataSet is for a sub-block that is a partitioned vtkDataSet. In this case they are logically grouped together
but cant be stored in the same sub-block of a vtkMultiBlockDataSet since each process will think that it
contains the entire vtkDataSet. The vtkMultiPieceDataSet is a flat structure with all of its children being
the same grid type. The methods that are used to set the pieces of the vtkMultiPieceDataSet are:
void SetNumberOfPieces(unsigned int numPieces) sets the number of pieces to be contained in the
vtkMultiPieceDataSet. This should be the same value on each process. When there is a single piece
per process the value of numPieces will be the number of processes.
void SetPiece(unsigned int pieceNumber, vtkDataObject* piece) sets piece for the pieceNumber location. Note that piece must be a vtkDataSet even though the method signature allows a vtkDataObject
to be passed in.
Note that vtkMultiPieceDataSet is intended to be included in other composite datasets, e.g. vtkMultiBlockDataSet or vtkOverlappingAMR. There is no writer in ParaView that can handle a vtkMultiPieceDataSet as
the main input so adaptors should nest any multi-piece datasets in a separate composite dataset. An example
of creating a multi-piece dataset where we only partition the grid in the x-direction is shown below:
vtkImageData* imageData = vtkImageData::New();
imageData->SetSpacing(1, 1, 1);
imageData->SetExtent(0, 50, 0, 100, 0, 100);
int mpiSize = 1;
int mpiRank = 0;
MPI_Comm_rank(MPI_COMM_WORLD, &mpiRank);
MPI_Comm_size(MPI_COMM_WORLD, &mpiSize);
vtkMultiPieceDataSet* multiPiece = vtkMultiPieceDataSet::New();
multiPiece->SetNumberOfPieces(mpiSize);
imageData->SetOrigin(50*mpiRank, 0, 0);
multiPiece->SetPiece(mpiRank, imageData);
imageData->Delete();
vtkMultiBlockDataSet* multiBlock = vtkMultiBlockDataSet::New();
multiBlock->SetNumberOfBlocks(1);
multiBlock->SetBlock(0, multiPiece);
multiPiece->Delete();
vtkUniformGridAMR
The vtkUniformGridAMR class is used to deal with AMR grids and to automate the process of nesting
the grids and blanking the appropriate points and cells, if blanking is needed. The first call to use in
constructing a vtkUniformGridAMR or any of its derived classes is the void Initialize(int numLevels, const
33
int* blocksPerLevel) method. This specifies how many levels there will be in the AMR data object and how
many blocks in each level. Note that blocksPerLevel needs to have at least numLevels values. The values
passed into Initialize() need to match on every process. Other class methods which should be used in the
order listed are:
void SetDataSet (unsigned int level, unsigned int idx, vtkUniformGrid *grid) Once the uniform grid
has been created it can be added to the vtkUniformGridAMR with this method. Note that the coarsest
level is 0 and that idx is the index that grid is to be inserted at for the specified level (i.e. 0 idx <
blocksPerLevel[level], where blocksPerLevel was passed in the Initialize() method).
void SetGridDescription(int gridDescription) The values of gridDescription specify what geometric
coordinates the uniform grids are meant to discretize. For example, VTK XYZ GRID indicates that
the vtkUniformGridAMR discretizes a volume (the default value) and VTK XZ PLANE indicates that
the vtkUniformGridAMR discretizes an area in the XZ plane. The definitions of appropriate values
for gridDescription are in vtkStructuredData.h.
vtkOverlappingAMR
The vtkOverlappingAMR grid is for when the set of uniform grids overlap in space and require blanking
in order to determine which grid is used to discretize the domain and for specifying attributes over. This
is the appropriate composite dataset for Berger-Colella type AMR grids. Because of this hierarchy there is
the notion of a global origin of the composite dataset. This is set with the void SetOrigin(double* origin)
method. A key point for vtkOverlappingAMR composite datasets is that the spacing is unique to each level
and must be maintained by the vtkUniformGrids that are used to discretize the domain at that level of the
composite dataset. This is done with the following method:
void SetSpacing(unsigned int level, const double spacing[3])
This needs to be called for each level with level 0 being the coarsest level. spacing is the distance between
consecutive points in each Cartesian direction. In addition to the spacing of each level, the nested hierarchy
must also be built up. vtkAMRBox is a helper class used to determine the hierarchy of the uniform grids
and their respective blanking. The main thing to keep in mind is that there is both a global origin value as
well as an origin value for each block. Additionally, vtkAMRBox needs the dimensions of each block and
the spacing for each level. The dimensions are the number of points in each direction. The main function of
interest for vtkAMRBox is the constructor which passes in all of the necessary values:
vtkAMRBox (const double *origin, const int *dimensions, const double *spacing, const double *globalOrigin, int gridDescription=VTK XYZ GRID) Here, origin is the minimum bounds of the vtkUniformGrid that the box represents, dimensions is the number of points in each of the grids logical
directions, spacing is the distance between points in each logical direction, globalOrigin is the minimum
bounds of the entire composite dataset and gridDescription specifies the logical coordinates that the
grid discretizes.
Once the vtkAMRBox is created for a vtkUniformGrid, the following methods should be used:
void SetAMRBox (unsigned int level, unsigned int id, const vtkAMRBox &box) level is the hierarchical level that box belongs to and id is the index at that level for the box. Note that similar to the
SetDataSet() method, valid values of id are between 0 and up to but not including the global number
of vtkUniformGrids at that level.
void SetAMRBlockSourceIndex (unsigned int level, unsigned int id, int sourceId) This method is
very similar to the SetDataSet() method but instead of specifying the dataset for a given level and
index at that level, it specifies the sourceId in the global composite dataset hierarchy. This is the
overall composite index of the dataset and can be set as the total number of datasets that exist at
coarser levels plus the number of datasets that have a lower index but are at the same level as the
given dataset.
34
After this has been done for each dataset, the void GenerateParentChildInformation() method needs to
be called. This method generates the proper relations between the blocks and the blanking inside of each
block. After this has been done, the uniform grids should be added with SetDataSet(). An example of
creating a vtkOverlappingAMR composite dataset is included below to help elucidate how all of this comes
together.
int numberOfLevels = 3;
int blocksPerLevel[] = {1, 1, 1};
vtkOverlappingAMR* amrGrid = vtkOverlappingAMR::New();
amrGrid->Initialize(numberOfLevels, blocksPerLevel);
amrGrid->SetGridDescription(VTK_XYZ_GRID);
double origin[] = {0,0,0};
double level0Spacing[] = {4, 4, 4};
double level1Spacing[] = {2, 2, 2};
double level2Spacing[] = {1, 1, 1};
amrGrid->SetOrigin(origin);
int level0Dims[] = {25, 25, 25};
vtkAMRBox level0Box(origin, level0Dims, level0Spacing, origin, VTK_XYZ_GRID);
int level1Dims[] = {20, 20, 20};
vtkAMRBox level1Box(origin, level1Dims, level1Spacing, origin, VTK_XYZ_GRID);
int level2Dims[] = {10, 10, 10};
vtkAMRBox level2Box(origin, level2Dims, level2Spacing, origin, VTK_XYZ_GRID);
amrGrid->SetSpacing(0, level0Spacing);
amrGrid->SetAMRBox(0, 0, level0Box);
amrGrid->SetSpacing(1, level1Spacing);
amrGrid->SetAMRBox(1, 0, level1Box);
amrGrid->SetSpacing(2, level2Spacing);
amrGrid->SetAMRBox(2, 0, level2Box);
amrGrid->GenerateParentChildInformation();
// the highest level grid
vtkUniformGrid* level0Grid = vtkUniformGrid::New();
level0Grid->SetSpacing(level0Spacing);
level0Grid->SetOrigin(0, 0, 0);
level0Grid->SetExtent(0, 25, 0, 25, 0, 25);
amrGrid->SetDataSet(0, 0, level0Grid);
level0Grid->Delete();
// the middle level grid
vtkUniformGrid* level1Grid = vtkUniformGrid::New();
level1Grid->SetSpacing(level1Spacing);
level1Grid->SetExtent(0, 20, 0, 20, 0, 20);
amrGrid->SetDataSet(1, 0, level1Grid);
level1Grid->Delete();
// the lowest level grid
vtkUniformGrid* level2Grid = vtkUniformGrid::New();
level2Grid->SetSpacing(level2Spacing);
level2Grid->SetExtent(0, 10, 0, 10, 0, 10);
amrGrid->SetDataSet(2, 0, level2Grid);
level2Grid->Delete();
vtkNonOverlappingAMR
The vtkNonOverlappingAMR grid is for the case of groups of vtkUniformGrids that do not overlap but can
have grids that are associated with different levels of the hierarchy. Note that the adaptor could arbitrarily
assign all vtkUniformGrids to be at the coarsest level but this would remove any hierarchical information
35
that may be useful by storing the grids at different levels. The methods of interested for constructing nonoverlapping composite datasets are all in its vtkUniformGridAMR superclass. An example is included below
which demonstrates the construction of a vtkNonOverlappingAMR grid.
int numberOfLevels = 3;
int blocksPerLevel[] = {1, 2, 1};
vtkNonOverlappingAMR* amrGrid = vtkNonOverlappingAMR::New();
amrGrid->Initialize(numberOfLevels, blocksPerLevel);
// the highest level grid
vtkUniformGrid* level0Grid = vtkUniformGrid::New();
level0Grid->SetSpacing(4, 4, 4);
level0Grid->SetOrigin(0, 0, 0);
level0Grid->SetExtent(0, 10, 0, 20, 0, 20);
amrGrid->SetDataSet(0, 0, level0Grid);
level0Grid->Delete();
// the first mid-level grid
vtkUniformGrid* level1Grid0 = vtkUniformGrid::New();
level1Grid0->SetSpacing(2, 2, 2);
level1Grid0->SetOrigin(40, 0, 0);
level1Grid0->SetExtent(0, 8, 0, 20, 0, 40);
amrGrid->SetDataSet(1, 0, level1Grid0);
level1Grid0->Delete();
// the second mid-level grid
vtkUniformGrid* level1Grid1 = vtkUniformGrid::New();
level1Grid1->SetSpacing(2, 2, 2);
level1Grid1->SetOrigin(40, 40, 0);
level1Grid1->SetExtent(0, 40, 0, 20, 0, 40);
amrGrid->SetDataSet(1, 1, level1Grid1);
level1Grid1->Delete();
// the lowest level grid
vtkUniformGrid* level2Grid = vtkUniformGrid::New();
level2Grid->SetSpacing(1, 1, 2);
level2Grid->SetOrigin(0, 0, 0);
level2Grid->SetExtent(56, 120, 0, 40, 0, 40);
amrGrid->SetDataSet(2, 0, level2Grid);
level2Grid->Delete();
3.4
Grid Partitioning
We have briefly covered partitioning the grid for parallel computing already but it is an important enough
of a topic that it deserves to be discussed in a complete manner. The driving motivation here is to use the
existing partitioning of the simulation grid. We assume that most filters will scale well with the existing grid
partitioning supplied by the simulation. VTKs datasets and composite data sets cover a wide enough range
of use cases that it should be rare that interprocess communication will be necessary to migrate simulation
grid data in order to properly create partitioned VTK grid data. VTK does assume a cell-based partitioning
of the grid where a cell is uniquely represented on a single process.
For topologically structured grids partitioning is done via extents as discussed above. For topologically
regular grids the developer has two choices for partitioning the grid. The first is using the grid that derives
from vtkDataSet and the second is using a vtkMultiBlockDataSet for each partition of the grid. Due to
the rigidity of the local extents and how they interact with the VTK pipeline, we recommend using the
vtkMultiBlockDataSet approach where each processs partition of the dataset is inserted as a block in the
composite dataset. This allows the extents to be independent for each block. For situations where a processs
partitioning of a topologically regular grid is not convex in the logical coordinates, a process can contribute
36
multiple blocks to a vtkMultiBlockDataSet such that each block is a convex logical subset of the total grid.
This is also useful for situations where the simulation data is chunked into smaller blocks to decrease cache
misses during runs. An example of a multi-block dataset with one block per process is shown below where
numberOfProcesses is the number of MPI processes and rank is the MPI rank of a process:
vtkMultiBlockDataSet* multiBlock = vtkMultiBlockDataSet::New();
multiBlock->SetNumberOfBlocks(numberOfProcesses);
multiBlock->SetBlock(rank, dataSet);
For vtkPolyData and vtkUnstructuredGrid, partitioning is straightforward when using cell-based partitionings of the simulations grid. Ghost cells should not be added to the VTK grids.
For point-based partitionings of the grid, there is typically an overlap of a single layer of cells that
exists on multiple process. These cells need to be assigned uniquely to a single process for VTK grids and
partitionings. Similar to topologically regular grids, vtkPolyData and vtkUnstructuredGrid datasets can
also be inserted into a vtkMultiBlockDataSet to allow for multiple blocks per process.
3.5
For a full description of VTKs pipeline architecture, we refer the reader to the VTK Users Guide (www.
kitware.com/products/books/vtkguide.html). We include a summary description of this architecture
since it is key to understanding how Catalyst outputs are generated. From a high level, Catalyst simply
defines and configures VTK pipelines that are executed at defined points in a simulation run.
VTK uses a data flow approach to transform information into desired forms. The desired form may be
derived quantities, subsetted quantities and/or graphical information. The transformations are performed
by filters in VTK. These filters take in data and perform operations based on a set of input parameters to the
filter. Most VTK filters do a very specific operation but by chaining multiple filters together a wide variety
of operations can be done to transform the data. A filter that doesnt have any input from a separate filter
is called a source and a filter that doesnt send its output to any other filters is called a sink. An example
of a source filter would be a file reader and an example of a sink filter would be a file writer. We call this
set of connected filters the pipeline. For Catalyst, the adaptor acts as the source filter for all pipelines. An
example of this is shown in Figure 3.14.
The pipelines task is to configure, execute, and pass vtkDataObjects between the filters. The pipeline
can be viewed as a directed, acyclic graph. Some key features of VTKs pipeline are:
Filters are not allowed to modify their input data objects.
It is demand driven meaning that filters only execute when something downstream requests that they
execute.
A filter will only re-execute if a request changed or something upstream changed.
Filters can have multiple inputs or outputs.
Filters can send their output to multiple separate filters.
This affects Catalyst in several key ways. The first being that the adaptor can use existing memory when
building the VTK data structures. The reason for this is that any VTK filter which operates on that data
will either create a new copy if the data needs to change or reuse the existing data through reference counting
if the data wont be modified. The second key way is that the pipeline will only be re-executed when it is
specifically requested
3.6
In this section we discuss how the adaptor passes information back and forth between the simulation code
and Catalyst. This information can be broken up into three areas:
37
3.6.1
High-Level View
Before diving into the details of the API, we want to describe the flow of information and its purpose to help
give a higher level of understanding of how the pieces work together. The first step is initialization which
sets up the Catalyst environment and creates the pipelines that will be executed later on. This is typically
called near the beginning of the simulation shortly after MPI is initialized. The next step is to execute the
pipelines if needed. This is usually done at the end of each time step update. The final step is finalizing
Catalyst and is usually done right before MPI is finalized.
The first and last steps are pretty simple but the middle step has a lot happening underneath the covers.
Essentially, the middle step queries the pipelines to see if any of them need to be executed. If they dont then
it immediately returns control back to the simulation code. In our experience, this is nearly instantaneous.
This must be fast since we expect many calls here and dont want to waste valuable compute cycles. If one
or more pipelines need to re-execute, then the adaptor needs to update the VTK data objects representing
the grid and attribute information and then execute the desired pipelines. Depending on the amount of
work that needs to be done by the filters in the pipeline, this can take a wide range of time. Once all of the
pipelines that need to be re-executed finish, control is returned back to the simulation code.
3.6.2
Class API
The main classes of interest for the Catalyst API are vtkCPProcessor, vtkCPDataDescription, vtkCPInputDataDescription, vtkCPPipeline and the derived classes that are specialized for Python. When Catalyst is
built with Python support, all of these classes are Python wrapped as well.
vtkCPProcessor
vtkCPProcessor is responsible for managing the pipelines. This includes storing them, querying them to see
if they need to be executed and executing them. The methods of interest for vtkCPProcessor are:
int Initialize() initializes the object and sets up Catalyst. This should be done after MPI Init() is
called. This initialization method assumes that MPI COMM WORLD is used.
int Initialize(vtkMPICommunicatorOpaqueComm& comm) initializes the object and sets up Catalyst. This initialization method uses an MPI communicator contained in comm which can be something besides MPI COMM WORLD. vtkMPICommunicatorOpaqueComm is defined in vtkMPI.h and
is used to avoid directly having to include the mpi.h header file. Note that this method was added in
ParaView 4.2.
int Finalize() releases all resources used by Catalyst. This should be done before MPI Finalize() is
called.
38
39
field attributes. This is demonstrated in Figure 3.15. This needs to be called before vtkCPProcessor::RequestDataDescription() is called but only needs to be called once per simulation time step for
each input.
void SetForceOutput(bool on) this allows the adaptor to force all of the pipelines to execute by
calling this method with on set to true. By default it is false and it is reset after each call to vtkCPProcessor::CoProcess(). In general, the adaptor wont know when a pipeline will want to execute but
in certain situations the adaptor may realize that some noteworthy event has occurred. An example of
this may be some key simulation feature occurs or the last time step of the simulation. In this situation
the adaptor can use this method to make sure that all pipelines execute. Note that user implemented
classes that derive from vtkCPPipeline should include logic for this. If this is used it should be called
before calling vtkCPProcessor::RequestDataDescription().
After vtkCPProcessor::RequestDataDescription() has been called, if the method returned 1 then the
adaptor needs to get the information set in the vtkCPPipeline objects. This information is used to determine
what data to provide to the pipelines for the following vtkCPProcessor::CoProcess() call. The following
methods can be used to get the vtkCPInputDataDescription object which is used to pass the grid to the
pipelines:
vtkCPInputDataDescription* GetInputDescription(unsigned int)
vtkCPInputDataDescription* GetInputDescriptionByName(const char* name)
For adaptors that provide a single pipeline input (i.e. AddInput() has only been called once), the conventional
arguments for the above two methods are 0 and input, respectively. If multiple grid inputs are provided by
the adaptor, its possible that not all of them are needed. To determine which ones are needed to update
the required pipelines the following method can be used:
bool GetIfGridIsNecessary(const char* name)
While vtkCPDataDescription is intended to pass the above information back and forth between the adaptor
and the pipelines, for user-developed pipelines there may be more information necessary to pass back and
forth. In this case, there is a user data object that can be used for this purpose. Currently we use a
vtkFieldData object for this functionality. The reasons for this are that it is Python wrapped and it can
hold a variety of data types through its intended use of aggregating classes that derive from vtkAbstractArray.
The classes that derive from vtkAbstractArray are Python wrapped as well. The methods for this are:
void SetUserData(vtkFieldData* data)
vtkFieldData* GetUserData()
vtkCPInputDataDescription
The vtkCPInputDataDescription class is similar to vtkCPDataDescription in that it passes information between the adaptor and the pipelines. The difference though is that vtkCPInputDataDescription is meant
to pass information about the grids and fields. As mentioned above, there should be a vtkCPInputDataDescription object in vtkCPDataDescription for each separate input VTK data object provided by the adaptor.
The main methods or interest are:
void SetGrid (vtkDataObject *grid) set the input data object representing the grids and their attributes for the pipelines.
void SetWholeExtent (int, int, int, int, int, int) or void SetWholeExtent (int[6]) for topologically
regular grids, set the whole extent for the entire grid.
There are a variety of other methods that are intended to increase the efficiency of the adaptor. The purpose
of them is to inform the adaptor code which attributes are needed for the pipelines. It is potential future
work for Catalyst and so for now the above methods are the proper ones to be used for this class.
40
3.6.3
Here we provide details through a summary example. Note that there are more examples available at
www.github.com/Kitware/ParaViewCatalystExampleCode.
Initialization steps
Finalization steps
Note that items with an asterisk only need to be done the first time the co-processing routines are
executed as long as they remain persistent data structures. In the code below we walk the reader through a
full example of a simplified adaptor.
// declare some static variables
vtkCPProcessor* Processor = NULL;
vtkUnstructuredGrid* VTKGrid;
// Initialize Catalyst and pass in some file names
// for Python scripts.
void Initialize(int numScripts, char* scripts[])
{
if(Processor == NULL)
{
Processor = vtkCPProcessor::New();
Processor->Initialize();
}
else
{
Processor->RemoveAllPipelines();
}
// Add in the Python script
for(int i=0;i<numScripts;i++)
{
vtkCPPythonScriptPipeline* pipeline = vtkCPPythonScriptPipeline::New();
pipeline->Initialize(scripts[i]);
Processor->AddPipeline(pipeline);
pipeline->Delete();
}
41
}
// clean up at the end
void Finalize()
{
if(Processor)
{
Processor->Delete();
Processor = NULL;
}
if(VTKGrid)
{
VTKGrid->Delete();
VTKGrid = NULL;
}
}
// The simulation calls this method at the end of every time
// step. grid and attributes are the simulation data structures.
// lastTimeStep is a flag indicating whether this will be
// the last time CoProcess is called. It is used to force all of
// the pipelines to execute.
void CoProcess(Grid& grid, Attributes& attributes, double time,
unsigned int timeStep, bool lastTimeStep)
{
vtkCPDataDescription* dataDescription = vtkCPDataDescription::New();
// specify the simulation time and time step for Catalyst
dataDescription->AddInput("input");
dataDescription->SetTimeData(time, timeStep);
if(lastTimeStep == true)
{
// assume that we want to all the pipelines to execute if it
// is the last time step.
dataDescription->ForceOutputOn();
}
if(Processor->RequestDataDescription(dataDescription) != 0)
{
// Catalyst wants to perform co-processing. We need to build
// the VTK grid and set the attribute information on it now.
BuildVTKDataStructures(grid, attributes);
// Make a map from input to our VTK grid so that
// Catalyst gets the proper input dataset for the pipeline.
dataDescription->GetInputDescriptionByName("input")->SetGrid(VTKGrid);
// Call Catalyst to execute the desired pipelines.
Processor->CoProcess(dataDescription);
}
dataDescription->Delete();
}
3.7
Catalyst is implemented as a C++ library with the addition of Python wrapping for many methods. This
makes it simple to natively link Catalyst with simulation codes developed in either C++ or Python. However,
many simulation codes are written in C or Fortran and require the addition of C++ code to create VTK
data objects. This is a common enough situation that we have added methods to Catalyst to simplify this.
42
Removing name mangling of C++ functions is necessary so that they may be called by Fortran or C code.
This is done by adding in extern C to the beginning of the C++ function declaration. For header files that
are to be used with C and C++ code, the following can be done:
#ifdef __cplusplus
extern "C"
{
#endif
void CatalystInitialize(int numScripts, char* scripts[]);
void CatalystFinalize();
#ifdef __cplusplus
}
#endif
The cplusplus macro is only defined for C++ compilers which then support extern C to remove C++
mangling of the function names without affecting the C compilers use of the header file. Another key to
inter-language calls is that generally only built-in types and pointers to arrays of built-in types should be
used. For Fortran, all data objects are passed as pointers. For simulation codes written in C, the proper
header file to include is CAdaptorAPI.h if Python isnt used or needed. Note that these methods assume a
single grid input which is referenced by the input key. The main functions of interest defined here are:
void coprocessorinitialize() initialize Catalyst.
void coprocessorfinalize() finalize Catalyst.
void requestdatadescription(int* timeStep, double* time, int* coprocessThisTimeStep) check the
current pipelines to see if any of them need to execute for the given time and time step. The return
value is in coprocessThisTimeStep and is 1 if co-processing needs to be performed and 0 otherwise.
void coprocess() execute the Catalyst pipelines for the timeStep and time specified in requestdatadescription(). Note that the adaptor must update the grid and attribute information and set them in the
proper vtkCPInputDataDescription object obtained with vtkCPAdaptorAPI::GetCoProcessorData().
If Python is used in Catalyst, then the proper header file to include in C code is CPythonAdaptorAPI.h.
The two functions defined in this header file are:
void coprocessorinitializewithpython(char* pythonFileName, int* pythonFileNameLength) initialize
Catalyst with the ability to use Python. If pythonFileName is not null and pythonFileNameLength is
greater than zero it also creates a vtkCPPythonScriptPipeline object and adds it to the vtkCPProcessor
object. Note that this method should be used instead of coprocessorinitialize().
void coprocessoraddpythonscript(char* pythonFileName, int* pythonFileNameLength) creates a
vtkCPPythonScriptPipeline object and adds it to group of pipelines to be executed by the vtkCPProcessor object.
Note that these are just convenience methods and are not required to be used. A C example is included
in the git examples repository (www.github.com/Kitware/ParaViewCatalystExampleCode) which does not
use these methods.
3.7.1
The final step to integrating Catalyst with the simulation code is compiling all code and linking the resulting
objects together. The simplest way to do this is to use CMake (www.cmake.org) as that will take care of all
of the dependencies (i.e. header files as well as libraries). An example CMake file, CMakeLists.txt, is shown
below.
43
cmake_minimum_required(VERSION 2.8.8)
project(CatalystCxxFullExample)
set(USE_CATALYST ON CACHE BOOL
"Link the simulator with Catalyst")
if(USE_CATALYST)
find_package(ParaView 4.3 REQUIRED COMPONENTS
vtkPVPythonCatalyst)
include("${PARAVIEW_USE_FILE}")
set(Adaptor_SRCS FEAdaptor.cxx)
add_library(Adaptor ${Adaptor_SRCS})
target_link_libraries(Adaptor vtkPVPythonCatalyst vtkParallelMPI)
add_definitions("-DUSE_CATALYST")
else()
find_package(MPI REQUIRED)
include_directories(${MPI_CXX_INCLUDE_PATH})
endif()
add_executable(FEDriver FEDriver.cxx FEDataStructures.cxx)
if(USE_CATALYST)
target_link_libraries(FEDriver LINK_PRIVATE Adaptor)
include(vtkModuleMacros)
include(vtkMPI)
vtk_mpi_link(FEDriver)
else()
target_link_libraries(FEDriver LINK_PRIVATE ${MPI_LIBRARIES})
endif()
This gives the option of building the simulation code with or without linking to Catalyst by allowing the user
at configure time to enable or disable using Catalyst with the USE CATALYST CMake option. If Catalyst
is enabled then the USE CATALYST macro is defined and can be used in the driver code to include header
files and function calls to the adaptor code.
Additionally, this example CMake file adds a dependency on the Adaptor for the FEDriver simulation
code example. If the simulation code doesnt require the Python interface to Catalyst, the user can avoid
the Python dependency by changing the required ParaView components from vtkPVPythonCatalyst to
vtkPVCatalyst. Either of these components also brings in the rest of the Catalyst components and header
files for compiling and linking.
For simulation codes that do not require CMake to build, we suggest using an example to determine
the required header file locations for compiling and required libraries for linking. Due to system specific
configurations, any attempt to list all the dependencies and locations here would be incomplete.
3.7.2
Python code doesnt need to compile and link with Catalyst as the needed parts of Catalyst are Python
wrapped and available by importing the proper modules. The typical Catalyst modules that need to be imported are paraview, vtkPVCatalystPython, vtkPVPythonCatalystPython, paraview.simple, paraview.vtk,
paraview.numpy support and vtkParallelMPIPython. However it is necessary to set up the proper system
paths so that the ParaView Catalyst modules can be properly loaded. When working with ParaView Catalyst
in a build tree, the following system variables need to be set for a Linux machine:
LD LIBRARY PATH needs to include the lib subdirectory of the build directory.
PYTHONPATH needs to include the lib and lib/site-packages subdirectories of the build directory.
When working with ParaView Catalyst in an install tree that includes development files, the following system
variables need to be set for a Linux machine:
LD LIBRARY PATH needs to include the lib/paraview-4.3 subdirectory of the install directory.
44
3.8
If Catalyst is built without Python support, all pipelines will need to be hard-coded in C++ . Even in cases
when Catalyst is built with Python, simulation code developers may wish to create hard-coded C++ pipelines
for their users. The main reason for this approach is to create a simplified interface for the simulation user.
The user does not have to use ParaView to create any Catalyst pipelines and may not even need to use
ParaView for post-processing in situ extracts. In this section, we go through three different ways to do this.
The first is directly creating VTK pipelines. The second is creating VTK pipelines through ParaViews C++
server-manager interface. The third is creating VTK pipelines through ParaView wrapped Python scripts.
In Table 3.1 below we list the main advantages and disadvantages of each.
Pipeline
VTK C++
ParaView C++
ParaView
Python
Advantages
Good documentation, not dependent on
Python, many examples
Automatically sets up compositing and
render passes easily, not dependent on
Python
Can be modified without requiring recompilation, can use existing scripts created in GUI and/or using ParaViews
trace functionality
Disadvantages
Complex to create output images in parallel, changes require recompilation
Sparse documentation, few examples,
changes require recompilation
Requires linking with Python
We recommend reviewing the Avoiding Data Explosion section (Section 2.4) before creating specialized
pipelines. The reason for this is that while many filters are very memory efficient, others can dramatically
increase the amount of memory needed. This is a major factor to consider when running on memory limited
HPC machines where no virtual memory is available.
3.8.1
Creating a custom VTK C++ pipeline is fairly straightforward for those that are familiar with VTK. This
is done in a class that derives from vtkCPPipeline. The two methods that need to be implemented are RequestDataDescription() and CoProcess(). Optionally, Finalize() can be implemented if there are operations
45
that the class needs to do before being deleted. RequestDataDescription() will contain code to determine if
the VTK C++ pipeline needs to be executed and return 1 if it does and 0 otherwise. It should also check
that the proper information is set (e.g. output file name information) for the pipeline to output the desired
data. CoProcess() is the method in which the actual VTK C++ pipeline is executed. In the example in the
git repository we create the pipeline every time it is needed but that is not necessary. Note that pipelines
are not limited to using only filters specified in the VTK code base. They can also use filters specified in
the ParaView code base as well. For example, we recommend using ParaViews vtkCompleteArrays filter
prior to using any of the parallel XML writers available in VTK. The reason for this is that the parallel
XML writers can give bad output if process 0 has no points or cells due to not having the needed attribute
information to include in the meta-file of the format.
It is beyond the scope of this Catalyst Users Guide to give a complete description of all of the filters in
VTK and ParaView. In addition to the VTK Users Guide, wiki and doxygen documentation web pages,
we also recommend looking at the examples at www.vtk.org/Wiki/VTK/Examples/Cxx for help in creating
VTK pipelines. A fully functioning example with a hard-coded VTK C++ pipeline is available from the
Catalyst examples git repository.
3.8.2
As background, ParaViews server-manager is the code that controls the flow of information and maintains
state in ParaViews client-server architecture. It is used by the client to set up the pipeline on the server, to
set the parameters for the filters and execute the pipeline, among other duties. Besides these duties, it will
automatically do things like add in the vtkCompleteArrays filter prior to any parallel writers that are added
in the pipeline. The reason for this is mentioned in the previous section. Additionally, it properly sets up
the parallel image compositing that can be difficult in pure VTK code.
Similar to creating a VTK C++ pipeline, we wont go into the full details of creating a ParaView servermanager pipeline due to the extent of the information. Most classes that will be used derive from vtkSMProxy, vtkSMProperty or vtkSMDomain. vtkSMProxy is used for creating VTK objects such as filters and
maintaining references and state of the VTK objects. vtkSMProperty is used for calling methods on the
VTK objects with given passed in parameters (e.g. setting the file name of a writer or setting the iso-surface
values of the contour filter). vtkSMDomain represents the possible values properties can have (e.g. the
radius of a sphere must be positive). The XML files under the ParaViewCore/ServerManager/SMApplication/Resources subdirectory of the ParaView source directory lists all of the proxy information. The key
XML files are:
filters.xml contains the descriptions of all of the filters that may be available in Catalyst.
sources.xml contains pipeline sources such as spheres, planes, etc. They may be useful for setting
inputs such as seed points for streamlines.
writers.xml contains descriptions of the writers that may be available in Catalyst.
utilities.xml contains utility proxies such as functions and point locators that may be needed by
certain filters.
rendering.xml contains proxies for setting rendering options such as cameras, mappers, textures, etc.
views and representations.xml contains proxies for setting view information such as 3d render views,
charts, etc. and representations such as surface, wireframe, etc.
Note that due to configuration of Catalyst, some proxies listed in the XML files may not be available. An
example of a ParaView server-manager created Catalyst pipeline is included in the git examples repository.
3.8.3
For creating custom Catalyst Python pipelines, the simplest way is to start with something that is fairly
similar already. The easiest way to do that is to create a similar pipeline in ParaView and export it using
the Catalyst script generator plugin (discussed in Section 2.2). For the most part, these generated Python
46
scripts are very readable, especially if no screenshots are being output. Other useful ParaView tools for
creating and/or modifying Catalyst Python scripts include:
the ParaView GUIs Python interpretor (available by going to Tools Python Shell in the main menu)
supports tab completion to help see available methods for each object.
using ParaViews trace functionality that can record the Python commands that mimic a users interaction with the GUI. This is available by using Tools Start Trace and Tools Stop Trace to start and
stop the trace, respectively.
Additionally, most of the ParaView wrapped objects have at least a minimal built-in documentation. There
is also Sphinx generated documentation available at www.paraview.org/ParaView3/Doc/Nightly/www/
py-doc/. We assume that through the tools above users will be able to create the proper objects and
set the proper parameters for them. The other information that is useful for creating custom Catalyst
scripts is being able to query for information about the output from filters. This includes information like
bounds of the output dataset, the ranges of attributes, etc. These are the typical pieces of information that
will be used to add logic into a custom Python pipeline. For example, when creating iso-surfaces through
the contour filter it is necessary to know the range of the data array that is to be iso-surfaced with respect
to. From a ParaView Python wrapped filter proxy, the following methods are the most useful for querying
filter output:
UpdatePipeline() executes the pipeline such that the filter output is current. This should be called
before any information is requested from the following methods.
GetDataInformation() get information about the filters output data object. Members of interest
are:
GetDataSetTypeAsString() return the VTK class name of the dataset (e.g. vtkPolyData)
GetPointData()/GetCellData() an object that provides information about the point data or cell data
arrays, respectively. The main members of interest for this are:
GetNumberOfArrays() give the number of point data or cell data arrays available
GetArray() return an array information object. The single argument to this method can either
be an integer index or a string name. The main members of this are:
Name the name of the array
GetRange() the range of the values
47
datarange = pd.GetArray("Elevation").
Note that these scripts can be added to vtkCPProcessor as long as they implement the RequestDataDescription() and DoCoProcessing() methods.
48
49
50
Figure 3.10: Internal vtkUnstructuredGrid data structures for storing cells connectivities.
51
Figure 3.13: Multi-block dataset where the outlines are for blocks with vtkUniformGrids and the interior
surface is a vtkUnstructuredGrid.
52
Figure 3.15: An example of a fluid-structure interaction simulation with separate grids and fields used for
the solid and the fluid domains.
53
Chapter 4
This section is targeted towards those users or developers responsible for building ParaView Catalyst. As
far as installation is concerned, Catalyst is a subset of the ParaView code base. Thus, all of the functionality
available in Catalyst is also available in ParaView. The difference is that ParaView will by default have
many more dependencies and thus will have a larger executable size. Catalyst is the flexible and specialized
configuration of ParaView that is used to reduce the executable size by reducing dependencies. For example, if
no output images are to be produced from a Catalyst-instrumented simulation run then all of the ParaView
and VTK code related to rendering and the OpenGL libraries need not be linked in. This can result in
significant memory savings, especially when considering the number of processes utilized when running a
simulation at scale. In one simple example, the executable size was reduced from 75 MB when linking with
ParaView to less than 20 MB when linking with Catalyst. The main steps for configuring Catalyst are:
1. Setting up an edition
2. Extract the desired code from ParaView source tree into a separate Catalyst source tree
3. Build Catalyst
Most of the work is in the first step which is described below. A Catalyst edition is a customization of
ParaView to support a desired subset of functionality from ParaView and VTK. There can be many editions
of Catalyst and these editions can be combined to create several customized Catalyst builds. Assuming that
the desired editions have already been created, the second step is automated and is done by invoking the
following command from the <ParaView source dir>/Catalyst directory:
python catalyze.py -i <edition_dir> -o <Catalyst_source_dir>
Note that more editions can be added with the -i <edition dir> and that these are processed in the order they
are given, first to last. For the minimal base edition included with ParaView, this would be -i Editions/Base.
The generated Catalyst source tree will be put in <Catalyst source dir>. For configuring Catalyst from the
desired build directory, do the following:
<Catalyst_source_dir>/cmake.sh <Catalyst_source_dir>
The next step is to build Catalyst (e.g. using make on Linux systems).
4.1.1
2. Specify files from the ParaView source tree to be copied into the created Catalyst source tree.
3. Specify files from the edition to be copied into the Catalyst source tree.
The information describing which files are in the generated Catalyst source tree is all stored in a JSON
file called manifest.json in the main directory of the edition. The user processes this information with a
Python script called catalyze.py that is located in the <ParaView source dir>/Catalyst directory.
4.1.2
By default, Catalyst will be built with the default ParaView build parameters (e.g. build with shared
libraries) unless one of the Catalyst editions changes that in its manifest.json file. An example of this is
shown below:
{
"edition": "Custom",
"cmake":{
"cache":[
{
"name":"BUILD_SHARED_LIBS",
"type":"BOOL",
"value":"OFF"
}
]
}
}
Here, ParaViews CMake option of building shared libraries will be set to OFF for this edition named
Custom. It should be noted that users can still change the build configuration from these settings but it
should be done after Catalyst is configured with the cmake.sh script.
4.1.3
Copying Files from the ParaView Source Tree into the Created Catalyst
Source Tree
By default, very little source code from the ParaView source tree will be copied to the generated Catalyst
source tree. Each edition will likely want to add in several source code files to the Catalyst source tree. Most
of these files will be filters but there may also be several helper classes that are needed to be copied over as
well. In the following JSON snippet we demonstrate how to copy the vtkPVArrayCalculator class into the
generated Catalyst source tree.
{
"edition": "Custom",
"modules":[
{
"name":"vtkPVVTKExtensionsDefault",
"path":"ParaViewCore/VTKExtensions/Default"
"include":[
{
"path":"vtkPVArrayCalculator.cxx"
},
{
"path":"vtkPVArrayCalculator.h"
}
],
"cswrap":true
55
}
]
}
A description of the pertinent information follows:
"name":"vtkPVVTKExtensionsDefault" the name of the VTK or ParaView module. In this case it
is vtkPVVTKExtensionsDefault. The name of the module can be found in the modules.cmake file in
the corresponding directory. It is the first argument to the vtk module() function.
"path":"ParaViewCore/VTKExtensions/Default" the subdirectory location of the module relative to
the main source tree directory (e.g. <ParaView source dir>/ParaViewCore/VTKExtensions/Default
in this case)
"path":"vtkPVArrayCalculator.cxx" the name of the file to copy from the ParaView source tree to
the generated Catalyst source tree.
"cswrap":true if the source code needs to be client-server wrapped such that it is available through
ParaViews server-manager. For filters that are used through ParaViews Python interface or through
a server-manager hard-coded C++ pipeline this should be true. For helper classes this should be false.
The difficult part here is determining which files need to be included in Catalyst. In the example above,
the actual name of the ParaView proxy for the vtkPVArrayCalculator is Calculator. Thus, to construct a
ParaView client proxy for vtkPVArrayCalculator on the server, the user would need to call Calculator() in
the Python script. The best way to determine this connection between the name of the ParaView proxy and
the actual source code is in the XML files in the ParaViewCore/ServerManager/SMApplication/Resources
directory. In this case the proxy definition is in the filters.xml file. The proxy label XML element will be
converted into the Python constructor for the proxy and the class name is stored in the proxy class XML
element. The conversion of the proxy label is done by removing spaces in the XML attribute. This is sufficient
for many situations but for some cases there will be additional classes needed to be included in order to
properly compile Catalyst. This can occur when the included source code derives from a class not already
included in Catalyst or uses helper classes not already included in Catalyst. For the vtkPVArrayCalculator
class we will also need to include the vtkArrayCalculator class that it derives from.
4.1.4
Copying Files From the Edition Into the Catalyst Source Tree
Some of the files that need to be in the generated Catalyst source tree cannot be directly copied over from
the ParaView source tree. For example, CMakeLists.txt files need to be modified in the Catalyst source tree
when multiple editions need to be added into a specialized CMakeLists.txt file in the same directory. This is
done with the "replace" keyword. An example of this is shown below for the vtkFiltersCore module. Here,
the vtkArrayCalculator source code is added to the Catalyst source tree and so the CMakeLists.txt file in
that directory needs to be modified in order to include that class to be added to the build.
"modules":[
{
"name":"vtkFiltersCore",
"path":"VTK/Filters/Core",
"include":[
{
"path":"vtkArrayCalculator.cxx"
},
{
"path":"vtkArrayCalculator.h"
}
],
"replace":[
{
56
"path":"VTK/Filters/Core/CMakeLists.txt"
}
],
"cswrap":true
}
]
In this case, the CMakeLists.txt file that needs to be copied to the Catalyst source tree exists in the <edition dir>/VTK/Filters/Core directory, where edition dir is the location of this custom edition of Catalyst.
Since the Base edition already includes some files from this directory, we want to make sure that the CMakeLists.txt file from this edition also includes those from the Base edition. This CMakeLists.txt file is shown
below:
set(Module_SRCS
vtkArrayCalculator.cxx
vtkCellDataToPointData.cxx
vtkContourFilter.cxx
vtkContourGrid.cxx
vtkContourHelper.cxx
vtkCutter.cxx
vtkExecutionTimer.cxx
vtkFeatureEdges.cxx
vtkGridSynchronizedTemplates3D.cxx
vtkMarchingCubes.cxx
vtkMarchingSquares.cxx
vtkPointDataToCellData.cxx
vtkPolyDataNormals.cxx
vtkProbeFilter.cxx
vtkQuadricClustering.cxx
vtkRectilinearSynchronizedTemplates.cxx
vtkSynchronizedTemplates2D.cxx
vtkSynchronizedTemplates3D.cxx
vtkSynchronizedTemplatesCutter3D.cxx
vtkThreshold.cxx
vtkAppendCompositeDataLeaves.cxx
vtkAppendFilter.cxx
vtkAppendPolyData.cxx
vtkImageAppend.cxx
)
set_source_files_properties(
vtkContourHelper
WRAP_EXCLUDE
)
vtk_module_library(vtkFiltersCore ${Module_SRCS})
Note that this CMakeLists.txt file does two things. Firstly it specifies which files to be compiled in the source
directory. Next, it specifies properties of the source files. In the above example, vtkContourHelper is given
a property specifying that it should not be wrapped. Another property which is commonly set indicates
that a class is an abstract class (i.e. it has pure virtual functions). An example of how to do this is shown
below.
set_source_files_properties(
vtkXMLPStructuredDataWriter
57
vtkXMLStructuredDataWriter
ABSTRACT)
58
Chapter 5
Examples
5.1
Examples
There are a wide variety of VTK examples at www.vtk.org/Wiki/VTK/Examples. This site includes C, C++ ,
Fortran and Python examples but is targeted for general VTK development. Examples specific to ParaView
Catalyst can be found at www.github.com/Kitware/ParaViewCatalystExampleCode. Descriptionts of the
examples are listed below.
FortranPoissonSolver
An example of a parallel, finite difference discretization of the Poisson equation implemented in Fortran
using a Conjugate Gradient solver. Instead of co-processing at the end of each time step it co-processes
at the end of each iteration.
Fortran90FullExample
An example of a simulation code written in Fortran that is linked with Catalyst.
CFullExample
An example of a simulation code written in C. This uses some methods from Catalyst for storing VTK
data structures. This assumes a vtkUnstructuredGrid.
CFullExample2
An example of a simulation code written in C. This improves upon the CFullExample by explicitly
storing VTK data structures. This assumes a vtkUnstructuredGrid.
CxxFullExample
A C++ example of a simulation code interfacing with Catalyst. This assumes a vtkUnstructuredGrid.
PythonFullExample
An example of a simulation code written in Python that uses Catalyst.
PythonDolfinExample
An example that uses the Dolfin simulation code.
CxxImageDataExample
A C++ example of a simulation code interfacing with Catalyst. The grid is a vtkImageData.
CxxMultiPieceExample
A C++ example of a simulation code interfacing with Catalyst. The grid is a vtkMultiPiece data set
with a single vtkImageData for each process.
CxxNonOverlappingAMRExample
A C++ example of a simulation code interfacing with Catalyst. The grid is a vtkNonOverlappingAMR
data set.
59
CxxOverlappingAMRExample
A C++ example of a simulation code interfacing with Catalyst. The grid is a vtkOverlappingAMR
data set.
CxxPVSMPipelineExample
An example where we manually create a Catalyst pipeline in C++ code using ParaViews servermanager. This example can be run without ParaView being built with Python.
CxxVTKPipelineExample
An example where we manually create a Catalyst pipeline in C++ code using VTK filters. This example
can be run without ParaView being built with Python.
CxxMappedDataArrayExample
An example of an adaptor where we use VTK mapped arrays to map simulation data structures to
VTK data arrays to save on memory use by Catalyst.
MPISubCommunicatorExample
An example where only a subset of the MPI processes are used for the simulation and Catalyst.
60
Chapter 6
References
6.1
References
Data Co-Processing for Extreme Scale Analysis Level II ASC Milestone. David Rogers, Kenneth
Moreland, Ron Oldfield, and Nathan Fabian. Tech Report SAND 2013-1122, Sandia National Laboratories, March 2013.
The ParaView Coprocessing Library: A Scalable, General Purpose In Situ Visualization Library.
Nathan Fabian, Kenneth Moreland, David Thompson, Andrew C. Bauer, Pat Marion, Berk Geveci,
Michel Rasquin, and Kenneth E. Jansen. In IEEE Symposium on Large-Scale Data Analysis and
Visualization (LDAV), October 2011, pp. 8996. DOI 10.1109/LDAV.2011.6092322.
The Visualization Toolkit: An Object Oriented Approach to 3D Graphics. Will Schroeder, Ken
Martin, and Bill Lorensen. Kitware Inc., fourth edition, 2004. ISBN 1- 930934-19-X.
The ParaView Guide: A Parallel Visualization Application. Utkarsh Ayachit et al. Kitware Inc.,
4th edition, 2012. ISBN 978-1-930934-24-5.
61
Chapter 7
Appendix
7.1
Appendix
7.1.1
To simplify reference counting, vtkWeakPointer, vtkSmartPointer and vtkNew can be used. vtkWeakPointer
stores a pointer to an object but doesnt change the reference count. When the object gets deleted vtkWeakPointer will get reset to NULL avoiding any dangling references. The latter two classes keep track of
other vtkObjects by managing the objects reference count. When these objects are created, they increment
the reference count of the object they are referring to and when they go out of scope, they decrement the
reference count of the object they are referring to. The following example demonstrates this.
{
vtkNew<vtkDoubleArray> a;
a->SetName("an array");
vtkSmartPointer<vtkPointData> pd =
vtkSmartPointer<vtkPointData>::New();
pd->AddArray(a.GetPointer());
vtkSmartPointer<vtkDoubleArray> a2 =
vtkSmartPointer<vtkDoubleArray>::New();
pd->AddArray(a2);
vtkWeakPointer<vtkPointData> pd2;
pd2 = pd;
vtkPointData* pd3 = vtkPointData::New();
pd2 = pd3;
pd3->Delete();
pd2->GetClassName();
}
// as ref count = 1
// pd3 is deleted
// bug!
// dont need to call Delete on any object
Note that when passing a pointer returned from vtkNew as a parameter to a method that the GetPointer()
method must be used. Other than this caveat, vtkSmartPointer and vtkNew objects can be treated as
pointers.
7.1.2
The script below will write out the full dataset every time step for the input grid provided by the adaptor
to Catalyst. Change input on lines 7 and 40 to the appropriate identifier for adaptors that provide multiple
grids. Note that this file is available at https://github.com/Kitware/ParaViewCatalystExampleCode/
blob/master/SampleScripts/gridwriter.py.
62
63
return
coprocessor.LoadRequestedData(datadescription)
def DoCoProcessing(datadescription):
"Callback to do co-processing for current timestep"
global coprocessor
coprocessor.UpdateProducers(datadescription)
coprocessor.WriteData(datadescription)
7.1.3
Recent work in VTK has added the ability to reuse the simulations memory and data structures in the
co-processing pipeline. We start with information on creating a class that derives from vtkDataArray that
uses pre-allocated memory that does not match up with VTKs expected layout. The abstract class to
derive from for this purpose is the vtkMappedDataArray. We first go through an example of this with the
vtkCPExodusIIResultsArrayTemplate class which is part of VTK. The vtkCPExodusIIResultsArrayTemplate class is a templated class that is a concrete implementation of vtkMappedDataArray. This class should
only be used if the data array has more than one component. It can be used as is if the simulation memory
layout has the following constraints:
The components of the data are each stored in contiguous arrays.
The component array data is stored in the same order as the points or cells in the VTK dataset for
point data or cell data, respectively.
If these two conditions are met then the main function of interest in this class is:
void SetExodusScalarArrays(std::vector<Scalar*>arrays, vtkIdType numTuples, bool save)
Here, arrays is used to pass the pointers to the beginning of each component array. The size of arrays sets
the number of components in the vtkCPExodusIIResultsArrayTemplate object. The number of tuples is set
by numTuples. Finally, if save is set to false then the object will delete the arrays using the delete [ ] method
on each component array when it is done with the memory. Otherwise it assumes that the memory will be
de-allocated elsewhere. The following code snippet demonstrates its use.
vtkCPExodusIIResultsArrayTemplate<double>* vtkarray =
vtkCPExodusIIResultsArrayTemplate<double>::New();
vtkarray->SetName("velocity");
std::vector<double*> simulationarrays;
simulationarrays.push_back(xvelocity);
simulationarrays.push_back(yvelocity);
simulationarrays.push_back(zvelocity);
vtkarray->SetExodusScalarArrays(myarrays, grid->GetNumberOfPoints(), true);
grid->GetPointData()->AddArray(vtkarray);
vtkarray->Delete();
If the vtkCPExodusIIResultsArrayTemplate class is not appropriate for mapping simulation memory to
VTK memory, a class that derives from vtkMappedDataArray will need to be written. The virtual methods
that need to be reimplemented are (note that Scalar is the templated data type):
void Initialize()
void GetTuples(vtkIdList *ptIds, vtkAbstractArray *output)
void GetTuples(vtkIdType p1, vtkIdType p2, vtkAbstractArray *output)
void Squeeze()
64
vtkArrayIterator *NewIterator()
vtkIdType LookupValue(vtkVariant value)
void LookupValue(vtkVariant value, vtkIdList *ids)
vtkVariant GetVariantValue(vtkIdType idx)
void ClearLookup()
double* GetTuple(vtkIdType i)
void GetTuple(vtkIdType i, double *tuple)
vtkIdType LookupTypedValue(Scalar value)
void LookupTypedValue(Scalar value, vtkIdList *ids)
Scalar GetValue(vtkIdType idx)
Scalar& GetValueReference(vtkIdType idx)
void GetTupleValue(vtkIdType idx, Scalar *t)
Since once the object is properly set up it should be considered a read-only class (i.e. nothing in VTK
should be modifying any of its contents), the following methods should be implemented with only errors to
ensure they arent being used:
int Allocate(vtkIdType sz, vtkIdType ext)
int Resize(vtkIdType numTuples)
void SetNumberOfTuples(vtkIdType number)
void SetTuple(vtkIdType i, vtkIdType j, vtkAbstractArray *source)
void SetTuple(vtkIdType i, const float *source)
void SetTuple(vtkIdType i, const double *source)
void InsertTuple(vtkIdType i, vtkIdType j, vtkAbstractArray *source)
void InsertTuple(vtkIdType i, const float *source)
void InsertTuple(vtkIdType i, const double *source)
void InsertTuples(vtkIdList *dstIds, vtkIdList *srcIds, vtkAbstractArray *source)
vtkIdType InsertNextTuple(vtkIdType j, vtkAbstractArray *source)
vtkIdType InsertNextTuple(const float *source)
vtkIdType InsertNextTuple(const double *source)
void DeepCopy(vtkAbstractArray *aa)
void DeepCopy(vtkDataArray *da)
void InterpolateTuple(vtkIdType i, vtkIdList *ptIndices, vtkAbstractArray* source, double* weights)
void InterpolateTuple(vtkIdType i, vtkIdType id1, vtkAbstractArray *source1, vtkIdType id2, vtkAbstractArray *source2, double t)
void SetVariantValue(vtkIdType idx, vtkVariant value)
65
66