Manual Distance 6 PDF
Manual Distance 6 PDF
Bibliography 333
Index 343
Welcome
Occurs before text that describes a feature of Distance that was not
present in previous versions. We hope it helps users of the old software
familiarize themselves with the new version.
When describing keyboard actions, CTRL is short for the control key. For
example CTRL-X means hold the control and x keys at the same time.
When describing menu selection the symbol | means “and then select” – for
example Help | About Distance… means select the Help menu and then
select About Distance….
Where to Go Next
We recommend that everyone start by following the guided tour in Chapter 3.
New users should then read the Users Guide Chapters 4-7 – if you do this you
will find using the program much less confusing! People familiar with Distance
3.5 or later should check out the section New in Distance (in Chapter 2) and
should at least glance at Chapters 4-7, as required.
Advanced users will want to check out Chapters 6, 9, 10 and the appendices.
If you have yet to install Distance, you should read the release notes in the file
ReadMe.rtf that accompanies the Distance setup program.
Staying in Touch
What is Distance?
Distance is a Windows-based computer package that allows you to design and
analyze distance sampling surveys of wildlife populations (for more about
distance sampling, see the distance sampling reference books).
Automated Survey Design
Using Distance, you can enter or import information about your study area into
the built-in GIS. You can then try out different types of survey design to see
which might be most feasible. Distance can look at overall properties of the
design such as probability of coverage. It can also generate survey plans from
the design. For more details, see Chapter 6 - Survey Design in Distance.
Data Analysis
Once you have collected your survey data, Distance can be used to analyze it.
Analysis is done in Distance using analysis engines, of which there are currently
three: the conventional distance sampling (CDS) engine, the multiple covariate
distance sampling (MCDS) engine and the mark-recapture distance sampling
(MRDS) engine. For more details, see Chapters 7-10.
Use Agreement
When you install Distance, you agree to abide by the Use Agreement. A copy of
this agreement is in the Distance program directory, in the file
UseAgreement.txt. This can be accessed from within Distance by selecting
Help | About Distance…, and clicking on the Use Agreement tab.
Sponsors
Distance is currently free to all users. Nevertheless, it is not free to develop and
maintain! If you use Distance on a regular basis, please consider sponsoring the
software. You could either make a donation towards program development and
maintenance or you could finance a specific new feature that you would find of
use. More details can be obtained by contacting the program authors (see
Sending suggestions and reporting problems).
A list of sponsors of this release of Distance can be seen by selecting Help |
About Distance…, and clicking on the Sponsors tab. See also the
Acknowledgements section, below.
Acknowledgements
We are grateful to our respective agencies and institutions for their support
during the production of this and the previous versions of Distance. In particular,
the Biotechnology & Biological Sciences Research Council (BBSRC) and
Engineering & Physical Sciences Research Council (EPSRC) provided full
History of Distance
Distance evolved from program TRANSECT (Burnham et al. 1980). However,
Distance is quite different from its predecessor as a result of changes in analysis
methods and expanded capabilities. The name Distance was chosen because it
can be used to analyze several forms of distance sampling data: line transect,
point transect (variable circular plot) and cue-counts. By contrast TRANSECT
was designed only to analyze line transect data.
Distance versions 1.0 - 2.2 were DOS-based applications that were programmed
using a relatively simple command language. Version 3.0 was a windows
console application, but retained the command language structure. All of these
versions were principally programmed by Jeff Laake of the National Marine
Mammal Laboratory, US Fisheries Service.
In 1997, Steve Buckland and David Borchers, from the University of St
Andrews, obtained funding from two British research councils to proceed with
an ambitious three-year project to develop new distance sampling software. The
new software, which became known as Distance 4, was designed to be fully
windows-based, and be capable of incorporating new features such as
geographic survey design, multiple covariate distance sampling models, spatial
estimation of abundance, and dual observer mark-recapture line transect
methods. A Distance 4 project development team was assembled, coordinated by
Len Thomas. In autumn 1997, it was decided to produce an intermediate version
of Distance: fully windows based, but with the same analysis capabilities as the
current version 3.0. This new program, Distance 3.5, took one full year to
develop, and was released in November 1998, with various updates through to
February 1999. Distance 3.5 was downloaded by over 4000 users, from around
120 countries.
Extension of Distance 3.5 to become Distance 4 began in 1999, and the software
was first previewed at training workshops in summer 2000. After various public
beta versions, Distance 4.0 was released in 2002, followed by Distance 4.1 in
2003 and Distance 5.0 in 2006. This last version has a major new feature in the
form of a link to the free statistical software R, thereby facilitating a major
expansion in the analytical capabilities potentially available to Distance users.
We have also released a beta version of Distance 6.0, which contains a new
density surface modeling engine
We are still actively developing the software, incorporating new features and
extending current ones. If you have any comments or suggestions about the
program, we’d love to hear from you!
Objective
The aim of this chapter is to provide a gentle introduction to Distance, and an
overview of its capabilities. We don’t go into much detail, but focus instead on
giving you an impression of how the software works and where to find things.
Please note that this is not a substitute for reading the rest of the
manual. You will need to know about Distance projects (Chapter 4) and how
data is stored in Distance (Chapter 5) before you can use the program effectively
for either survey design (Chapter 6) or analysis (Chapter 7).
This chapter gives step-by-step instructions to walk you through four examples.
In the first, the goal is to perform a preliminary analysis of some straightforward
line transect data. We create a new Distance project, import the data, and do
some preliminary analyses. In the second, we deal with import of slightly more
complex survey data. In the third, we look at creating a geographic project and
using it for survey design, and in the forth we look at more complex geographic
data using one of the sample projects.
Note that this Users Guide is available in both on-line and print-ready formats.
If you’re currently reading the on-line version, you may find it easier to follow
this chapter in the other format – see the Welcome topic for more about the
different formats available.
To start with, we’ll assume that you’ve downloaded and installed Distance. If
not, go to the Program Distance Web Site for instructions.
• Now click on Next, and then click Finish to import the data.
Example 2 - Analysis
Before analyzing any data, we need to enter the number of visits in the multiplier
field that has been created in the project.
• Click on the Data tab of the Project Browser.
• Under Visits, enter the number 4.
We also need to set up the multiplier correctly in the Multipliers tab of the first
Model Definition.
• Choose View | Analysis Components and then Analysis
Components | Model Definitions.
• Open the default model definition’s properties by choosing
Analysis Components | Item Properties… or by double-
clicking on the ID 1 in the Analysis Components Window.
• Under Multipliers, there should be one multiplier defined, with
field name Visits. However, the Operator is wrong – it is set to
If you want to skip the steps involved in creating the project and
importing the data, you can open the sample project StAndrewsBay.dst (choose
File | Open Project… and, select StAndrewsBay and click Open). Then, go
to the section entitled Example 3 - Creating a Survey Design.
• Click the button in the top right corner of the Map window to
close the map. You will be asked whether you want to save the
changes to have made to the map (you added a new layer to it) –
choose Yes.
• You can now view the coverage grid on your map. Click on the
Maps tab of the Project Browser.
• Choose Maps | View Map to open the map of St Andrews Bay.
• Choose Map | Add Layer, choose layer name “Grid” and click
OK. You should now see the coverage probability grid points on
the map.
Sample Projects
Distance comes with a number of sample projects, listed in the table below. The
projects located in the “Sample Projects” folder, below the Distance program
folder (usually “C:\Program Files\Distance 6”, or the equivalent in your
language). They demonstrate various aspects of the program – for more
information about a project, open it in Distance, and choose the menu item File |
Project Properties.
Feel free to use the sample projects as a test bed for learning about the program –
try adding and deleting data, creating and running designs and analyses. Have
fun!
Project name Description
Line transect example Simulated line transect data from Chapter 4 of Introduction
to Distance Sampling. Exact data, individuals as clusters
and no stratification. Step-by-step instructions for setting
up a project identical to this and importing the data are
given here in Example 1: Using Distance to Analyze Simple
Data.
Point transect example Simulated point transect data from Chapter 5 of Introduction
to Distance Sampling. Exact data, no clusters or
stratification.
Stratify example An example of stratified data. Two strata, one with high
sample coverage and one with low. Distances are exact and
objects are clusters. The example is based on cetacean data,
and there is also a multiplier defined to account for g(0)<1,
based on a separate experiment to estimate g(0).
Ducknest An example of how to enter and analyze interval data.
There are also 8 example analyses set up, in two sets. Data
are a subset of the Monte Vista duck nest data used as in
Some of the projects contain data used in the distance sampling text
books, and you can use these to recreate analyses in the books as a learning
exercise. Note, however, that you may find minor differences in results between
the book and the distance projects. In some cases the data in the projects are
slightly different; in others differences will be due to changes in the Distance
analysis engine since the books were published.
Double clicking on files with this icon in Windows opens the associated project
in a Distance session. Windows also stores a list of your recently used projects
in the Windows taskbar Start menu, under Documents.
You can use Windows to rename, move, copy and delete project files just like
any other file – but if you do this you should do the same to the data folder. For
example, if you want to copy a project from one computer to another, you should
copy both the project file and the data folder.
You can change the default folder that Distance uses to create and
open projects by selecting this folder in a New Project or Open Project
After you have chosen a filename, click the Create button. The new project file
is created and the Setup Project Wizard opens. This wizard guides you through
the process of setting up the new project ready for use. Using the wizard, you
can:
• set the project up ready for doing data analysis
• set the project up ready for survey design
• use an existing Distance project as a template
• import data and options from a previous version of Distance
• bypass the wizard and set up the project manually
See the Setup Project Wizard section in the Appendix - Program Reference for
more about these options. Use of an existing project as a template is also
described in the next section. For more about project import, see Importing from
Previous Versions of Distance.
If you choose to set the project up ready for survey design or data analysis, or
use an existing project as a template, then once you complete the wizard, you are
ready to start entering or importing data. See Chapter 5 - Data in Distance for
more information about how this is done.
You can use any Distance project as a template, but you may find it
easier to save the projects used as templates to a special folder, to make it easy to
distinguish them from your other projects. You can save a project to another
folder using File | Export Project - see Exporting, Transporting and Archiving
Projects. When exporting projects to the templates folder, you can save space by
choosing the options to exclude the data and results from the exported project.
You can change the default folder that distance uses to create and open
projects by selecting this folder in a New Project or Open Project dialog and
then checking the box Save this folder as default for Distance projects.
When you open a Distance project, the Project Browser is automatically restored
to the same size, position and tab as when you last closed the project.
You can also open projects that have been archived in a zip file (see Exporting,
Transporting and Archiving Projects for more about archiving projects). In the
Open Project dialog, select Zip archive files (*.zip) from Files of type:.
Distance will then prompt you for the folder to extract the project files into, and
will then open the extracted files.
While you’re working with a project, Distance reads and writes to the
project and data files constantly. It is therefore much better to keep your project
on your local hard drive, where access times are much faster. We don’t
recommend accessing projects over a network if you can help it, and we
certainly don’t recommend working with projects stored on floppy disks, zip
disks, etc. In all cases, you’re best to copy the project to your local hard drive
before opening it.
Saving Projects
In Distance, almost all of the changes you make are “live”- that is, they are saved
in the Distance Project the instant you make them. For example, if you add
some new data or create a new analysis, this data or analysis is instantly recorded
in the project file. Because of this, there is no need to “save” distance projects in
the same way that you might save word processor files. Everything is
automatically saved for you as you go along.
Even though you don't need to save your work, it is important to make backup
copies. This is discussed in the next section.
Backing up Projects
Why Back Up?
Having backup copies of your Distance projects can be useful for two reasons:
You can turn off this auto-backup feature in the Preferences dialog.
Only the distance project file and files in the data folder are
exported. If your project links to GIS files or other database files outside the
data folder, these will not be included
Compacting a Project
In a Distance project, the Project File and Data File are actually database files.
Like all database files, they tend to grow as you use them. This is because
records and queries that Distance no longer needs are marked as deleted, but are
not actually removed from the database. Permanently removing deleted records
is called “compacting” a database. To do this, the database must be closed.
Distance projects are automatically compacted when you close the project.
However, if you work with an open project for a long time, you might want to
compact it occasionally. To do this, choose Tools | Compact Project.
Distance will close the project, compact it, and open it again.
Data Structure
This section describes how your survey and associated data are represented in
Distance. It is essential reading for anyone using the program. There is quite a
lot of material and jargon to absorb, but it is important to understand the
concepts here in order to make efficient use of the software.
Data Layers
Introduction
Data in Distance are divided into a set of nested data layers. Each data layer can
be thought of as a database table, with records (rows) and fields (columns). Data
layers have three attributes associated with them:
• Layer Name (e.g., Study area, Point transect, New Layer 1) – this
is a description of the layer. You can change this from the default
to make it more relevant to your study.
• Layer Type (e.g., Global, Sample, Coverage) – this is a
description of the function of the layer, and its place in the
hierarchy of layers. The layer type is used internally by Distance
and once it is set you cannot change it.
• Geographic (Yes/No). If the project is geographic, then each
data layer can contain geographic information, although not all
layers have to. You can tell if a layer is geographic because it will
contain a Shape field (see Data Fields, below).
Hierarchy of Data Layers
Data layers are linked together in a hierarchy, with a layer of type Global at the
top, and other layers below it. Here’s a simple example:
In this picture, the icon (symbol) tells you the layer type, while the text tells you
the data layer name. The top layer, Study Area is of type Global. It has one
child layer, Region of type Stratum. This in turn has one child layer, Line
transect, of type Sample. Lastly, line transect has one child layer, Observation
of type Observation.
Data Fields
Introduction
As we said earlier, each data layer can be though of as a database table, with a
number of fields (columns) and records (rows):
There are two field types that you won’t see from the Distance data sheet:
LinkID and ParentID. LinkID fields are used to join geographic and external
data to that stored in the Data File DistData.mdb. ParentID fields are used,
together with the ID field, to link together records from different layers. You
don’t normally have to worry about either of these fields, unless you are editing
the distance database by hand from outside of Distance – for more on this see the
Appendix - Inside Distance.
Data Import
Notice that the record “Line 1B” has no distance in the final column - this is a
transect where no objects were seen.
Notice also that all transects from the same stratum are grouped together, and all
observations from the same transect are grouped together. If you accidentally
forgot to sort the data before importing it, so that for example the first four lines
looked like this:
then Distance would interpret this as three transects with labels “Line 1A”, “Line
2A” and “Line 1A” again.
If you wanted to import this data, you would have to find some way to delete the
multiple spaces before importing it as space delimited:
In Distance, you are guided through the import process by the Import Data
Wizard. This wizard can be started in one of two ways:
• from the last page of the Setup Project Wizard, by choosing the
option Proceed to Import Data Wizard. This is the ideal way
to import data into a new project.
• by selecting the menu item Tools | Import Data Wizard. This is
the best way to add extra data from file into an existing project.
You can also replace your existing data with the imported data -
this is an option at the end of the Import Data Wizard.
Additional information about the Import Data Wizard is given in the Program
Reference page Import Data Wizard. If you are having problems, check the page
Troubleshooting the Import Data Wizard.
Once you have imported your data, you should always double-check
that the correct number of strata, samples (transects) and observations are present
in the Distance project. For example, if you forget to sort the data correctly (by
stratum and sample label)
If you find yourself importing lots of data that are basically similar, then there
are several steps you can take to make the import process quicker. This topic
describes these steps, using an example of a single text file that contains all the
survey data.
Before reading further, you should be somewhat familiar with the Import Data
Wizard, and should read the Program Reference pages on the Import Data
Wizard.
Let’s consider the following data, stored in a text file:
Stratum A;100;Line 1A;10;14
Stratum A;100;Line 1A;10;8
Stratum A;100;Line 1A;10;22
Stratum A;100;Line 2A;10.3;7
Stratum A;100;Line 2A;10.3;37
Stratum A;100;Line 2A;10.3;13
Stratum B;123;Line 1B;5.7;
Stratum B;123;Line 2B;8.4;27
Stratum B;123;Line 2B;8.4;76
Stratum B;123;Line 2B;8.4;44
Stratum B;123;Line 2B;8.4;7
The data are semicolon delimited, and the columns are: stratum label, stratum
area, transect label, transect length and distance. Normally, to get such data into
a Distance project you would:
4. create a new project, going through the Setup Project Wizard,
choosing the option to Analyze a survey that has been
completed, and filling in the options in the successive screens.
5. Proceed to the Import Data Wizard, and specify the appropriate
source file
6. In the Data File Structure screen of the Import Data Wizard,
manually match up the field names with the columns in the text
file.
7. Finish the Import Data Wizard and import the data.
The “*” delimiting layer names and field names in the code
can be replaced by other delimiters. Alternatives are * | _ and .
(.e., a full stop or period). During data import, you can choose
from these alternatives the appropriate one for the text file you
are importng.
Flat files, such as the one used in the example of the previous topic, are useful
ways to store small datasets. However, for large datasets they are inefficient.
For example, in the previous topic, the stratum label and area for stratum 1 was
repeated six times. Imagine if there were 10,000 observations in stratum 1! A
more efficient way to store and import large datasets is to have each data layer in
a separate file, and to import one layer at a time.
Continuing the example from the previous topic, you would have 3 files:
File 1: stratum.txt
Columns: stratum label, area
Stratum A;100
Stratum B;200
File 2: transect.txt
Columns: stratum label, transect label, transect length
Stratum A;Line 1A;10
Stratum A;Line 2A;10.3
Stratum B;Line 1B;5.7
Stratum B;Line 2B;8.4
File 3: observation.txt
Columns: transect label, distance
Line 1A;8
Line 1A;22
Line 2A;7
Line 2A;37
Line 2A;13
Line 2B;27
Line 2B;76
Notice that the transect file contains a column giving the stratum of each
transect, and that the observation file contains a column giving the transect of
each observation. In general, each file has to have a column giving an unique
identifier to the record in the parent layer. You don’t need one for the stratum
file because it’s parent, the global layer, has only one record.
To import these data:
• Create a new distance project, with 4 data layers and appropriate fields
– probably using the Setup Project Wizard.
• Begin by importing the stratum layer. Fire up the Import Data Wizard,
and enter “stratum.txt” in the Data Source page.
• Under Data Destination, both the highest and lowest Destination
data layers are the stratum layer (called Region by default). Under
Location of new records, choose the first option Add all new
records under the first record in the parent data layer.
• Under Data File Format, choose semicolon delimited, and in Data File
Structure, match the columns in “stratum.txt” to the fields in the stratum
layer. Click Next and Finish.
• Use the Data Explorer to check the stratum data were imported
correctly.
• Now import the transect file. This time in Data Destination, the highest
and lowest Destination data layers are the transect layer. Under
Location of new records, choose the second option, Input file
contains a column corresponding to the following field in
the parent data layer, and make sure the label field is selected from
the drop down box.
• In the Data File Structure page, match the columns in “transect.txt” to
those in the Distance database, including the stratum label field.
• Import the data, and check it in the Data Explorer.
• Repeat this process for the observation data file.
Non-unique label fields
If the label field is not unique, then you will have to add an extra column
containing the ID of each record. For example, imagine that the transect labels
are not Line 1A, Line 2A, Line 1B, Line 2B, but instead are Line 1, Line 2, Line
1, Line 2. In this case the observation data file will need to be as follows:
File 3: observation.txt
Columns: transect ID, distance
1;8
1;22
2;7
2;37
2;13
4;27
4;76
4;44
4;7
When the transects are created, they are assigned IDs sequentially, so transect
“Line 1” in stratum A will have ID 1, transect “Line 2” in stratum A will have
ID 2, “Line 1” in stratum B will be ID 3, and “Line 2” in stratum B will be ID 3.
In the above file, because the transect labels are not unique, the transect IDs have
been used instead. The only difference in the Import Data Wizard will come in
the Data Destination step, where under Location of new records, the Field
name will be “ID”, rather than “Label”.
You set the coordinate system of a data layer when it is created – by default it is
the same as the default coordinate system, but this is not required. (The only
requirement is that all data layers use the same datum.)
Do you need to worry about the coordinate system of the data?
Most geographic data is stored as latitude and longitude according to some
geographic coordinate system. Latitude and longitude are expressed in angular
units (usually decimal degrees). If you want to work with survey design, you
will likely want to work in linear units (e.g., meters), so you will need to
transform your data. To do this you will need to know the geocoordinate
system.
If your data are already expressed in linear units, then likely they are already
projected. This can happen, for example, because you digitized the study area
using a map. So long as you are happy with the projection, then you can set the
geographic coordinate system to “[None]” and forget about coordinate systems.
Similarly, if your study area is small, and you have measured its boundaries
directly, then no coordinate system is required.
If the data are stored projected, then this projection is used for all maps and
calculations, while if the data have no coordinate system then they cannot be
projected.
In survey design, a different projection can be defined for each design, so you
can compare the effect of different projections on the results. (See Chapter 6 -
Survey Design in Distance for more on survey design.)
Which projection?
Over a small study area the projection used will make relatively little difference.
Over larger areas the projection can make a significant difference.
If you need to project your data but are not sure which projection to use,
probably the best option is to cheat and refer to maps of your study area to see
what projection they use (the map will usually give the projected coordinate
system – a literature search will reveal the composite projection, projection
parameters and geocoordinate system).
Alternatively, many cartography books describe the properties of the different
projections, which may help you decide which is appropriate. You will need to
consider the following questions:
• Which spatial properties do you want to preserve? Is it just for
displaying the study area, or for performing calculations such as for
survey design, transect distance, etc.
• Where is the study area? Is your data in a polar region? An
equatorial region?
• What shape is the study area? Is it square? Is it wider in the east–
west direction?
• How big is the study area?
Map projection classifications
Map projections can be generally classified according to what spatial attribute
they preserve.
• Equal Area projections preserve area. Many thematic maps use an
equal area projection. Maps of the United States commonly use the
Albers Equal Area Conic projection.
• Conformal projections preserve shape and are useful for
navigational charts and weather maps. Shape is preserved for small
areas, but the shape of a large area such as a continent will be
significantly distorted. The Lambert Conformal Conic and
Mercator projections are common conformal projections.
• Equidistant projections preserve distances, but no projection can
preserve distances from all points to all other points. Instead,
distance can be held true from one point (or a few points) to all
other points or along all meridians or parallels. If you will be using
your map to find features that are within a certain distance of other
features, you should use an equidistant map projection.
• Azimuthal projections preserve direction from one point to all other
points. This quality can be combined with equal area, conformal,
and equidistant projections, as in the Lambert Equal Area
Azimuthal and the Azimuthal Equidistant projections.
Other projections minimize overall distortion but don’t preserve any of the four
spatial properties of area, shape, distance, and direction. The Robinson
projection, for example, is neither equal area nor conformal but is aesthetically
pleasing and useful for general mapping.
Consider projecting your data before importing it, and then importing
it into Distance without the projection – see the tip in Coordinate Systems, Maps
and Calculations in Distance on page 14 for details.
The Shape Properties Dialog has a facility to copy and paste the vertices
(corners) of an individual shape to and from the Windows clipboard. You can
use this to transfer GIS data between Distance and other formats such as text
files and spreadsheets. For a step-by-step example of importing GIS data from a
text file into Distance, see the Getting Started chapter Example 3: Using
Distance to Design a Survey.
To copy data from a spreadsheet or text file into Distance:
• Highlight the data in the text file or spreadsheet and copy it to the
Windows clipboard.
• In the Distance project you want to copy to, select the shape you
wish to replace in the Data Explorer and double-click on it to open
the Shape Properties Dialog.
• Choose Paste from Clipboard.
To copy data from Distance to a spreadsheet or text file:
• In Distance, double-click on the shape in the Data Explorer. This
opens the Shape Properties Dialog.
• Choose Copy to Clipboard
To separate the parts of a multi-part polygon, leave a blank line (or have a line
that contains anything other than the above number-tab-number format). For
example the following indicates two triangles:
0 0
0 100
100 0
100 0
100 100
200 0
Spreadsheet format
Each vertex should be in a separate row, with two columns: the first for the x-
coordinate and the second for the y-coordinate. To separate the parts of a multi-
part polygon, leave a row blank.
Example from an Excel spreadsheet, showing data for two triangles. Both columns are
highlighted, ready to copy to the windows clipboard.
This method is similar to the previous one, but does not require you to copy the
shapefile into the Data Folder, or to rename it. The advantage, therefore, is that
you only have one copy of your shapefile to manage. The disadvantages are:
• it is more complicated, requiring you to edit the Data File using a
database package
• because the shapefile remains outside the Data Folder, it is harder
to move the project onto other machines (for example, Export
Project only copies files in the Data Folder).
The following instructions assume that you have software to (1) create and edit
shapefiles (e.g., ESRI ArcView), (2) edit the Distance Data File DistData.mdb
(e.g., Microsoft Access 97). If you have Access 2000 or later, then you should
read Accessing DistData.mdb using newer versions of Access in the Appendix –
Inside Distance.
Prepare the Distance Data Layer
• Follow the instructions under this section in Method 1:Prepare the
Distance Data Layer.
Prepare the Shapefile
• Open the shapefile you wish to import in your GIS package, and
add a field to the table. The field should be able to contain long
integer numbers – in ArcView this means the field type should be
“Number”, the width 16 and decimal places 0. Name this field
“LinkID”.
• The LinkID field will be used to link records in the shapefile to
records in Distance’s internal data table for this layer. Each
LinkID value must correspond with a value in the ID field of
Distance’s internal table.
• If you haven’t created the Data Layer in Distance yet, then it
doesn’t matter what order you number the records in the
Stratifying the survey region into 3 strata can eliminate non-convexity or at least reduce
the discontinuity in the sampler
Even if stratification doesn’t let you take care of non-convexity entirely, it may
at least reduce the discontinuity in your sampler. An example of this, for a real
survey in British Colombia, is shown below.
Although this method does not produce exactly equal coverage in the area of the
first and last transects, it usually comes close (see, e.g., Thomas et al. 2007).
Alternative algorithms may be implemented in future, such as the use of an
adjusted angle zigzag sampler for the first and last segments.
In this example, there are four analyses. The first analysis, which is highlighted
in blue, has not been run – its status light (left hand column) is grey, and the
results columns (right columns) are blank. The next two analyses have been run,
but the run generated some warnings – the status light is amber. If an analysis
encounters an error during a run, the status light will be red. The last analysis
ran with no errors or warnings – its status light is green.
You may also notice in the example that the toolbar along the top of the
Analysis Browser has a box labelled “Set: All data”. In Distance you can
group your analyses into different Sets. If you were to click on the down arrow
beside “All data”, you would see that there is another set in this project, called
“Truncation at 6 feet”. If you chose that set then another table of analyses would
The Analysis Details windows for two analyses, open on different Results pages.
In the above example, the results tab of the top analysis (“Analysis 2”) is green
because it ran without generating any errors or warnings. For an analysis that
has not been run, all three tabs are grey, while if an analysis encounters problems
during the run, the Log tab is colored amber (warnings) or red (errors).
Analysis Components
You can get a list of Data Filters and Model Definitions by clicking on the View
Analysis Components button on the main toolbar. This opens the
Analysis Components Window:
Analysis Components window, showing a list of the two Data Filters (left) and four Model
Definitions (right), in the Ducknest project.
Using the Analysis Browser, we can see that the four analyses in the Analysis
Set “All data”, all use Survey number 1 (called “New Survey”), and Data Filter
number 1 (called “Default data filter”). However, each one uses a different
Model Definition.
Example of the Analysis Details Inputs tab for an analysis, from the Ducknest sample
project
The analysis is also called “Half normal / hermite” (top of the picture, in the title
bar and beside Name:). This is the name that appears in the Analysis
Browser, so it is always a good idea to give the analysis a name that lets you
distinguish it from other analyses in the Analysis Set.
If you wanted to change, say, the Data Filter for this analysis to “Truncation at 6
feet”, you would click on that Data Filter in the list. In the case of the analysis
shown above, it would not be a good idea to change the Data Filter as the
analysis has already been run and has results associated with it (you can tell this
because the Results tab is green). If you change the selected Data Filter or
Model Definition in an analysis that has already been run, then Distance will
warn you that the results have become out of date and ask whether you want
them deleted.
The best way to do an analysis with a new combination of Data Filter or Model
Definition is to create a new analysis in Distance for this combination. You do
this in the Analysis Browser, by clicking on the New Analysis button .
This automatically creates a new analysis, based on the one that you currently
have selected in the Analysis Browser. You can then open up the Analysis
Details for the new analysis. Because the new analysis has not yet been run,
you are free to choose the combination of Data Filter and Model Definition you
want. If this seems a little confusing, take a few moments to try creating a new
analysis in the Analysis Browser for the Ducknest project.
Model Definition section of the Inputs tab on the Analysis Details window
Distance creates a new Model Definition, based on the one you currently have
selected, and opens the Model Definition Properties dialog for this Model
Definition:
You can then edit the properties to reflect the changes you want. As in our
example, to change the variance option to use boostrapping, you click on the
Variance tab and tick the Select non-parametric bootstrap checkbox.
(More details about the options in this dialog are given in the Model Definition
Properties Dialog section of the Program Reference.) You may also want to
change the name to reflect the change in properties, for example calling the
anlysis “Half-normal / hermite – bootstrap”. You do this by editing the Name:
text box.
You can now press the OK button to save the new options and close the Model
Definition Properties dialog. The new Model Definition is automatically
selected in the Analysis Details:
Example of the Analysis Details Inputs tab, with a new Model Definition selected
You can then run the analysis by pressing the Run button, or you can close the
Analysis Details window and run the analysis from the Analysis Browser by
pressing the button on the Analysis Browser's toolbar.
Creating new Data Filters is exactly analogous to the process just described for
Model Definitions. In this case, when you press the Data filter New... button, a
new Data Filter is created based on the one currently selected, and the Data Filter
Properties dialog opens. To find out more about the Data Filter options, see the
Data Filter Properties Dialog section of the Program Reference.
To find out more about the options available in the Data Filter Properties and
Model Definition Properties dialogs, see the Program Reference sections
Data Filter Properties Dialog and Model Definition Properties Dialog.
Analysis Components window, showing a list of the Model Definitions in the Ducknest
sample project
In the Analysis Components toolbar, you click on the first button to get a list
of all the Data Filters, and the second button to get a list of all the Model
Definitions in your projects. The other buttons allow you to copy (i.e. create),
delete, view and arrange (move up and down the list) the component that you
have selected.
You can also work with Data Filters and Model Definitions in the Analysis
Details window. Using the Analysis Components window is most useful
when you have a large number of components in your project, as you can
arrange them into a logical order, delete the ones you are not using, and easily
rename them.
The last column in the table of analysis contents tells you whether that
component is currently being used in any analyses: “Y” means it is being used
and “N” means that it is not. This is useful because when there are many
components (e.g., many Model Definitions if you have been doing a lot of
analyses), it is easy to loose track of which are being used and which are no
longer required. Also, if you double-click on a “Y”, you get a list of the analyses
that use that component.
Example
In the section Creating New data Filtes and Model Definitions we showed how
to create a new Model Definition and associate it with a new Analysis using the
The new Model Definition has been given the default name “Half-normal /
hermite 1”. You may want to change the name to reflect the options you’re about
to set – double click on the name and type “Half-normal / hermite – bootstrap”.
You can now edit the new Model Definition properties, by double-clicking on
the ID of the new Model Definition, or by clicking the View Item Properties
button . The Model Definition Properties dialog opens. To change the
variance option to use boostrapping, you click on the Variance tab and select
the “non-parametric bootstrap” option. (More details about the options in this
dialog are given in the Model Definition Properties Dialog section of the
Program Reference.) You can now press the OK button to save the new options
and close the dialog.
You have now set up the new Model Definition ready for use. If you want to
perform other types of analysis, you could set up more Model Definitions at this
point.
You now need to create a new Analysis, and attach the new Model Definition to
this new Analysis. In the Analysis Browser (i.e., the Analysis tab of the
Project Browser), click on the New Analysis button . Double-click on the
status button of the new analysis – this opens the Analysis Details window. In
the Model definition section, select your new model definition, and in the
Name: section, type in a suitable name for your new analysis, e.g., “Half-
normal / hermite – bootstrap”. You can now run the analysis.
This approach to setting up new Model Definitions (or Data Filters) is most
useful when you have several to set up at once. You can use the Analysis
Components window to set up your new components, then use the Analysis
Browser to create new Analyses, and associate the new analyses with the new
components. Then, in the Analysis Browser, you can highlight all the new
Analyses, click the run button , and go and have a cup of tea while they all
run! (For more about running analyses, see Running Analyses on page 12).
Normally, for analysis in Distance, you only need one Survey. This Survey tells
Distance what type of survey you performed and where the data from the survey
are stored. However, there are some situations when it is useful to have more
than one Survey in a project. Examples include:
• you have a complicated data structure, for example with two or
more layers of type Sample (e.g., for two or more survey regions or
years). In this case, you will set up one Survey to point to each
sample layer. You could then create one Analysis set for analyses
that use the first layer, and another for analyses that point to the
second.
Analysis Engines
In Distance, you have a choice of analysis engines to perform an analysis.
Each analysis engine has different capabilities, and has different inputs and
outputs. Distance has three analysis engines built in: a conventional distance
sampling (CDS) engine, a multiple covariate distance sampling (MCDS) engine,
and a mark recapture distance sampling (MRDS) engine. More engines are
planned for future versions of Distance.
You choose the analysis engine when you are setting up a model definition –
select the appropriate engine from the drop down list at the top of the Model
Definition Properties dialog:
A description of the options available for each engine is given in the Model
Definition Properties Dialog section of the Program Reference.
This engine contains (almost) all the features of the CDS engine, but also allows
additional covariates to be included in the detection function model, in addition
to observed distance. These covariates enter through the scale parameter of the
key function (via a log link function). This means that the covariates are
assumed to influence the scale of the detection function, but not its shape (see
picture, below).
Example estimated detection functions, where cluster size (Sbar) is the covariate. The
basic shape of the function is the same (half-normal), but the effective strip width is wider
at cluster size 750.
For more information about this engine, and when it can be useful, see Chapter 9
- Multiple Covariates Distance Sampling Analysis. This engine was first
introduced in Distance 4.0.
This engine permits analysis of data collected from two survey platforms, where
the assumption of certain detection detection of objects on the trackline can be
relaxed. CDS and MCDS analyses are also possible, although adjustment terms
are not currently available to modify the shape of the detection functions in this
engine.
For more information about this engine, see Chapter 10 - Mark Recapture
Distance Sampling. This engine was first introduced in Distance 5.0.
Running Analyses
Once you have created an analysis, and set it up by choosing an appropriate
Survey, Data Filter and Model Definition, you are then ready to run it.
There are two ways to run an analysis in Distance:
You can run more than one analysis at once. Simply highlight all of
the analyses you want to run in the Analysis Browser and then press the Run
Analysis button. This is useful if you are planning on doing a number of long
analyses - simply set them all up, select and run them all at once, and go and
have a cup of tea!
Windows NT, 2000 and XP are generally better built than Windows
95, 98 and ME. However, one unfortunate consequence of this for us is that
background processes tend to get more CPU time. This means that the Distance
interface may slow down quite noticeably while an analysis is running under
Windows NT/2000/XP, even on well-configured machines. You can give the
interface a boost by changing the Foreground performance setting in the
Performance tab of the Windows System Properties dialog (System icon in
Control Panel) to Maximum.
You can stop an analysis that is running by pressing the Stop button
in the Analysis Details window (this button replaces the Run button while the
analysis is running), or by highlighting the analysis in the Analysis Browser
and pressing the Reset Analysis button. However, on some systems, the
analysis will appear to stop but will carry on running in the background, using up
system resources. For more about this, see Stopping an Analysis, in Chapter 10.
R Statistical Software
Distance has a link to the free statistical software R. The Mark Recapture
Distance Sampling (MRDS) analysis engine is implemented as an R library, and
so you must have a working copy of R installed on your computer before you
can use that engine.
You can download and install R after you have installed Distance, and you can
run Distance without having R installed - but you will get an error message if
you try to run the MRDS engine without R.
R is under very active development, and new versions are released quite
frequently. Unfortunately, new versions are sometimes not compatible with
libraries compiled in old versions. We will endeavour to test our libraries with
each new version as it appears, and update it as required. For more information
about the version we are currently supporting, please browse to the Program
Distance Web Site, Support, Updates and Extras page.
To use the MRDS engine, you don’t have to know anything about R
beyond how to install it. However, R is a fully featured, widely-used statistics
package which you may consider using for your other analyses. You can find
out more about R from the R project home page, http://www.r-project.org/.
If you cannot see the plots that R produces, see Images Produced by R
Images produced by R
The images produced by R are stored as files in the R folder (see Contents of the
R folder). They are of the general form
[prefix].[analysis ID].[plot number].[suffix]
for example qq plot 1 for analysis 8 in windows metafile (.wmf) format would be
qq.8.01.wmf.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 87
It is also possible to perform a CDS (or MCDS) analysis using the
MRDS engine – see Single Observer Configuration in the MRDS Engine in
Chapter 10 of the Users Guide.
88 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
The issue of suitable truncation can be examined by creating Data Filters with
different truncation distances. Similarly, it may prove beneficial to group exact
distance data into intervals - this is also done in the Data Filter. This exploratory
phase is open ended, but you should strive to fully understand the data and
possible violations of the assumptions of distance sampling analyses (Buckland
et al. 2001, Section 2.1). In Distance, it may be worth grouping these
exploratory analyses into a suitably named Analysis Set.
Once the data have been properly prepared, and a decision has been made about
truncation and other Data Filter issues, the model selection phase can begin. We
recommend selecting a small number of sensible candidate models from those
available in Distance, and defining a separate Model Definition for each one.
This way, a separate analysis can be created for each model, and the AIC and
Delta AIC (or AICc and Delta AICc) columns in the Analysis Browser can be
used to sort and compare the analyses. Of course, other criteria should also be
used in selecting among the candidate models, such as goodness of fit (especially
near zero distance). A great deal of useful information about each model is
stored in the Analysis Details, Results tab.
In many cases these analyses will suggest additional explanatory work, so the
process of model selection and exploration is often iterative. Other issues, such
as the appropriate levels for estimating parameters (sample, stratum, global)
must also be considered (see Stratification and Post-stratification in this Chapter
for a discussion of some of these issues).
As the number of analyses defined and run starts to build up, it becomes worth
considering grouping the analyses into different Analysis Sets in the Analysis
Browser. The Analysis Components window can also be used to move
related Data Filters and Model Definitions so that they are positioned adjacent to
one another. The Comments section of the Analysis Details window for
each Analysis can be used to record pertinent information, such as what you
learnt by running the analysis.
At some point, you select a model you believe to be the best for the data set
under consideration. This is the time to consider making bootstrap estimates of
variance (see Model Definition, Variance Tab - CDS and MCDS in the
Program Reference), and beginning to make inferences from the abundance
estimates produced. In many cases there will be perhaps two or three models
that appear to fit the data equally well. Distance allows you to define multiple
models in the Model Definition, Detection Function Models tab - if you
create a Model Definition that includes all of the final models and specify
bootstrap variance estimates, then the estimated variance will account for this
uncertainty in model selection as well as the other sources of variation. This
approach has much to recommend it. (For more on this, see Model Averaging in
CDS Analysis in this Chapter).
The above guidelines give a broad overview of how the analyst might proceed.
These ideas are developed much more fully in Buckland et al. (1993, 2001), and
extensive examples are given to illustrate the approach.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 89
• a detailed listing of results in the Results tab of the Analysis
Details window. These are described in the following section,
CDS Results Details Listing.
• a log of the analysis, highlighting any possible problems, in the
Log tab of the Analysis Details window. For information about
troubleshooting problems, see Chapter 12 - Troubleshooting.
• (optionally) text files, containing the results listing, analysis log,
summary statistics, bootstrap statistics and plot data. For more
about these, see the section on Exporting CDS Results.
90 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
The previous three pages are designed to help you diagnose model
fit. By default, you get three sets of these pages, with the data divided into
equally spaced intervals, and the number of intervals being n0.5, 2/3n0.5 and
3/2n0.5 (where n is the count of objects). Instead of using the defaults, we
recommend you always define your own cutpoints.
These pages are probably the most important of the whole results
output, and you should check them carefully. Check the Model Fitting output to
see if any of the models that were fit did not converge, or hit any of the
constraints. Even if the model affected is not the one that was eventually
selected (if you're using automatic model selection), the selection process can be
affected if there was a problem in the fitting. Check the plot(s) and GOF tables
to look for evidence of lack-of-fit, and possible problems with the data such as
rounding and evasive movement. These issues are mentioned in this Chapter on
the page entitled CDS Analysis Guidelines, and are covered in more detail in the
Distance Book.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 91
– see Model Averaging in CDS Analysis). Two types of
confidence limits are given. The first use the bootstrap standard
error to generate parametric, log-normal confidence limits. The
second use the percentile method – i.e., for x% confidence intervals
the (x/2)th and (100-x/2)th quantiles of the bootstrap estimates are
given. In general, the latter confidence intervals are considered
more reliable.
For information about how to export the results text or plots into
another program, see Exporting CDS Results from Analysis Details Results.
CDS Qq-plots
Distance gives quantile-quantile (qq)-plots for all analyses that use exact data
(i.e., those where the data is not transformed into intervals in the Data Filter).
Qq plots are useful for diagnosing problems in the data such as rounding to
preferred values and other systematic departures from the fitted model. A major
advantage of these plots over the histograms of the detection function and
probability density function (pdf) is that they do not require the data to be
grouped into intervals. A disadvantage is that they require a little effort to
understand the output.
In statistics, qq-plots are used to compare the distribution of two variables – if
they follow the same distribution, then a plot of the quantiles of the first variable
against the quantiles of the second should follow a straight line.
To compare the fit of a detection function model to the data, a standard method
is to plot the fitted cumulative distribution function (cdf) against the empirical
distribution function (edf). The cdf, F(x), gives the probability of getting a
distance less than or equal to x for a given model. The edf, S(x), gives the
proportion of the data with distances less than x. (Note – this explanation
ignores tied values.) If the data fit the model, then the fitted cdf and edf should
be the same.
To make the qq-plot, the fitted cdf is evaluated for each observation. The data
are then sorted into increasing order i=1, … n and the edf is calculated as (i-
0.5)/n. The following plot shows an example where 58 of the 204 data points are
at 0 distance (yes, this is a real dataset!). The red dots show the data, and the
blue line is where they should lie if the fit of the model was perfect.
92 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
The cdf at 0 distance is 0, so the 58 points appear along the bottom left side of
the plot. Clearly in this case, the data do not fit the model. This can be
confirmed in Distance using the Kolmogorov-Smirnov and Cramér-von Mises
goodness of fit tests on the next page of results output.
The following plot shows an example where the fit appears quite good, with
most points close to the line and little systematic departure. The data have
clearly been rounded (e.g., to the nearest meter) as there are several data points
at each level of the cdf. Such rounding should not affect the reliability of the
parameter estimates at all.
For more on qq-plots, see Chapter 11 of Buckland et al. (2004). For options
associated with qq-plots in the CDS engine, see the Program Reference page on
Model Definition Properties, Diagnostics - Detection Function Tab - CDS and
MCDS. For information about how to export qq-plots (and other output) from
Distance into Word processors, spreadsheet and other graphing programs, see
Exporting CDS Results from Analysis Details Results.
The k-s statistic can be used to test whether there is a significant departure
between the edf and cdf – in other words whether the data fit the model.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 93
where z = Dˆ n n and n is the number of observations (Gibbons 1971, page 81).
This should be suitably accurate for practical application for sample sizes of
about 35 or greater.
−∞
94 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
About CDS Detection Function Formulae
Understanding the Detection Function Model Formulae
On various pages of the results details listing, Distance presents the detection
function model used and the parameters estimated. For example, under
Detection Function, Model Fitting, you may see:
Model
Hazard Rate key, k(y) = 1 - Exp(-(y/A(1))**-A(2))
Simple polynomial adjustments of order(s) : 4
A( 1) bounds = ( .00000 , 1.0000 )
A( 2) bounds = ( 1.0000 , 2.0000 )
(
Hazard rate: 1 − exp − ( y σ )− b )
where σ is the scale parameter and b is the shape parameter. This formula is
given in the output above, and it can be seen that parameter A(1) corresponds to
σ and A(2) to b.
The formula for the polynomial series adjustment is given in Buckland et al.
(2001) as
∑ a j ( y s )2 j
m
Simple polynomial:
j =2
where m is the number of adjustment terms, and aj is the parameter for the
adjustment term of order 2j. In the above output, parameter A(3) corresponds to
the order 4 adjustment term (i.e., where j=2).
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 95
To calculate probability of detection at a given distance, y, you need to substitute
the parameter estimates into the formula
key ( y )[1 + series ( y s )]
g(y) =
key (0)[1 + series (0)]
For example, taking the results given above and assuming a truncation distance
w = 10, the probability of detection at y = 3 is
g(y) =
(1 − exp(− (3 0.4582) −1.195
))[1 − 1.954(3 10) ]
4
=
(1 − 0.8995)[0.9842]
(1 − 0)[1]
= 0.0989
To calculate average probability of detection over the surveyed strip, an easy
approach is to divide the interval (0, w) into a large number of evenly spaced
intervals, evaluate g(y) at each cutpoint and take the mean. (This is almost
equivalent to numerical integration of g(y)/w using the trapeziodal rule.)
Alternatively, use a numerical integration routine (e.g., the function integrate
in R) to integrate g(y) and divide the result by w.
The estimated effective strip width (line transects) or effective area (point
transects) in CDS analysis is given in the Results Details listing. However, there
are some circumstances when you may wish to calculate it outside of Distance
using the parameter estimates (one example is for MCDS analyses to calculate it
at a given covariate level).
For line transects, effective strip width is given by
w
μ= ∫ g ( y )dy
0
96 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
See the previous topic for a tip on how to get the best accuracy when
calculating statistics such as effective strip width (and average probability of
detection).
The best way to be sure which parameter is which is to run the analysis
without starting values and check the Detection Fct part of the output.
• When there are multiple models and multiple strata, the parameters
are not indexed separately when printing results in the Density
Estimates section. Only the model selected in each stratum
contributes to the indexing. For example, imagine Distance is
choosing among hazard rate and half-normal key functions, with no
adjustments, and that there are two strata. In stratum 1 it chooses
half-normal and in stratum 2 it chooses hazard rate. Then in the
Density Estimates output, parameter A(1) corresponds to the
half-normal parameter in stratum 1 and parameters A(2) and A(3)
correspond to the hazard rate parameter in stratum 2.
• This means that in some cases the parameter indexes in the
Density Estimates part of the output can be different from those
in the Detection Fct part. Hopefully the output in each section is
self-explanatory. The important thing to remember is that it is the
parameter indexes in the Detection Fct part that are used for
setting starting values – the output in the Density Estimates part
is for display purposes only.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 97
CDS Analysis Browser Results
When an analysis is run, a summary of the results is given in the right-hand pane
of the Analysis Browser. You can select which statistics that are displayed
separately for each analysis set by using the Column Manager (click the
button).
Most of the columns that are available for selection have obvious interpretations.
However a few require some additional explanation or amplification:
• Many columns will appear blank in the Analysis Browser when the
analysis is stratified. For example, the number of parameters,
probability of detection and chi-square p columns will be blank if
detection function is estimated by stratum.
• One exception to the above is the model selection statistics AIC,
AICc, BIC, and LogL (and respective Delta AIC, Delta AICc, …).
When detection function is estimated by stratum, these statistics are
summed across the estimated detection functions, making it easy to
compare models where detection function is estimated separately
by stratum vs those where it is pooled.
• The goodness of fit Chi-square p value is for the last test performed
(if more than one diagnostic test is performed), after automatic
pooling has taken place - see the last “Chi-sq GOF” page of the
Analysis Details Results Tab.
• Both bootstrap and analytic estimates of coefficient of variation
and confidence limits for the abundance and density estimates can
be displayed. The bootstrap estimates use the percentile method
(cf. bootstrap in the Distance Book). Bootstrap estimates obtained
from the bootstrap variance estimate, assuming a lognormal
distribution for the density estimate, are available in the Analysis
Details Results Tab Bootstrap Summary page.
• Bootstrap point estimates of abundance and density are also
available – these are the mean of the estimates from the bootstrap
replicates. They are especially useful if you have run the bootstrap
with multiple key functions as they are then model-averaged point
estimates – see Model Averaging in CDS Analysis for details.
98 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
Non-integer results are only shown in the Analysis Browser to a few
(usually 2) decimal places. However, if you copy and paste them into another
application (e.g., Excel) you can see them to 7 significant figures.
If you wish to use Excel to recreate a plot from the plot data, there is a
simple macro available on the Support page of the Program Distance Web Site to
do this.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 99
#this adds in the (0,0) (1,1) line
lines(c(0,1), c(0,1))
In the Model Definition Properties dialog, there are options to save files
containing summaries of the results of an analysis. Four files can be saved, all of
which are standard ASCII text files:
• Results Details File - see Misc. Tab - CDS and MCDS.
• Results Stats File - see Misc. Tab - CDS and MCDS.
• Bootstrap Stats File - see Variance Tab - CDS and MCDS.
• Plot File - see Detection Function Tab - CDS and MCDS.
These files may be useful in providing an interface between Distance and other
applications - for example you could write a spreadsheet macro to paste the
results stats file and extract information into spreadsheet cells. In addition, the
Bootstrap file is often useful for making diagnoses of problems encountered
while doing bootstrap resampling. The formats of these four files are given in
the MCDS Engine Command Language Appendix section Output from the
MCDS Engine.
Note that the results details file is the same as the text displayed in the Results
tab of the Analysis Details window for an analysis that has been run. You can
easily obtain this text by choosing the menu item Analysis – Results | Copy
Results to Clipboard or pressing the the Copy to Clipboard button on the
main toolbar. Similarly, you can obtain a copy of the plot data by displaying the
plot in Distance and pressing the Copy to Clipboard button. For more on this,
see Exporting CDS Results from Analysis Details Results.
If you observed, say 50 objects in the first bin of a transect you won't
want to have to create 50 records by hand and type the distance for each one.
Luckily there is a shortcut - simply create a record for the first object and enter
100 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
the distance (in this case 5m). Then double-click on the ID field to bring up the
multi-record add dialog. Select 49 and press append - this will automatically add
the other 50 records. See the Program Reference page Editing, Adding and
Deleting Records for more details.
When you have entered your data, the first thing you should do is tell Distance to
turn the data into intervals for analysis. Create a new analysis in the Analysis
Browser, and in the Analysis Details window, click on Properties… for the
default Data Filter. In the Intervals tab of the Data Filter click on “Transform
distance data into intervals for analysis” and enter your intervals.
Look at the Data Filter Truncation tab help page to find out about choosing the
level of truncation for your distance data.
If you run an analysis with data that includes missing distances, the
CDS engine will issue a warning and exclude the observations with missing
distances from the analysis.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 101
Clusters of Objects
In many studies, the objects of interest (usually animals) occur in clusters
(schools, flocks, etc.). In this case, each observation represents a cluster, and in
addition to the distance from the transect to the cluster the observer also records
the cluster size.
Distance sampling theory is readily extended to include clustered populations, as
outlined in Buckland et al. 2001. Distance allows you to specify that objects are
in clusters during the New Project Setup Wizard. It then automatically creates a
field for cluster size in the Observation data layer, and specifies that objects are
clusters in the default Survey object.
The key decision in the analysis of clustered data is how to estimate the expected
cluster size at zero distance. Distance offers a number of options for this, as
explained in Section 3.5 of Buckland et al. 2001. In the Data Filter Properties,
under Truncation, there is an option to right-truncate the data for cluster size
estimation independently of the truncation for estimating the detection function.
In addition, in the Model Definition Properties, there is a Cluster Size tab which
gives a number of options for estimating expected cluster size (see the Program
Reference page on the Cluster Size Tab - CDS and MCDS).
102 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
instead would use the mean cluster size (see Cluster Size Tab - CDS and MCDS
in the Program Reference).
An alternative analysis method would be to only include the trees where one
more parasitic plant was seen in the data – cluster size will then always be >1. A
disadvantage of this approach is that there will be less data available to fit the
detection function.
Another example of zero cluster sizes are in multi-species analyses where
species are encountered as mixed groups. You may have one field giving cluster
size for one species and a second field giving cluster size for the second species.
To do the analysis you would then a separate Survey object for each species and
use one survey object for each analysis (see Analysis with Multiple Surveys in
Chapter 7 for more on the use of multiple survey objects).
To analyze data that has been entered this way, you should click on the option
Use layer type: Stratum in the Model Definition Properties dialog
Estimate tab:
In the lower part of the Estimate tab, you can then select the level of estimation
for density, encounter rate, detection function and cluster size (if the
observations are clusters of individuals). In the following picture, density is
estimated separately for each stratum as well as overall (globally), encounter rate
and cluster size are estimated by stratum, but detection function is estimated
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 103
globally (i.e., pooled across strata). You can see this from the location of the
tick marks in the boxes.
Assuming you wish to estimate density by stratum and globally, you must tell
Distance how to combine the stratum estimates to produce a global estimate. If
your strata are geographic, the following options should be used:
The other options for the global density estimate are discussed in the following
sections.
Data sheet part of the data explorer, showing the Vessel field in the sample layer “Line
transect”
104 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
(For more information about how to create additional fields in Distance, see the
Program Reference page about the Data Explorer. Additional fields such as this
can also be imported into Distance as with the other survey data - see Chapter 5
of the Users Guide on Data Import. To analyze data where the stratum is entered
as an additional field, click on the Post-stratify, using option in the Model
Definition Properties dialog Estimate tab, and select the appropriate data
layer and data field:
Example from Estimate tab, showing post-stratification by the Vessel field in the sample
layer
Finally, we estimate the global abundance estimate as the mean of the stratum
estimates, but in this case we weight by survey effort.
When you choose to weight by survey effort, the tick box Strata are
replicates is enabled. In this case we do not want to tick that option. This, and
the other options, are discussed more in the next section, and in the Program
Reference page on the Model Definition Estimate Tab - CDS and MCDS.
If you wish to estimate overall density for multi-strata surveys, you can use the
Data Selection option in the Data Filter to set up a separate filter for each
level of your highest stratum. Then, do a separate analysis for each level and
combine the resulting density estimates by hand, remembering to include the
appropriate weightings. Variances can be calculated using the delta method
(Buckland et al. 2001). We hope to include multi-level stratification in a future
release of Distance.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 105
In these situations you create an extra field that indexes the type of effort. You
post-stratify on this field and estimate density as the mean of the post-stratum
estimates, weighted by survey effort:
One example is the scenario described in the previous section, where there is
observer heterogeneity. In this case, you would add an extra field in the sample
data layer for observer (/vessel/survey party/etc.), and then post-stratify by this
field. The global density estimate is given as the mean of the stratum estimates,
weighted by survey effort.
Another example is where the study area is surveyed in multiple time periods,
using a different set of samples (transects) in each time period. Even without
wanting to post-stratify, it would be a good idea to add an extra field that
indicates the time period of each transect in the sample data layer, as this would
enable you to use the Data Selection feature of the Data Filter to pick out only
certain time periods for analysis. Post-stratification becomes useful if you want
a combined estimate of the average density over all periods. This is done by
post-stratifying on the time period field, and asking for a combined estimate of
density that is the mean of the post-stratum estimates, weighted by survey effort.
If observers, methods and conditions were the same at all time points it would be
reasonable to investigate the possibility of pooling the detection function over
the time periods.
A third example is similar to the previous one, in that the study area is surveyed
at multiple time points, but in this case the same set of samples (transects) were
used at each time point. You could set up the project in the same way as for the
previous case, but this would necessitate entering each transect into the sample
data layer once for each time period. Instead, you may consider adding the time
period field to the Observation data layer - this way each transect only has to
appear in the sample data layer once, while you indicate the time period that
each object was observed. To estimate mean density over the whole study, you
post-stratify on the time period column in the observation data layer and estimate
overall density as the mean in the post-strata. One small problem occurs in this
scenario when you come to estimate variance - each stratum in each year will be
treated as independent when in fact they are not. We hope to address this in a
future release of Distance.
Strata as replicates when there are different types of survey effort
In all of the above cases, we chose to weight by survey effort. When you weight
by survey effort, there is an option to treat the strata as replicates. This affects
how the variance of the global density estimate is calculated. Ticking the Strata
are replicates option means that you consider the strata you surveyed to be a
random sample from some larger population of possible strata that could have
been surveyed.
For example, consider the case when we survey at multiple time points – say
multiple days during a year. We may consider the days we surveyed to be a
sample from all those in the year, and we want to make inferences about the
average density of animals during the year. In this case we tick the Strata as
replicates option. Our estimate is then the effort-weighted mean density, and
our variance is calculated from the variation in density between days – i.e.,
treating strata as replicate samples.
On the other hand, let’s imagine that our multiple time points are actually years,
and we only surveyed in two years (we pooled the data within year). We could
consider the two years of data to be a sample from some larger set of possible
years – but this does not seem very useful and in any case a sample of two is not
very large for making inferences about average density over this larger set of
years. Instead, we will make inferences only about the average density over the
two years we sampled. We do not tick the Strata as replicates option. Our
106 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
estimate is still the effort-weighted mean density, but now the variance is
calculated from a weighted average of the stratum variances.
The difference between the two scenarios is known in the statistical literature as
treating the strata as random effects (the first scenario) or fixed effects (the
second scenario). The right one for your study depends on the inferences you
are making – is it to the average density over a larger set of strata from which
you have a random sample (random effect), or is it just to the average density
over the strata in which you sampled (fixed effect).
The variance calculation is given in more detail in the Program Reference page
on the Model Definition Estimate Tab - CDS and MCDS.
2. Where there are different types of object (animal) in the
population.
If the population can be divided into different “sub-populations”, each with
different encounter rates or detection functions, then it may be possible to
increase precision through post-stratification. This is done by creating an extra
field in the Observation data layer that indexes the sub-population type. You
then post-stratify on this field, with the global density estimate as the sum of the
post-stratum estimates:
One example of this would be where male and female animals have very
different detectabilities. For each animal, its sex would be entered as an
additional field (of type "Other") in the Observation data layer. In the Model
definition, you would choose Post-stratification by the sex field.
3. Where there is not enough data to estimate a detection function
for some subsets of the study.
For example, in a multi-species study it is often not possible to estimate f(0)
reliably for the rarer species. In this case, it may be acceptable to estimate the
detection function by pooling over similar species. To do this, you would add a
column to the Observation data layer for species name or ID. Then, define a
Data Filter that uses the Data Selection to include only the species for which you
wish to pool the detection function. In the Model Definition, post-stratify by
species and choose the following Levels of estimation:
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 107
methods used is given here; this can be safely skipped on first reading through
this manual.
Variance options are chosen in the Model Definition, Variance tab – for more
on these options, see Variance Tab - CDS and MCDS in the Program Reference
appendix.
Analytic variance estimation
Density can be estimated globally, by stratum and/or by sample. At the lowest
of the levels requested (e.g., at the stratum level of both globally and by sample
are selected), variance estimates for encounter rate, detection probability and
expected cluster size are combined using the delta method (formula 3.68 of
Buckland et al. 2001) to give the variance of the density estimate at that level.
Lognormal confidence intervals are calculated using formulae 3.71-3.74 except
that t-based limits are calculated using degrees of freedom calculated using the
Satterthwaite method given in formula 3.75.
If there are any multipliers (see Multipliers in CDS Analysis) with non-zero
variance these are included in formula 3.68 as extra terms. If they have a non-
zero degrees of freedom they are also included in the degrees of freedom
calculation of equation 3.75. If degrees of freedom for the multiplier is not
specified, it is assumed zero, and the multiplier is omitted from both the top and
bottom line of equation 3.75.
By default, encounter rate variance is calculated using the empirical between-
sample variation in encounter rate, as detailed in section 3.6.2 (formulae 3.77-
3.82). Alternatively, the user may specify that encounter rate variance follows a
Poisson or overdispersed Poisson distribution (see Variance Tab - CDS and
MCDS). In this case, the encounter rate variance is assigned zero degrees of
freedom and the encounter rate term is omitted from both the top and bottom line
of equation 3.75.
When there is only one line, the default option is to set encounter
rate variance to zero – see Analysis of Data from a Single Transect in CDS.
In Distance sampling, there are often situations where the standard methods
produce a density estimate that is only proportional to the true density. For
example, when detection probability on the trackline (g0) is less than one, the
108 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
true density is the Distance density estimate divided by g0. Another example
comes from cue counts, where the density of animals is the density of cues
divided by the cue rate.
In Distance, such factors are dealt with using Multipliers. You enter the
multiplier value in the global data layer and in the Multipliers tab page of the
Model Definition Properties, you tell Distance which multipliers you want to use
in your analyses and whether they divide or multiply the density estimate.
Distance then scales you density estimate appropriately.
Some multipliers are known with certainty. One such multiplier is the “sampling
fraction” – the proportion of each line or point surveyed. Normally this is 1, but
in some cases you may only survey one side of a transect line, so the sampling
fraction is 0.5. Another example is a cue count survey, where the sampling
fraction is the proportion of a full circle that is covered by the observation sector.
A third example is when all points or lines are visited multiple times – then the
sampling fraction is the number of visits. In these cases, you tell Distance to
create a field for the multiplier, enter the appropriate value, and tell Distance to
multiply the Density estimate by the value in this field.
If the sampling fraction is not the same for all lines or points, you
account for this by adjusting the survey effort at the data entry stage. For
example, if all your transects were 10km long, but you visited some 3 times, then
you set the survey effort for these to 30km, and leave the others at 10km. In this
situation you don’t need a sampling fraction multiplier.
Other multipliers are based on estimates from other experiments. For example,
in a cetacean survey, g(0) may be less than one because some animals are below
the surface and so not available for detection. You may have estimated g(0)
based on a separate experiment where you follow a sample of animals and record
the proportion of time they are on the surface. In these cases, your estimate of
the multiplier will have uncertainty associated with it. If you want this
uncertainty to be reflected in the variance of the final density estimate, you do
this in Distance by having additional fields for the multiplier standard error (SE)
and degrees of freedom (df). Note that you can have fields for both SE and df,
or just the SE. In this latter case, Distance assumes the DF for the multiplier is
infinity. (Another way to specify infinite DF is to have a field for DF containing
the value 0.0) Degrees of freedom of the multiplier affect the DF of the density
estimate as the density estimate DF is calculated using the Satterthwaite
approximation (formula 3.75 of Buckland et al. 2001). This in turn affects the
log-normal confidence limits on the density estimate.
In the Setup Project Wizard you are given the chance to create fields in the
Global data layer for a number of common multipliers. You can also add more
fields manually after the project is created using the Append Field button in the
Data Explorer (see Data Explorer in the Program Reference).
If you use the Setup Project Wizard to define your multiplier fields,
then they will appear automatically in the Multiplier tab in Model Definition
Properties for the default CDS analysis. For these fields, Distance also knows
whether they should multiply or divide the density estimate. The default value
for a multiplier created by the wizard is 1.0, with SE 0 and DF 0 (i.e., infinity) –
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 109
in other words a multiplier that doesn't affect the density estimate at all! It's up
to you to enter appropriate values into the multiplier fields.
110 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
probability of detection as 1.0. The multiplier you created will
allows you to specify the probability of detection yourself.
Variance of the density estimate is calculated correctly
automatically.
Another example of the use of multipliers is given in the Getting Started chapter,
Example 2: More Complex Data Import.
A third example would be when some distances are missing from the dataset,
and you are confident that these are missing at random (i.e., it isn’t just the
farthest away distances that are missing). Then you could fit a detection
function to the data for which you have distances, enter the estimated detection
probability (and associated SE and df) as a multiplier and estimate density using
the whole dataset. For more on this, and a perhaps better approach, see Missing
Data in CDS Analysis.
When using AIC to select among alternative candidate models of the detection
function, it is not unusual to find that more than one model have similar AIC
scores (perhaps differing by AICs of 2 or fewer). When this happens, more
reliable inferences can be obtained by basing the final results on an AIC-
weighted average of these plausible alternative models (Buckland et al. 1997;
Burnham and Anderson 2002). To do this:
• Create a new Model Definition.
• In the Detection Function, Models tab, click on the + button to
create one line for each candidate model. For each one, set the
appropriate key function and adjustment terms. Select the option to
Select among multiple models using AIC
• In the Variance tab, tick on Select non-parametric
bootstrap, and set appropriate options for the level of
bootstrapping and number of bootstraps.
When you run an analysis using this model definition, each bootstrap replicate
will use AIC to choose among the candidate models. The bootstrap point and
interval estimates you get will then be an average over all the replicates, and so
will include uncertainty as to which model is best.
You can only use this approach to include different candidate key
function + adjustment term combinations. There is currently no way within
Distance to do model averaging over, say, global and by-stratum analyses, or
CDS and MCDS models, or different covariates within an MCDS model. This
would have to be done by writing an external bootstrap routine and running the
analysis engine as a stand-alone program (see Running CDS Analyses From
Outside Distance).
For more on the bootstrap options, see Variance Tab - CDS and MCDS in the
Program Reference.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 111
Sample Definition in CDS Analysis
Sample definition allows you to specify the data layer to be used in the
estimation of the encounter rate variance.
For example, imagine you have line transect data from a survey where 3 lines are
divided into 4 short segments. The spacing between segments is approximately
the same as the spacing between transects, so a map of the design is as follows:
- - - -
- - - -
- - - -
The data are stored in distance in a Global layer called “Study area”, a Sample
layer, “Transect”, a Sub-sample1 layer, “Segment”, and an Observation layer,
“Sightings”:
When you come to analyze these data, you must choose whether to treat the 3
transects or the 12 segments as the samples, for estimating variance in encounter
rate. You make this selection in the Sample definition section of the
Estimate tab of the Model Definition Properties:
In this case, because the distance between segments is the same as the distance
between transects, it is valid to treat each segment as a separate sample. So you
would choose the layer type “SubSample1” as the sample definition.
If the between-segment spacing was much less than the between-transect
spacing, then you would choose the layer type “Sample” as the sample
definition, and Distance would pool the data from the segments on each transect
for estimating encounter rate variance.
112 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
make inferences about the density or abundance of animals in the study area.
However, in some (rare) cases, we may just want to make inferences about the
density or abundance of animals in the region covered by the samplers. One
example where we actually survey the whole study area – for example in a
simulation experiment, where we lay out objects such as golf tees or wooden
stakes in a strip and then have observers do a line transect experiment within that
strip. Another example is when we do not lay out the transect lines using a
survey design with an element of randomization in it, but lay the lines
purposively. In this case, we cannot guarantee that our survey lines will be
representative of the study area and so one approach is to restrict inferences only
to density in the region actually covered by the lines. (Another is to use a
model-based approach such as that of Hedley et al. 2004)
Restricting inferences in this way does not affect the density estimate at all.
Abundance is simply density multiplied by the area covered (the sum of the
areas of the samplers). The big difference comes in the variance. Normally in
distance sampling, there are three components that make up the variance of the
density or abundance estimate: variance from estimating the detection
probability, variance from estimating population mean cluster size and variance
from spatial variability in encounter rate between samplers. The third
component is often the largest, and typically it is estimated from the empirical
variation in encounter rate between samplers. When we only wish to make
inferences about density or abundance in the covered region, the only relevant
sources of uncertainty are the first two – the third is no longer relevant.
If we wish to restrict inferences in this way, how can we ensure that Distance
sets the encounter rate variance to zero? The trick is as follows. For the CDS
(and MCDS) engine, in the Model Definition, Variance tab, under Analytic
variance estimate, choose the option Assume distribution is Poisson,
with overdispersion factor and set the factor to 0. Once you’ve run the
analysis, you can check in the results that the encounter rate variance is zero. If
you’re only interested in density, this is all you have to do. If you also want the
correct abundance estimate, you need to set study area (either in the global layer
or stratum layer, if you have strata) to be the sum of the area of the samplers
(either globally or by stratum) – i.e., the total length times twice the truncation
width for line transects or number of points times truncation radius for point
transects.
User's Guide Distance 6.0 Beta 5 Chapter 8 - Conventional Distance Sampling Analysis • 113
Running CDS Analyses From Outside Distance
The CDS engine is implemented as a stand-alone FORTRAN program,
MCDS.exe. This program is called behind the scenes by Distance when you
press the Run button on the Analysis Details Inputs tab.
Some users may wish to run the engine from outside the Distance interface –
either from the Windows command line or from another program. For example,
you may want to automate the running of analyses for simulations, or you may
want to perform a more complicated bootstrap than Distance allows.
Full documentation for running MCDS.exe is provided in the Appendix - MCDS
Engine Reference.
114 • Chapter 8 - Conventional Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
Chapter 9 - Multiple Covariates
Distance Sampling Analysis
Example estimated detection functions, where cluster size (Sbar) is the covariate. The
basic shape of the function is the same (half-normal), but the effective strip width is wider
at cluster size 750.
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 115
Of course, it is possible for covariates to affect both the shape and
scale of the detection function. Such models could be fit in Distance (for factor
covariates) using stratification. There is, however, some evidence that at least
some covariates may only affect the scale: Otto and Pollock (1990) examined the
effect of cluster size and distance on the detection function of graduate students
searching for beer cans. They found that a model where cluster size influenced
only the scale of the detection function fit the data best.
The covariates are assumed to affect the scale parameter of the key function, σ .
The scale parameter controls the “width” of the detection function. Of the four
key functions available in the CDS engine, the half-normal and hazard-rate are
both available in the MCDS engine; the other two either do not have a scale
parameter (uniform), or provide an implausible shape close to 0 distance
(exponential).
{
Half-normal key function, exp − x 2 2σ (z )2 }
116 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
{
Hazard-rate key function, 1 − exp − [x σ (z )]− b }
The scale parameter is modeled as an exponential function of the covariates:
σ (z ) = exp(β 0 + β1 z1 + β 2 z 2 + Kβ q z q )
where q is the number of covariate parameters. The term inside the brackets is
akin to a linear model – the β ’s are parameters to be estimated, with β 0
corresponding to the intercept. The exponential term prevents the scale
parameter from being negative.
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 117
More complex models are constructed along similar lines. For example, a model
containing habitat as a factor covariate and Beaufort as non-factor covariate will
be:
σ ( z ) = exp(β 0 + β1 z1 + β 2 z 2 + β 3 z 3 )
where the β 0 parameter corresponds to the effect of wood and the intercept of
Beaufort, β1 corresponds to the additional effect of grassland, β 2 corresponds to
the additional effect of scrub, and β 3 is the slope of the Beaufort covariate.
Interactions between covariates can be modeled by creating a new field in the
Distance data sheet that contains one covariate multiplied by the other. We hope
to extend the MCDS engine to allow easier modeling of interactions in a future
version.
118 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
dealing with this (Buckland et al. 2001), including predicting expected cluster
size at zero distance using a regression of cluster size against distance (or against
probability of detection). The expected cluster size is then used when converting
the estimated density of clusters into density of individuals. In Distance, these
options are accessible in the Cluster Size tab of the Model Definition
Properties window.
In MCDS, there is another alternative: cluster size can be included in the
detection function model as a covariate. In this case, the size bias will be
allowed for in the detection function model. Density of individuals can be
obtained directly from the Horvitz-Thompson-like estimator used in the MCDS
engine (Marques and Buckland 2004), so the options in the Cluster Size tab
become obsolete (indeed, this tab is greyed out when you select cluster size as a
covariate).
When cluster size is a covariate, several options also change in the Model
Definition Properties window Estimate tab. The changes are outlined
below; see also the Estimate Tab - CDS and MCDS page of the Program
Reference.
Stratification and post-stratification
A restriction when you select cluster size as a covariate is that stratification and
post-stratification is no longer possible – see Stratification and Post-stratification
in MCDS for more on this).
Variance estimation
A second restriction when you select cluster size as a covariate is that the
variance of the estimate of density of individuals is not estimated analytically,
but is instead obtained using the bootstrap. We also use the bootstrap to obtain a
variance for the estimated expected cluster size. Analytic formulae for these
variances are given by Marques and Buckland (2004) but are not currently
implemented in the software.
Simple polynomial m
∑ a j y s2 j
j =2
Hermite polynomial m
∑a j H 2 j (ys )
j =2
(1Note that when a uniform key function is used in CDS, the summation is from j=1 to m)
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 119
when search effort was conducted in such a way that the probability of detection
is consistently higher at a given distance than the model without adjustments
would predict. Nevertheless, to enable models with a truly constant shape to be
fit in the presence of adjustment terms, Distance also offers the ability to scale
the distance by σ, the scale parameter of the half-normal or hazard rate key
functions (see Defining MCDS Models on page 3 of this chapter for more on the
scale parameter). Since σ is a function of the covariates, this means that the
scaling will be different at each covariate level, and the shape of the function
will be preserved.
This option is set in the Model Definition, under the Detection function |
Adjustment terms tab. For more information, see the Program Reference
pages on the Model Definition Properties Dialog.
120 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
Specifying the Model
Fitting the detection function with multiple covariates is significantly harder,
computationally, than in the case where there is only one covariate. This has
several consequences:
• the analysis engine takes longer to run
• the algorithm will fail to converge more often
Because of this, it is important to be careful and thoughtful when setting up and
running MCDS analyses. Here are some recommendations:
• Rather than including numerous covariates at once, start by
including one covariate at a time, and selecting the covariate that
gives the best model fit/lowest AIC/AICc/BIC. You can then carry
out forward stepwise selection by adding one additional covariate
at a time, while retaining the one(s) already selected, until there is
no decrease in the AIC/AICc/BIC value (depending on the criteria
you are using).
• Bear in mind that factor covariates are usually harder to fit than
non-factor covariates, especially as the number of factor levels
increases (see Factor and Non-factor Covariates in MCDS earlier in
this Chapter for information on factor covariates). If you encounter
problems while trying to fit a factor covariate, try reducing
(condensing) its number of levels, if possible.
• Avoid using the feature that allows automatic selection of
adjustment terms – at least to start with. Instead, start by using a
model with no adjustments, and if this converges try one with one
adjustment. Gradually work up to more adjustments if required.
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 121
• Convergence is often very sensitive to the starting values used.
You can set starting values manually using Model Definition
Properties, Detection Function, Adjustment Terms,
Manually select starting values.
122 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
• K-S GOF Test. Similar to CDS output. For more about these
CDS Goodness of Fit Tests, see Chapter 8.
• Plot: Detection probability. These plots can be used in the
same way as for the CDS engine – for comparing the estimated
function with the histograms of counts (or scaled counts for point
transects). However, because covariates affect the detection
function, there is no one single detection function to display.
Instead, the plotted functions are the average detection function,
conditional on the observed covariates. The histograms show
observed frequencies at given distances, pooled over all covariates.
For more information, see Chi-square GOF tests and related plots
in this Chapter.
• Plot: Pdf. (Point transects only) See above.
• Chi-sq GOF test. See above.
• FCx. When there are factor covariates, you get a set of diagnostic
plots for each factor combination, using only the data for that
combination. For example, if there are two factor covariates, one
with 3 levels and the other with 4, there will be 3x4=12 possible
factor combinations (although some may not occur in the dataset,
so will not be shown).
• Plot: Det Prb – the detection probability plot for the given
factor combination. If there are any non-factor covariates,
then the plotted function will be the average detection
probability, conditional on the observed non-factor covariates,
and the histograms will show observed frequencies pooled
over the non-factor covariates.
• Plot: Pdf. (Point transects only) See above.
• Plot: Examp Det Funcs. If there are any non-factor
covariates, then the above plots show the detection function (or
pdf) averaged over the observed covariate levels. Depending
on the observed covariates, the shape of these functions can be
quite different from the detection function given fixed
covariate values. So, for each non-factor covariate, the MCDS
engine outputs a plot showing 3 example detection functions.
These are evaluated at the 25th, 50th and 75th quartiles of the
covariate. If there is more than one non-factor covariate, the
values of the other covariates are fixed at the 50th percentile.
For information about how to export the results text or plots into
another program, see Exporting MCDS Results.
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 123
Chi-square GOF Tests and Related Plots
In the CDS engine, the detection function depends on distance alone, and this
function is displayed in the detection probability plots. By contrast, in the
MCDS engine, probability of detection depends on other covariates, so there are
many possible detection functions, depending on the covariate levels. Hence
detection probability plots show the average detection function, conditional on
the observed covariates. Similarly, for point transects, the probability density
function (pdf) shown is the average pdf conditional on the observed covariates.
For example, the average conditional pdf at distance j is calculated as:
f ( j, z i )
n
1
f ( j | z) =
n ∑ w
i =1
∫ f (x, z i )dx
x =0
where f (x, z ) is the joint density function, n is the number of observations, and
w is the truncation distance. (Assuming no left truncation, otherwise x=0 in the
integral is replaced by x=left truncation distance.)
Expected frequencies for the chi-square GOF test are calculated similarly. For
the jth bin, with cutpoints c j1 to c j 2 , expected frequency is:
c j2
n ∫ f (x, z i )dx
( ) ∑
Eˆ n j =
c j1
w
i =1
∫ f (x, z )dx
0
i
As with the CDS engine, the observed frequencies correspond to the total
number of observations which fall within each bin.
124 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
If you have missing covariate data for some observations, then this can be treated
in a similar way to missing distances (see above).
If you run an analysis with data that includes missing covariates, the
MCDS engine will issue a warning and exclude each observation with a missing
covariate value.
Double observer data comes from surveys where two (semi-) independent
observer teams perform a distance sampling survey and duplicate detections are
identified. The method for setting up double observer data in Distance is
outlined in the topic Setting up a Project for MRDS Analysis in Chapter 10 of
the Users Guide.
There are two ways to achieve an MCDS (or CDS) analysis of double observer
data in Distance:
• Analyze the data using the MRDS engine, with Detection
Function | Method in the model definition set equal to ds –
single observer. Using this approach, duplicate observations are
automatically removed. For more information, see Single Observer
Configuration in the MRDS Engine in Chapter 10 of the Users
Guide.
• Analyze the data using the MCDS engine. Using this approach, it
is necessary to use the data filter to specify either observer 1 or
observer 2 data or both should be used.
Here, we focus on the second option.
To analyze double-observer data using the MCDS engine, you set up an analysis
where the analysis engine option in the associated model definition is MCDS.
User's Guide Distance 6.0 Beta 5 Chapter 9 - Multiple Covariates Distance Sampling Analysis • 125
However, it is also necessary to set up a data filter specifically to achieve the
desired analysis. This is because double-observer data is entered into distance
with two records for each detected object in the observation layer. So, an
analysis that ignores this will have two records for each object, and so more data
than there should be. Double observer data has two fields that can be used in
conjunction with the data selection options in the Data Filter to achieve the
desired analysis. The two fields are observer (which is 1 for the first observer
and 2 for the second) and detected (which is 1 if the animal was detected by that
observer and 0 if not; note that the actual field names may be different from
this). For more details on these, see Setting up a Project for MRDS Analysis in
Chapter 10 of the Users Guide.
Three types of analysis can be envisaged:
12. An analysis of only objects detected by observer 1. To achieve
this, set up a Data Filter with data selection at the Observation
layer, and the data selection criterion
(observer=1) and (detected=1)
Note that “observer” and “detected” are the default field names for
these fields, but the actual field names in your project may be
different if you set up the double-observer project manually (rather
than via the Setup Project Wizard).
13. An analysis of only objects detected by observer 2. To achieve
this, you want the data selection criterion
(observer=2) and (detected=1)
126 • Chapter 9 - Multiple Covariates Distance Sampling Analysis User's Guide Distance 6.0 Beta 5
Chapter 10 - Mark Recapture
Distance Sampling
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 127
follow, and even more complex non-continuous intervals are
planned). Note that you can’t do left or right truncation when the
data are in intervals.
• Up to one level of geographic stratification (see Stratification and
Post-stratification in MRDS; extension to multiple levels of
stratification and other types of stratification may follow)
• Three analytic methods of variance estimation (see Variance
Estimation in MRDS; bootstrap may follow)
• Estimation of density/abundance using a detection function fitted in
a previous analysis – allows different subsets of the data to be used
for encounter rate and detection function fitting (see Using a
Previously Fitted Detection Function to Estimate Density in
MRDS).
The MRDS engine is implemented as a library in the free statistical software R.
When you run an MRDS analysis from Distance, Distance creates a sequence of
R commands, calls the R software, waits for the results and then reads them back
in. Therefore, before you can use the MRDS engine, you must first ensure that
you have R correctly installed and configured. For more on this, see R
Statistical Software in Chapter 7 of the Users Guide.
To analyze double-observer data in Distance, you then need to set up the project
appropriately and include data in the correct format – see Setting up a Project for
MRDS Analysis. You must next create one or more model definitions that
specifies to use the MRDS analysis engine, and associate these model definitions
with analyses, which you can then run. For more about the basics of setting up
analyses, see Chapter 7 - Analysis in Distance. More details of the various
models available in the MRDS engine are given in Defining MRDS Models, and
a detailed description of the options available in the Model Definition Properties
pages for this engine is given in the Program Reference pages Model Definition
Properties - MRDS.
In this chapter we also provide some analysis guidelines, give a list of the output
the engine can produce and cover various miscellaneous topics such as how to
deal with interval data, stratification, etc.
If you are familiar with the R software, you can run the MRDS
engine directly from within R, bypassing the Distance interface altogether. For
more information, see Running the MRDS Analysis Engine from Outside
Distance.
128 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
Setting up a Project for MRDS Analysis
The easiest way to set up a new project for an MRDS analysis is using the Setup
Project Wizard.
• In Step 1, under I want to:, select Analyze a survey that has
been completed.
• In Step 3, under Observer configuration, select Double
observer.
• Follow through the rest of the wizard as usual.
Distance then creates the appropriate data fields for double observer data, and
you can then import your data using the Import Data Wizard. For more about
the data requirements, see Setting up your Data for MRDS Analysis.
Alternatively, you can create the appropriate fields by hand, and manually create
a new survey object with the appropriate observer configuration and data files.
For more about survey objects, see Working with Surveys During Analysis in
Chapter 7.
Part of the Observation layer from the Golftees double observer example project
If you open the Golftees sample project you will notice that:
• The object number goes up sequentially in this example (1, 2, 3,
…) – in general the object number should be unique but it doesn’t
need to be sequential.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 129
• In this example the two records for each object come one after the
other (e.g., lines 1 and 2 are Observer 1 and Observer 2 for object
1) – in general they don’t have to so long as there are two lines for
each object – one with Observer 1 and one with Observer 2. For
example, you might like to structure your data with all records for
observer 1 first and then all records for observer 2
• The detected field indicates whether an observer saw the object or
not. For example, object 1 was seen by observer 1 but not by
observer 2.
• In this example, the distance field (Perp distance) contains the same
distance for both observers, regardless of whether the observer saw
the object or not. In general you should always put the same
distance for both observers – a version that can deal with
measurement error and so allow different distances is planned.
• There are some additional covariates in the observation layer in this
example: sex, exposure and Cluster size. In general, covariates can
be placed in any of the data layers – although the rules for referring
to the covariates differ between the layers – for more on this, see
Defining MRDS Models.
• In this example, there is only one transect, and so there are no
transects on which no objects were seen. In general there may well
be transects with no objects. On these transects you should not
enter any observations (just as with the CDS engine – see the
Example Data Sheet picture on the Data Fields page in Chapter 5
for an example, and see Introduction to Data Import for how to
import such data).
130 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
1
For more on this, see the topic Single Observer Configuration in the MRDS Engine later
in this chapter.
2
These methods are not implemented in the current version of Distance
The fitting method is chosen on the Detection Function | Method page of the
Model Definition Properties.
DS and MR Models
The form of the two DS and MR models are different. The implementation here
corresponds with models 1 and 3 of Table 6.1 from Laake and Borchers (2004) –
see also that reference for more information.
DS Model
The DS model is of the same form as the models used in the MCDS engine,
except that currently only the key function part of the model is implemented,
with no adjustment terms. The two key functions implemented are half-normal
and hazard rate:
{
Half-normal key function, exp − x 2 2σ (z )2 }
Hazard-rate key function, 1 − exp − [x σ (z )] { −b
}
The scale parameter, σ (z ) , is modeled as an exponential function of the
covariates:
σ ( z ) = exp (β 0 + β 1 z 1 + β 2 z 2 + K β q z q )
The key function and covariates to use are specified on the Detection
Function | DS Model page of the Model Definition Properties. One can also
choose no additional covariates, which corresponds with a CDS model. Note
that distance (x in the above formulae) is automatically a part of this model.
MR Model
The MR model is currently implemented as a logistic model – i.e. it is of the
form:
p j|3− j ( y , z ) =
(
exp β 0 + β1 z1 + β 2 z 2 + Kβ q z q )
(
1 + exp β 0 + β1 z1 + β 2 z 2 + Kβ q z q )
The covariates to use are specified on the Detection Function | MR Model
page of the Model Definition Properties. Note that these covariates need not be
the same as those chosen for the DS model, if there is a DS model with the
chosen fitting method. Note also that unlike for DS models, distance is not
automatically a part of the model – if you want to include distance as a covariate
in an MR model you must explicitly include it in the model formula.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 131
of terms joined by operators such as “+”. The terms represent covariates and the
operators tell Distance how the covariates relate to one another.
For example, the MR formula
distance + sex + exposure
means include the data from the fields distance, sex and exposure as covariates.
To understand how to specify formulae, we need to understand (1) how to
translate a field in the Distance database into a covariate to specify and (2) what
are the possible operators and how do they work. We also need to understand
the difference between factor and non-factor covariates and how to specify
which is which. These are covered in the following sections.
For example, if you then wanted to specify a formula with the label field from
the region layer (i.e., [Region].[Label]) and observer from the Observation layer
(i.e., [Observation].[observer]) as covariates, you would write the formula as:
stratum.label + observer
132 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
it renames it by adding the layer type and a dot in front of the field
name. For example, by default, there is a field called “Label” in the
sample, stratum and global layers. The sample layer is the lowest, so
the formula name for this is label (note it is lower case as all formula
names are changed to lower case – see below). The stratum and global
layers are higher, so the formula names for the Label fields in those
layers are stratum.label and global.label.
• To comply with R object naming rules, and make life simpler:
• spaces and anything else that isn’t a letter or number are
replaced by dots – e.g., “type of habitat” becomes
type.of.habitat.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 133
More About DS and MR Model Formulae
• An intercept term is included in formulae by default. To remove it
you can use “- 1” - for example sex - 1, while to specify an
intercept-only formula, you use 1 alone.
You can specify covariates as factors even if they are not included in a
model in the current model definition. It saves time to list all the factor
covariates in your first model definition, as this list will then be copied to all
subsequent model definitions that you define, saving you the bother of having to
type the factor list for each model definition. In the above example, we could
have specified observer, sex, exposure as the factor list – observer is not in
the current model, but if we subsequently define a model definition based on this
one and include observer as a covariate we won’t have to remember to include it
in the list of factors as it will already be there.
These methods are relatively new, so we are only beginning to gain experience
on effective analysis strategies. Some preliminary guidelines follow – please let
us know if you can suggest some further guidelines or amendements.
• Start with some CDS and MCDS analyses (i.e. using the CDS/MCDS
engines) to get a ball park on the detection function shapes, etc. – see
Analysis of Double Observer Data with the MCDS Engine in Chapter 9
for some tips on doing this. You can also perform CDS and MCDS
analyses using the MRDS engine – see Single Observer Configuration
in the MRDS Engine later in this chapter.
• Your first MRDS analysis could be a Peterson or other simple model –
helps work out what covariate names to use (see Translating Distance
134 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
fields into DS and MR covariates), and should also fit without
problems.
• Build up covariates slowly. You may need to specify starting values
(although this option isn’t available currently in the interface) look at
the iteration history (Detection Function | Control page of Model
Definition), etc. to work around any convergence problems.
• If you experience problems, check Problems with the MRDS Engine in
the Troubleshooting chapter, and also check the Program Distance Web
Site for the latest list of known problems.
• This is a new analysis engine – you can expect some teething problems.
Contact the program authors if you can’t resolve them (see Sending
Suggestions and Reporting Problems).
You can get more information about the fitting process using the
showit control setting – for details, see the section on Fine-tuning an MRDS
Analysis.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 135
• Goodness-of-fit. Chi-squared goodness of fit tests for the
DS and/or MR models (depending on the fitting method), and
Kolmogorov-Smirnov and Cramér-von Mises tests (for exact
data). For more about these latter tests, see CDS Goodness of
Fit Tests in Chapter 8.
• Plot: Detection Probability. Plots of the fitted detection
functions, superimposed on histograms showing the frequency
of counts. The estimated probability of detection of each
observation (given its covariate values and distance) is also
shown. The number of plots depends on the fitting method.
The plots are stored as image files in the R directory, and so can easily
be imported into other programs - see Exporting MRDS Results.
136 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
Miscellaneous MRDS Analysis Topics
The cluster size field is one of the fields with a fixed name in
detection function formulae in MRDS (see Translating Distance Fields into DS
and MR Covariates) – in formulae you should use the name size regardless of
the actual field name.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 137
length of the kth line, θˆ is a vector of length r containing the parameter
estimates of the detection function model, and H −1 θˆ is the jmth jm ()
element of the inverse of the Hessian matrix for θˆ . (See also formula
3.35 for the variance of the number of individuals.) More details are
given in Innes et al. (2002) and Marques and Buckland (2004).
• Buckland et al. (2001) - based on the delta method, using the
empirical variance in encounter rate between samples. This
method is based on the conventional distance sampling variance
estimator of Buckland et al. (2001, Sections 3.6.1 and 3.6.2), extended
to allow probability of detection to vary among individuals. The
method assumes independence between the estimates of detection
function parameters, encounter rate (and mean cluster size for variance
of the estimate of abundance of individuals) – an assumption not made
by the Innes et al. estimator. For that reason, the Innes et al. estimator
is preferred. The formula can be written:
⎧ ⎛n n ⎞
2 ⎫
⎪ K k⎜
l ⎜ sk − s ⎟⎟ ⎪
()
2
( ) ⎛ A ⎞ ⎪ Nˆ cs L ∂Nˆ cs ∂Nˆ cs −1 ˆ ⎪
r r
l L
vâr Nˆ s =⎜ ⎟ ⎨
⎝ 2 wL ⎠ ⎪ (n s w )2
∑k =1
⎝ k
K −1
⎠ +
∑∑ ˆ
j =1 m =1 ∂θ j ∂θ m
ˆ
H jm θ ⎬
⎪
⎪ ⎪
⎩ ⎭
where ns is the number of clusters seen, and nsk is the number of clusters
seen on line k.
• Binomial variance of detection process. This variance estimator
should only be used when the entire study area is sampled (as happens
sometimes, for example in simulation experiments). It only contains
the term for uncertainty due to estimating the detection function
parameters – i.e., it assumes no variance comes from scaling up the
estimated density on the surveyed area to the whole study area. The
formula is:
2 ⎧ ns
∂Nˆ cs ∂Nˆ cs −1 ˆ ⎫⎪
( ) ()
r r
⎛ A ⎞ ⎪
vâr Nˆ s = ⎜ ⎟
⎝ 2 wL ⎠
⎨ ∑
⎪⎩ i =1
fˆ (0 | z i )2 − Nˆ cs + ∑∑ ∂θˆ
j =1 m =1
ˆ
j ∂θ m
H jm θ ⎬
⎪⎭
138 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
Sample Definition in MRDS Analysis
This is implemented in exactly the same way as for CDS and MCDS analyses -
see Sample Definition in CDS Analysis in Chapter 8 of the Users Guide for
details.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 139
provide the detection function fit again, and the R objects will be saved (in the R
object file .RData in the R Folder – see Contents of the R Folder in Chapter 7 for
more on this file).
Now we’ll apply this detection function to the new subset of data. Define a new
data filter that selects the new subset of data. Note that it must be a subset of the
data used to fit the detection function. Define a new Model Definition, and in
the Estimate tab, check the option to Estimate density / abundance and
also the option to Use the fitted detection function from previous
MRDS analysis. Select the ID of the analysis you want to use. For example,
if you want to use analysis 2 (called “FI - MR dist” in this case), the lower half
of the Estimate tab will look like this:
Applying the fitted detection function from one analysis to another analysis
When you run the new analysis, the probability of detection for each object in
the new analysis is estimated using the fitted detection function from analysis 2
(in this case).
140 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
function). You can use these files as templates for creating your own command
and data files.
To run the analysis from within the R GUI (Graphical User Interface), you can
cut and paste the commands from the file in.r. To run the analysis from another
program, you can call R in batch mode – this is achieved by calling the program
RCmd.exe, which is located within the /bin subdirectory of your R installation.
For more details, see the R for Windows FAQ (in R, type help.start() and
when a browser window opens, click on the FAQ for Windows port). For an
example of its use, see the Log tab of any MRDS analysis you have run that was
not in debug mode – you should see a line of the form:
Starting engine with the following command:
C:\PROGRA~1\R\rw1091\bin\Rcmd.exe BATCH C:\temp\dst90474\in.r
C:\temp\dst90474\log.r
Users familiar with R may wish to work inside the R GUI. The MRDS engine is
contained in the library mrds. To load the library from within R GUI, type
library (mrds)
All the functions in the mrds library are documented – the main functions are
ddf() (fits the detection function) and dht() (estimates abundance using the
Hortvitz-Thompson-like estimator). You can open a copy of the help files from
within Distance by choosing Help | Online Manuals | MRDS R Engine
Help (html).
The next topic describes how to check which version of the library is being used.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 141
new version or reporting a problem you may want to check which version of the
library is currently in use. To do this, re-run an analysis that uses the MRDS
Engine (such as one from the Golftees sample project) and look in the Log tab
for the line
> library(mrds)
After it, you should see a line which looks something like the following:
This is mrds 1.2.7
Built: R 2.3.1; i386-pc-mingw32; 2006-08-09 17:33:03; windows
If you are reporting a problem you should quote both the build number (in the
above case 1.2.7) and the build date and time (2006-08-09 17:33:03).
The previous topic describes how to update to a newer version of the MRDS
Engine, if one is available.
When reporting results, you may want to cite the exact version (i.e.,
build number) of the library that used in the analysis. This is stored in the Log
tab, as outlined above.
In most cases, the default options for estimation of detection function parameters
in an MRDS analysis work well. However, there are times when you need to
tweak the analysis, for example by setting starting values or bounds on
parameters. You may also want to get more information on the fitting process,
such as parameter estimates at each iteration of the optimization. This kind of
fine-tuning is specified in the Detection function | Control page of the
model definition.
To enter options on this page, type them in as a comma-delimited list. Any
noninteger numbers should have a decimal point separating the integer and
fractional parts – e.g., 38.98. You can also use engineering notation – e.g.,
3.898E1. For some options (e.g., starting values), you need to specify a vector
of numbers. To do this, write them out as a comma-delimited list prefixed by c(
and suffixed by ) – e.g., c(4.7, 0.1, 0.2). An example of a control list is:
showit=T, doeachint=T, lowerbounds=c(0,0,0)
The options vary slightly depending on which detection function method is being
fit (ds, io, etc). More details are in the mrds R library online help, under the
appropriate ddf function (ddf.ds, ddf.io, etc). However, in general, the options
are as follows.
• showit – F (false, the default) or T (true); if true gives output at each
iteration of the fit
• doeachint – F (false, the default) or T (true); if true uses numerical
integration rather than an interpolation method during fitting
• estimate – T (true, the default) or F (false); if false fits the detection
function model but doesn’t estimate predicted probabilities.
• refit – T (true, the default) or F (false); if true, the algorithm tries
multiple optimizations at different starting values if it doesn’t converge
142 • Chapter 10 - Mark Recapture Distance Sampling User's Guide Distance 6.0 Beta 5
• nrefits – integer number – controls the number of refitting attempts
• initial – a vector if initial values for the parameters
• lowerbounds – a vector of lower bounds for the parameters
• upperbounds – a vector of upper bounds for the parameters
• limit – T (true, the default) or F (false); if true then restricts analysis
to observations with detected=1 (this option is ignored if fitting method
= ds and there is no detected field)
You can also fit CDS and MCDS models (i.e., models that assume all
animals on the trackline are detected) to double observer data by choosing the ds
detection function method. When you run such an analysis, Distance will pool
the data from the two observers, so that the data are the total number of unique
detections.
User's Guide Distance 6.0 Beta 5 Chapter 10 - Mark Recapture Distance Sampling • 143
Chapter 11 – Density Surface
Modelling
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 145
As with the MRDS engine, the DSM engine is implemented as a library in the
free statistical software R. When you run a DSM analysis from Distance,
Distance creates a sequence of R commands, calls the R software, waits for the
results and then reads them back in. Therefore, before you can use the DSM
engine, you must first ensure that you have R correctly installed and configured.
For more on this, see R Statistical Software in Chapter 7 of the Users Guide.
To produce a density surface model in Distance, you then need to set up the
project appropriately and include data in the correct format – see Setting up a
Project for DSM Analysis. You must next create one or more model definitions
using the MRDS analysis engine, and associate these model definitions with
analyses to derive detection probabilities for each objected detected. For more
about the basics of setting up analyses, see Chapter 7 - Analysis in Distance.
More details of the various models available in the MRDS engine are given in
Defining MRDS Models, and a detailed description of the options available in
the Model Definition Properties pages for this engine is given in the Program
Reference pages Model Definition Properties - MRDS. After deriving detection
probabilities, then a density surface model can be fitted. In addition, you must
also create a prediction grid that contains values of the covariates used for spatial
prediction at a grid of locations throughout the study region, not only along the
surveyed line transects. This prediction grid must be geo-referenced, and read
by Distance in a particular fashion to take advantage of the spatial data contained
therein. See Producing a prediction grid in GIS for further details.
In this chapter we also provide some analysis guidelines, give a list of the output
the engine can produce and cover various miscellaneous topics.
If you are familiar with the R software, you can run the DSM engine
directly from within R, bypassing the Distance interface altogether. For more
information, see Running the DSM Analysis Engine from Outside Distance.
146 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
• In Step 3, under Observer configuration, select Double observer.
But see also Single Observer Configuration in the MRDS Engine.
• Follow through the rest of the wizard as usual.
Distance then creates the appropriate data fields for double observer data, and
you can then import your data using the Import Data Wizard. For more about
the data requirements, see Setting up your Data for MRDS Analysis.
Alternatively, you can create the appropriate fields by hand, and manually create
a new survey object with the appropriate observer configuration and data files.
For more about survey objects, see Working with Surveys During Analysis in
Chapter 7.
Line transect after segmentation with individual segments of length li and truncation
distance w.
We will not be concerned with strata that may arise in some distance sampling
applications. The sampling of our study region (otherwise know as the 'global
layer' in Distance) consists of transects. Each transect is composed of segments
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 147
(thanks to the segmentation effort we performed previously). Within each
segment we may have detections of the objects we are studying.
As you know from your previous work with Distance projects, each layer of a
Distance project contains multiple fields. You will want to populate the
Observation layer with data that you think may be influential in modelling the
detectability of objects (e.g., observer, cluster size, etc.). This is exactly the type
of modelling you have done with previous versions of the Distance software.
You will also wish to populate the segment layer with data that you will wish to
use in your spatial modelling (e.g., latitude, longitude, soil depth, prey biomass,
etc.). Note these data are specific to the segment, so you will wish to think
carefully about how to integrate data that is defined at a point so that it will be
relevant at the spatial scale of a segment.
This represents 5 transects within the study area (labeled A through E), with
Transect A comprised of 3 segments, Transect B comprised of 4 segments, C
comprised of 2 segments, D having 3 segments, and E also having 3 segments).
By coincidence there were also 15 observations (2 in segment 1, none in
segment 2, 1 each in segments 3, 4 and 5, none in segment 6, 2 in segment 7,
none in 8, 2 in 9, 1 in 10, 2 in 11, none in either 12 or 13, one in 14 and finally 2
in 15). Note, this depiction omits the covariates at either the segment or
observation layer. If there are a large number of covariates included for
consideration in modelling either the detection function or the response surface,
the amount of data to bring into program Distance could be considerable.
148 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
We advocate the use of importing data by layers, which in this case might
consist of 3 files (one for each layer ''transect.txt'', ''segment.txt'', and
''observation.txt''). ''transect.txt'' would contain the transect label and its length,
''segment.txt'' would contain the label of the transect to which each segment
belonged, along with segment-specific data such as segment length, latitude, and
longitude mentioned earlier. Finally ''observation.txt'' would contain the
identifier of the segment in which the detection took place, plus items such as the
perpendicular distance of the detection, cluster size, and other data associated
with detectability.
The observation layer file will need to contain three other fields beyond those
already mentioned. If you are working with double platform (MRDS) designs,
you will already include them in your data. If you are unfamiliar with double
observer designs, and the letters MRDS mean nothing to you, then consult the
section of this users guide regarding the MRDS engine. Setting up your Data for
MRDS Analysis. There are two fields that will take the value '1' for each
detection. They will be called 'Observer' and 'Detected' when they are imported.
Finally the last field in this layer will be called 'Object' and it will contain a
unique number for each object detected. However see also Single Observer
Configuration in the MRDS Engine regarding elimination of these fields in
favour of a simpler approach.
For each of these cells you will need to populate an attribute table with all of the
covariates you specified at the segment layer of your Distance project and that
are included in the density surface model you wish to use for prediction. So,
assuming you are using ArcGIS as your GIS software, perform the following:
• In ArcGIS, create a shape file of type point; containing covariates
of interest; clipped to the extent of the study region,
• Compute the area of the cells (boundary cell sizes can be ignored
for sufficiently dense prediction grid spacings relative to the size of
the study region),
• Open attribute table of this object, and create a new field called
“LinkID” of type "Number" and width 16.
• use the “Calculate Values” tool to fill that attribute field with a
formula equivalent to the value of the FID field plus 1. This is
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 149
accessed by highlighting the newly-created field and pressing the
right mouse button
o Having made this modification the .dbf file you have worked
with will also be modified (without explicit saving by you)
• Export the attribute table to ASCII. This file will be imported into
the Distance project via the data import wizard in due course.
150 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
• Copy the shapefile (and companion files) manufactured by your
GIS work with the prediction grid into the Distance project data
folder, renaming them identically with the name of the coverage
grid data layer you have just manufactured.
• In this fashion, you have swapped the GIS-generated shapefile for
the artificially manufactured coverage grid layer.
• Check to see that in the prediction grid layer, you have values in
the shape (type point) fields: these are the coordinates of your
prediction grid cells. Also make sure you have values in all rows
for the attribute fields you have imported. As an extra check of the
integrity of the prediction grid layer, you can ask Distance to map
the prediction grid cells, by using the mapping function of
Distance.
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 151
that predict response at the segment level because there may be multiple
observations within a given segment.
Formulae consist of a series of terms joined by operators such as “+”. The terms
represent covariates and the operators tell Distance how the covariates relate to
one another.
For example, the formula
latitude + longitude + depth
means include the data from the fields latitude, longitude and depth as
covariates.
The names of predictor covariates are possible transformations of the field
names in the Distance project database. See the section Translating Distance
Fields into DS and MR Covariates. The concepts of factor covariates (Factor
and Non-factor Covariates in MRDS) also apply to the use of covariates for
density surface modelling. Operators for formulae are the same as for MRDS
modelling (); however there is the additional smoother operator s, that can
operate either in a univariate (s(depth)) or bivariate (s(latitude, longitude))
manner. The smoother operator will fit a nonlinear spline. See Wood (2006) for
details on fitting of GAMs to data.
These methods are relatively new, so we are only beginning to gain experience
on effective analysis strategies. Some preliminary guidelines follow – please let
us know if you can suggest some further guidelines or amendments.
Producing point and interval estimates of abundance in a study region using the
DSM engine requires four steps
• Modelling the detection function,
• Modelling the estimated segment-specific abundance as a function
of covariates,
• Extrapolation, courtesy of the model, from the covered region of
the study region to the unsurveyed portion of the study region
(given measures of the predictive covariates throughout the study
region). This is termed the prediction step, and
• Producing variance estimates (and confidence limits) using
parametric bootstrapping techniques.
We will not discuss the modelling of the detection function, as that is
discussed elsewhere (Modelling the Detection Function). The remaining
steps however, are unique to the density surface modelling, and receive
attention herein.
152 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
for each object (individuals or cluster). The level at which the segments are
represented in the Distance project is also required.
Specification of the form the relationship between the response and the
covariates include the following:
• Generalized linear model or generalized additive model
framework,
• Link function,
• Offset term (if necessary),
• Form of the error distribution, and
• Weighting factor for observations (if necessary);
along with the model formula that uses the operators described in Specifying
DSM Model Formulae.
Modelling involves fitting a series of models to the data, and employing model
selection to compare them. For GAMs, the model selection diagnostic metric is
the generalized cross-validation (GCV) score. Models with small GCV scores
tend to do best at predicting response when presented with a new set of
covariates, and this is our objective when modelling density surfaces.
The number of knots in a GAM smooth (governing the wiggliness of the
predicted relationship) is estimated from the data, with the default for a
univariate smooth being 10 knots, and 30 knots for a bivariate smooth. The
number of knots can be fixed by the user by stating s(latitude, k=5,
fx=TRUE) as part of the model formula, where k is the number of knots.
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 153
inside/outside of the region). The polygons may be either part of
the survey in the Distance project that gave rise to the sightings
(such as strata), or some other polygon contained within the
Distance project (an area of special interest such as a biological
reserve, within the study region). The default is aggregation over
only the study region.
154 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
• a log of the analysis, highlighting any possible problems, in the
Log tab of the Analysis Details window. For information about
troubleshooting problems, see Chapter 12 - Troubleshooting.
• (optionally) plots from the results details can be imported into other
programs.
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 155
Bootstrap variance computations: outlier removal
• Describes result of outlier removal (percent of replicates meeting
the outlier criterion and list of those replicates values removed)
Response surface/Variance plot: bootstrap distribution
• Histogram showing distribution of bootstrap replicates after outlier
removal
Bootstrap confidence interval for abundance within study area
• Percentile confidence limits incorporating only uncertainty
associated with the density surface modelling,
156 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
Unlike CDS analysis, the Model Definition does not offer any regression
methods for dealing with size bias. If you suspect size bias is a potential
problem, the appropriate way to deal with it in an MRDS analysis is to include
cluster size (or some transformation of cluster size) as a covariate in the
detection function model(s).
The cluster size field is one of the fields with a fixed name in
detection function formulae in DSM (see Translating Distance Fields into DS
and MR Covariates) – in formulae you should use the name size regardless of
the actual field name.
Users familiar with R may wish to work inside the R GUI. The DSM engine is
contained in the library DSM. To load the library from within R GUI, type
library(dsm)
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 157
All the functions in the dsm library are documented – the main functions are
dsm.fit() (fits the density surface model) and dsm.predict() (produces the
estimated response across the prediction grid). You can open a copy of the help
files from within Distance by choosing Help | Online Manuals | DSM
Engine R Help (html).
The next topic describes how to check which version of the library is being used.
After it, you should see a line which looks something like the following:
This is dsm 1.0
Built: R 2.5.1; ; 2007-07-20 14:41:38; windows
If you are reporting a problem you should quote both the build number (in the
above case 1.0) and the build date and time (2007-07-20 14:41:38).
The previous topic describes how to update to a newer version of the DSM
Engine, if one is available.
When reporting results, you may want to cite the exact version (i.e.,
build number) of the library that used in the analysis. This is stored in the Log
tab, as outlined above.
158 • Chapter 11 – Density Surface Modelling User's Guide Distance 6.0 Beta 5
Fine-tuning a DSM Analysis
In most cases, the default options for estimation of detection function parameters
in a DSM analysis work adequately. If you are a seasoned veteran in the use of
mgcv() for fitting density surface models, you may wish to access some of the
inner levers and knobs. This kind of fine-tuning is specified in the Detection
function | Control page of the model definition.
To enter options on this page, type them in as a comma-delimited list. Any
noninteger numbers should have a decimal point separating the integer and
fractional parts – e.g., 38.98. You can also use engineering notation – e.g.,
3.898E1. For some options (e.g., starting values), you need to specify a vector
of numbers. To do this, write them out as a comma-delimited list prefixed by c(
and suffixed by ) – e.g., c(4.7, 0.1, 0.2). Consult the R documentation for
detailed instructions regarding the use of gam.control() features.
User's Guide Distance 6.0 Beta 5 Chapter 11 – Density Surface Modelling • 159
Chapter 12 - Troubleshooting
Known Problems
A list of known problems at time of release is in the file ReadMe.rtf, in the
Distance program directory.
For a more up-to-date list, see the Support page of the Distance web site (you
can access the web site from Distance by choosing Help | Distance on the
web | Distance home page…).
Run-time errors are rather more common in the MCDS engine. If you
are using Windows NT/2000/XP, then Distance saves a copy of the command-
line output to the Log file. This can be useful in diagnosing the source of the
problem, so you should make a note of what is recorded there when discussing
the problem with the program authors.
If you are using Windows 98/ME, you can still access the command-line output,
by running the analysis in debug mode within the interface, and then running the
analysis using the analysis engine from the Windows command line. For more
information, see Running the MCDS Engine in the Appendix on the MCDS
Engine Command Reference.
Problem Solution
Plots cannot be viewed or are poor quality Change image format or image properties
in Results tab of Analysis Details. - see Images Produced by R.
Stopping an Analysis
On occasion, you will want to stop an analysis that is running. For example, you
may have set off a long bootstrap analysis by mistake. Or perhaps the analysis
engine has locked up.
• To stop an analysis from the Analysis Browser, click on the Reset
Analysis button .
• To stop the analysis from the Input tab of the Analysis Details
window, click on the Stop button (which replaces the Run button
when an analysis is running).
GIS Problems
If you are experiencing strange behaviour with a project that contains geographic
data, it could be because the GIS data is invalid in some way. Symptoms include
maps that are blank or for which Full Extent button in the Map Window
doesn’t seem to work properly, error messages when generating a grid layer or a
design.
After you have finished the wizard, and the project has been created,
you can turn a non-geographic project into a geographic one by choosing File |
Project Properties … and ticking the check box “Project can contain
geographic information” on the Geographic tab.
You can change the measurement units later in the Data Explorer –
double-click on the 4th header row of the field you want to change and select
from the list of units.
You can measure data in one set of units, and output analysis results in
another set. The units of measurement are specified here and in the data sheet.
The units of analysis are specified in the Units tab of the Data Filter. See Data
Filter Units in the Program Reference for more information.
You can also define multipliers after the project has been created,
using the Append Field button in the Data Explorer. This gives you the
ability to define as many multipliers as you like. However, if you add the
multipliers in the Setup Project Wizard, Distance will automatically include
them in the default Model Definition. These issues are discussed in more detail
in the Users Guide page on Multipliers in CDS Analysis.
• Distance 3.5.
Under Files of type, specify “Distance 3.5 project files”, and select
the project file you wish to import.
Distance imports the project settings, and uses them to create a Survey
object. It will also create the appropriate data structure and import the
survey data. Distance will not import the Data Filters, Model
Definitions or Analyses – you will have to recreate these again
manually.
• Distance 4
Under Files of type, specify “Distance 4 project files”, and select the
project file you wish to import. Distance will import all of the
information from the old project, including project settings, data, maps,
designs, surveys and analyses.
You can only access the Data Entry Wizard if you have a simple
data structure, with a single global, stratum, sample and observation layer. In
other cases, you must manipulate your data using the Data Explorer.
The introductory screen in the wizard provides you with an overview of the 4
data layers (Global, Stratum, Sample and Observation) in your project. If you
want to find out more about the way that Distance stores survey data, you should
read the Users Guide Chapter 5 - Data in Distance.
As with the other wizards in Distance, you navigate through the wizard by
pressing the Next and Back buttons on the navigation bar (or pressing Alt-N
and Alt-B).
If you do not want to see the Data Entry Wizard introductory screen again
then tick the box Don’t show this introductory screen again.
If you commonly use this delimiter then on the Finished page, choose
the option to Save current settings as default, and your delimiter will
become the default.
Before opening the Import Data Wizard, you can move fields in the
Data Explorer by dragging and dropping them. You will also need to create any
new fields before starting the Wizard.
First row contains layer names and field names of each column
In many database and spreadsheet packages, you can specify the contents of the
first row when exporting data. To use this shortcut, you will need the first row
to contain both the layer name and field name for each column, separated by
some delimiter. For example, first row of the column corresponding to the field
“Area” in the data layer “Region” would be “Region*Area”, assuming that “*” is
the delimiter used. Possible delimiters are: * | _ - and . (i.e., a full stop or
period).
You can easily cut and paste from the comments box - right click on
the box to see a pop-up list of options.
Preferences Dialog
The Preferences dialog lets you set options that relate to the behaviour of
Distance across all projects on this computer.
Data Explorer
The Data Explorer is the main interface for viewing and manipulating data in
Distance. You access it via the Data tab of the Project Browser.
An alternative data interface is the Data Entry Wizard, which can be accessed
from the Setup Project Wizard or by choosing Tools | Data Entry Wizard.
The Data Entry Wizard has a more restricted interface, and can only be used
with a simple data structure. For more information, see Data Entry Wizard in
the Program Reference.
You cannot use the Data Explorer effectively until you understand the way that
survey data is stored in Distance – make sure you’ve read the Chapter 5 - Data in
Distance in the Users Guide before you continue!
The Data Explorer is split into three sections, the Toolbar, the Data Layers
Viewer and the Data Sheet.
The Toolbar functions are summarized below. The other sections of the
Explorer are discussed on the next pages.
If you delete a data layer, all data in the layer and in all child
layers are lost. There is no undo button!
When you delete a row from a higher data layer then the
corresponding rows from lower data layers also get deleted!
If you want to know more about the Data Explorer, proceed to next page on the
Data Layers Viewer.
Example of the Data Layer Viewer. In this case the project has 4 data layers.
The Data Layer Viewer appears on the left in the Data Explorer. It presents a
hierarchical view of the data layers in your project (see Chapter 5 - Data in
Distance in the Users Guide for a discussion of the data layers). The icons by
the data layer names indicate the Data Layer Type (see List of Data Layer Types
in Chapter 5 of the Users Guide for a complete list).
Clicking on a data layer in the viewer shows the data for that layer in the Data
Sheet, as well as data for all higher layers. When the Data Explorer first opens,
only the Global Layer is displayed:
Part of the Data Explorer from the Stratify example project when first opened
If you click on the stratum data layer icon (in this case the stratum layer is called
“Region”), the stratum data appears beside the global data:
You can compress the fields so that only the Label and ID column from the
Global layer appears by clicking on the button on the toolbar (see Toolbar,
above):
Similarly, if you click on the sample data layer icon (in this case the sample data
layer is called “Line transect”), the sample data appears beside the stratum data.
Because the Compact View button is enabled, all fields in the Stratum data layer
(Region) except the label and ID fields disappear from view:
Part of the Data Explorer from the Stratify example project after the Sample icon in the
Data Layer Viewer has been clicked
Why go to all the trouble of selecting different data layers - why not just click on
the observation layer and open up the whole data sheet at once? When all the
data layers are open, the Data Sheet can become cluttered. It is often simpler, for
example, if you want to see how many strata there are, to only open the strata
data layer, and leave the sample and observation layer hidden.
If you want to learn more about the Data Explorer, you should read the next
page, which is about the Data Sheet.
Data Sheet
(For an overview of the data explorer, see the Program Reference page Data
Explorer.)
The Data Sheet is intended to be as intuitive to use as any spreadsheet grid.
However, the hierarchical nature of the data (observations within transects
within strata within global) imposes some restrictions. In the following three
pages, we describe how to perform the common tasks associated with
manipulating survey data in the Data Sheet:
• Navigating Around the Data Sheet
• Editing, Adding, and Deleting Records
• Editing, Adding and Deleting Fields
You can easily copy your data from the data sheet to a spreadsheet or
database package. Click on the Data Explorer to give it focus and then click on
the Copy to Clipboard button on the main toolbar, or choose the Data
Explorer menu item Copy Data to Clipboard. In your spreadsheet or
database, choose Paste. You can use this Copy to Clipboard facility, together
with the Data Import Wizard to provide a crude Import/Export facility for your
data.
Sometimes you may run into problems pasting the data into your target package.
This is usually caused by the symbol used by Distance to signify an end-of-row
– you can change this in the General tab of the Preferences window.
You can also move around by clicking on the grid and then moving
the mouse while holding the left mouse button down. In this mode, if you move
past the end of the visible grid, it scrolls for you.
You can tell whether you can edit a cell because the Focus rectangle
(dashed box that shows you which cell is currently selected) turns from light
dashes to heavy dashes when you can edit the cell.
When you enter data into a field some data validation takes place to
check that the data is of the correct type. This prevents you, for example, from
entering text into a decimal or integer field, or entering decimal points into an
integer field. Most cells must contain some value and cannot be left blank.
When you are in edit mode, the cell behaves just like any other text
box. For example, you can right-click to bring up a pop-up menu containing
useful commands such as Cut and Paste. You can also use the usual keyboard
shortcuts (Ctrl-Insert, Shift-Insert and Shift-Del for Copy, Paste and Cut).
Shift+Enter takes you out of edit mode and on to the first field of the
next record for that layer. Its designed to help with data entry - try it!
Adding Records
The Data Sheet is a representation of the underlying database, not a simple
spreadsheet. Therefore, you cannot edit a cell in the Data Sheet unless there is a
corresponding record in the database. For example, consider the following data
sheet:
A record has been inserted with ID of 4. Notice that the focus rectangle is now
heavy too, indicating that the value in the cell is editable. We enter the value
0.53 and hit Enter:
Notice that the value of the record in the cell that previously had focus has been
copied into the new record. We can now enter the next value for distance, 1.98
and hit Enter:
We could continue this process until all the observations have been entered.
In the text field, we type in 10: , and then click on either of the two
buttons to the right of the up and down arrows (again, because there are no
records associated with the cell, it doesn't matter which we press). Voilla!:
This example used the Observation data layer, but of course it could have been
done with any other layer.
Deleting Records
To delete a record, highlight the record in the Data Sheet and press the Delete
Current Record button. You can only delete one record at once.
If you delete a record from the sample data layer, all of the
observations associated with that sample will be deleted. Similarly, if you delete
a record from the transect data layer, all of the samples and observations will be
deleted. Remember, there is no undo button, so you may want to consider
Backing up first!
To find out more about the Data Explorer, consult the last page in this chapter,
Editing, Adding and Deleting Fields.
Adding Fields
There are many reasons for adding fields to the Distance database, beyond those
provided by default. You may want to add extra multiplier fields in the global
layer (see Multipliers in CDS Analysis in Chapter 8 of the Users Guide). You
may want to add a field that will be used for post-stratification (see Chapter 8 of
the Users Guide:Stratification and Post-stratification). You may want to add a
column that defines a subset of the data that you will use to select data in a Data
Filter, such as different species or years of data (see the Data Selection Tab of
the Data Filter entry in the Program Reference). You may want to add fields for
covariates in MCDS analyses, or you may be setting up a complicated data
structure by hand.
To add a field to the data sheet, click on any cell in the appropriate data layer to
give that layer focus. The click on either the Insert Field or Append Field
button. Insert Field puts the new field where the current field is, and moves the
current field to the right. Append Field places the new field after the current
one.
After clicking the button, a small window will appear prompting you for the new
field's name, Field Type and Units.
New fields are automatically filled with default values in the Data Sheet.
Deleting Fields
Map Browser
The Map Browser allows you to create, sort, rename, delete and preview maps of
the geographic data in your Distance project. It is accessed via the Maps tab of
the Project Browser. The Map Browser is only accessible if the project is
geographic. For more information about geographic data in Distance, see
Geographic (GIS) Data in Chapter 5 of the Users Guide.
The Map Browser comprises a table showing a list of the maps that have been
created, and optionally a preview pane, which gives a preview of the map that is
currently selected. You can change the size of the preview pane by dragging the
bar between the map table and preview pane.
For maps containing many shapes, the preview pane can take a while
to draw. In these cases, it’s better to hide the preview pane (click the button).
To create a new map, click the button, or choose Maps | New Map. To
rename the map, double click on the name in the map table, and type the new
name. To view the map, so you can add layers, etc, select the map in the map
table and click , or choose Maps | View Map.
Toolbar
You can resize the panes by dragging the bar that divides them.
If you hold your mouse over a column header for a few moments, a
small window pops up giving you an explanation of that column. This also
works if you hold your mouse over a survey, data filter or model definition
number: a window pops up giving you the name that corresponds with the
number.
Designs can be grouped into Design Sets. A Design Set is a group of related
designs – you are free to create, delete and rename sets, and choose which
designs to group together. The current set name is listed after the word “Set:” on
the design browser toolbar, and you can access a list of sets by clicking on the
down arrow beside the current set name. You can create, delete and move sets
using the buttons to the right of the current set name.
For the New Design, Delete Design and Design Details buttons, you
can work with more than one design at once by highlighting multiple designs in
the browser. To highlight more than one design, either:
(i) Hold the Ctrl key down and click on each design to highlight them.
(ii) Hold the Shift key and click on two non-adjacent designs to select all designs
in between them.
(iii) Hold your mouse button down and move it over the designs you want to
highlight, if they are adjacent.
(iv) Hold the Shift key and use the up or down keys to extend the current
highlighting.
Sorting designs
To sort your design by any column, just click on the column header. One click
sorts the column in ascending order, while another clicks sort in descending
order. A little red arrow tells you which column is currently being used as the
sort column, and whether it is an ascending or descending sort.
You can resize the panes by dragging the bar that divides them.
If you hold your mouse over a column header for a few moments, a
small window pops up giving you an explanation of that column. This also
works if you hold your mouse over a survey, data filter or model definition
number: a window pops up giving you the name that corresponds with the
number.
Surveys can be grouped into Survey Sets. A Survey Set is a group of related
surveys – you are free to create, delete and rename sets, and choose which
surveys to group together. The current set name is listed after the word “Set:” on
the survey browser toolbar, and you can access a list of sets by clicking on the
down arrow beside the current set name. You can create, delete and move sets
using the buttons to the right of the current set name.
The new survey is based on the one that is currently selected in the
Survey Browser.
For the New Survey, Delete Survey and Survey Details buttons, you
can work with more than one survey at once by highlighting multiple surveys in
the browser. To highlight more than one survey, either:
(i) Hold the Ctrl key down and click on each survey to highlight them.
(ii) Hold the Shift key and click on two non-adjacent surveys to select all
surveys in between them.
(iii) Hold your mouse button down and move it over the surveys you want to
highlight, if they are adjacent.
(iv) Hold the Shift key and use the up or down keys to extend the current
highlighting.
Sorting surveys
To sort your survey by any column, just click on the column header. One click
sorts the column in ascending order, while another clicks sort in descending
order. A little red arrow tells you which column is currently being used as the
sort column, and whether it is an ascending or descending sort.
You can resize the panes by dragging the bar that divides them.
If you hold your mouse over a column header for a few moments, a
small window pops up giving you an explanation of that column. This also
works if you hold your mouse over a survey, data filter or model definition
number: a window pops up giving you the name that corresponds with the
number.
The new analysis is based on the one that is currently selected in the
Analysis Browser (Half-normal / hermite in the picture above - when more than
one are selected, look for the dashed focus rectangle around the one that has
focus)
For the New Analysis, Delete Analysis and Analysis Details buttons,
you can work with more than one analysis at once by highlighting multiple
analyses in the browser. To highlight more than one analysis, either:
(i) Hold the Ctrl key down and click on each analysis to highlight them.
(ii) Hold the Shift key and click on two non-adjacent analyses to select all
analyses in between them.
(iii) Hold your mouse button down and move it over the analyses you want to
highlight, if they are adjacent.
(iv) Hold the Shift key and use the up or down keys to extend the current
highlighting.
Some columns, such as AIC and Delta AIC are special. When you
click on these columns, your analyses are sorted by Data Filter first, and then
AIC or Delta AIC column second. Why? AIC stands for Akaike's Information
Criterion - an index of the relative fit of competing statistical models. The lower
the AIC, the more parsimonious the model (other things being equal – look for
AIC in the Distance Book). Delta AIC is the difference in AIC between the
model with the lowest AIC and the current model. However, it only makes sense
to compare AIC values, calculate Delta AIC and rank the analyses based on
models fit to the same data. Sorting by the data filter column first ensures that
this happens.
AIC is not the only model selection criterion that can be used. Distance also
provides the following columns: AICc (“corrected” AIC), BIC (Bayes
Information Criterion) and LogL (the log-likelihood). For more information
about these criteria, and model selection in general, see Burnham and Anderson
(2002).
Map Window
Map windows provide a view of the geographic data layers in a project. You can
customize the map by choosing which data layers to include, and by panning and
zooming around the map area. Any changes you make to a map are saved when
you close the map.
You create maps in the Map Browser (see Program Reference, Map Browser for
details). From there, click on the View Map button to open the Map window.
You can have more than one map open at a time.
The map window is split into two panes. On the left is a pane containing map
tools (currently just the layer control), and on the right is the map itself.
Several features of the map window are not yet implemented. These
include: the Info, Find and Spatial select map tools; the ability to customize the
properties of each data layer, such as its colour, and add legends. We expect to
implement these features in future releases.
Layer control
The layer control displays a list of the data layers currently shown on the map,
with a legend showing the symbol used to display shapes on that layer. There is
a tick box where you can turn off display of that layer. You can change the
ordering of layers by clicking on a layer and dragging it above or below another
layer.
Map tips
Map tips is a popup window that appears when you hover over a map feature,
giving information about the feature. To enable map tips, click on the Map
Tips button on the toolbar. A second row of tools opens on the toolbar,
prompting you for the Map Tip Layer and Field. Select from the list of layers
and fields, and then position the cursor over a feature on the map. The value of
the selected field in that position will appear. For example, if you select a
The use of map tips can significantly slow down the map display, so
turn them off when you’re finished!
The Design Details window is divided into three tabs: Inputs, Log and Results.
The tab that Distance first displays when you open an Design Details window
depends on the status of the design. For designs that have not been run (grey
status light in Design Browser), it opens the Inputs window. For designs that ran
with warnings or errors (amber or red), it opens the Log tab. For designs than
ran OK (green) it opens the Results.
You can give yourself more room by resizing the comments section.
Put your mouse just above the Comments section header and dragging the
section up and down. You may want to increase the height of the whole Design
Details window (by dragging on its border) before you do this.
You can give yourself more room by resizing the comments section.
Put your mouse just above the Comments section header and dragging the
section up and down. You may want to increase the height of the whole Survey
Details window (by dragging on its border) before you do this.
The central window lists the Data Filters that are available for you to choose
from. The one selected for the current analysis is highlighted on the list.
Choosing a different Data Filter for your analysis
If you want to choose another data filter for this analysis, click on the data filter
you want. If you have results already for your analysis, Distance issues a
warning that they will be deleted. This is because your results were generated
with the old Data Filter and so will not correspond to your new choice. If you
want to do a new analysis but keep your old results, then you need to go back to
the Analysis Browser, click on the new analysis button and select the new data
filter in Analysis Details window for the new analysis.
Making a new Data Filter
To make a new Data Filter press the New… button. A new filter will be created
and appended to the current list. The new data filter is based on the data filter
you have highlighted in the central window when you press the new button. The
Data Filter Properties window is then opened up, so you can edit this new filter.
Scenario: Imagine you have run an analysis, and now want to try
another analysis, but with just one part of the Data Filter changed (say a different
truncation distance). Highlight the analysis you just ran in the Analysis Browser
and click the New Analysis button. A new analysis is created, based on your
current one. Now click the Show Details button to open an Analysis Details
window. The old Data Filter will already be highlighted, so click the New…
button to make a new Data Filter based on the old one. Make the changes in the
Data Filter Properties and press OK to return to the Analysis Details window.
Then click Run to run the analysis. Easy, eh!
Click Properties… if you just want to view the properties for this
Data Filter, rather than edit them, and then press Cancel in the Data Filter
Properties window to return without saving any changes.
The central window lists the Model Definitions that are available for you to
choose from. The one selected for the current analysis is highlighted on the list.
Choosing a different Model Definition for your analysis
If you want to choose another Model Definition for this analysis, click on the
Model Definition you want. If you have results already for your analysis,
Distance issues a warning that they will be deleted. This is because your results
were generated with the old Model Definition and so will not correspond to your
new choice. If you want to do a new analysis but keep your old results, then you
need to go back to the Analysis Browser, click on the new analysis button and
select the new Model Definition in Analysis Details window for the new
analysis.
Making a new Model Definition
To make a new Model Definition press the New… button. A new filter will be
created and appended to the current list. The new Model Definition is based on
the Model Definition you have highlighted in the central window when you
press the new button. The Model Definition Properties window is then opened
up, so you can edit this new filter.
Editing the Model Definition
Click the Properties… button. The Model Definition Properties window
appears. Make any change you want in the Model Definition Properties and then
press OK to return. Distance will warn you if you the Model Definition is
associated with any analyses that have already been run.
Click Properties… if you just want to view the properties for this
Model Definition, rather than edit them, and then press Cancel in the Model
Definition Properties window to return without saving any changes.
You can give yourself more room by resizing the comments section.
Put your mouse just above the Comments section header and dragging the
section up and down. You may want to increase the height of the whole Analysis
Details window (by dragging on its border) before you do this.
You can change the default tab for analyses that ran with warnings to
the Results tab. The option is under the Analysis tab of the Preferences
window. This option is useful if you are running analyses that regularly generate
warnings, but where you want to disregard the warning message.
The Log tab is split into two sections. The top contains the analysis log – a list of
the commands that Distance used to do your analysis, together with the message
that Distance sent back while executing the commands. The bottom section gives
a summary of any warning and error messages. You can resize the bottom
section by clicking just above it and dragging.
One common reason for analyses returning an error status is that the
data selection criteria in the Data Filter have been entered incorrectly. To check
what data are being sent to an analysis, tick the “Echo data to log” option in the
Analysis tab of the Preferences dialog.
To copy the log file to the clipboard, click the Copy to Clipboard
button on the main toolbar, or choose the Analysis - Log menu option Copy
Log to Clipboard.
You can change the font size of the Log text in the General tab of the
Preference dialog.
You can increase or decrease the font size of an individual results page
by right clicking and choosing the appropriate button. This is particularly useful
for the text-based cluster size regression plots, which don’t fit easily on a page.
You can choose Set current font as default to make the font size the default
for all results and log pages.
The statistics from any design class run include the minimum, maximum, mean,
and standard deviation of the coverage probability. For a survey plan the
statistics from the run include the number of points or lines, the maximum
possible area coverage, the realized area coverage, and the mean realized
proportion of survey area covered. Each statistic is the sum over all strata. If the
design is based on a line sampler then the statistic for the mean realized sampler
line length (mean of all strata) is also generated.
For survey layers containing multiple strata, you can allocate zero
effort to some of the strata. For those strata with zero effort, the design
properties will not be calculated during a design run, and no design will be
generated during a survey run. The sum of the effort over all strata should,
however, be greater than zero.
If you used Distance 3.5, then this tab plays a similar role to the
Modelling Types in Distance 3.5. The current setup is more flexible, however,
because you can define multiple surveys within a project.
You choose from data layer types, rather than layer names at this
stage because it is only when running the analysis that Distance combines your
data filter with a survey object to find out which data layers are to be used in that
run.
You then type the selection criteria in the space to the right of the layer type.
If you need more while editing a edit or view a long selection criterion,
click on the line you want to see more of and press SHIFT-F1 (i.e., the shift and
F1 keys) to open the Data Selection Zoom Dialog.
• FieldName is the name of the field that the criterion applies to. If
the field contains any spaces or punctuation then put it inside
square brackets – e.g., Cluster size becomes [Cluster size]
You can also manipulate the data using simple functions, such as:
• string functions LEFT, RIGHT, MID
• numerical functions INT, ROUND
For example:
LEFT (Observer,1) = ‘L’
INT(Distance)=0
Intervals Tab
See Data Filter Properties Dialog of the Program reference for an overview of
the Data Filter Properties dialog.
On this page, you specify whether you want your distance data analyzed as
interval data (as opposed to exact measurements).
You would use this option under two circumstances:
• Your data were collected in intervals. In this case you would set the
intervals here in the default Data Filter and leave them the same for
Do not use this option if you want to analyze the data as exact,
but want to specify intervals for the goodness-of-fit tests. You do this in the
Diagnostics page, under the Detection Function tab the Model Definition
Properties.
When you select interval data on this tab page, your truncation
options change on the Truncation tab page. By default, the data are truncated at
the upper and lower cutpoints you have selected. See the Truncation Tab page of
the Program Reference for more details.
Selecting the Automatic equal intervals option will give you less
flexibility in choosing truncation distances in the Truncation tab. So, even if you
have evenly spaced cutpoints, it is often better to use the Automatic equal
intervals option to speed entering the cutpoints (this way you only have to type
in the lowest and highest cutpoints), but to select the Manual option before going
on to the Truncation tab page.
Left truncation (i.e., discarding observations at less than a given distance) is less
common. In Distance you can choose a fixed distance for the left truncation.
The cluster size options are only relevant for CDS analyses, and
MCDS analyses where cluster size is not a covariate.
Data Filter Truncation Tab when data have been transformed into intervals in the
Intervals tab
By default, Distance will right truncate all observations that fall beyond your
upper interval cutpoint and left truncate all observations that fall within your
lower interval cutpoint. If you want to truncate further, you can right or left
truncate at any of the interval cutpoints by selecting from the drop-down lists.
For cluster sizes, Distance allows you to choose either the same truncation as
above, or to choose from one of your cutpoints.
Interval data - Automatic intervals
If you have specified automatic intervals in the Intervals tab page, then the right
and left truncation are set to the upper and lower cutpoints that you specified and
cannot be changed.
Units Tab
See Data Filter Properties Dialog of the Program reference for an overview of
the Data Filter Properties dialog.
Using the Units tab, you can convert between that the data are stored in (as
specified in the Data Explorer), and the units for reporting analysis results. You
can, if you want, report results in one unit of area (say) in one analysis and a
different set of units in another.
You specify layer types rather than layer names at this stage because
it is only when the analysis is run that the survey object is used to select the data
Sample definition
Here, you specify which sample or sub-sample layer to use as the sample, for
determining encounter rate variation and for bootstrapping. For more
information see Chapter 8 of the Users Guide, the section on Sample Definition
in CDS Analysis.
Quantities to estimate and level of resolution
These options define which quantities you wish to estimate, and at what level. If
you have selected No stratification in the Stratum definition section then the
Stratum column will be greyed out. Also, if your observations are individual
objects, not clusters, then the Cluster Size row will be greyed out. Lastly, if
you are doing an MCDS analysis, and have cluster size as a covariate, the
options here will look different (see below).
If you wish to estimate density, you should first check the boxes at the levels for
which you want density estimates. In most cases you will not have enough
observations in each sample to estimate density by sample, but this is not always
true. The lowest level of density dictates the level of estimation for encounter
rate, and the lowest level for estimation of the detection function parameters and
cluster size. After you have selected the level to estimate density, you can then
select one level to estimate the detection function and cluster size (if applicable).
Notice that the restrictions on the level of resolution of estimates are removed
when you are not estimating density.
If you are estimating density by stratum and also globally, you need to tell
Distance how to combine the stratum estimates together to make the global
estimate. For geographic strata, use the default settings:
If your strata are not spatial strata, for example time periods or different sections
of the population, then you should consider using the other options here;
however we recommend entering non-spatial data as columns in the appropriate
data layer and then using the Post-stratification feature to do the analysis
(Chapter 8 of the Users Guide in the section on Stratification and Post-
stratification has more information and further details of the scenarios when each
of the following options are appropriate).
For reference, the four possible options are:
• Global density estimate is Mean of stratum estimates, weighted by
Stratum area.
For MCDS analyses, it is possible to fit the detection function at one level and
estimate at a lower level. For example you can fit a global detection function
model, but estimate average f(0) and probability of detection separately for each
stratum. For details of why this may be useful, see Estimating the detection
function at multiple levels in Chapter 9 of the Users Guide.
Estimating detection function at multiple levels. Note that the detection function boxes
are checked both at the global and stratum levels.
The only situation where we recommend you select more than one detection
function model is where you are using bootstrapping to incorporate model
selection uncertainty in your variance estimate (see Model Averaging in CDS
Analysis, in Chapter 8 of the Users Guide). In this case, use the + button to add
more detection function models, and select from the options under Selection
among multiple models using. You can choose either AIC, AICc or BIC.
You can also select starting values for the key function and adjustment terms. To
do this, check the box Manually select starting values, and enter the
number of parameters for each model.
Calculate the number of parameters by summing the number of key function
parameters, the adjustment terms and any covariate parameters. If the detection
function is fit by stratum or sample, sum the number of parameters in each
stratum to get the total. For more information about model parameterization, see
About CDS Detection Function Formulae in Chapter 8 of the Users Guide.
Scaling of distances
As explained in Chapter 9, this option is mainly of interest when using the
MCDS engine (although it can be used for the CDS engine too). For all of the
series expansion terms, the scaled distance is used in place of actual distance in
calculating the expansion term values mainly for numerical reasons. There are
two options: scale by w, the truncation distance, or by σ, the scale parameter of
the key function (this does not apply to the uniform key function, which has no
scale parameter). For the CDS engine one will generally want to scale by w, but
for the MCDS engine one may scale by either – see Chapter 9 - Multiple
Covariates Distance Sampling Analysis in the Users Guide.
This column gives layer types rather than layer names because at
this stage Distance doesn’t know which Survey you’re going to use with this
Model Definition, so it doesn’t know which layer names it can use. This way
you can pair the same Model Definition with many different surveys.
In the second column you select the field name of the covariate.
Tick the box in the third column if the covariate is a factor. Factor, or class
covariates have a finite number of distinct levels. The value of each level is not
significant – for example the factor levels could be text fields “Porpoise”,
“Whale”, “Seal”, or they could be numeric fields 1, 2, 3. If the box in this
column is not ticked, the covariate is assumed to be a non-factor covariate. In
this case, the field must be numeric, otherwise an error will occur when you try
to run the analysis engine. For more about this, see Factor and Non-factor
covariates in MCDS in the Users Guide.
Tick the box in the last column if the covariate is the cluster size field.
Distance needs to know whether any of the fields in your analysis are the cluster
size field because it treats this field a special way in the analysis. For more
information see Cluster size as a covariate in Chapter 9 of the Users Guide.
Some general advice about selecting covariates to include is given in the CDS
Analysis Guidelines section of Chapter 8 - Conventional Distance Sampling
Analysis in the Users Guide.
If you are not sure how many key function parameters there are in an
analysis, run it first without specifying starting values, and then look in the
Parameter Estimates page of the Analysis Details Results to see how many
parameters were used. This is particularly useful for MCDS analyses, as the
number of covariate levels in non-factor covariates may vary among strata,
depending on which covariate levels occur in each stratum.
This option is only relevant if you are analyzing the data as exact
distances. If you to analyze your data as intervals (by selecting Intervals in the
Data Filter) then a single goodness-of-fit test is performed using those intervals.
Any options you set under Intervals on this tab page are ignored.
A good way to get to know your data when you begin analyzing it is
to define a large number of intervals (say 15-20), fit any arbitary model, and then
examine the output histogram for evidence of evasive movement, heaping,
outliers, etc. (you can ignore the model fit for now). See CDS Analysis
Guidelines in Chapter 8 of the Users Guide for more on this.
The Interval cutpoints options can make it easier for you to enter
manual intervals. For example, you can enter the left and right truncation points
in the first and last cutponts rows, and then click on Automatic equal
intervals to have the intermediate cutpoints set. Then go back to Manual and
customize the cutponts to your requirements.
This option is only relevant if you are analyzing the data as exact
distances.
Qq plots are a graphical technique for assessing the adequacy of the fit, and the
associated test statistics (Kolmogorov-Smirnov and Cramér-von Mises) test
goodness-of-fit for exact data. For more about these outputs, see CDS Qq-plots
and CDS Goodness of fit tests in Chapter 8 of the Users Guide.
Since these outputs can take a while to generate for large datasets, there is an
option here to turn them off. Qq plots have one plotted point per observation, so
for large datasets it is better to plot only a subset of points. By default, the
maximum number of points to plot is 500, but that can be changed here.
Entering 0 under Maximum num points in qq plots means that all points
are plotted, regardless of how many there are.
Plot file
If higher quality graphical output is required, Distance can save the histogram
data to a file that can then be imported into any graphics or statistics package.
Check the Create file of histogram data option and choose the file name using
the Browse button.
The output format of the file is described in the Users Guide page Saving CDS
results to file in Chapter 8.
You can copy and paste the high quality plots produced by Distance
straight into most word processor and spreadsheet packages. In addition, you
can easily paste the plot data into a spreadsheet and re-create the plot that way.
See the Analysis Details Results Tab help page.
If the multiplier represents cue rate in a cue count analysis, tick the
“Cue rate” box
You can only add multipliers if you have already created the
appropriate fields in the Data Explorer.
If you used the Setup Project Wizard to define your multiplier fields,
then they will appear automatically in the Multiplier tab in Model Definition
Properties. For these fields, Distance also knows whether the operator is * or /
(i.e. whether to multiply or divide the density estimate).
The encounter rate variance can be calculated in three ways (see Buckland et al.
2001 section 3.6.2, and look in the book index under Poisson variance of n)
• Estimate variance empirically. This is usually the best option the
variance is calculated from the variance in observations between
samples. However this is unreliable when there are few samples.
• Assume distribution of observations is Poisson.
• Assume distribution is Poisson, with overdispersion factor b.
Setting b to 1 is equivalent to the previous option..
Bootstrap Variance Estimate
Check on the box Select non-parametric bootstrap to tell Distance to
estimate the variance from bootstrap resamples of the data. When you run an
analysis with this option checked in the Model Definition, the bootstrap results
are given at the end of the Results tab on the Analysis Details. Distance reports
bootstrap confidence limits using two methods: firstly using the bootstrap
estimate of variance but assuming that the distribution of the density estimate is
log-normal; secondly using the bootstrap percentile method (i.e., gives the
appropriate quantiles of the actual bootstrap estimates). Distance also reports the
mean of the bootstrap point estimates.
Each bootstrap resample is made up by sampling with replacement an equal
number of units at the level you specify. For example, if you specify to resample
samples (see below), then each bootstrap resample is made up of the same
number of samples (line or point transects) as your original sample, chosen
randomly with replacement from the original sample. Note that for line
transects, this means that the survey effort (total line length) will differ in each
resample. Note also that each of your original samples has an equal probability
of appearing in the resample (an alternative, which we do not implement, would
be to have probability proportional to line length).
You can add a column for the bootstrap CV and confidence limits in
the Analysis Browser using the Analysis Browser Column Manager.
You specify layer types rather than layer names at this stage because
it is only when the analysis is run that the survey object is used to select the data
layers for the analysis. If you select a field for post-stratification that is not in
the layers used in the analysis, an error will result.
Sample definition
Here, you specify which sample or sub-sample layer to use as the sample, for
determining encounter rate variation - see Sample Definition in MRDS Analysis
in Chapter 10 of the Users Guide for details.
Quantities to estimate
These options determine which quantities to estimate, and with what data.
The first option determines whether the engine should Estimate density /
abundance or not. By default, this option is checked, but there are two
circumstances under which you might want to uncheck it:
• In exploratory analysis, you might want to focus on fitting
detection functions and leave the estimation of density until you’ve
selected the detection function to use. This also saves computer
time, since estimating density can be time consuming for larger
datasets.
• You may wish to use different subsets of the data for estimating
detection function and for estimating density given a detection –
see below.
The second set of options determines how to obtain the Detection function
parameters:
For the second option to work, you must have run the analysis
containing the target detection function after un-checking the option in Tools |
Preferences | Analysis | R Software to Remove the new objects that
are created with each run.
When you choose the option to use the fitted detection function from
a previous analysis, you cannot choose any options under the Detection
function tab in the current Model Definition – since you are not fitting a
detection function in this analysis.
Variance - MRDS
See Model Definition Properties Dialog in the Program Reference for an
overview of the Model Definition Properties dialog.
In the Variance page you specify the methods of calculating the variance of the
density and abundance estimates. The options are:
• Innes et al. (2002) – Based on the empirical variance of estimated
density between samples (the default and preferred option).
• Buckland et al. (2001) – Based on the delta method, using the
empirical variance in encounter rate between samples
• Binomial variance of detection process – Only realistic if the entire
study area was sampled.
Analysis Components window, showing a list of the Model Definitions in the Ducknest
sample project
The last column of the in the table of analysis contents tells you
whether that component is currently being used in any analyses: “Y” means it is
being used and “N” means that it is not. This is useful because when there are
many components (e.g., many Model Definitions if you have been doing a lot of
analyses), it is easy to loose track of which are being used and which are no
longer required. Also, if you double-click on a “Y”, you get a list of the analyses
that use that component.
Toolbar
• List Data Filters. When this button is selected, the Analysis
Components window shows a list of the all Data Filters in the
project.
• List Model Definitions. When this button is selected, the
Analysis Components window shows a list of all the Model
Definitions in the project.
• New Item. Create a new Data Filter or Model Definition,
based on the one currently selected
• Delete Item. Delete the selected Data Filter or Model
Definition
Other Windows
You can select more than one analysis at once in either table, by
holding the Ctrl or Shift keys while you click, or by pressing Ctrl A or Ctrl / to
select all the analyses.
To rearrange the ordering of the columns in the selected table, use the ↑ and ↓
buttons.
To reset the columns to their state when the Column Manager was opened, press
the Reset button.
To reset to columns to their default arrangement, press the Default button. You
can edit the default arrangement in the Preferences dialog (choose File |
Preferences… on the main menu).
To leave the Column Manager without saving the changes, press the Cancel
button.
To save the changes and exit the Column manager, press OK.
This chapter contains information about how Distance works from the inside. It
is intended for advanced users who want to push Distance to its limits.
The information here is preliminary, and will be expanded in future releases.
Distance components
This section will contain a description of the various components that go to make
up Distance. You can get a list of the component files that make up Distance by
selecting the Help | About Distance… from the main menu, and clicking on the
Program Files tab.
Note 2: for single table databases, this is the folder name, for multiple table databases, its
the file name
Note 3: For external files: for single table databases, this is the file name without
extension, for multiple its the table name in the file.
The first time you open a DistData.mdb file in versions of Access after Access
97 (e.g., Access 2000, 2002, etc.), it asks you if you want to convert the file to
the new format, or open as is.
If you choose to open as is, you will get a message saying that you cannot
change the database structure. This means that you cannot add new fields or
new tables to the database, but you can edit records or add new records. In many
cases (for example, see Importing Existing GIS Data in Chapter 5 of the Users
Guide) this is fine. In general we recommend this as the easier option.
You can, in theory, use Distance to link to data in tabular text files, databases
and spreadsheets – although you cannot do this directly from the Distance GUI.
Instead, you do it by directly editing the Data File, DistData.mdb, using Access.
Briefly: you add an entry for the table you want to link in the DataTables table in
DistData.mdb, and then add entries for the fields you want to link to in
DataFields. An example is provided in the LinkingExample sample project.
The technology used by the Distance database engine (Jet 3.51) has
been replaced by Microsoft by newer technology, so it is unlikely they will issue
IISAM drivers to link to newer versions of the above software. Given the
overhead that would be required, it is also unlikely that we will be updating the
Distance database engine to use newer technology any time soon. Many newer
programs can, however, work with files in the older formats – for example,
newer versions of Excel can easily save files as Excel 97-2002 (Excel 8.0) and
work with them in that format.
If you run into problems linking files of a specific format, and have
tried everything you can think of, try looking at the settings in
HKEY_LOCAL_MACHINE\SOFTWARE\
Microsoft\Jet\3.5\Engines or \ISAM_Formats to see if they might be the cause of
the problem.
The following table lists the few limitations to the size of text tables and objects.
Item Maximum size per text file
Field 255
Field name 64 characters
Field width 32,766 characters
Record size 65,000 bytes
You can specify settings for more than one file in the same Schema.ini file.
Specifying the file format
The Format option in Schema.ini specifies the format of the text file. The Text
IISAM can read the format automatically from most character-delimited files.
You can use any single character as a delimiter in the file except the double
quotation mark ("). The Format setting in Schema.ini overrides the setting in the
Windows Registry on a file-by-file basis. The following table lists the valid
values for the Format option.
Format specifier Table format
TabDelimited Fields in the file are delimited by
For example, to specify a comma-delimited format, you would add the following
line to Schema.ini:
Format=CSVDelimited
You can also instruct Microsoft Jet to determine the data types of the fields. Use
the MaxScanRows option to indicate how many rows Microsoft Jet should scan
when determining the column types. If you set MaxScanRows to 0, Microsoft Jet
scans the entire file. The MaxScanRows setting in Schema.ini overrides the
setting in the Windows Registry on a file-by-file basis.
The following entry indicates that Microsoft Jet should use the data in the first
row of the table to determine field names and should examine the entire file to
determine the data types used:
ColNameHeader=True
MaxScanRows=0
The next entry designates fields in a table by using the column number (Coln)
option, which is optional for character-delimited files and required for fixed-
length files. The example shows the Schema.ini entries for two fields, a 10-
character CustomerNumber text field and a 30-character CustomerName text
field:
Col1=CustomerNumber Text Width 10
Col2=CustomerName Text Width 30
Note: If you omit an entry, the default value in the Windows Control Panel is
used.
Note that both of these format sections can be in the same .ini file.
Another example of a Schema.ini file is in the Data Folder of the
LinkingExample project.
Valid Names
Valid Field Names
Field names must meet the following criteria to be valid:
• For internal fields, the name must be 64 letters long or less
• For shapefile fields, the name must be 10 letters or less long with no
spaces
• The only permitted characters are letters (A-Z or a-z), numbers (0-9),
spaces or underscores.
• Field names must be unique within a data layer (i.e., the same name is not
allowed in 2 tables, except for the ID and LinkID fields)
• The name must not appear on the list of reserved field names below (not
case sensitive).
Since the CDS and MCDS analysis engines are both implemented in
MCDS.exe, we refer to both as “the MCDS engine” in what follows.
Some history
In historic versions of Distance (1.0 - 3.0), the program was driven by a simple
command language, which defined the survey design, data, and analysis
methods. Distance could be run in batch mode by passing in the filenames of
input and output files via the DOS command line. It could also be run
interactively, entering the commands at a prompt.
Distance 3.5 and later added a graphic user interface for defining the inputs. The
program that does the actual work of analysis was called an “analysis engine”,
and was called D35Engine.exe in Distance 3.5, D4.exe in Distance 4, and now
MCDS.exe. This program is run from the Distance graphical interface in batch
mode. The exact way that Distance communicates with the MCDS analysis
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 279
engine is outlined in another Appendix – see Introduction to Inside Distance
Appendix.
The data format and command language used to run MCDS.exe are therefore
very similar to those used to run the old versions of Distance (the major
differences are outlined in a subsection, below). The last complete
documentation for the command language is the Distance 2.2 users manual,
which is available for download from the support page of the Program Distance
web site. Many new features have been added since Distance 2.2 (for example
multiple covariates and flat data file input), but some features are also no longer
supported. These include: interactive mode (batch mode only is now supported)
and hierarchical data input (flat files only). For a full list, see the section
Changes in MCDS Engine Since Distance 2.2.
where
Parameter1 is either a 0 or a 1. 0 is for run mode - i.e. run the analysis. 1 is for
import mode, which is used to implement part of the Project Import feature in
Distance and is not described further here.
Parameter2 is the filename of the input command file – see MCDS Command
Language, below for details of the contents of this file.
The program returns a number to the command line, indicating the status of the
run, as well as up to 6 files of output – see Output From the MCDS Engine.
Example 1:
Assume that we have a command window open in the Distance program
directory (usually C:\Program Files\Distance5), and that we have a file
TestInput.txt in that directory. Then we type:
MCDS 0, TestInput.txt
Example 2:
Assume we have a command window open in some arbitrary directory (e.g.,
C:\). Assume that we have an input command file C:\Temp\Input File.txt
that we want to run. Assume that the MCDS.exe program is in the Distance
program directory C:\Program Files\Distance5. Because both the input file
and the Distance program directory have spaces in them, we need to enclose the
program and file names in quotes:
"C:\Program Files\Distance5\MCDS" 0, "C:\Temp\Input File.txt"
The space between MCDS and parameter 1, and the space and
comma between parameter 1 and parameter 2 are critical. For example, in
Example 1, above,
MCDS 0,TestInput.txt
will not work (it will return the value 4 - file error) because there is no space
between parameter 1 and parameter 2 (see MCDS engine command line output
for more about the numbers returned to the command line).
You can copy the file MCDS.exe to another folder and run it from
there if you want to (e.g., C:\temp). You could also add the Distance program
folder to your windows path (in Windows XP it’s under Control Panel | System |
280 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
Advanced | Environment variables) so you don’t then need to give the full path
when calling it from the command line.
C:\Temp\dst111.tmp
C:\Temp\dst110.tmp
C:\Temp\dst112.tmp
C:\Temp\dst113.tmp
None
None
Options;
Type=Line;
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 281
Length /Measure='Mile';
Distance=Perp /Measure='Foot';
Area /Units='Square mile';
Object=Single;
SF=1.0;
Selection=Sequential;
Lookahead=1;
Maxterms=5;
Confidence=95;
Print=Selection;
End;
Data /Structure=Flat;
Fields=STR_LABEL, STR_AREA, SMP_LABEL, SMP_EFFORT, the MRDS engine;
Infile=C:\Temp\dst10D.tmp /NoEcho;
End;
Estimate;
Distance /Intervals=0,1,2,3,4,5,6,7,8 /Width=8 /Left=0;
Density=All;
Encounter=All;
Detection=All;
Size=All;
Estimator /Key=HN /Adjust=CO /Criterion=AIC;
Monotone=Strict;
Pick=AIC;
GOF;
Cluster /Bias=GXLOG;
VarN=Empirical;
End;
Example command file
Header Section
This section is required in all command files. Here, you specify the names of the
output files that Distance will generate. If the files do not exist, they will be
created. If they exist, they will be overwritten. The section is 6 lines long, and
each line corresponds with the following file:
• Output file
• Log file
• Stats file
• Plot file
• Bootstrap file
If you do not include a path for the files (e.g., just dst6FA1.tmp in the
above, for the first file), it is created and written into the current working
directory (the directory you called the program from).
282 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
In previous versions of Distance, the CDS and MCDS engine
required 5 header lines, and not six (because there was no bootstrap progress
file). Also, the bootstrap file came before the plot file. So, if you have any code
for calling previous versions, you’ll need to update it to call the new version.
Options Section
Various options can be set to control program operation. Once an option value
has been set, it retains its value until you change it or exit the program. The data
options define the characteristics of the data collected and how they are to be
entered. The model fitting options define values to be used in fitting a
probability density function to the distance data, some of which can be
overridden in the estimation procedure. Print options control the amount and
format of program output and bootstrap options control the number of bootstrap
samples and the random number seed used to generate a bootstrap sequence.
This section should always begin with the command OPTIONS and end with the
END command.
Below are the valid commands in the options section by category. Each option
and its possible values are individually described in the following sections in
alphabetical order.
Miscellaneous
DEFAULT command Options reset to default
END command Ends options section
LIST command List option values
Output
PRINT command Controls amount of output
QQPOINTS command Max number of points in qq plot
TITLE command Value of output title
Data Options
AREA command Set area quantities
CUERATE command Set cue rate
DISTANCE command Set distance quantities
LENGTH command Set length quantities
OBJECT command SINGLE or CLUSTER
SF command Sampling fraction
TYPE command POINT, LINE or CUE
Model Fitting
LOOKAHEAD command Max for sequential fit
MAXTERMS command Max # model parameters
PVALUE command Significance level (α-level)
SELECTION command Term selection mode
Model Fitting
BOOTSTRAPS command # of bootstrap samples
SEED command Random number seed
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 283
AREA Command
Syntax:
AREA /CONVERT=value /UNITS='label' ;
Description:
This command defines the area unit for expressing density (D). The switches
are:
/UNITS='label' - a label for the unit of area of the density estimate. The single
quotes are only required to retain lowercase. Only the first 15 characters are
used.
/CONVERT=value - value specifies a conversion factor which is used to
convert the estimated density to new units for area. It is needed for atypical
units.
If the MRDS engine recognizes the measurement unit for DISTANCE (and
LENGTH for line transects) and if it recognizes the Area UNITS label, it will
calculate the appropriate conversion factor. However, if one or more of the
UNITS is not recognized, you will need to specify the conversion value with the
CONVERT switch. The Area units recognized by the program are those listed
under the DISTANCE command and HECTARES (HEC) and ACRES (ACR).
For example, the unit can be entered as Squared Meters or Metres Squared
because the MRDS engine recognizes the unit based on the character string
MET. See the the MRDS engine command below for a definition of recognized
units
Default: AREA /UNITS=HECTARES;
Examples:
Distances are measured in feet but analyzed in meters, length is measured in
miles and density is estimated as numbers per square kilometer. The MRDS
engine will do necessary unit conversions because all unit labels are recognized.
DISTANCE /MEASURE=FEET /UNITS='Meters';
LENGTH/UNITS='Miles';
AREA /UNITS='Sq. kilometers';
BOOTSTRAPS Command
Syntax:
BOOTSTRAPS=value ;
Description:
“Value” is the number of bootstrap samples which should be generated. For a
reasonable variance estimate, this number should be at least 100. We
recommend setting BOOTSTRAPS=999 or 1000 to construct a bootstrap
confidence interval.
Default: BOOTSTRAPS=1000;
CUERATE Command
Syntax:
CUERATE = value1 /SE=value2 /DF=value3;
Description:
For cue counting, “value1” is the average rate at which animals issue visual or
auditory detection cues. The rate should be given in the same units of time as
the values given for sampling effort in the data. For example, if effort is
measured in hours then the cue rate should be number of cues per hour. The cue
284 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
rate must be a positive number (>0). Optionally a standard error for the cue rate
can be given with “value2”, and the degrees of freedom can be given with
“value3” (a DF of 0.0 is interpreted as infinite degrees of freedom). The
standard error and df is accounted for in the estimated standard error of the
density and abundance estimates. This option is only used if TYPE=CUE is
specified.
Default: CUERATE=1 /SE=0 /DF=0;
Example:
An estimate of the cue rate is 12 per hour with a standard error of 2 per hour and
93 degrees of freedom. The sample effort for this cue counting example be
specified in hours sampled.
CUERATE=12 /SE=2 /DF=93;
DEFAULT Command
Syntax:
DEFAULT ;
Description:
This command resets all of the options to their default values. Remember that an
option remains in effect until it is changed or the MRDS engine is terminated.
The default values for each of the options are:
DISTANCE Command
Syntax:
/ NCLASS = nclass
/ WIDTH = width
PERP / UNITS =' label '
/ CONVERT = value
DISTANCE = / RTRUNCATE = t ;
/ MEASURE =' label '
RADIAL / INTERVALS = c0 , c1, ..., cu
/ LEFT = left
/ EXACT
Synonyms: RIGHT=WIDTH
Description:
This command describes numerous features about the distance data and defines
the default values for estimation. The format of the data entry within the Data
section is determined by the values set with this command. Whereas, the
DISTANCE command in the Estimate section only determines how the distance
data are analyzed.
For line transect data (TYPE=LINE), this command defines whether the data
will be entered as either perpendicular distances or as radial distance and angle
measurements.
• PERP - perpendicular distance was measured for a line transect
• RADIAL - radial distance and angle were measured in line
transects
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 285
For TYPE=POINT (which includes trapping webs) or CUE, the MRDS
engine=RADIAL is assumed and only radial distances are expected.
Distances can be entered as ungrouped or grouped. Ungrouped implies an exact
distance is entered for each observation in the data. Grouped means a set of
distance intervals is given and the frequency of observations in each interval is
entered. Ungrouped distances are indicated by the switch /EXACT and grouped
data is indicated by the /INTERVALS switch which also specifies the distance
intervals (c0-c1,c1-c2,c2-c3,...). The value c0 specifies the left-most distance
and cu the right-most distance for grouped data. Typically, c0=0 and cu=w.
Intervals can also be specified by using the /NCLASS and /WIDTH and
optionally the /LEFT switch. These switches will create 'nclass' equal width
distance intervals between the values of 'left' and 'width' (i.e., each interval is of
length (width-left)/nclass). For ungrouped data, it is also possible to specify left
and right truncation with the /LEFT and /WIDTH switches. Any values outside
of these bounds are excluded from the analysis. Right truncation as a percentage
of the observations can also be specified for both grouped and ungrouped data
with /RTRUNCATE switch. The value of t must be between 0 and 1. In the
analysis, no more than t*100% of the data is truncated from the right. For
ungrouped data, the width is set at the distance which represents the (1-t)*100%
quantile. For grouped data, intervals are truncated from the right as long as no
more than t*100% of the data is truncated. If t=0 and the data are ungrouped
data, the width is set to the largest distance measurement and if the data are
grouped, the width is set to the endpoint for the right-most interval with a
non-zero frequency. For ungrouped data, if both the /WIDTH and
/RTRUNCATE switch are specified, the RTRUNCATE value specifies the value
of width.
The DISTANCE command is also used to define the measurement unit for
distances:
/MEASURE = 'label' - a label for the units in which distance was measured.
Single quotes are only required to retain lowercase. Only the first 15 characters
are used.
/UNITS='label' - a label for the units for distance after conversion, if any.
Single quotes are only required to retain lowercase. Only the first 15 characters
are used.
/CONVERT=value - value specifies a conversion factor which is used to
convert the input distances for atypical units.
MEASURE and UNITS switches are used to convert from the unit in which the
data are recorded and entered (MEASURE) to the unit for analysis (UNITS). It
is not necessary to convert distances to different units for analysis as long as it is
a unit that is recognized by the MRDS engine (see list below). It is only
provided as a convenience and it is probably easier to leave measurements in
their original units. If you do convert units, take note that values such as f(0),
h(0), effective strip width (ESW) and effective detection radius (EDR) are
expressed in the converted units. Thus, the point estimate and standard errors
will change by the conversion factor from the measured to analysis units. If you
are not converting distance units, you can specify the units with either switch
(/MEASURE or /UNITS). The most common measurement units are recognized
by the MRDS engine and there is no need to enter a conversion value
(/CONVERT= value).
The following are the recognized measurement unit labels:
• CENTIMETERS
• METERS
• KILOMETERS
• MILES
286 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
• INCHES
• FEET
• YARDS
• NAUTICAL MILES
Each label is recognized by its first 3 characters which allows variations in
spelling. For example, if you enter METRES, it will use METRES as the label
and will recognize it based on MET. Values are given in uppercase but can be
entered in upper or lowercase. If the MRDS engine recognizes the /UNITS and
/MEASURE labels and you specify the /CONVERT= switch it will display a
warning message that you are overriding the conversion value. Values for
/WIDTH, /LEFT, and /INTERVALS should be given in original measurement
units and not in converted units.
Default:
DIST=PERP /UNITS='Meters' /MEASURE='Meters' /EXACT /LEFT=0
/RTRUNCATE=0;
Examples:
Perpendicular distance measured in intervals of 2 feet to a distance of 10 feet and
converted to metres (meters) for analysis. The grouped data are entered as the
frequency of observations in each of the 5 distance intervals (see the Data
section). Notice that WIDTH is specified in the original measurement units of
feet and not in meters.
DIST=PERP /MEASURE='Feet' /UNITS='Metres' /WIDTH=10
/NCLASS=5 ;
LENGTH Command
Syntax:
LENGTH /CONVERT=value /UNITS='label' /MEASURE='label' ;
Description:
This command sets the measurement unit for line length and any desired
conversion to different units for analysis. It is not necessary to convert line
length, but may be desirable depending on the original units.
/MEASURE='label' - a label for the units in which line length was measured.
Single quotes are only required to retain lowercase. Only the first 15 characters
are used.
/UNITS='label' - a label for the units for length after conversion, if any. Single
quotes are only required to retain lowercase. Only the first 15 characters are
used.
/CONVERT=value - value specifies a conversion factor which is used to
convert length measured in atypical units.
See further explanation under the DISTANCE command for the /MEASURE,
/UNITS and /CONVERT switches. The LENGTH command is used for line
transects only.
Default:
LENGTH /UNITS=KILOMETERS /MEASURE=KILOMETERS;
Example:
Length is entered in miles but converted to kilometers for display and analysis.
LENGTH /UNITS='Kilometers' /MEASURE='Miles' ;
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 287
LIST Command
Syntax:
LIST;
Description:
Lists current values of the program options and the program limits to the screen.
LOOKAHEAD Command
Syntax:
LOOKAHEAD=value ;
Description:
For term selection modes SEQUENTIAL and FORWARD (see SELECTION
command), “value” specifies the number of adjustment terms which should be
added to improve the fit, before the added terms are considered to be non-
significant. For example, if LOOKAHEAD=2 and a model with 2 adjustment
terms does not significantly improve the fit over a model with 1 term, a model
with 3 adjustment terms is fitted. If the 3-term model is an improvement over a
1-term model, the algorithm will continue with the 3-term model as the new base
model. If it is not an improvement, the 1-term model would be chosen. If
LOOKAHEAD=1 (the default), in the above example, the 3-term model would
not have been examined because upon finding the 2-term model was not an
improvement, the 1-term model would have been used.
Default: LOOKAHEAD=1;
MAXTERMS Command
Syntax:
MAXTERMS=value ;
Description:
“Value” is the maximum number of model parameters. The maximum number
of adjustment terms (defined as m) that may be added is MAXTERMS minus the
number of parameters in the chosen key function (defined as k). MAXTERMS
must be less than or equal to 5. This option is only useful to limit the number of
model combinations with the term selection mode that considers also possible
combinations of adjustment terms (SELECTION=ALL). Use the NAP switch
on the Estimator command to specify an exact number of adjustment terms to be
used. The maximum number of adjustment terms is also limited by the number
of observations for ungrouped data or number of distance intervals for grouped
data.
Default: MAXTERMS=5;
OBJECT Command
Syntax:
SINGLE
OBJECT = ;
CLUSTER
Description:
This option defines whether objects are detected individually (SINGLE) or as
clusters (CLUSTER).
SINGLE - Object always detected as a single animal or other entity (e.g., duck
nest)
CLUSTER - Object detected as a cluster (e.g., herd, flock,pod of whales)
288 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
Default: OBJECT=SINGLE;
PRINT Command
Syntax:
SELECTION
RESULTS
PRINT = ;
ALL
SUMMARY
Description:
This option sets the default level of printing in the output. The various settings
are hierarchical and more control over the amount of results can be obtained with
the PRINT command in the Estimate section.
ALL - print fitting iterations, model selection results and estimation results
SELECTION - print model selection results and estimation results
RESULTS - print estimation results only
SUMMARY (NONE) - only summary tables are printed
Note: if you choose RESULTS or SUMMARY, warnings are not given about the
algorithm having difficulties fitting a particular model or constraining the fit to
achieve monotonicity.
Default: PRINT=SELECTION;
PVALUE Command
Syntax:
PVALUE=α;
Description:
α is the significance level of likelihood ratio tests to determine significance of
adding adjustment terms and is the default value for the significance test for size-
bias regression of cluster sizes
Default: PVALUE=0.15;
QQPOINTS Command
Syntax:
PVALUE=value;
Description:
Maximum number of points to print in qq-plots. When there are a large number
of data points, plotting all the points can take quite a while and result in a very
large plot file. The default, 0, means no maximum – i.e., plot every point.
Default: QQPOINTS=0;
SEED Command
Syntax:
SEED=value ;
Description:
SEED specifies the random number seed for generating a sequence of random
numbers for bootstrap samples. “Value” should be a large odd number
preferably greater than 2,000,000. If you use the same seed, the same sequence
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 289
of random numbers will be generated. You can use SEED=0; (the default)
which will use a value from the computer's clock to generate a seed.
Default: SEED=0;
SELECTION Command
Syntax:
SEQUENTIAL
FORWARD
SELECTION = ;
ALL
SPECIFY
Description:
This command specifies the default mode for adjustment term selection in the
ESTIMATE procedure for fitting the detection function. The /SELECT switch
of the ESTIMATOR command overrides the default value. See the Estimate
section for a description of adjustment term and model selection.
SEQUENTIAL - add adjustments sequentially (e.g., for simple polynomial, in
increasing order of the exponent).
FORWARD - equivalent to forward selection in regression; select the
adjustment term which produces the largest increase in the maximum of the
likelihood function
ALL - fit all combinations of adjustment terms with the key function and use
the model with smallest Akaike Information Criterion (AIC) value.
SPECIFY - user-specified number of adjustment terms and possibly order of the
adjustments.
Default: SELECTION=SEQUENTIAL;
SF Command
Syntax:
SF=c ;
Description:
SF defines the value of the sampling fraction which is typically 1. However, if
only one side of a transect line is observed c=0.5, or if some fraction of the circle
surrounding a point transect is searched, c is the fraction searched (e.g.,c=0.5 if a
semi-circle is observed). For cue counting, c is the proportion of a full circle that
is covered by the observation sector. For a sector of 90o (45o either side of the
line) with cue counting, c = 0.25
Note that SF can now be specified using the MULTIPLIER command, with
SE=0, and this is the way that Distance does it.
Default: SF=1;
TITLE Command
Syntax:
TITLE='yourtitle' ;
Description:
This command sets a value for the title which is printed at the top of each page.
Yourtitle should contain no more than 50 characters. Excess characters are not
used. There is only 1 title line. Re-specifying the title will replace the previous
value.
290 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
TYPE Command
Syntax:
POINT
TYPE = LINE ;
CUE
Description:
This option defines the type of sampling, which determines what types of data
can be entered and how data are analyzed.
POINT - point transect data
LINE - line transect data
CUE - cue counting data
Trapping webs should be treated as point transects.
Default: TYPE=LINE;
Data section
In this section, you specify the file containing data, and which column in this file
corresponds with which field. For more about the format of the data file, see see
the section describing the MCDS Engine Required Data Format.
The data section should always begin with the statement DATA
/STRUCTURE=FLAT and end with the statement END. Note that historical
versions of this engine used a hierarchical data format, but that is no longer
supported, so the /STRUCTURE=FLAT switch is now mandatory.
The commands that are valid in the Data section are listed in alphabetical order
below, and described in the following sections.
Data section commands
END command Ends data section
FACTOR command Specifies that a field is a factor covariate
FIELDS command List of fields in the data file
INFILE command Gives filename of data file
SIZEC command Specifies that a field is the cluster size
covariate
FACTOR command
Syntax:
FACTOR /NAME='fieldname' /LEVELS=value /LABELS= 'label1',
'label2', … ;
Description:
This command defines a field in the data file as a factor covariate in MCDS
analyses. For more about factor covariates, see the section on factor and non-
factor covariates in MCDS in Chapter 9 of the Users Guide. Covariates for the
detection function are specified in the Estimator command.
There should be one FACTOR command for each factor field in the data file. If
there are no factor fields, this command will not be present.
/NAME='fieldname' – the name of the field (must be one of the names in the
FIELDS command)
/LEVELS=value – the number of levels in the factor covariate
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 291
/LABELS= 'label1', 'label2', … - a comma-delimited list giving the value of
each level of the factor
Default: no FACTOR command
Examples:
The data file contains a column for obsererver, which is specified in the FIELDS
command as “Observer”. Observer is to be used as a factor covariate in the
detection function, and each observation can take one of three possible values
“Peter”, “Paul” and “Mary”.
FACTOR /NAME=Observer /LEVELS=3 /LABELS=Peter, Paul, Mary;
FIELDS command
Syntax:
FIELDS= fieldname1, fieldname2, fieldname3, …
Description
This command gives a list of the fields occuring in the data file, reading the
columns of the data file from left to right. The following fieldnames are required:
• SMP_LABEL – sample label
• SMP_EFFORT – sample effort (line length/number of points)
• DISTANCE – perpendicular or radial distances (depending on the
TYPE and DISTANCE commands)
If the TYPE = LINE and DISTANCE = RADIAL then an additional required
field is
• ANGLE – angle of radial distances, in degrees
If OBJECT = CLUSTER then another required field is
• SIZE – the cluster size
Two additional fields with fixed names may be specified:
• STR_LABEL – stratum label
• STR_AREA – stratum area – if areas are ommitted then density but
not abundance is calculated
If covariates are specified in the ESTIMATOR command then these should be
included in the data file, and their names listed in the FIELDS command. In
addition to being listed in the FIELDS command, factor covariates should be
declared as such using a FACTOR command.
Default: No default
Examples:
Standard line transect data, with a column for stratum label, area, transect label,
line length and perpendicular distance:
Fields=STR_LABEL, STR_AREA, SMP_LABEL, SMP_EFFORT,
DISTANCE
Line transect data, with radial distance and angle, objects as clusters, and an
additional field for an Observer covariate:
Fields=STR_LABEL, STR_AREA, SMP_LABEL, SMP_EFFORT,
DISTANCE, ANGLE, SIZE, Observer
INFILE Command
Syntax:
292 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
/ ECHO
INFILE = filename ;
/ NOECHO
Description:
This command specifies the data file name. Filename should either give the full
absolute path to the file, or just the filename if the file is in the current directory.
The ECHO and NOECHO switches control whether the data are ECHOed to the
LOG file. Once you are certain that the data are free of errors, using /NOECHO
will reduce the amount of output to the LOG file.
Example:
Infile=C:\temp\dst7035.tmp /NoEcho;
SIZEC command
Syntax:
SIZEC=fieldname;
Description
Specifies that the field “fieldname” is the cluster size field, when cluster size is a
covariate in the detection function.
Estimate section
The following are valid commands in the estimate section:
Estimate section commands
BOOTSTRAP command bootstrap variance/confidence intervals
CLUSTER command estimation of expected cluster size
DENSITY command resolution of density estimation
DETECTION command resolution of detection probability estimation
DISTANCE command analysis treatment of distances
(Estimate section)
ENCOUNTER command resolution of encounter rate estimation
END command initiates estimation
ESTIMATOR command model for g(x)
G0 command estimate of g(0) and its standard error
GOF command intervals for goodness of fit test/display
MONOTONE command monotonicity constraints on g(x)
MULTIPLIER command multipliers in the detection function
PICK command method of model choice
PRINT command (Estimate detailed control of output
section)
SIZE command resolution of expected cluster size estimation
VARN command variance estimation of n
The commands are described below in alphabetical order. You will use these
commands to define:
• which quantities you want to estimate and at what level of
resolution (DENSITY, DETECTION, ENCOUNTER, SIZE ),
• how distance and cluster size are treated in the analysis and which
models are used for estimation (DISTANCE, CLUSTERS,
ESTIMATOR, MONOTONE, PICK, GOF, G0),
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 293
• how variances are estimated ( VARN, VARF, BOOTSTRAP), and
• how much output should be generated ( PRINT ).
Density and abundance estimates are comprised of the following components:
• detection probability,
• encounter rate,
• expected cluster size (if the detected objects are clusters).
It is possible to restrict estimation to one or more of these components without
estimating density; however, all components must be estimated to obtain an
estimate of density. You will use the commands DENSITY, DETECTION,
ENCOUNTER, and SIZE to define which components will be estimated. If you
do not use any of these commands, each component and density is estimated, by
default. Likewise, if you use the DENSITY command, density and all of its
components are estimated. If you use any or all of the DETECTION,
ENCOUNTER, and SIZE commands and not the DENSITY command, only the
specified components are estimated. For example,
ESTIMATE;
ENCOUNTER ALL;
END;
will only estimate encounter rate.
Estimates of density and its components can be made at different levels of the
sampling hierarchy (Sample < Stratum < All). The DENSITY, DETECTION,
ENCOUNTER, and SIZE commands are used to specify the level at which each
quantity is estimated. Different levels can be used for the various quantities;
although, some combinations are incompatible. An error message is given, if the
levels are incompatible. The lowest level of resolution specified for DENSITY
is the default level for each of its components, if they are unspecified. For
example,
ESTIMATE;
ESTIMATOR /KEY=UNIFORM;
DENSITY BY STRATUM;
END;
will estimate density and each of its components for each stratum defined in the
data. The lowest level for density must coincide with a level assigned to
encounter rate. The level of any component cannot be lower then the lowest
level specified for density. For example, the following is not valid:
ESTIMATE;
ESTIMATOR /KEY=UNIFORM;
DENSITY BY STRATUM;
DETECTION BY SAMPLE;
END;
If a size-bias regression estimate of expected cluster size is computed, the level
for SIZE must be no greater than the level for DETECTION. This feature is
most useful for estimating density by stratum when too few observations exist in
each stratum to estimate f(0) (or h(0)). A solution is to assume f(0) is the same
for all strata, which is illustrated in the following example:
ESTIMATE;
ESTIMATOR /KEY=UNIFORM;
DENSITY BY STRATUM;
DETECTION ALL;
END;
All of the observations are pooled to estimate a common value for f(0), which is
used in each stratum density estimate.
294 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
When there are covariates in the detection function, it is possible to fit the
detection function at one level, and then estimate probability of detection at a
lower level. For more on this, see Chapter 9 of the Users Guide Estimating the
detection function at multiple levels. An example of this, with a global detection
function fit with a habitat covariate which is specific to each stratum, and then
the detection function estimated by stratum is:
ESTIMATE;
ESTIMATOR /KEY=UNIFORM /COVARIATES=Habitat;
DENSITY BY STRATUM;
DETECTION ALL;
DETECTION BY STRATUM
END;
Possibly the most confusing aspect of estimation with the MRDS engine will be
the specification of models for detection probability and model selection. A
model is specified with the ESTIMATOR command which defines a type of key
function and adjustment function. The adjustment function is actually a series of
terms which are added to the key function to adjust the fitted function to the
data. Model selection includes 1) selecting how many and which adjustment
terms are included in the model (term selection) and 2) selecting a “best” model
(estimator) from the specified set of competing models.
The default method of selecting terms (term selection mode) is defined by the
SELECTION command in the Options section. Its value can be overridden with
the /SELECT switch of the ESTIMATOR command. Related options include
LOOKAHEAD and MAXTERMS. There are 4 types of term selection modes
described below: 1) SEQUENTIAL, 2) FORWARD, 3) ALL, and 4) SPECIFY.
The maximum number of adjustment terms that can be included in the model is
limited by the value of (MAXTERMS - number of parameters in the key
function) and less frequently by either the number of observations, for
ungrouped data, or the number of distance intervals for grouped distance data.
The MRDS engine will issue a warning message if the number of parameters is
being limited by the amount of data.
Term selection mode SPECIFY implies the user will specify which adjustment
terms are included in the model. Typically, this is used to specify that a key
function without adjustment terms is to be fitted to the data, as in the following
example:
ESTIMATE;
ESTIMATOR /KEY=HNOR /NAP=0 /SELECT=SPECIFY;
END;
It is not necessary to include the /SELECT switch, but it will prevent the MRDS
engine from issuing a warning message that you are specifying the model. It is
also possible to specify any combination of terms and give starting values for
their coefficients. the MRDS engine does not select the terms to include in the
model but does estimate the parameters to fit the model to the data. For
example,
ESTIMATE;
ESTIMATOR /KEY=UNIFORM /NAP=2
/SELECT=SPECIFY /ORDER=1,3;
END;
specifies the model as the following 2-term cosine series for which the
parameters a1 and a2 are estimated:
1⎛ ⎛ πx ⎞ ⎛ 3πx ⎞ ⎞
f ( x) = ⎜⎜1 + a1 cos⎜ ⎟ + a 2 cos⎜ ⎟ ⎟⎟
w⎝ ⎝w⎠ ⎝ w ⎠⎠
ALL, as its name implies, examines all possible combinations of a limited
number of adjustment terms. If z is the maximum number of parameters
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 295
(MAXTERMS=z) and k is the number of parameters in the key function, then
there are 2z-k combinations of the adjustment terms. Each model is fitted to the
data and the model with the smallest value of the Akaike's Information Criterion
(AIC) is selected.
SEQUENTIAL and FORWARD both consider a subset of models with different
combinations of adjustment terms. For each of these term selection modes, a
sequence of models is considered. An adjustment term is added at each step of
the sequence. The sequence of models can be represented as:
M1- key function with no adjustment terms
M2- key function with 1 adjustment term
M3- key function with 2 adjustment terms
:
:
Mv- key function with v-1 adjustment terms
A stopping rule (CRITERION) is either based on a likelihood ratio test or
minimizing AIC. Model Mt is chosen if there is no model in the sequence
Mt+1,...,Mt+l which provides a significantly better fit as determined by the
specified CRITERION. The LOOKAHEAD option determines the length (l) of
the sequence of models that is examined before choosing model Mt.
SEQUENTIAL and FORWARD only differ in their choice of which adjustment
term is included at each step in the sequence. SEQUENTIAL term selection
adds the terms sequentially based on the order of the term. For polynomial
adjustment functions, the order of the adjustment term is the exponent of the
t t +2
polynomial. Terms are added in the following sequence: ( x , x ,... . For
cosine adjustments, cosine terms are added in the following
sequence: cos(tπx w), cos((t + 1)πx w),... . The beginning value, t, is determined
by the shape of the key function. FORWARD selection adds 1 term at a time but
not necessarily in sequential order. For each model in the sequence, each term
not already in the model is added and the adjustment term which increases the
likelihood the most is chosen as the term to add. For example, to find model
M2, z-k models are fitted to the data, each with a single adjustment term of a
2 4 6 8 10
different order (e.g., x , x , x , x , or x ). The term which maximizes the
likelihood is selected for model M2. Model M3 would then consider adding
another term not included in M2. With FORWARD selection it is possible to
select models that cannot be selected with the SEQUENTIAL mode. For
example, the following model might be chosen with FORWARD selection:
1⎛ ⎛ πx ⎞ ⎛ 3πx ⎞ ⎞
f ( x) = ⎜1 + a1 cos⎜ ⎟ + a 2 cos⎜ ⎟ ⎟⎟
w ⎜⎝ w
⎝ ⎠ ⎝ w ⎠⎠
⎛ 3πx ⎞
However, with SEQUENTIAL selection, the adjustment term cos⎜ ⎟ , could
⎝ w ⎠
⎛ 2πx ⎞
not be added without first adding the adjustment term cos⎜ ⎟.
⎝ w ⎠
The additional level to model fitting is to choose between the competing models
(ESTIMATORs). This model selection step is determined by the PICK
command. It has 2 values: NONE and AIC. If you assign the value NONE, the
MRDS engine will not choose between the different models and will report the
estimates for each model. However, if you accept the default value, AIC the
MRDS engine will only compute estimates based on the model which has the
smallest AIC value.
BOOTSTRAP Command
Syntax:
296 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
/ STRATUM
BOOTSTRAP / SAMPLES / OBS ;
/ INSTRATUM
Description:
The BOOTSTRAP command initiates a non parametric bootstrap of the density
estimation procedure. The number of bootstraps performed is determined by the
BOOTSTRAP command in OPTIONS. The basic re-sampling unit of the
bootstrap is a SAMPLE; however, if strata are replicates they can also be re
sampled with the /STRATUM switch. If both are specified, re-sampling occurs
at both levels (see example below). The switch /INSTRATUM can be set to
restrict the re-sampling of samples or observations within stratum It would be
used if density is estimated by stratum or sampling was stratified apriori.
The switch /OBS can be set to re-sample distances. Using BOOTSTRAP/OBS;
will provide a non-parametric bootstrap of f(0) or h(0) and, if the population is
clustered, E(s). However, the variances and confidence intervals are conditional
on the sample size and do not include the variance of the encounter rate. The
/OBS switch has been included for completeness but its routine use is not
recommended. Reasonable confidence intervals for density could only be
obtained by adding a variance component for the encounter rate. It is also
possible to include the /OBS switch with /SAMPLES, however, this is not
recommended unless the number of observations per sample is reasonable ( >
15).
By default, issuing the BOOTSTRAP command without switches is equivalent
to BOOTSTRAP/SAMPLES/INSTRATUM;. We recommend the default or
dropping the /INSTRATUM if sampling across strata is appropriate. The use of
/STRATUM is only appropriate if the strata represent an additional level of
sampling (e.g., independent observers (stratum) traversing an independent set of
line transects (sample)).
Each bootstrap resample is made up by sampling with replacement an equal
number of units at the level you specify. For example, if you specify to resample
samples within strata (the default), then each bootstrap resample is made up of
the same number of samples (line or point transects) as your original sample,
chosen randomly with replacement from the original sample (within each
stratum). Note that for line transects, this means that the survey effort (total line
length) will differ in each resample. Note also that each of your original samples
has an equal probability of appearing in the resample (an alternative, which we
do not implement, would be to have probability proportional to line length).
The bootstrap summary is given at the end of the output. The point estimate is
the mean of the bootstrap point estimates. Two sets of confidence intervals are
given: 1) log based confidence intervals based on a bootstrap standard error
estimate, and 2) 2.5% and 97.5% quantiles of the bootstrap estimates (i.e.,
percentile confidence intervals).
Summary results from each iteration are stored in the Bootstrap file – see MCDS
Engine Bootstrap File for details.
Example:
Re sample strata and samples within each stratum.
BOOTSTRAP /STRATUM /SAMPLES;
CLUSTER Command
Syntax:
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 297
CLUSTER / WIDTH = value / TEST = α
/ MEAN
X
XLOG
/ BIAS =
GX
GXLOG
Description:
The CLUSTER command, like the the DISTANCE command, is used to modify
the way the cluster sizes are used in the estimate of density. By default, the
WIDTH is chosen to match the truncation value set by the the MRDS engine
command and the MRDS engine computes a size bias regression estimate
(/BIAS=GXLOG) by regressing the loge(s) (natural logarithm specified as log()
in the output) against (x), where x is the distance at which the cluster was
observed.
The WIDTH switch specifies that only cluster sizes for observations within a
distance less than WIDTH are used in the calculation of the expected cluster
size. This treatment of the data can only be accomplished if the distances and
cluster sizes are both entered as ungrouped.
The MEAN switch specifies that the expected cluster size is to be estimated as
the average (mean) cluster size. Likewise, the BIAS switch specifies that
expected cluster size is to be estimated by a size bias regression defined by the
value of the switch.
Value Meaning
X Regress cluster size against distance x
XLOG Regress loge(s) against distance x
GX Regress cluster size (s) against (x)
The TEST switch specifies the value of the significance level to test whether the
regression was significant. If it is non significant, the average cluster size is
used in the estimate of density. The default value for the significance level is set
by PVALUE in OPTIONS. If the TEST switch is not specified, the size bias
regression estimate will be used regardless of the test value.
Examples:
Estimate the expected cluster size from the loge(s) vs. (x) regression, but use the
average cluster size if the correlation is non significant as determined by the -
level set with PVALUE (default=0.15).
CLUSTER /BIAS=GXLOG /TEST;
DENSITY Command
Syntax:
NONE
DENSITY by SAMPLE /DESIGN = ;
REPLICATE
or
298 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
NONE
/DESIGN =
STRATA
REPLICATE
DENSITY by STRATUM
EFFORT
/WEIGHT = AREA ;
NONE
or
DENSITY by ALL;
Description:
These commands define the levels at which density estimates are made and how
these estimates are weighted. If the DENSITY by ALL; command is used or if
none of the commands (DENSITY, ENCOUNTER, DETECTION, SIZE ) are
given, all of the data are used to make one overall estimate of density.
If the DENSITY BY SAMPLE command is given, density is estimated for each
sample. The DESIGN value defines how the estimates should be treated to
create a pooled estimate. If DESIGN=REPLICATE (default), each sample is
treated as an independent replicate from the stratum or the entire area. In this
case, the estimates are weighted by effort (e.g., line length) to get a stratum
density estimate (if DENSITY by STRATUM is also specified) or a pooled
overall density estimate (see eqns. 3.84-3.87 in Buckland et al. (2001)). If
DESIGN=NONE, the sample estimates are not pooled.
If DENSITY BY STRATUM is specified, an estimate is made for each stratum.
A stratum estimate is a pooled estimate of the sample estimates within the
stratum, if DENSITY by SAMPLE; is specified, or it is an estimate based on the
data within the stratum. An overall (pooled) estimate of density is made unless
DESIGN=NONE is specified. If DESIGN=REPLICATE, the stratum estimates
are treated as replicates to create a pooled estimate and variance weighted by
effort (eqns. 3.84-3.87 in Buckland et al. (1993), treating stratum as a sample).
If DESIGN=STRATA, the pooled estimate is a weighted sum of the estimates
and the variance is a weighted sum of the stratum variances (Section 3.7.1 in
Buckland et al. (2001)). Weighting is defined by the WEIGHT switch. If
WEIGHT=NONE, the densities are summed, which is only useful if the
population is stratified as by sex or age. If WEIGHT=AREA, the densities are
weighted by area (which is the same as adding abundance estimates) and if
WEIGHT=EFFORT, the densities are weighted by effort.
Prior to version 2.1, a combined estimate of abundance (N) was created by
multiplying the combined density estimate by the sum of the areas specified on
each of the STRATUM commands. This produces obviously erroneous results
when DENSITY by STRATUM/DESIGN=REPLICATE; is used and the area
size is repeated on each STRATUM. To avoid this problem in the following two
situations the combined abundance estimate uses the area from the first stratum:
1) DENSITY by STRATUM/DESIGN=REPLICATE;
2) DENSITY by STRATUM/DESIGN=STRATA/WEIGHT=NONE;
In all other cases, the area is totalled from all of the strata.
Default:
For DENSITY by SAMPLE: /DESIGN=REPLICATE
For DENSITY by STRATUM: /DESIGN=STRATA /WEIGHT=AREA
Examples:
An estimate is needed for each stratum and it will be weighted by stratum area:
DENSITY BY STRATUM;
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 299
An estimate is needed for each stratum and there are enough observations in
each sample to get an estimate from each. The strata represent different
platforms surveying the same area so the strata are treated as replicates.
DENSITY BY SAMPLE;
DENSITY BY STRATUM /DESIGN=REPLICATE;
DETECTION Command
Syntax:
SAMPLE
DETECTION by STRATUM
ALL
Description:
This command explicitly specifies that detection probability (and its functionals
f(0), h(0)) should be estimated and the resolution at which the estimate(s) should
be made (by SAMPLE, by STRATUM, or ALL data).
When there are covariates in the detection function, it is possible to fit the
detection function at one level, and then estimate probability of detection at a
lower level. For more on this, see Chapter 9 of the Users Guide Estimating the
detection function at multiple levels.
Examples:
Density is estimated by stratum but the estimates are based on an estimate of f(0)
for all the data.
DENSITY by STRATUM;
DETECTION ALL;
Estimate detection by stratum choosing between 2 models but do not estimate
any other parameters. Different models may be selected for each stratum.
ESTIMATE;
DETECTION BY STRATUM;
ESTIMATOR/KEY=HAZ;
ESTIMATOR/KEY=UNIF;
END;
Fit detection function globally using a habitat covariate, but estimate by stratum
(the level at which habitat is defined):
ESTIMATE;
ESTIMATOR /KEY=UNIFORM /COVARIATES=Habitat;
DENSITY BY STRATUM;
DETECTION ALL;
DETECTION BY STRATUM
END;
300 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
Right truncation is specified with either the WIDTH or RTRUNCATE switch. If
WIDTH=w, only distances less than or equal to w, are used in the analysis. If
RTRUNCATE=t, the right truncation distance is set to use (1-t)*100% of the
data. If the data are ungrouped the (1-t)*100 percentile is used as the truncation
distance. If the distances are grouped or being analyzed as such, the truncation
distance is set to the uth interval end point where u is the smallest value such that
no more than t*100% of the distances are truncated. The value of t=0 trims the
intervals to the right most interval with a non-zero frequency. If both the
WIDTH and RTRUNCATE are specified, the value of RTRUNCATE defines
the truncation unless the WIDTH is used with NCLASS (see below).
Left truncation is accomplished with the LEFT switch which works in an
analogous fashion to WIDTH. If LEFT=l, only distances greater than or equal to
l are used in the analysis. If LEFT is not specified, it is assumed to be 0.
The INTERVALS command is used to specify u distance intervals for analyzing
data in a grouped manner when the data were entered ungrouped. The value c0
is the left most value and so it can be used for left truncation. If there is no left
truncation, specify c0=0. The values c1 ,c2 ,..., cu are the right end points for the
u intervals. The value cu is the right-most point and is used as the WIDTH
which defines the right truncation point. If all of the distances are less than or
equal to cu, the MCDS engine will not truncate data on the right unless
RTRUNCATE is set. Perpendicular distance intervals can also be created for
analysis with the NCLASS and WIDTH commands. NCLASS intervals of equal
length are created between “Left” and “Width”, if both NCLASS and WIDTH
are given.
The SMEAR switch is used only if TYPE=LINE and radial distance/angle
measurements were entered (DISTANCE=RADIAL). “Angle” defines the angle
sector around the angle measurement and “Pdist” defines the proportional sector
of distance to use as the basis for the smearing (see pg: 269-271 of Buckland et
al. 2001).
If an observation is measured at angle “a” and radial distance “r”, it is smeared
uniformly in the sector defined by the angle range (a-angle,a+angle) and distance
range (r*(1-pdist),r*(1+pdist)).
The NCLASS and WIDTH switches must also be given to define a set of equal
perpendicular distance intervals. The proportion of the sector contained in each
perpendicular distance interval is summed as an observation frequency and these
non-integer frequencies (“grouped data”) are analyzed to estimate detection
probability.
Note: Distances specified by WIDTH, LEFT, and INTERVALS should be in the
same units used for the entered data, even if the distance units are converted in
the analysis.
Examples:
Truncate the distances at 100 feet, hence only use those less than or equal to 100
feet in the analysis. This value would be used even if the distances were
converted to meters for analysis. The conversion is applied to the input width of
100 feet.
DIST /WIDTH=100;
The distance data were entered ungrouped but they were actually collected in
these intervals; alternatively, to mediate the effects of heaping, these intervals
were chosen to analyze the data.
DIST /INT=0,10,20,30,40,50,60,70,80,90,100;
The above example could also be entered as:
DIST/NCLASS=10/WIDTH=100;
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 301
ENCOUNTER Command
Syntax:
SAMPLE
ENCOUNTER by STRATUM
ALL
Description:
This command explicitly specifies that encounter rate should be estimated and
the resolution at which the estimate(s) should be made (by SAMPLE, by
STRATUM, or ALL data). This command is only necessary if density is not
being estimated.
Examples:
A user wishes to explore the variability in encounter rate by listing the encounter
rate for each sample. The variance of the encounter rate for each sample is
assumed to be Poisson because the sample is a single entity.
ESTIMATE;
ENCOUNTER by SAMPLE;
END;
A user wishes to explore the variability in encounter rate by listing the encounter
rate for each stratum. The variance of the encounter rate for each stratum is
computed empirically for each stratum with more than one sample; otherwise, it
is assumed to be Poisson.
ESTIMATE;
ENCOUNTER by STRATUM;
END;
A user wishes to only see the average encounter rate and an estimate of its
variance. The variance of the encounter rate is computed empirically if there is
more than one sample; otherwise, it is assumed to be Poisson.
ESTIMATE;
ENCOUNTER ALL;
END;
ESTIMATOR Command
Syntax:
302 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
UNIFORM
COSINE
HNORMAL
/KEY = /ADJUST = POLY
NEXPON
HERMITE
HAZARD
SPECIFY
SEQUENTIAL
/SELECT = /ORDER = O(1), O(2),...,0(nap)
FOREWARD
ESTIMATOR ALL
/NAP = nap /START = A(1), A(2),..., A(nkp + nap)
AIC
AICC
/CRITERION = /COVARIATES = cov1, cov2, ...
BIC
LR
/LOWER = val1, val2,...
/UPPER = val1, val2,...
W
/ ADJSTD =
SIGMA
Description:
The ESTIMATOR command specifies the type of model for detection
probability (g(x)) to estimate f(0) or h(0). The KEY switch specifies the key
function to be used and the ADJUST switch specifies the type of adjustment
function. The SELECT switch specifies the type of adjustment term selection
which overrides the default value specified by the SELECTION command in the
OPTIONS procedure (see the discussion on adjustment term in the introduction
to the Estimate section).
If SELECT=SPECIFY is chosen, you can specify the number of adjustment
parameters, the order of the adjustment term and starting values for the
parameters. The number of adjustment parameters is set with the NAP (Number
of Adjustment Parameters). NAP must be less than or equal to MAXTERMS -
'nkp' (number of key parameters). The orders of the adjustment term(s) are
specified with the /ORDER switch. Starting values (/START) for the key and
adjustment parameters can be given if the optimization algorithm suggests there
are problems in finding the maximum of the likelihood function. The first 'nkp'
starting values in the list should be the values for the key parameters and the
remaining are for the 'nap' adjustment parameters. One reason for using the
SELECT and NAP switches is to specify that only the key function should be
fitted to the data. An example is given below.
CRITERION specifies the manner in which the number of adjustment terms is
chosen for SELECT=FORWARD and SEQUENTIAL. LR specifies that a
likelihood ratio test be performed using the PVALUE specified in OPTIONS.
AIC specifies using the Akaike's Information Criterion for adjustment term
selection.
COVARIATES gives a list of covariates that enter the scale parameter of the
detection function – see the Users Guide Chapter 9 section Introduction to
MCDS Analysis for more on this. Note that the covariates must be declared in
the list of FIELDS in the Data section, and that factor covariates need to be
declared as such using the FACTOR command.
The distances passed into the adjustment term formulae are scaled. ADJSTD
determines how they are scaled – either by W (the truncation width) or SIGMA
(the evaluated value of the scale parameter for this covariate). For a discussion
of the difference, see Scaling of Distances for Adjustment Terms in Chapter 9 of
the Users Guide.
LOWER and UPPER enable you to set bounds on the key function parameters
(note that you cannot currently set constraints on adjustment term parameters).
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 303
If either are missing, default bounds are used (these are reported in the results).
A value of -99 indicates use the default bound for that parameter. See below for
an example.
Multiple ESTIMATORs can be specified within an ESTIMATE procedure and
the “best” model is selected or estimates are given for each model (see PICK
command). Note that if covariates are used, the same covariates must be
declared in all ESTIMATOR commands – automatic selection among covariates
is not currently supported.
The only portion of the command required is the command name ESTIMATOR
because all of the switches have default values.
Default Values:
KEY = HNORMAL
ADJUST = COSINE
SELECT = SEQUENTIAL (or value set in Options section)
CRITERION = LR (except if SELECT=ALL)
Examples:
Use the following to fit a model with a half-normal key function (by default) and
Hermite polynomials for adjustment functions. DISTANCE fits all possible
combinations of adjustment terms and uses AIC to choose the best set of
adjustment terms:
ESTIMATOR /ADJ=HERM /SEL=ALL;
Use the following to fit a model that uses the uniform key function with simple
polynomial adjustment functions:
ESTIMATOR /KEY=UNIFORM /ADJ=POLY;
Use the following to fit a model that uses the hazard key without adjustments:
ESTIMATOR /KEY=HAZARD /SELECT=SPECIFY /NAP=0;
Use the following to fit a 2-term cosine series with terms of order 1 and 3 and
specify the parameter starting values (note: nkp=0 for a uniform key):
ESTIMATOR /KEY=UNIF /SELECT=SPECIFY /NAP=2 /ORDER=1,3
/START=0.3,0.05;
Use the following to fit a hazard key with one polynomial adjustment, and a
specified lower bound on the second parameter of the key function but the
default lower bound of the first parameter. Imagine in this case that there are
two strata and we want a lower bound of 2 on the 2nd parameter in the first
stratum, and a lower bound of 2.5 on the 2nd parameter in the second stratum.
Not a realistic example perhaps, but illustrates that you have to specify
constraints separately for each stratum, as these are treated as separate key
function parameters to be estimated. Also illustrates that you ignore the
adjustment term parameters when setting bounds.
ESTIMATOR /KEY=HAZARD /ADJ=POLY /SELECT=SPECIFY /NAP=1
/LOWER=-99,2.0,-99,2.5;
G0 Command
Syntax:
G0=value /SE=value /DF=value;
Description:
This command assigns a value to g(0) which is assumed to be 1 unless a value is
assigned with this command. The SE and DF switches are used to specify a
standard error for the estimate so that estimation uncertainty of g(0) can be
incorporated into the analytical variance of density. G0 is just a special case of a
304 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
multiplier, so see the MULTIPLIER command for details of its use and the
options.
Default:
G0=1.0/SE=0.0;
Example:
G0=0.85/SE=0.12;
GOF Command
Syntax:
GOF /INTERVALS = c0, c1, c2, … , cu;
or
GOF /NCLASS=nclass;
or
GOF;
Description:
This command is used to specify the distance intervals for plotting a scaled
version of the histogram of distances against the function g(x) (and f(x) ) and for
the chi-square goodness of fit test .
If the data are entered and analyzed ungrouped (/EXACT), the first 2 forms can
be used to define the intervals which are used for plotting the data and for the
chi-square goodness of fit test.The first form specifies the intervals exactly and
the second form provides a shortcut approach of specifying “nclass” equal
intervals (Note: the syntax from previous versions of GOF=c0,c1...; will also
work). You can enter up to 3 of these commands to specify different sets of
intervals. If you do not specify this command and the data are analyzed as
ungrouped, 3 sets of intervals are constructed with equally spaced cutpoints and
the number of intervals being the n0.5 and 2/3 n0.5 and 3/2 n0.5.
If the data are entered grouped or entered ungrouped and analyzed as grouped
(DISTANCE/INTERVALS= used in ESTIMATE) then only the third form can
be used to specify that the GOF statistics should be generated. It is not possible
to specify goodness of fit intervals other than those used to analyze the data
Examples:
Data are ungrouped and 2 different sets of intervals are specified.
GOF /NCLASS=5;
GOF=0,5,10,20,30,40,50;
Data are grouped but GOF statistics are desired.
GOF;
MONOTONE Command
Syntax:
MONOTONE = WEAK or STRICT or NONE
Description:
The estimators are constrained by default to be strictly monotonically non-
increasing (i.e., MONOTONE=STRICT; the detection curve is either flat or
decreasing as distance increases from 0 to w). In some instances, depending on
the tail of the distribution this can cause a poor fit at the origin(x=0). Two
options exist: 1) truncate the observations in the tail, or 2) use the command
MONOTONE=WEAK; or MONOTONE=NONE;. MONTONE=WEAK; will
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 305
only enforce a weak monotonicity constraint (i.e., f(0) >= f(x) for all distances
x). This will allow the curve to go up and down as it fits the data but it will not
let the curve dip down at the origin. In some instances this will allow the
estimator to achieve a better fit at the origin which is the point of interest.
Setting MONTONE=NONE; will allow the curve to take any possible form
except that it must remain non-negative.
Monotonicity is achieved by constraining the function at a fixed set of points. In
some circumstances it is possible that the curve can be non-monotone between
the fixed points. Typically, this results from trying to over-fit the data with too
many adjustments with a long-tailed distribution. Truncate the data rather than
attempting to over-fit.
Note that MONOTONE = NONE is the only allowed option when there are
covariates in the detection function.
Default:
MONOTONE = STRICT; (no covariates)
MONOTONE = NONE; (covariates)
MULTIPLIER Command
Syntax:
MULTIPLIER = value1 /LABEL='name' /SE=value2 /DF=value3;
Description:
This command specifies a multiplier for the density and/or abundance estimate.
Some uses of multipliers are discussed in the Users Guide section Multipliers in
CDS Analysis.
Density/abundance is multiplied by the value of value1. The analytic variance
estimate takes into account the additional variance due to the multiplier, as
specified by value2, by adding an additional term ot the delta method formula
(equation 3.70 in Buckland et al. 2001). The degrees of freedom for confidence
limits are affected if a non-zero value is specified for value3, because an extra
term is added to the Satterthwaite formula (equation 3.75 in Buckland et al.
2001).
/LABEL='name' – “name” is the name given to the multiplier in the output file
/SE=value2 – value2 is the standard error of the multiplier – use 0 if the
multiplier value is known with certainty
/DF=value3 – value3 is the degrees of freedom associated with the multiplier –
use 0 for infinite degrees of freedom.
Note that if you want a multiplier to divide the density estimate, simply specify
value1 as the inverse of the multiplier value. Value2 (the SE) is then the
multiplier SE divided by the square of the multiplier value.
There is no maximum to the number of multiplier commands within the Estimate
section.
PICK Command
Syntax:
PICK = AIC or AICC or BIC
Description:
If more than one ESTIMATOR command is given a choice must be made as to
which model will be used for the final estimate. The command PICK=AIC;
instructs the program to choose the model that minimizes Akaike's Information
Criterion, AICC minimizes the small sample corrected version of AIC, and BIC
minimizes the Bayesian Information Criterion. If no command is given
306 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
PICK=AIC; is assumed. (Note: the option PICK=NONE, which told the program
not to choose a model and to present the results of each, is no longer supported.)
If the BOOTSTRAP; command is given, the bootstrap is performed and the
estimator is chosen for each analysis. Thus, even though a single estimator is
chosen for the point estimate, different estimators can be chosen for each
bootstrap and the standard errors and interval estimates incorporate the
uncertainty of the model selection process.
Default:
PICK=AIC;
Below are listed the default values of the print options as defined by the value set
by PRINT=in OPTIONS. (Y=Yes and N=No)
OPTIONS PRINT=
command ESTI FXE FXP FXT SBA SBA FXFI FXIT QQP
value MAT ST LOT EST RES RPL T LOT
E T OT
SUMMARY N N N N N N N N N
RESULTS Y Y Y Y Y Y N N Y
SELECT Y Y Y Y Y Y Y N Y
ALL Y Y Y Y Y Y Y Y Y
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 307
SIZE Command
Syntax:
SAMPLE
SIZE by STRATUM
ALL
Details:
This command explicitly specifies that expected cluster size should be estimated
and the resolution at which the estimate(s) should be made (by SAMPLE, by
STRATUM, or ALL data). This command is only necessary if density is not
being estimated or to specify a level of resolution different from density. The
level of resolution for estimating cluster size must be less than or equal to the
level for estimating detection probability, if a size bias regression estimate is
computed.
Example:
A user wishes to examine detection probability and expected cluster size but not
density at this point:
ESTIMATE;
ESTIMATOR/KEY=UNIF;
DETECTION ALL;
SIZE ALL;
END;
VARN Command
Syntax:
VARN = POISSON or b or EMPIRICAL
Description:
This command specifies the type of variance estimation technique for encounter
rate. The value POISSON specifies that the distribution of n (number of
observations) is Poisson . EMPIRICAL specifies that the variance should be
calculated empirically from the replicate SAMPLEs (section 3.6.2 of Buckland
et al. 2001). If only one SAMPLE is defined in the data, the POISSON
assumption is used unless a value b is specified. If a value b is specified it is
used as a multiplier such that var(n)= bn (e.g., Buckland et al. 2001, section
8.4.1). The Poisson assumption is equivalent to specifying b=1. The default for
VARN is EMPIRICAL unless there is only one SAMPLE, in which case, the
default is POISSON.
Default:
VARN=EMPIRICAL
308 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
An example data file is given below:
Stratum A 100 Line 1A 10 14 F
Stratum A 100 Line 1A 10 8 M
Stratum A 100 Line 1A 10 22 M
Stratum A 100 Line 2A 10.3 7 F
Stratum A 100 Line 2A 10.3 37 F
Stratum A 100 Line 2A 10.3 13 F
Stratum B 123 Line 1B 5.7
Stratum B 123 Line 2B 8.4 27 M
Stratum B 123 Line 2B 8.4 76 F
Stratum B 123 Line 2B 8.4 44 M
Stratum B 123 Line 2B 8.4 7 M
Example data file
This tells Distance that the first column is the stratum label, the second is the
stratum area, the third is the sample label, the forth the sample effort, the fifth is
the distances and the last is a column called “Sex”. This last column will be
used as a factor covariate, so the DATA section also needs the command
FACTOR=Sex /LEVELS=2 /LABELS=’F’,’M’;
Notice that for Line 1B there is nothing in the distance column – this is because
no animals were seen on that line.
An easy way to generate an example data file, to get a feel for the
required format, is to set up an analysis using the Distance graphical interface,
and then run the analysis in Debug mode. In this mode, the Distance interface
generates a command file and data file, and stores them in the Windows
temporary folder, but does not run the analysis. For more about Debug mode,
see the Program Reference page on the Analysis Preferences Tab. This strategy
will also enable you to see what commands are required in the Data section for a
particular data file.
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 309
MCDS Engine Command Line Output
Run status
When the MCDS engine is run from the command line, it returns a number when
the run finishes. This number gives the status of the run, as follows:
• 1 means the analysis ran OK
• 2 means it ran with warnings (see log file for details)
• 3 means it ran with errors (see log file for details)
• 4 means it ran with file errors (e.g., could not find the specified
command file)
• some other number. A major error occurred (see below).
These numbers are also returned if the engine is run from another program as an
independent process, and so can be used by the program to diagnose whether the
run was OK.
A number returned to the command line when the run finishes, giving the status
of the run. Occasionally, other output, such as FORTRAN error or warning
messages, may appear there as well.
FORTRAN Debugging output
Occasionally, some other text is written to the standard output, which is usually
the command line, by the FORTRAN run time library used to run mcds.exe. A
mild example is that the number of underflow errors are written out, e.g.,:
forrtl: error (74): floating underflow
forrtl: error (74): floating underflow
forrtl: info (300): 30 floating underflow traps
Floating point underflow occurs when a number is calculated that is smaller than
the smallest number the computer can store and the number is instead stored as
zero. This rarely causes problems in practice – although it is worth double-
checking your results.
A more extreme example is if there is a program crash, debugging information is
written out. If this happens, a copy of the Distance project or command file
should be sent to the program authors. In the example below, the program
crashed with a “floating invalid” error on line 398 of the routine SBREG, which
was called from line 205 of CMOD, etc.
forrtl: error (65): floating invalid
Image PC Routine Line Source
MCDS.exe 0040DA68 SBREG 398 Cmod.for
MCDS.exe 0040C294 CMOD 205 Cmod.for
MCDS.exe 0040B2F3 ESMOD 29 Cmod.for
MCDS.exe 0044763C ESTPARM 487 Estmte.for
MCDS.exe 00446471 ESTMTE 291 Estmte.for
MCDS.exe 00445201 ESTPROC 88 Estmte.for
MCDS.exe 004136EA CNTRL 113 Control.for
MCDS.exe 004381CF DISTANCE 263 Distance.for
MCDS.exe 004F2459 Unknown Unknown Unknown
MCDS.exe 004C8BE3 Unknown Unknown Unknown
kernel32.dll 7C816FD7 Unknown Unknown Unknown
310 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
MCDS Engine Stats File, which should be the first port of call for extracting
results by machine.
The modules and statistics within each module are listed in below in the order in
which they are summarized in the output. The FORTRAN format for each
record is:
FORMAT(2(1X,I5),2(1X,I1),1X,I3,5(1X,G14.7))
Each field is separated by a space, so the records can be read into a spreadsheet
or other program as space delimited or as fixed-width format. The record for a
module/statistic type is only output if it is relevant and it was computed in the
analysis.
The following table defines the module and statistic codes used:
Module Statistic/Parameter Estimate
1 – encounter rate 1 – number of observations (n)
2 – number of samples (k)
3 – effort (L or K or T)
4 – encounter rate (n/L or n/K or n/T)
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 311
5 – left truncation distance
6 – right truncation distance (w)
2 – detection probability 1 – total number of parameters (m)
2 – AIC value
3 – chi-square test probability
4 – f(0) or h(0) 1
5 – probability of detection (Pw) 1
6 – effective strip width (ESW) or effective
detection radius (EDR) 1
7 – AICc
8 – BIC
9 – Log likelihood
10 – Kolmogorov-Smirnov test probability
11 – Cramér-von Mises (uniform weighting)
test probability
12 – Cramér-von Mises (cosine weighting)
test probability
13 – key function type 2
14 – adjustment series type 3
15 – number of key function parameters
(NKP)
16 – number of adjustment term parameters
(NAP)
17 – number of covariate parameters (NCP)
101 … (100+m) – estimated value of each
parameter 5
3 – cluster size 1 – average cluster size 1
2 – size-bias regression correlation (r)
3 – p-value for correlation significance (r-p)
4 – estimate of expected cluster size
corrected for size bias 1
4 – density/abudance 1 – density of clusters (or animal density if
non-clustered) 1
2 – density of animals 1
3 – number of animals, if survey area is
specified 1
4 – bootstrap density of clusters 1, 4
5 – bootstrap density of animals 1, 4
6 – bootstrap number of animals 1, 4
1
Values for CV, LCL, UCL and DF are included for these statistics.
2
Key function types are: 1 = uniform, 2 = half-normal, 3 = negative exponential, 4 =
hazard rate
3
Adjustment series types are: 1 = simple polynomial, 2 = Hermite polynomial, 3 = cosine
4
Bootstrap CV calculated as bootstrap SE / bootstrap point estimate; df field here is the
number of bootstraps
5
Statistic 101 corresponds with the parameter identified as A(1) in the results, 102 with
A(2), etc.
312 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
Title line - plot 1 (up to 80 char)
Sub-title line (up to 60 char)
x-label (up to 30 char)
y-label (up to 30 char)
# of data rows (r)
x1,y11,y21,y31,y41
x2,y21,y22,y31,y41
.
.
.
xr,yr1,yr2,y31,y41
Title line - plot 2 (up to 80 char)
Sub-title line (up to 60 char)
x-label (up to 30 char)
y-label (up to 30 char)
..
etc
The number of columns of data (yr1, yr2, etc) depends on the plot:
• For qq-plots there are 4 columns: 1 and 2 are the x and y
coordinates of the data points (i.e. the edf and the fitted cdf); 3 and
4 are the x and y coordinates of the line that runs from (0,0) to
(1,1). If you are using this file to recreate the plot in another
package, you could easily ignore columns 3 and 4 and replace them
with a (0,0) (1,1) line.
• For plots containing the data histograms and accompanying pdf or
detection function plots there are 4 columns: 1 and 2 give the x and
y coordinates which, when joined up, give the fitted detection
function or pdf and 3 and 4 give a set of x and y coordinates which,
when joined up, produce the data histograms
• For the MCDS example detection function plots, which contain 3
detection functions, there are 6 columns: 1 and 2 give the x and y
coordinates for the first detection function, 3 and 4 give this for the
second detection function and 5 and 6 give the coordinates of the
third detection function.
You can see an example of these kind of data being used to produce a plot in a
tip under Exporting CDS Results from Analysis Details Results in Chapter 8
(although the data there come from copying the plot to the clipboard rather than
directly from the plot file).
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 313
MCDS Engine Limitations
The following limitations apply to the MCDS engine:
Limitation Maximum number
Observations 100 000
Samples (transects) 50 000
Strata 1 000
Cutpoints in GOF and interval data 25
Detection function models 5
Adjustment terms per model 5
Covariates 10
Levels, for factor covariates 200
This means that if you select the MCDS engine in the Distance
interface, but do not specify any covariates, you will get identical output to
selecting the CDS engine with no monotonicity constraints.
314 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
MCDS Engine Error and Warning Messages
This section gives a comprehensive list of the warning and error messages that
can be generated by the MCDS engine, and can occur in the output or log file.
Explanations for some of the messages are given – please contact us if you need
an explanation for a message we don’t explain here, or get a message that is not
documented (it’s possible we missed some!).
The standard format for an error or warning message is
** [Bootstrap] level: message **
where level can be either “Warning”, “Error” or “Internal Error”, message is the
text of the message, and the word Bootstrap appears if the problem occurred
while running a bootstrap replicate dataset.
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 315
parameters > +/- 1.0.
10 Estimation routine failed to converge due to
negative area estimates on iteration [iteration
number]. Using results from previous iteration.
11 Estimation routine failed to converge due to
singular information matrix on iteration [iteration
number]. Using results from previous iteration.
12 Estimation routine failed to converge.
13 FIELD will be ignored in the data
14 INTERVALS switch ignored because NCLASS Mutually incompatible
specified. commands
15 Missing item in list two adjacent commas [value
here]. Skipping to next item.
16 Negative variance estimate for f0. Invalid
variance. Results may not be reliable.
17 One or more cluster sizes are 0. These See Zero Cluster Sizes in
observations will be used in cluster size CDS Analysis in the Users
estimation. If you intended to code these values Guide.
as missing, please enter them as -1.0
18 One or more cluster sizes is coded as -1. Distance See Missing Cluster Size
assumes -1 to mean a cluster of undetermined Data in CDS Analysis in
size. These observations are used for estimating the Users Guide.
detection probability and encounter rate, but not
cluster size.
19 Parameter [parameter number] is at a lower
bound.
20 Parameter [parameter number] is at an upper
bound
21 Parameters are being constrained to maintain a
positive pdf
22 Parameters are being constrained to obtain
monotonicity.
23 Previously read samples were not assigned a
stratum, so all strata will be ignored.
24 SIZEC is an invalid command when Mutually incompatible
OBJECT=SINGLE, and so was ignored. commands
25 Some of the estimates of f0 are negative. Results
are not reliable.
26 Some parameters are very highly correlated.
27 The /BIAS switch is not allowed when cluster Mutually incompatible
size is a covariate, and so it has been ignored. commands
28 The cluster size covariate is a factor and so it is
assumed that factor levels correspond to cluster
sizes.
29 The estimated analytic variance for f0 gives a CV
greater than 10000%, and hence results may not
be reliable. To avoid numerical problems, the CV
for f0 was assigned a value of 9999.99%.
30 The estimated area is negative and only a single
iteration of the estimation routine has been
carried out.
31 The number of lower bounds for the parameters
does not match the number of parameters in the
model. The default bound for parameter A
[parameter number] is being used instead.
32 The number of starting values for the parameters
316 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
does not match the number of parameters in the
model. The default starting value for parameter A
[parameter number] is being used instead.
33 The number of upper bounds for the parameters
does not match the number of parameters in the
model. The default bound for parameter A
[parameter number] is being used instead.
34 The starting value for parameter A [parameter
number] must be > 0. Using the default starting
value.
35 There are less than 10 data points per estimated
parameter. Results may not be reliable.
36 There is only one level for factor covariate
[covariate]. A minimum of two levels is required
for estimation; hence this covariate will be
omitted.
37 There is only one level for factor covariate
[covariate]. A minimum of two levels is required
for estimation; hence this covariate will be
omitted from estimates for stratum [stratum]
sample [sample].
38 There is only one level for factor covariate
[covaraite]. A minimum of two levels is required
for estimation; hence this covariate will be
omitted from estimates for stratum [stratum]
39 When cluster size is a covariate, variance of the See Cluster Size as a
cluster size, density of individuals, and Covariate in the Users
abundance estimates can only be obtained via the Guide.
bootstrap. You have not specified the bootstrap
variance option, so these variance estimates will
not be produced.
40 When cluster size is a covariate encounter rates See Cluster Size as a
are not computed. Covariate in the Users
Guide.
41 When cluster size is a covariate no stratification
is allowed.
42 When covariates are being used, a number of
intervals > 20 for GOF tests may cause the
program to terminate with an error.
43 The number of intervals - 1 = [number] which is
less than the number of key parameters [number].
No fit possible.
44 The number of intervals - 1 equals the number of
key parameters [number] so no adjustments can
be made.
45 The number of observations - 1 equals the
number of key parameters [number], so no
adjustments can be made.
46 There are no cluster size observations selected.
Cannot estimate expected cluster size.
47 Negative variance for expected cluster size. No
size bias adjustment. Average cluster size used
instead.
48 Convergence failure.
49 Estimated cluster size greater than exp14.
Average cluster size used instead.
50 SEED cannot be a negative number. It has been
set to 0
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 317
51 Negative variance estimate for parameter. Invalid
variance.
52 Number of cluster size measurements =
[number]. This is not sufficient for size-bias
regression. Average cluster size used instead.
53 Number of cluster size measurements = [number]
This is not sufficient to estimate a mean and
variance
54 Number of observations is small. Do not expect
reasonable results.
55 Size bias adjustment has increased expected
cluster size.
56 The number of adjustment parameters allowed
has been reduced to [number] because of limited
number of observations.
57 Too few observations to calculate AICc. AICc set
to 0.
58 Too few observations. An estimate of f0 cannot
be computed. f0 set to 1/width.
59 Two models have the same [model selection
statistic]. Choosing one of them at random.
60 Zero observations. An estimate of f0 cannot be
computed. f0 set to 0.
61 Angle not valid for DIST=PERP Mutually incompatible
commands
62 Area = 0 for stratum
63 BOOTSTRAPS may not exceed 5000. Set to
5000
64 BOOTSTRAPS should be at least 100
65 Cannot specify CONVERT without MEASURE Mutually incompatible
and UNITS',/, CONVERT value will not be used. commands
66 CONVERT value will overide previous value','
specified.
67 INTERVALS ignored because NCLASS was set. Mutually incompatible
commands
68 Warning: Invalid or missing covariate.
69 NCLASS and INTERVALS both set. Mutually incompatible
INTERVALS',' ignored. commands
70 No observations in stratum [stratum] so
estimating f0 using global average f0|z. Results
are therefore not reliable.
71 SEED should be an odd number greater than
2000000
72 Warning: Seed will be set with value from clock Refers to the random
number seed. This
warning does not occur
when calling MCDS from
the interface as SEED=0 is
specified, which means set
from clock. This warning
only occurs if SEED is not
specified, and is intended
to remind the user the seed
has come from the clock.
73 SMEAR switch only valid for ungrouped Mutually incompatible
dist/angle measurements. commands
318 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
74 SMP_EFFORT not in data. Assumed to be 1 for
each point
75 Specified width [width] does not match an When truncating data in
interval value. It has been set to [new width] intervals.
76 There is only one level for factor covariate
[covariate]. A minimum of two levels is required
for estimation; hence this covariatewill be
omitted from estimates for sample [sample]
77 Warning: TITLE value not found.
78 Too many sets of GOF, only [number] allowed. Currently, [number]=3
79 User is overriding a conversion factor available in Obsolete – conversion
the program. factors between units now
always specified by
Distance interface, so this
warning is suppressed.
80 For goodness-of-fit interval set [number].
Number of goodness-of-fit intervals reduced.
81 For goodness-of-fit interval set [number]. Interval
end-point modified to match width.
82 For goodness-of-fit interval set [number].
Goodness-of-fit intervals specify testing subset of
the data.
83 For goodness-of-fit interval set [number].
Specified intervals are inconsistent with width.
84 For goodness-of-fit interval set [number]. Interval
begin-point modified to match left truncation.
85 For goodness-of-fit interval set [number].
Specified intervals are inconsistent with left
truncation value.
86 Exact distance values, rather than distance
intervals have been used in size bias regression
calculations.
87 No monotocity constraints are allowed when Mutually incompatible
covariates are present. commands
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 319
False.
6 Invalid value for command [command]
7 [Command] is an invalid command for Obsolete.
HIERARCHY structure.
8 A maximum of [number] levels is allowed for See MCDS Engine
each factor covariate. Limitations.
9 A maximum of 10 covariates may be specified.
10 ANGLE needed in data
11 Area under fx or gx is zero.
12 At most one group contains observations.
13 Bootstrap will not be done because', observations
are not being re-sampled and density estimated by
sample
14 Cannot scale distances by sigma when using the
uniform key function: change /ADJSTD option to
W.
15 Cannot use multiple DETECTION commands for
CDS analysis or when cluster size is a covariate.
16 Cluster size frequency < 0 [value of cluster freq]
17 CONFIDENCE must be between 1 and 99
18 Covariate specified in the ESTIMATE command
but not in the DATA command: [covariate],
19 CUERATE must be a positive number.
20 Dataset has been cleared.No data has been stored.
21 Density for each sample is unnecessary when
detection and expected cluster size are estimated
at higher levels
22 Detection probability must be estimated for size
bias calculations
23 Distance frequency < 0 [value of distance freq]
24 DISTANCE needed in data
25 Due to errors, this ESTIMATOR command will
be ignored.
26 Due to errors, this GOF command will be
ignored.
27 Error reading in values.
28 Exceeded array size [size] - for entering data.
29 Exceeded maximum number of cluster size Same as number of
observations = [number] distance observations,
below.
30 Exceeded maximum number of distance', See MCDS Engine
observations = [number] Limitations.
31 FIELDS have not been set. Data cannot be read.
32 Filename on INFILE command was not found.
33 FLAT data structure invalid for grouped data
34 Incompatible resolution levels for estimation of
[one component of estimation] and [another]
35 Incorrect number of data values.
36 Interval values for cluster sizes are out of order. Obsolete
37 Interval values for distances are out of order. Obsolete
38 Intervals for clusters cannot be specified because Obsolete
320 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
the data were entered in intervals.
39 Intervals for distance cannot be specified because Obsolete
the', data were entered in intervals.
40 Invalid cluster size < 0 [value]
41 Invalid command encountered – [command]
42 Invalid distance <0 [value]
43 Invalid filename or file could not be found
[filename]
44 Invalid initial values for parameters
45 Invalid option for CUERATE command.
46 Invalid or missing angle.
47 Invalid or missing cluster size.
48 Invalid or missing distance.
49 Invalid or missing value for [variable]
50 Invalid or missing value for sample effort =
[sample name] Sample will be ignored.
51 Invalid radial distance <0 [value]
52 Invalid smearing angle = [value] It must be > 0 &
<90
53 Invalid smearing distance. It must be > 0.
54 Invalid value for adjustment ORDER = [value]
55 Invalid value for NCLASS. It must be between 2 See MCDS Engine
and [max number of classes] Limitations.
56 Invalid value for sighting angle <0 OR >360 =
[value]
57 ITERATIONS must be > or = to 25.
58 Left truncation value cannot be negative.
59 LENGTH command is invalid for point transect Mutually incompatible
data. commands
60 Maximum number of samples [number] See MCDS Engine
exceeded. Limitations.
61 Maximum number of strata [number] exceeded. See MCDS Engine
Limitations.
62 Mismatched number of observations for multiple
measurements.
63 Missing sample label. Further data will be
ignored.
64 More cluster size frequencies were given than
intervals
65 More distance frequencies were given than
intervals.
66 More than 10 multipliers were specified. Excess
will be ignored.
67 Multiplier value must be > 0
68 NCLASS and WIDTH setting needed for
SMEAR switch.
69 NCLASS must be > 1 and <= [max value] See MCDS Engine
Limitations.
70 NCLASS set without a WIDTH value, both must Mutually incompatible
be set. commands
71 Negative variance estimate for f0. Invalid
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 321
variance.
72 No data available to be analyzed.
73 Not a valid option with grouped data – [option] Mutually incompatible
commands
74 Number of adjustment parameters NAP is greater
than maximum possible for this model = [max.
number possible for this model]
75 Number of starting values exceeds [number of
parameters]
76 One or more estimated CDF is less than zero.
Value has been set to zero.
77 Only a single covariate may be specified as the
cluster size covariate.
78 Re-sampling unit Strata, Sample, Obs not set.
Bootstrap will not be attempted.
79 SE for multiplier must be non-negative number.
80 SF must be between 0.0 and 1.0
81 SMP_LABEL needed in data
82 Specified variance for N must be >= 0.Value
entered was [value]
83 Standard Error for CUERATE must be a positive
number.
84 Strata can not be re-sampled if Density by Strata
85 The [command] command is only valid when
there are covariates.
86 The FIELDS command must be specified before
the [command] command.
87 The number of adjustments specified by ORDER Mutually incompatible
[value] does not match the number specified by commands
NAP [value]
88 The number of adjustments specified by ORDER
exceeds the maximum of [maximum]
89 The number of covariates which are factors
cannot exceed the total number of covariates to
be included.
90 The significance level for the test must be in the
interval 0-1.
91 The specified factor covariate must match one of
the covariates given in the FIELDS', command.
92 The total number of cluster sizes does not match
the total number of distances.
93 The truncation proportion must be >=0 and <1.
94 The width must be a positive value.
95 There must be 2 parameters for smearing.
'Smear=angle,dist’
96 There were [value] covariates specified in the',
ESTIMATE command, but only [value] specified
in the DATA command. The latter must be >= the
former.
97 Unexpected end-of-file encountered.
98 Unknown units for distance measurement. Need
to set conversion factor or correct input.
99 Unknown units for distance/area conversion.
Need to set area conversion factor or correct
322 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
input.
100 Unknown units for distance/length/area
conversion. Need to set conversion factor or
correct input.
101 Valid values for LOOKAHEAD are 1 – [max
lookahead]
102 Values for lower parameter bounds must be
smaller than values for upper parameter bounds!
103 When cluster size is a covariate no stratification
is allowed.
104 When covariates are present, only the half-normal
or hazard-rate keys may be specified.
105 When using the FACTOR command, the switches
/LEVELS, and /LABELS must be specified.
106 WIDTH must be given with NCLASS value.
107 You have requested more estimators than the
maximum of [maximum estimators]
108 You must reset FIELDS or OPTIONS
109 PVALUE must be between 0.0 and 1.0
110 EPS must be BETWEEN 0.1 and 1.0E-8.
111 Exceeded maximum array storage
112 Maximum number of observations – [max obs]
exceeded. Procedure terminated.
113 Negative variance estimate for f0. Invalid
variance.
114 Negative variance estimate for parameter. Invalid
variance.
115 Negative variance estimate for part of f0|Z.
Invalid variance.
116 Bad bootstrap sample.
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 323
30a Problems with incomplete gamma fct (GSER)
30b Problems with incomplete gamma fct (GSF)
30c Problems with incomplete gamma fct (GAMMP)
x or a <=0
31 Invalid degrees of freedom or chi-square value.
32 variance of cluster sizes is <= 0
33 Normalizing factor MU1=0.
34 Area under PDF=0 for at least one observation.
35 Confidence limit ZVAL is NaN. Confidence
limits set to 0. DF= [value]
36 F0NOBS= [value]
37 N= [value]
38 WIDTH has been erroneously set to zero
39a Standard error is 0 for one of the parameters.
39b Standard error is 0 for parameter [parameter]
39c Cannot scale distances by sigma for uniform key
function
40 DERIVS - PJ = 0 value
41 DERIVS - MU = 0
42 PROB - MU = 0
43 Invalid N in inverse routine = [value]
44 WIDTH is zero.
45 Attempt to divide by zero in inversion.
46 compute pdist error [values]
47 compute pdist [values]
50 Mismatched strata – [values]
51 Mismatched samples – [values]
53 Mismatched modules – [values]
54 Mismatched stats – [values]
61 F0 = 0.0
62 N= [value]
70 CLEVMULT=0
71 COVLEVELSI=0
91 Problems with settings for estimation routine.
102 Could not evaluate area under CDF.
324 • Appendix - MCDS Engine Reference User's Guide Distance 6.0 Beta 5
• New switch /COVARIATES in ESTIMATOR command.
Note that covariates must be the same in all ESTIMATORS in
the same run.
• ASSIGN commands not supported – assign output files
through the first 6 lines of the command file – see Header
Section.
• HELP commands no longer supported
• SQUEEZE command no longer supported
• New MULTIPLIER command
• DF added to the CUERATE command (which is, after all, just
another multiplier).
• Not recommended to use the SF command – use
MULTIPLIER instead
• Model fitting commands EPSILON and ITERATIONS
removed
• LIST command in Data section no longer supported
• In GOF /SAS and /SPLUS switches no longer supported
• VARF command no longer supported
• PICK=NONE no longer supported.
• When bootstrapping, point estimate is mean of the bootstrap
replicate point estimates.
• Changes in output format:
• Output file pages are no longer separated by page break
characters
• Page titles in the output file are surrounded with tab characters
to enable them to be easily recognized by a regular expression
parser
• Format of the stats and bootstrap stats file changed (each line
is longer) – see MCDS Engine Stats File.
• Extra output in the stats file – e.g., parameter estimates
• Bootstrap progress file added to give a way to allow the user to
find out how far the bootstrap has progressed.
User's Guide Distance 6.0 Beta 5 Appendix - MCDS Engine Reference • 325
Appendix – HT estimation of
density when probability of
coverage is unequal
where si is the size of the ith cluster, p̂i is the estimated probability of detection
of the ith cluster, and q̂i is the estimated coverage probability of the location
where the ith cluster was located. This formulation is premised on the basis of
estimating abundance of individuals; if interest was instead focused upon the
User's Guide Distance 6.0 Beta 5Appendix – HT estimation of density when probability of coverage is unequal • 327
estimation of abundance of clusters in the population, a “1” would be substituted
for the numerator.
The job of the DHT engine is to bring together these three pieces of information
to create an estimate of abundance. In its simplest incarnation, the cluster size is
observed by the researcher, coverage probability is estimated from the survey
design (by the survey design engine in Distance prior to the conduct of the
survey), and finally the detection probability is estimated by fitting a detection
function to the observed distances.
As with the MRDS and the DSM engine, the DHT engine is implemented as a
library in the free statistical software R. When you run a DHT analysis from
Distance, Distance creates a sequence of R commands, calls the R software,
waits for the results and then reads them back in. Therefore, before you can use
the DHT engine, you must first ensure that you have R correctly installed and
configured. For more on this, see R Statistical Software in Chapter 7 of the
Users Guide.
To produce an abundance estimate based upon unequal coverage probability in
Distance, you then need to set up the project appropriately and include data in
the correct format – see Setting up a Project for DSM Analysis. You must next
create one or more model definitions using the MRDS analysis engine, and
associate these model definitions with analyses to derive detection probabilities
for each objected detected. For more about the basics of setting up analyses, see
Chapter 7 - Analysis in Distance. More details of the various models available
in the MRDS engine are given in Defining MRDS Models, and a detailed
description of the options available in the Model Definition Properties pages for
this engine is given in the Program Reference pages Model Definition Properties
- MRDS. After deriving detection probabilities, coverage probabilities
associated with each detected object is derived from the coverage grid
constructed by the survey design engine.
In this chapter we also provide some analysis guidelines, give a list of the output
the engine can produce and cover various miscellaneous topics.
If you are familiar with the R software, you can run the DHT engine
directly from within R, bypassing the Distance interface altogether. For more
information, see Running the DSM Analysis Engine from Outside Distance.
328 • Appendix – HT estimation of density when probability of coverage is unequalUser's Guide Distance 6.0 Beta 5
Observation layer of the Distance project in the usual fashion, with the additional
data requirement that the location of each detection (x and y coordinates) is also
recorded in the Observation layer.
The easiest way to set up a new project for a DHT analysis is using the Setup
Project Wizard.
• In Step 1, under I want to: select Analyze a survey that has been
completed.
• Be sure to tick the box indicating the Project will contain
geographic information at the bottom of the Step 1 screen
• In Step 3, under Observer configuration, select Double observer.
But see also Single Observer Configuration in the MRDS Engine.
• Follow through the rest of the wizard as usual.
Distance then creates the appropriate data fields for double observer data, and
you can then import your data using the Import Data Wizard.
Alternatively, you can create the appropriate fields by hand, and manually create
a new survey object with the appropriate observer configuration and data files.
For more about survey objects, see Working with Surveys During Analysis in
Chapter 7.
User's Guide Distance 6.0 Beta 5Appendix – HT estimation of density when probability of coverage is unequal • 329
Miscellaneous DHT Analysis Topics
The cluster size field is one of the fields with a fixed name in
detection function formulae in DHT (see Translating Distance Fields into DS
and MR Covariates) – in formulae you should use the name size regardless of
the actual field name.
330 • Appendix – HT estimation of density when probability of coverage is unequalUser's Guide Distance 6.0 Beta 5
To run the analysis from within the R GUI (Graphical User Interface), you can
cut and paste the commands from the file in.r. To run the analysis from another
program, you can call R in batch mode – this is achieved by calling the program
RCmd.exe, which is located within the /bin subdirectory of your R installation.
For more details, see the R for Windows FAQ (in R, type help.start() and
when a browser window opens, click on the FAQ for Windows port). For an
example of its use, see the Log tab of any DSM analysis you have run that was
not in debug mode – you should see a line of the form:
Starting engine with the following command:
C:\PROGRA~1\R\rw1091\bin\Rcmd.exe BATCH C:\temp\dst90474\in.r
C:\temp\dst90474\log.r
Users familiar with R may wish to work inside the R GUI. The DHT engine will
be contained in the library DHT. To load the library from within R GUI, type
library(dht)
All the functions in the dsm library will be documented. You will be able to
open a copy of the help files from within Distance by choosing Help | Online
Manuals | DHT Engine R Help (html).
After it, you should see a line which looks something like the following:
The previous topic describes how to update to a newer version of the DHT
Engine, if one is available.
When reporting results, you may want to cite the exact version (i.e.,
build number) of the library that used in the analysis. This is stored in the Log
tab, as outlined above.
User's Guide Distance 6.0 Beta 5Appendix – HT estimation of density when probability of coverage is unequal • 331
Bibliography
This section contains a list of references cited in the Users Guide. Much more
complete lists of works related to distance sampling are in Buckland et al. (2001,
2004).
• Borchers, D.L., S.T. Buckland and W. Zucchini. 2002. Estimating
Animal Abundance: Closed Populations. Springer Verlag.
• Borchers, D.L., S.T. Buckland, P.W. Goedhart, E.D. Clark and S.L
Hedley. 1998a. Horvitz-Thompson estimators for double-platform
line transect surveys. Biometrics 54: 1221-37.
• Borchers, D.L., W. Zucchini and R.M. Fewster. 1998b. Mark-
recapture models for line transect surveys. Biometrics 54: 1207-
1220.
• Buckland, S.T., D.R. Anderson, K.P. Burnham and J.L. Laake.
1993. Distance Sampling: Estimating Abundance of Biological
Populations. Chapman and Hall, London, reprinted 1999 by
RUWPA, University of St. Andrews, Scotland.
• Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L.
Borchers and L. Thomas. 2001. Introduction to Distance Sampling.
Oxford University Press, London.
• Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L.
Borchers and L. Thomas. (eds.) 2004. Advanced Distance
Sampling. Oxford University Press, London.
• Buckland, S.T., K.P. Burnham and N.H. Augustin. 1997. Model
selection: an integral part of inference. Biometrics 53: 603-618.
• Burnham, K. P., and D. R. Anderson. 2002. Model Selection and
Multimodel Inference: A Practical Information-Theoretic
Approach. 2nd edition Springer-Verlag, New York.
• Gibbons, J.D. 1971. Nonparametric Statistical Inference. McGraw-
Hill, New York.
• Hedley, S.L., S.T. Buckland and D.L. Borchers. 2004. Spatial
distance sampling models. Pages 48-70 in Buckland, S.T., D.R.
Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers and L.
Thomas. (eds.) 2004. Advanced Distance Sampling. Oxford
University Press, London.
• Horvitz, D.G., and D.J. Thompson. 1952. A generalization of
sampling without replacement from a finite universe. Journal of
the American Statistical Association 47: 663-685.
• Innes, S., Heide-Jørgensen, M.P., Laake, J.L., Laidre, K.L.,
Cleator, H.J., Richard, P. and Stewart, R.E.A. (2002) Surveys of
AIC
Akaike's Information Criterion (AIC) is used in model selection and puts this
process into a function minimization framework. It is based on the Kullback-
Leibler "distance" between two distributions.
For more about AIC and model selection, see Burnham and Anderson (2002).
AICc
Version of AIC corrected for small sample size.
For more information, see Burnham and Anderson (2002).
analysis engine
A component within Distance that runs analyses and produces results. Different
analysis engines have different capabilities. Currently, there are three analysis
engines, one for conventional distance sampling (CDS), one for multiple
covariate distance sampling (MCDS), and one for mark-recapture distance
sampling (MRDS).
API
Abbreviation for Application Programming Interface – an interface that allows a
piece of software to be instructed to perform tasks from within a separate
software package.
CDS
See conventional distance sampling
checkbox
A box in the graphical user interface of Distance that you click on to select. A
selected checkbox displays a tick. Example:
covariate
A variable that you can use to model the detection function. Perpendicular or
radial distance is always used as a covariate, but in the Multiple Covariate
Distance Sampling (MCDS) engine, you can include other covariates, such as
cluster size, sex, platform of observation, habitat, etc.
coverage probability
The coverage (or inclusion) probability of at an arbitrary location within the
survey region is the probability of it falling within the sampled portion of the
survey region.
data file
This file, always called DistData.mdb, contains information about how the data
is stored, and may contain some or all of the data itself. It is stored in the data
folder.
data folder
A folder (directory) containing survey data and related information about the
survey effort, study area boundaries, etc. Data folder names always have the
same beginning as the associated project file, but end in .dat - e.g., Ducknest.dat.
The data folder contains the data file, and one or more other files.
detection function
A function, denoted g(x), that described the probability of detecting an object
(individual or cluster) given that it is at distance x from the transect line or point.
In Distance, the detection function is modeled using the key + series adjustment
framework described in Buckland et al. (1993, 2001).
densification
A line that is straight in one coordinate system will not necessarily be straight
when viewed in a different system. For example, the equator is not a straight
line on many maps. So, when projecting from one coordinate system to another,
a straight line must be broken into a series of smaller straight lines so that it stays
in approximately the same place in the projected coordinate system. This
process of adding vertices to a line when projecting it is called densification.
design axis
User-defined line, superimposed on the survey region, that is used to orient the
samplers in zigzag designs.
dialog
A type of window in the graphical user interface of Distance. Dialog windows
are modal – that is you cannot access any other windows in Distance until you
distance project
Where all of the information about one study area is stored. A project is made
up of a project file (which ends in .dst) and a data folder (ends in .dat).
distance sampling
A group of related survey methods for estimating the density and/or abundance
of wildlife populations.
double observer
A survey protocol where two (semi-) independent observer teams perform a
distance sampling survey, and duplicate detections are identified. Under this
protocol, more advanced analysis methods (Mark Recapture Distance Sampling)
can be used where it is possible to relax the assumption of standard methods that
all animals at zero distance are seen.
For more information, see Laake and Borchers (2004). For more about how to
set up a double observer dataset in Distance, see the Users Guide chapter on
Mark Recapture Distance Sampling.
factor
Name given to a covariate that is divided into distinct classes. Examples include
sex (male / female), observer, etc.
f(0)
The value of the probability density function of observed distances, evaluated at
0 distance.
geographic coordinates
A measurement of a location on the earth’s surface expressed in degrees of
latitude and longitude.
GIS
Geographic Information System - a piece of software that can work with
geographic data.
Horvitz-Thompson
Unbiased estimator of abundance, given by
n
1
Nˆ surv = ∑p
i =1 i
where N surv is the number of animals in the surveyed area (i.e., the strip or
circle actually surveyed), n is the number of animals seen, and pi is the
probability of observing the ith animal, given that it is in the surveyed region.
Given this estimate, assuming equal probability of coverage, an estimate of the
population abundance N is given by
A
Nˆ = N surv
a
where A is the area of the survey region, and a is the surveyed area.
If coverage probability is not equal, population abundance can be estimated by
n
1
Nˆ = ∑pq
i =1 i i
where qi is the probability that surveyed area covers the ith animal, given its
location. qi is dictated by the survey design.
For more information, see Horvitz and Thompson 1952, Borchers et al. 2002,
Thompson 2002, Buckland et al. In prep
Horvitz-Thompson-like
Term used to describe a Horvitz-Thompson estimator, where the probability of
observing the animal is estimated, rather than known:
n
1
Nˆ surv = ∑ pˆ
i =1 i
This estimator is biased, although the bias is usually not large if the pis are not
small.
See the entry for Horvitz-Thompson estimator for notation, and generalization to
the case where coverage probability is not equal.
For more information, see Horvitz and Thompson 1952, Borchers et al. 2002,
Thompson 2002, Buckland et al. In prep
inclusion probability
see Coverage Probabiltiy
MCDS
see multiple covariate distance sampling
MRDS
See mark recapture distance sampling
multiplier
A quantity you can use when you know your estimates are proportional to the
true abundance or density. If you know the constant of proportionality, you can
use a multiplier to get unbiased estimates. An example would be if you know
that g(0) is less than 1, but you have an independent estimate of g(0). You can
then use the multiplier (and, if you have it, the multiplier SE and DF) to correct
your estimates.
probability of detection
The probability of recording an object (individual or cluster) in the surveyed
area.
project file
A file containing the project settings, survey designs, analysis settings and
results. Project files always end in .dst – e.g., Ducknest.dst. Double-clicking a
project file opens it in Distance.
projected coordinates
A measurement of locations on the earth’s surface expressed in a two-
dimensional system that locates features based on their distance from an origin
(0,0) along two axes, a horizontal x-axis representing east–west and a vertical y-
axis representing north–south. A map projection transforms latitude and
longitude to x,y coordinates in a projected coordinate system.
radio button
A round button that you click on to select. Radio buttons usually occur in a
group, of which only one can be selected at once. An example is the constraints
group in the Conventional Distance Sampling Model Definition properties
window:
R folder
A folder (directory) containing the R object file (.RData) and image files
generated by the R statistical software package. It is located within a project’s
data folder, and is created automatically the first time an analysis is run that uses
R.
R software
According to the R web site (http://www.r-project.org), R is a language and
environment for statistical computing and graphics. In the context of Distance,
the mark-recapture distance sampling (MRDS) analysis engine is implemented
as an R library. A working copy of R is therefore required before this engine can
be run.
set
A collection of Analyses, Designs or Surveys that are displayed on the same
browser page. You usually group items together that share some properties – for
example, you could have two different Analyses Sets, one where you use
truncation and one where you do not.
shapefile
A shapefile is a standard format for storing geographic information, invented by
the GIS company ESRI. Each shapefile is actually 3 separate files: an .shp file, a
.shx file and a .dbf file. (In addition, there may be other files such as .prj files.)
Shapefiles are used to store geographic information in Distance.
single observer
The standard survey protocol where a single team of observers perform a
distance sampling survey. Under this protocol, it is necessary to assume that all
animals at zero distance are detected. cf. double observer methods.
toolbar
The collection of buttons and menus at the top of a window in Distance. An
example is the Analysis Components toolbar:
trackline
The transect line.
Index Introduction 71
Preferences 181
Analysis Results
Output from MCDS Analyses 122
Authors 6
B
Backing up projects 36
Bibliography 333
Binned Data
A CDS 100
Books
About Distance 5
Bibliography 333
About Distance dialog 254
Distance sampling reference books 3
About the Users Guide 1
Bootstrap
Acknowledgements 6
Overview in CDS 107
Adjustment terms
Setting options in CDS and MCDS analysis 247
Specifying in CDS and MCDS analysis 241
Bootstrap file
Algorithms
MCDS engine file format 313
MCDS engine 314
Bootstrap progress file
Analyis Results
MCDS engine file format 313
Output from CDS analyses 89
Bounds on parameters
Output from DSM analyses 154, 329
Setting in CDS and MCDS 244
Output from MRDS analyses 135
Specifying in DSM 159
Analysis
Specifying in MRDS 142
Stopping 163
Analysis Browser 71, 197
CDS Results 98 C
DSM Results 156
Calculating Probability of Detection 95
Exporting CDS Results 98
CDS See Conventional Distance Sampling
MRDS Results 136
Citation for Distance 3
Analysis Components 72, 253
Cleaning the Windows Temp folder 83
Analysis Components Window 77, 253
Cluster Size tab
Analysis Details 72
CDS and MCDS 246
CDS Results 90
Clustered populations
DSM Results 155, 329
In CDS 102
Exporting Results 99
Clusters
MCDS results 122
About 102
MRDS Results 135
CDS 102
Analysis Details window 210
Data Filter options 233
Analysis Details Windows 72
Missing values 102
Analysis Engines
Model Definition options 246
About 80
Zero cluster size 102
CDS Output 89
Column Manager dialog 258
DSM Output 154, 329
Compacting a project 39
MCDS output 122
Components that make up the Distance software 261
MRDS Output 135
Constraints
Running DSM from outside Distance 157, 330
Setting in CDS and MCDS 244
Running MCDS from the command line 279
Control
Running MRDS from outside Distance 140
Specifiying control parameters in MRDS Model
Analysis Guidelines
Definition 252
CDS 88
Conventional Distance Sampling 87
DSM 152
About 87
MCDS 120
Analysis guidelines 88
MRDS 134
T
Temp folder
Cleaning 83
Template, using a project as 35
Troubleshooting 161
Truncation 235
U
Units 236
Unknown Study Area Size 112
Use Agreement 5
Using a previously fitted detection function to
estimate density in MRDS 139
V
Valid field names 276
Variance Estimation
CDS 107
MRDS 137
Variance tab
CDS and MCDS 247
MRDS 252, 253
Version
Distance 254
MRDS Engine 141, 158, 331
W
Warnings
In CDS and MCDS engine 162
Web site 4
Welcome to the Users Guide 1
What is Distance? 5
Which geographic projection? 55
Wizards
Data Entry Wizard 170
Setup Project Wizard 165