0% found this document useful (0 votes)
38 views26 pages

Occupancytuts - Single Season Occupancy in RPresence

Uploaded by

Ulfah Mardhiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views26 pages

Occupancytuts - Single Season Occupancy in RPresence

Uploaded by

Ulfah Mardhiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

occupancyTuts: Single Season Occupancy in RPresence

Prerequisites

tutPrePost(tut = "single_season", type = "pre")

The tutorial prerequisites are listed below. To run a tutorial, use learnr::runtutorial(‘name’, package =
‘occupancyTuts’)

name description

software An introduction to primary software used in this package, RPresence.

Suggested/Potential Readings
A friendly overview of the occupancy paradigm is:

Bailey, L., and M. Adams. 2005. Occupancy models to study wildlife. USGS Fact Sheet 2005-3096.
(https://pubs.usgs.gov/fs/2005/3096/fs20053096.pdf)

The original paper by Darryl MacKenzie et al. is:

MacKenzie, D. I., Nichols, J. D., Lachman, G. B., Droege, S., Royle, J. A., & Langtimm, C. A. 2002.
Estimating site occupancy rates when detection probabilities are less than one. Ecology

83(8):2248–2255.

Background
Finally! This tutorial will help you understand how to run a basic single season occupancy model in
RPresence.

Let’s recap the main concepts we’ve learned so far. For occupancy models, the goal is to determine the
probability that a site is occupied by a target of interest, usually a species, given that detection is
imperfect. Thus, we are interested in two things. First, is the target present at a site? Second, for targets
that are present, is the target detected by an observer? In this sense, occupancy models are
“hierarchical” models in that there is a process that first establishes the distribution of targets (the true
presence or absence of the target), and then there is an observation process of the resulting distribution
(the detection or non-detection of the target).

127.0.0.1:40724/#section-whats-next 1/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

As researchers, the data we have in hand are the latter – the detection or non-detection of target species
across surveys. To get at detection, sites must be surveyed at least twice under the assumption that true
occupancy patterns are fixed between the surveys. Survey results are summarized as “encounter
histories” as shown below.

For example, in surveying trolls at 75 sites in Middle Earth, 45 sites had a history of 000, 1 site had a
history of 010, and so on. From the detection and non-detection information in the table, we try to
understand the former – the true presence or absence of the target.

You may see the exact same data presented in unsummarized fashion, where each row is a “site” and
the frequency is 1. For example, the table below shows the encounter histories 1-1-1 (n = 15), 0-1-1 (n =
1), and 1-0-1 (n = 7) on a row-by-row basis:

Frequency of troll encounter histories for 3 surveys.

Survey 1 Survey 2 Survey 3 History Frequency


1 0 0 0 000 1
1.1 0 0 0 000 1
1.2 0 0 0 000 1
1.3 0 0 0 000 1
1.4 0 0 0 000 1
1.5 0 0 0 000 1
1.6 0 0 0 000 1
1.7 0 0 0 000 1
1.8 0 0 0 000 1
1.9 0 0 0 000 1
1.10 0 0 0 000 1
1.11 0 0 0 000 1
1.12 0 0 0 000 1
1.13 0 0 0 000 1
1.14 0 0 0 000 1

127.0.0.1:40724/#section-whats-next 2/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Survey 1 Survey 2 Survey 3 History Frequency


1.15 0 0 0 000 1
1.16 0 0 0 000 1
1.17 0 0 0 000 1
1.18 0 0 0 000 1
1.19 0 0 0 000 1
1.20 0 0 0 000 1
1.21 0 0 0 000 1
1.22 0 0 0 000 1
1.23 0 0 0 000 1
1.24 0 0 0 000 1
1.25 0 0 0 000 1
1.26 0 0 0 000 1
1.27 0 0 0 000 1
1.28 0 0 0 000 1
1.29 0 0 0 000 1
1.30 0 0 0 000 1
1.31 0 0 0 000 1
1.32 0 0 0 000 1
1.33 0 0 0 000 1
1.34 0 0 0 000 1
1.35 0 0 0 000 1
1.36 0 0 0 000 1
1.37 0 0 0 000 1
1.38 0 0 0 000 1
1.39 0 0 0 000 1
1.40 0 0 0 000 1
1.41 0 0 0 000 1
1.42 0 0 0 000 1
1.43 0 0 0 000 1
1.44 0 0 0 000 1
2 0 0 1 001 1
3 0 1 0 010 1
4 0 1 1 011 1
5 1 0 0 100 1
5.1 1 0 0 100 1
6 1 0 1 101 1
6.1 1 0 1 101 1
6.2 1 0 1 101 1
6.3 1 0 1 101 1
6.4 1 0 1 101 1
6.5 1 0 1 101 1
6.6 1 0 1 101 1
7 1 1 0 110 1
7.1 1 1 0 110 1
7.2 1 1 0 110 1
8 1 1 1 111 1
8.1 1 1 1 111 1

127.0.0.1:40724/#section-whats-next 3/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Survey 1 Survey 2 Survey 3 History Frequency


8.2 1 1 1 111 1
8.3 1 1 1 111 1
8.4 1 1 1 111 1
8.5 1 1 1 111 1
8.6 1 1 1 111 1
8.7 1 1 1 111 1
8.8 1 1 1 111 1
8.9 1 1 1 111 1
8.10 1 1 1 111 1
8.11 1 1 1 111 1
8.12 1 1 1 111 1
8.13 1 1 1 111 1
8.14 1 1 1 111 1

👉🏾It’s important to recognize that either depiction of the encounter histories and their frequencies is OK.
Why wouldn’t we want to use the shorter, summarized format? Most likely, each site will have a unique
combination of covariates, so each site must appear on it’s own row. We will use the second format for this
tutorial.

The approach for untangling detection probability from occupancy probability was described in a seminal
paper by Darryl MacKenzie and colleagues: MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege,
J. A. Royle, and C. A. Langtimm. 2002. Estimating site occupancy when detection probabilities are less
than one. Ecology 83:2248-2255. The approach considers the different encounter histories as face in a
die.

Each die face has a probability of being observed, which in turn are composites of the parameters ψ (the
probability that the state is occupied) and pi (the probability that the target is detected on a survey i). For
trolls that were surveyed 3 times in a single season, the multinomial log-likelihood function

127.0.0.1:40724/#section-whats-next 4/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

log(L(pi |n, yi )) ∝ y111 log(p111 ) +


y011 log(p011 ) +
y101 log(p101 )+. . . +
y000 log(p000 )

becomes:

log(L(pi |n, yi )) ∝ y111 log(ψ ∗ p1 ∗ p2 ∗ p3 ) +


y011 log(ψ(1 − p1 ) ∗ p2 ∗ p3 ) +
y101 log(ψ ∗ p1 ∗ (1 − p2 ) ∗ p3 )+. . . +
y000 log(ψ ∗ (1 − p1 ) ∗ (1 − p2 ) ∗ (1 − p3 ) + (1 − ψ))

The main task now is to find the most likely value of the parameters ψ and pi , given the encounter
histories and their frequencies. We base our scientific inferences on these results. In this tutorial, we aim
to show you how to run a single season occupancy model with the R package, RPresence.

Objectives
To become familiar with the RPresence package.

To run the single-season occupancy model in RPresence.


To understand the RPresence output.

The trolls_ss dataset


To use RPresence, use the library() function to load it into R. This makes all of the package’s
functions and datasets available for use. We will also make use of some functions in the tidyr package for
some light data wrangling, so we’ll load that now too. [Note: these packages were pre-loaded when you
launched this tutorial.]

library(RPresence)
library(tidyr)

For this tutorial, we’ll work with a new dataset that is included in occupancyTuts. The dataset is called
trolls_ss, and was loaded when you launched the tutorial.

Enter code below to view the first 6 records of the trolls_ss dataset.

R Code Start Over Hint Run Code

The head() function shows the first 6 rows of the troll data-set. Each site in Middle Earth has a unique
ID, such as “1085441”. The data-set shows the detection status of trolls for each site on a given date by a
given observer. For instance, the first record was surveyed on 1200/01/10 by Legolas, who did not detect

127.0.0.1:40724/#section-whats-next 5/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

any trolls (“detection” = 0). Scroll to the right if this does not appear on your screen. Legolas surveyed
this same site the next day, 1200/01/11, where he detected a troll, as did Gollum on 1200/01/12. Perhaps
Gollum spotted the same troll as Legolas, but perhaps not. The data only convey the detection or non-
detection of a target.

The data-set also contains conditions associated with each site. For example, the column “city” indicates
whether a site is within 30km of any Middle Earth city. There are other columns as well that describe site
characteristics. We’ll work with these columns in future tutorials.

Add code to look at the structure of this dataset:

R Code Start Over Hint Run Code

Quiz

This dataset has 375 observations (rows) and 15 columns. Look for this information in the str() output.
We will be running a single season occupancy analysis of trolls, so we need to ensure that all surveys
were conducted in a short enough time frame such that the presence or absence of trolls on a site
does not change between surveys. Enter code below to inspect the dates in which trolls were
surveyed. The unique() function will come in handy here.

R Code Start Over Hint Run Code

Great! Trolls were surveyed in the year 1200 on 5 days, and we will assume that a site that was occupied
on day 1 remained occupied through day 5, and a site that was unoccupied on day 1 remained
unoccupied through day 5. This is a key assumption of the single-season occupancy model.

For this tutorial, we can drop several columns from the dataset. Enter code below to trim the dataset
using dplyr functions, retaining the columns “siteID”, “ValianDate”, and “detection”. Change the site
column to a character, and name the resulting object trolls_tut. Then show the first five records. You
can use base R functions if you prefer.

R Code Start Over Hint Run Code

To analyze these data in RPresence, we need to convert the data from “long” format (one row for each
observation at a site and survey) to “wide” format (where each site’s survey results are provided in a
single row). The spread() function in the package tidyr can help with this. Let’s look at this function’s
arguments:

127.0.0.1:40724/#section-whats-next 6/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

args("spread")

## function (data, key, value, fill = NA, convert = FALSE, drop = TRUE,
## sep = NULL)
## NULL

The first argument is the dataset. Easy: we will pass the dataset trolls_tut to the function. The second
argument is the “key”, which identifies the column you want to convert from long format to wide format.
We would like our dates to be spread across columns, so our key will be ValianDate. The last argument
is “value”, which specifies what value we should fill in for each new date column. We would like to fill in
the detection data, which consist of 0’s and 1’s.

Try it! Convert the trolls dataframe from long format (1 record per site and detection survey) to detection-
history format (1 record per site, 1’s and 0’s in survey columns). Name your output “eh”, short for
“encounter history.” Then print the 1st 10 rows of the “eh” dataframe.

R Code Start Over Hints Run Code

This dataset is now in “unsummarized” form, where each row gives the encounter history for each site.
The frequency of each site is not actually part of the dataset, but is assumed to be 1 (each row in the
dataset provides the encounter history for a single site). This dataframe is a typical occupancy dataset.

At this point, we have finished wrangling and are ready to move to the next step.

RPresence createPao()
The basic input for nearly every analysis in RPresence is an R object of class “pao”, which stands for
“presence-absence object”. A “pao” object is created with the createPao() function. The output of this
function is a large R object that contains all of the information needed to run an analysis in RPresence.

👉🏻 The createPao() function creates an input object for RPresence which contains the encounter history
matrix as well as other variables needed for analysis. Running an occupancy analysis is really a two-step
process. First, you must create an object of class “pao”, and this object in turn is passed to the occMod()
function, which does the actual analysis.

It is well worth your time to look at this function’s help page:

R Code Start Over Run Code

The function’s description tells us that it “Creates a pao data object from R variables”. The function has
several arguments, and many are populated with default values as indicated by an equal sign.

127.0.0.1:40724/#section-whats-next 7/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Use the args() function to show the arguments of createPao function. If an argument has a default
value, you can spot it by looking for an equal assignment.

R Code Start Over Hint Run Code

Let’s explore the first argument, which is the only argument that does not have a default value:

data = data frame containing the detection (=1) and non-detection (=0) data with observations from

a sampling unit along a row, and surveys in the columns. Missing values are allowed (=NA).

👉🏿 In using R functions, any R function in fact, it is very important to know if the function is using default
arguments, and if so, what they are.

Quiz

Have a look at the other default values in createPao() , in particular the argument named “unitnames”.
These are the names of the sites, and as you can see the default will paste the word “unit” with the row
number to generate default site names. Our Middle Earth site names, however, are stored in column 1 of
the dataframe, eh. The argument named “title” also has a default that could be updated to something
more meaningful.

Create the pao object using the “eh” dataframe, recalling that the site names are stored in column 1
and the survey results are stored in columns 2-6. Name the pao object, “trolls_pao”, and give it the
title, “Trolls of Middle Earth Single-Season Occupancy Data”. Hint: The data should include only
columns with 0’s and 1’s (not the ID field).

R Code Start Over Hint Run Code

Let’s have a look at the class of the returned object:

class(trolls_pao)

## [1] "pao"

This object is of class “pao”, and such an object is the primary input for running an occupancy analysis in
RPresence. The RPresence function, is.pao() will test if an object has this class:

127.0.0.1:40724/#section-whats-next 8/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

is.pao(trolls_pao)

## [1] TRUE

Now, let’s look at the structure of this object:

# look at the structure of the returned object


str(trolls_pao)

## List of 18
## $ nunits : int 75
## $ nsurveys : int 5
## $ nseasons : int 1
## $ methods : num 1
## $ det.data :'data.frame': 75 obs. of 5 variables:
## ..$ 1200/01/10: int [1:75] 0 1 0 0 0 0 0 0 0 0 ...
## ..$ 1200/01/11: int [1:75] 0 0 0 0 1 0 0 0 0 0 ...
## ..$ 1200/01/12: int [1:75] 0 0 0 0 1 0 0 0 0 0 ...
## ..$ 1200/01/13: int [1:75] 0 1 0 0 0 0 0 0 0 0 ...
## ..$ 1200/01/14: int [1:75] 0 0 0 0 0 0 0 0 0 0 ...
## $ nunitcov : int 0
## $ unitcov :'data.frame': 0 obs. of 0 variables
## $ nsurvcov : int 1
## $ survcov :'data.frame': 375 obs. of 1 variable:
## ..$ SURVEY: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ nseasncov : int 1
## $ seasncov :'data.frame': 75 obs. of 1 variable:
## ..$ SEASON: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
## $ nsurveyseason: int 5
## $ intervals : num [1:5] 1 1 1 1 1
## $ title : chr "Trolls of Middle Earth Single-Season Occupancy Data"
## $ unitnames : chr [1:75] "1020724" "103396" "1034007" "108217" ...
## $ surveynames : chr [1:5] "1-1" "1-2" "1-3" "1-4" ...
## $ paoname : chr "pres.pao"
## $ frq : num [1:75] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "class")= chr "pao"

This object is a list of 18 elements. Each list element stores specific information. For example:

The list element named “nunits” indicates that we are analyzing data from 75 sites. Each site was
surveyed 5 times (“nsurveys”) in a single season (“nseasons”) with a single method (“nmethods”).
This information conveys that the data are sufficient for running a single season occupancy model.

The detection-nondetection data are stored as a dataframe in the list element named “det.data”.

127.0.0.1:40724/#section-whats-next 9/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

The list elements named “nunitcov” and “unitcov” store the number and values of site level
covariates. We’ll learn about these in other tutorials, but these are variables that may affect site
occupancy and/or detection, such as the proportion of forest, whether the site is near a hill, and if
stonefields are present in the site.

The list elements “nsurveycov” and “surveycov” store survey level covariates. Survey level
covariates are variables that may affect detection, such as who conducted the survey and the
weather conditions in which the survey was conducted. Notice that createPao() automatically
created a survey covariate named SURVEY.

👉🏽 A pao object is stuffed full of the information needed to run an occupancy analysis. Take time to inspect
this object now.

The authors of RPresence wrote a function that will show a summary of a pao object. Let’s use it now:

summary(trolls_pao)

## paoname=pres.pao
## title=Trolls of Middle Earth Single-Season Occupancy Data
## Naive occ=0.4
## nunits nsurveys nseasons nsurveyseason methods
## "75" "5" "1" "5" "1"
## nunitcov nsurvcov
## "0" "1"
## unit covariates :
## survey covariates: SURVEY

The summary() function for an object of class “pao” will show critical pieces from the list we explored.
One juicy bit of information shown in the summary() output is the naive occupancy rate.

Quiz

Now, let’s look at a couple of elements in the pao object. We can extract elements from the list by name.

127.0.0.1:40724/#section-whats-next 10/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Enter code below to return the first 10 records of the detection data, which are stored in trolls_pao list
element named “det.data”:

R Code Start Over Hint Run Code

So the “det.data” element stores the encounter histories.

We didn’t include any site or survey covariates for this tutorial, such as presence of stonefields or
observer. However, RPresence added a default survey covariate named “SURVEY”. Survey covariates
are needed for each site (n = 75) and survey (n = 5). RPresence transforms the survey covariates into a
single column vector. Our built-in “SURVEY” covariate was transformed from a 75 X 5 dataframe (75
sites by 5 surveys) to a 375 X 1 dataframe. We can peek at this structure with the code below, noting that
you’ll need to scroll to see how the survey numbers are stored.

R Code Start Over Run Code

The first 75 rows of this covariate correspond to the 75 sites and survey number 1. The next 75
correspond to the 75 sites for survey 2, etc. If we printed every 75th row, we would get the values 1, 2, 3,
4, 5. It’s not a user-friendly format, but was created by authors of the function to indicate which survey
each detection-history cell belongs to.

We have now created our “pao” object and studied its components in depth. The next step is to pass that
object to the function, occMod() .

RPresence occMod()
At last! Now that we have created our input pao object, we can run the single season occupancy analysis
and obtain multinomial maximum likelihood estimates for occupancy and detection probabilities (ψ , p1 ,
p2 , p3 , p4 and p5 ).

👉🏼In RPresence, the main function for running occupancy analyses is the occMod() function. This is a
workhorse function and we will re-visit it in many future tutorials. If you haven’t guessed, our first stop is
the function’s help page:

R Code Start Over Run Code

This function fits a wide variety of occupancy models using program Presence. Let’s look at the
arguments for occMod():

args(occMod)

127.0.0.1:40724/#section-whats-next 11/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## function (model, cov.list = NULL, data, type = "so", conf = 0.95,


## modname = NULL, outfile = NULL, fixed = NULL, initvals = NULL,
## modfitboot = NULL, VCoutopt = "realbetavc", noDerived = F,
## randinit = 0, maxfn = "32000", quiet = 1, limit.real = 0,
## threads = 1, param = NULL, seed = 0, miscopts = NULL, ...)
## NULL

There are several arguments to the occMod() function, and many have default values. Can you spot
them? We can view the definitions and defaults for the parameters by using the help function
(help(‘occMod’)), but here’s an overview of the ones needed for this tutorial:

model = list of formula for each parameter. We need a formula for ψ and a formula for p in
general.

data = pao data object (object of class pao).

type = a string indicating the type of occupancy model to fit. In this tutorial, we are fitting a single
season occupancy model, which has the type of “so” (static occupancy).

There are other arguments too, and we’ll introduce them as needed.

All of occMod() s arguments have default values but two: model and data. The data argument is the pao
object we created with createPao() in the previous section.

Let’s discuss the model argument next, which is a list that provides a formula for each parameter. For
single season analyses, we need a formula for ψ and a formula for p. To run models with RPresence,
there is no need to write or code the likelihood functions as in the multinomial examples. Since you’re not
writing a function to do it, there needs to be a way to specify a model. This is done in a similar fashion to
the way generalized linear models (GLM’s) are specified in R . . . by formula. In R, model formulea are
typically specified as

dependent variable ∼ independent variable 1 + independent variable 2+. . .

This formula describes the relationship between the independent variables (e.g., site or survey
covariates) and the dependent variable (e.g., occupancy or detection parameters). The most simple
formula for a parameter would be something like, “psi ~ 1”. Here, we don’t have any independent
variables, only a constant which we call an “intercept”. Since the right side of the formula is a constant,
the parameter, psi will be estimated as a constant (i.e., ψ will be the same for all sites).

For detection, we can create a model where detection probability (p) is the same across all 5 surveys,
forcing p1 = p2 = p3 = p4 = p5 . This formula would look something like “p ~ 1”, again the meaning is that
there are no independent variables, only a constant which is called an “intercept”.

Alternatively, a detection model might specify that each pi differs according to the survey number. This
number, if you recall, is stored in a dataframe in the pao object’s list element named “survcov”. This
dataframe has one column called SURVEY:

As a friendly reminder:

127.0.0.1:40724/#section-whats-next 12/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

str(trolls_pao$survcov)

## 'data.frame': 375 obs. of 1 variable:


## $ SURVEY: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

We can use the formula, “p ~ SURVEY” to denote a model in which we estimate a unique detection
probability p for each survey. Since we have 5 troll surveys, we will be estimating unique values for p1 ,
p2 , p3 , p4 , and p5 .

Let’s prepare to run the function occMod() . The required inputs will be:

“data” = the pao object, which is “trolls_pao”.


“model” = a list of models. We will need to create a list that specifies the formula for ψ and a

formula for p. We can create this list as “list(psi ~ 1, p ~ SURVEY)”.

type = the type argument to signal which type of occupancy model we are fitting. Here, we will be
fitting a single-season occupancy model, also known as a “static occupancy” model, or “so” for

short.

Let’s try it. Enter code to run a single season occupancy model where occupancy is the same for all
sites (intercept model), and detection is a function of survey. Save the result as a new R object called
“psi_pSURVEY”. You’ll need to provide the arguments, model, data, and type to the occMod()
function. Then display the summary of the returned object.

R Code Start Over Run Code

Congratulations! You have run your very first occupancy analysis in RPresence! We have come a long
way since our first tutorial.

The summary() function returns information that the function authors (Darryl MacKenzie and Jim Hines)
thought was most relevant to return to the function user (you!). Let’s step through the summary output.

Here we can see the model name “psi()p(SURVEY)”, which occMod() created by default because

we did not intentionally supply a name for this model. (A convention for naming models is to list the

parameter, then list the independent variables which affect the parameter in parentheses.)
We also see AIC and the -2 log-likelihood value. We will introduce AIC in a future tutorial, but the

model’s log-likelihood should ring a bell. For this particular single season model with 5 surveys, the

multinomial log-likelihood is:

log(L(pi |n, yi )) ∝ y11111 log(ψ ∗ p1 ∗ p2 ∗ p3 ∗ p4 ∗ p5 ) +


y01111 log(ψ(1 − p1 ) ∗ p2 ∗ p3 ∗ p4 ∗ p5 ) +
y10111 log(ψ ∗ p1 ∗ (1 − p2 ) ∗ p3 ∗ p4 ∗ p5 )+. . . +
y00000 log(ψ ∗ (1 − p1 ) ∗ (1 − p2 ) ∗ (1 − p3 ) ∗ (1 − p4 ) ∗ (1 − p5 ) + (1 − ψ))

127.0.0.1:40724/#section-whats-next 13/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

If we plug in any values for the parameters, the log-likelihood is calculated. -2 times this result is

the -2 log-likelihood value.


The “num.par” in the summary() output shows that 6 total parameters have been estimated (count

them).

The warning is important: “Numerical convergence may not have been reached. Parameter

estimates converged to approximately 7 significant digits.” Remember, we are trying to find


combinations of parameters (betas) that minimize the -2 LogLike. In the process of optimization,

many calculations are performed and quantities can be very small. At some point, computer

“round-off” error occurs. For example, the computer cannot represent the number 1/3 exactly, but
stores it as something like, 0.3333333333333. After thousands of calculations, these round-off

errors can build up and the result will be inexact. In this case, the warning states that the result of

the optimization has 7 significant digits, so no worries here.

A few things are worth highlighting.

👉🏽RPresence does not maximize the log-likelihood function to find the MLEs; it minimizes the -2*log-
likelihood value, and that is the value that is returned by the function. See the “optimization” tutorial if you
would like to learn more about how optimization works in general.

👉🏻 If the number of significant digits is less than 3, it indicates an unstable likelihood function and
estimates might be suspect. If so, there are some steps you can take to resolve it. The usual cause of the
problem is fitting a model which is too complicated for the data. The solution in that case is to try a less-
complicated model, or include more data. With complicated models, the optimization routine may benefit
by having better starting values for the parameters. Starting values can be specified as an argument to
occMod() , or you can try a sequence of random starting values using the “randinit” argument.

👉🏾 Keep in mind the big picture! This model provides our best estimate of troll occupancy and detection
probabilities, along with the uncertainty in these estimates.

Of course, the summary() function is returning only a summary of the object, psi_pSURVEY. This object,
in fact, is crammed with information, and we will need to know how to retrieve key pieces of the output
that are of interest (such as how does the estimated psi compare to the naive estimate of psi). For more
details of the occMod() function output, let’s examine the structure of the returned model.

# look at the structure of the occMod() function's output


str(psi_pSURVEY, max.level = 1)

127.0.0.1:40724/#section-whats-next 14/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## List of 17
## $ modname : chr "psi(1)p(SURVEY)"
## $ model :List of 2
## $ dmat :List of 2
## $ data :List of 18
## ..- attr(*, "class")= chr "pao"
## $ outfile : chr "del_8164_99428.out"
## $ neg2loglike: num 255
## $ npar : int 6
## $ aic : num 267
## $ beta :List of 5
## $ real :List of 2
## $ derived :List of 1
## $ gof :List of 4
## $ warnings :List of 2
## $ version :List of 2
## $ Version : chr "2.15.2"
## $ dateTime : POSIXct[1:2], format: "2024-05-28 10:26:16" "2024-05-28 10:26:16"
## ..- attr(*, "names")="start" "end"
## $ type : chr "so"
## - attr(*, "class")= chr [1:2] "occMod" "so"

You can see that occMod() returns an object of class “occMod” and “so”. This object is packed with
information. It is a list that contains several elements, each described in the Value section of the
function’s helpfile.

You can always use simple list extraction to extract portions of this list, such as
psi_pSURVEY$neg2loglike or psi_pSURVEY$npar.

Enter R code to display the model’s log-likelihood:

R Code Start Over Hints Run Code

Enter R code below to view any portion of this list – your choice.

R Code Start Over Run Code

Although you can use list indexing to pull certain elements out of the resulting object, Darryl and Jim
included some “methods” in RPresence that will aid in the most common tasks. A “method” in R is a
function that works on an object of a specific class. Let’s use the methods() function to return a list of
methods that work on objects of class “occMod”:

methods(class = "occMod")

## [1] coef fitted predict summary


## see '?methods' for accessing help and source code

127.0.0.1:40724/#section-whats-next 15/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Here, we see four methods are listed: coef() , fitted() , predict() , and summary() . If we pass our
psi_pSURVEY object to any of these functions, we will be able to retrieve key components of the output.

We’ve already used the summary() method. Let’s try the coef() method to extract the model
coefficients. Here, for the single season model, we pass in our psi_pSURVEY object, and specify either
‘psi’ or ‘p’ for the “param” argument. We will also pass in 0.95 for the “prob” argument, which will return
the 95% confidence intervals for the parameter estimates.

R Code Start Over Run Code

👉🏾 Clearly, with estimates such as -0.404792 and 2.193197, these coefficients are not probabilities. Don’t
forget that RPresence converts the occupancy parameters, ψ and pi , which are bound between 0 and 1, to
uncontrained betas, which have no bounds, in finding the optimal solution. Thus the coefficients are on the
logit scale.

Savvy readers will notice that these results can be located directly in the psi_pSURVEY list element
called “beta”, which provides not only the beta estimates and standard errors but also the variance-
covariance matrices:

psi_pSURVEY$beta

127.0.0.1:40724/#section-whats-next 16/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## $psi
## est se
## psi_A1.Int -0.404792 0.235766
##
## $psi.VC
## [,1]
## [1,] 0.055586
##
## $p
## est se
## p1_B1.Int 2.193197 0.608323
## p2_B2 -1.501259 0.720538
## p3_B3 -0.808918 0.759632
## p4_B4 -1.347243 0.726521
## p5_B5 -0.808918 0.759632
##
## $p.VC
## B1 B2 B3 B4 B5
## B1 0.370056 -0.369615 -0.369381 -0.369579 -0.369381
## B2 -0.369615 0.519175 0.369143 0.369273 0.369143
## B3 -0.369381 0.369143 0.577041 0.369128 0.369043
## B4 -0.369579 0.369273 0.369128 0.527832 0.369128
## B5 -0.369381 0.369143 0.369043 0.369128 0.577041
##
## $VC
## A1 B1 B2 B3 B4 B5
## A1 0.055586 -0.000227 0.000159 0.000114 0.000152 0.000114
## B1 -0.000227 0.370056 -0.369615 -0.369381 -0.369579 -0.369381
## B2 0.000159 -0.369615 0.519175 0.369143 0.369273 0.369143
## B3 0.000114 -0.369381 0.369143 0.577041 0.369128 0.369043
## B4 0.000152 -0.369579 0.369273 0.369128 0.527832 0.369128
## B5 0.000114 -0.369381 0.369143 0.369043 0.369128 0.577041

Again, these are beta coefficients – the unconstrained parameters that the occMod() function uses to
find the model’s maximum likelihood (or to find the model’s minimum -2log-likelihood) value. Notice there
is one parameter for ψ and 5 parameters for detection. In future tutorials, we will run models where ψ
and p depend on covariates. For such models, the beta estimates will be important as they represent the
effect of the covariates on occupancy or detection.

The fitted() method will display the “real” (back-transformed using the inverse-logit function)
occupancy (psi) and detection (p) estimates for each and every site. Because these are probabilities,
their values are bound between 0 and 1. This is the information that you, the troll biologist, will want to
report back to Gandolf.

Use the fitted() method to obtain the estimates of psi. Don’t forget to specify ‘psi’ for the param
argument!

R Code Start Over Hint Run Code

127.0.0.1:40724/#section-whats-next 17/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Notice that the parameter name and unit (site) name are combined to make the row names of this table.
With an “intercept” model for psi (psi ~ 1), each and every site has the same probability of site
occupancy, which you can confirm by scrolling!

We can also use the fitted() function to extract parameter estimates for detection. If you look at the
entire fitted results, however, you will see that it is quite a long dataframe.

Since all sites have the same detection probability, we can select any one site and inspect its results. The
print_one_site_estimates() function comes in handy here, especially for models in which there are no
covariates. This function has an argument called “mod”, where you pass in the model, and a second
argument called “site”, which has a default value of 1. This is the first index of the fitted results.

R Code Start Over Run Code

Notice that the dataframe contains an estimate for psi as well as an estimate for each of the 5 surveys,
including the standard error and 95% confidence intervals. You can also pass in a site name (character)
for this argument as well. Let’s try it.

Enter code below to examine the parameter estimates for site “108217”

R Code Start Over Hint Run Code

Quiz

Displaying results
Remember, Gandalf hired you as the troll biologist to estimate troll occupancy and detection rates. He
probably won’t care about the nuances of RPresence, but he will want the results of the analysis
presented in a easily understood format. Tables and figures can convey our single season maximum
likelihood estimates for our troll data.

First, let’s create a dataframe that can be presented as a table. We can use the
print_one_site_estimates() function to retrieve our estimates, and then do some light data wrangling
to shape the results into a presentable table.

127.0.0.1:40724/#section-whats-next 18/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

# extract the psi content


ests <- as.data.frame(print_one_site_estimates(mod = psi_pSURVEY))

# wrangle the data


ests <- ests %>%
mutate(parm = c("psi", "p1", "p2", "p3", "p4", "p5")) %>%
select(parm, est, se, lower, upper) %>%
rename(.,
c(
Parameter = parm,
Estimate = est,
SE = se,
L95 = lower,
U95 = upper
)
) %>%
`rownames<-`(seq_len(nrow(ests)))

# print the data frame


ests

The dplyr wrangling steps include: 1) adding the column “parm” with mutate() function; 2) reordering the
columns with select() ; 3) renaming the columns, and 4) re-setting the row names. You could add this
table to your report if you wish.

However, while tables are excellent for presenting raw values, they leave interpretation to the reader. You
can help your reader by rending the table information in a more suitable format: a figure. Next, we’ll use
ggplot2 to plot the estimates as bars, and add in their 95% confidence limits:

127.0.0.1:40724/#section-whats-next 19/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

library(ggplot2)
ggplot(ests, aes(x = Parameter, y = Estimate)) +
labs(x = "Parameter", y = "Estimate") +
geom_bar(stat = 'identity') +
geom_errorbar(aes(ymin = L95, ymax = U95), width = 0.2) +
theme_bw()

Maximum likelihood estimates of Troll occupancy data

This graph, or the data table above it, would be a fine way to display your results (assuming the model
“officially” fits the data). Although there is room for improvement, both provide the maximum likelihood
estimates as well as uncertainty associated with these estimates.

Potential challenges
Let’s now look at a new dataset targeting occupancy of a different species (such as Orcs, which move at
speed in daylight and are easily observed). We’ve taken the liberty of creating a pao object containing 4
orc surveys at 100 sites in Middle Earth. Here’s the summary:

# look at the summary of the orc pao object


summary(orcs_pao)

127.0.0.1:40724/#section-whats-next 20/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## paoname=pres.pao
## title=Orcs of Middle Earth Single-Season Occupancy Data
## Naive occ=0.11
## nunits nsurveys nseasons nsurveyseason methods
## "100" "4" "1" "4" "1"
## nunitcov nsurvcov
## "0" "1"
## unit covariates :
## survey covariates: SURVEY

R Code Start Over Run Code

This model appears to have run just fine. That is, the model convergered to a solution with no apparent
errors. However, let’s look at the real estimates that were returned:

R Code Start Over Run Code

What happened here? The maximum likelihood estimate for orc occupancy is 0.11, with upper and lower
confidence intervals roughly spanning 0.06 to 0.19. The detection rates overall are quite high, ranging
from 0.64 in survey 4 to 1.0 in survey 2. Survey 2 looks to be problematic, however, given that the
confidence intervals range between 0 and 1. Should we through survey 2 away? That would be throwing
valuable data away!

Before doing anything drastic, it’s useful to look at the raw data that makes up encounter histories (if you
recall, these are tucked into the “det.data” list element of the pao object. Let’s look at the first 20 records:

head(orcs_pao$det.data, n = 20)

127.0.0.1:40724/#section-whats-next 21/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## det
## unit1 0 0 0 0
## unit2 0 0 0 0
## unit3 0 0 0 0
## unit4 0 0 0 0
## unit5 0 0 0 0
## unit6 0 0 0 0
## unit7 0 0 0 0
## unit8 0 0 0 0
## unit9 1 1 1 1
## unit10 0 0 0 0
## unit11 0 1 1 1
## unit12 0 0 0 0
## unit13 0 0 0 0
## unit14 0 0 0 0
## unit15 0 0 0 0
## unit16 1 1 1 1
## unit17 0 0 0 0
## unit18 0 0 0 0
## unit19 0 0 0 0
## unit20 0 0 0 0

We can see that histories generally consist of all 0’s (such as the first 8 sites), all 1’s (such as sites 9 and
16), or mostly 1’s with a 0 tossed in (such as site 11).

👉🏽 This pattern of 0 and 1’s forms the basis of estimating both occupancy and detection. It may be the case
that orcs were always detected given presence during survey 2.

The overall detection rate of orcs is high, near the boundary of

# calculate the total detections per site and append to the encounter histories
totals <- cbind(
orcs_pao$det.data,
rowSums(orcs_pao$det.data)
)

# reduce the dataset to only sites with at least one observation


totals <- totals[rowSums(orcs_pao$det.data) > 0, ]

# add column names for clarity


colnames(totals) <- c("Survey 1", "Survey 2", "Survey 3", "Survey 4", "Total Detections")

# show the sites


totals

127.0.0.1:40724/#section-whats-next 22/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

## Survey 1 Survey 2 Survey 3 Survey 4 Total Detections


## unit9 1 1 1 1 4
## unit11 0 1 1 1 3
## unit16 1 1 1 1 4
## unit21 1 1 1 0 3
## unit30 1 1 1 0 3
## unit34 1 1 1 1 4
## unit44 1 1 0 1 3
## unit49 1 1 1 0 3
## unit71 1 1 1 1 4
## unit80 1 1 1 1 4
## unit84 1 1 1 0 3

The first four columns in this matrix are the encounter histories, and the last column is the row sum. Here,
we can confirm that survey 2 has no instances of a non-detection of orcs. This is very useful information
and should not be discarded!

👉🏿 Always, always take care to inspect the estimated parameters from any occupancy model. If there are
potential issues, take time to dig into the data to uncover what may be driving your results.

Alternative models
Let’s return to our troll surveys. A powerful idea in occupancy modeling is that you can constrain some
parameters to be equal to others. For example, we can run a new model where p1 = p2 = p3 = p4 = p5 .
This is a simpler model to fit in that we have to only estimate one detection parameter, p, instead of 5
uniquely estimated detection parameters. This would be called an intercept model for detection, and it is
coded like this:

R Code Start Over Run Code

Look for the number of parameters that are estimated. We estimate ψ and p for a total of two
parameters. In this model, p = p1 = p2 = p3 = p4 = p5 . It is a simpler model than our first model, which
estimated 6 unique parameters to find the most likely values that produced our observed 0 and 1 troll
detection data.

This is an important concept: two alternative models run on the same data will generate different
maximum likelihood estimates. Let’s confirm that now.

Select one of the two options below to select a model for p, which will run occMod() and return results.
For both models, occupancy is modeled as a constant, psi ~ 1. Notice the following: (1) Notice
occupancy rate is constant across sites - estimates from a handful of sites are shown for brevity (look
in the first column for psi_ to see the psi estimates for a given site); (2) The parameter estimates
depend on the model you run (look in the first column for p_ to see the detection parameter estimates
for a given site); (2) the -2LogLike (and LogLike) depend on the model run.

127.0.0.1:40724/#section-whats-next 23/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

Detection Model:
p ~ SURVEY
p~1

Notice how the standard errors (se) of p’s are lower for the model with constant p (p ~ 1). There is a
trade-off when adding parameters to a model. More parameters yield a more realistic model with more
flexibility for the parameters, but that comes at the cost of parameter precision. The AIC statistic is a
useful tool to address this trade-off, and we’ll learn about it in future tutorials.

Tutorial summary
Whew! We have covered a lot of information, setting the foundation for single-season occupancy analysis
in RPresence. We spent quite a bit of time discussing createPao() and occMod() functions, which are
the two workhorse functions of RPresence. The general framework

betas → parameters → history probabilities → log probabilities →


product → multinomial log-likelihood → -2LogLike

applies not just to single-season occupancy models, but many other occupancy analytical frameworks as
well. Thus, it is essential that these concepts really sink in. You may wish to revisit this tutorial as it lays
the foundation for many other tutorials in occupancyTuts.

👉🏽 If you’d like a pdf of this document, use the browser “print” function (right-click, print) to print to pdf.
If you want to include quiz questions and R exercises, make sure to provide answers to them before
printing.

What’s next?
tutPrePost(tut = "single_season", type = "post")

Smashing! Suggested follow-ups are listed below. To run a tutorial, use learnr::runtutorial(‘name’,
package = ‘occupancyTuts’)

name priority description

gof Suggested An introduction to goodness of fit methods and how the


MacKenzie and Bailey (2004) test is implemented in RPresence.

optimization Suggested An introduction to computer optimization procedures, including an


overview of the optim() function and the need for the logit link for
occupancy analysis.

127.0.0.1:40724/#section-whats-next 24/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

name priority description

sitecovs Suggested Analyzing occupancy data with covariates that affect site
occupancy, including the analysis of null, continuous, categorical,
additive, polynomial, and interactive models in RPresence.

spatials Suggested Working with spatial data in R, including a brief introduction to the
packages raster and sf, and how to wrangle data for incorporation
into occupancy models.

ss_corr_det Optional Simulating and analyzing the single-season correlated detections


occupancy model, in which detections are spatially correlated with
each other.

ss_false_pos Optional Simulating and analyzing the single-season occupancy model


with identification errors, including both false negatives and false
positives.

ss_mixture Optional Simulating and analyzing single-season occupancy data with


heterogeneous detections (mixtures of detection probabilities that
are not captured by covariates).

ss_multi_method Optional Simulating and analyzing single-season occupancy models where


multiple methods are used in detection.

ss_multistate Optional Simulating and analyzing the single-season occupancy model in


which occupancy includes more than just 1 state (presence); e.g.,
age-, sex-, breeding-status).

ss_species_richness Optional Single-season species richness occupancy tutorial, in which a


pool of species is monitored in a single season and analyzed with
occupancy approaches.

ss_two_species Optional Single-season two-species interaction occupancy model, in which


the goal is to determine whether a site is occupied by two
different species, and to assess if the species affect each other’s
detection and occupancy probabilities.

study_design Optional An introduction to how to design an occupancy study for a target


species.

surveycovs Optional Analyzing occupancy data with covariates that affect detection
rates, including the analysis of null, continuous, categorical,
additive, polynomial, and interactive models in RPresence

wrangling Suggested An introduction to common data wrangling techniques for


occupancy analysis, including brief overviews of the packages
lubridate, dplyr, tidyr, and ggplot2.

127.0.0.1:40724/#section-whats-next 25/26
6/3/24, 2:31 PM occupancyTuts: Single Season Occupancy in RPresence

127.0.0.1:40724/#section-whats-next 26/26

You might also like