Reading Data in Zoo: Gabor Grothendieck Achim Zeileis
Reading Data in Zoo: Gabor Grothendieck Achim Zeileis
Abstract
This vignette gives examples of how to read data in various formats in the zoo package
using the read.zoo() function. The function read.zoo() function expects either a text
file (or text connection) as input or data frame. The former case is handled by first using
read.table() to produce the data frame. (Instead of a text file, the text argument
can be used to read a text string that is already stored in R which is used in the examples
of this vignette.) Subsequently, read.zoo() provides a wide collection of convenience
functionality to turn that data frame into a ‘zoo’ series with a specific structure and a
specific time index. In this vignette, an overview is provided of the wide variety of cases
that can be handled with read.zoo(). All examples assume that zoo is already loaded
and (if necessary) that the chron package has been loaded as well.
Note that functions read.csv.zoo(), read.csv2.zoo(), read.delim.zoo(), and
read.delim2.zoo() are available that call the respective read.*() function instead of
read.table() and subsequently read.zoo(). However, these convenience interfaces are
not employed in this vignette in order to demonstrate setting all arguments ‘by hand’.
Keywords: irregular time series, daily data, weekly data, data frame, text file.
2 Reading Data in zoo
Example 1
Input class: Text file/connection (space-separated with header).
Input index: ‘integer’.
Output class: Multivariate ‘zoo’ series.
Output index: ‘integer’.
Strategy: No transformation of time index needed, hence only a simple call to read.zoo().
Example 2
Input class: ‘data.frame’.
Input index: ‘factor’ with labels indicating AM/PM times but no date.
Output class: Multivariate ‘zoo’ series.
Output index: ‘times’ (from chron).
Strategy: The idea is to add some dummy date (here 1970-01-01) to the ‘character’ lables,
then transform to ‘chron’ and extract the ‘times’.
Bid Offer
07:10:03 6118.5 6119.5
07:10:36 6118.5 6119.5
07:11:07 6119.5 6119.5
07:11:48 6119.0 6120.0
07:12:25 6119.0 6119.5
4 Reading Data in zoo
Example 3
Input class: Text file/connection (semicolon-separated with header).
Input index: ‘factor’s with labels indicating dates (column 1) and times (column 2).
Output class: Multivariate ‘zoo’ series, with separate columns for each date.
Output index: ‘times’ (from chron).
Strategy: Split the data based on date (column 1) and process times (column 2) to ‘times’.
Enhance column names at the end.
01/09/2009 02/09/2009
10:00:00 56567 55626
10:05:00 56463 55723
10:10:00 56370 55659
16:45:00 55771 55742
16:50:00 55823 55717
16:55:00 55814 55385
Gabor Grothendieck, Achim Zeileis 5
Example 4
Input class: Text file/connection (space-separated with header).
Input index: ‘factor’s with labels indicating dates (column 1) and times (column 2).
Output class: Multivariate ‘zoo’ series.
Output index: ‘chron’ (from chron).
Strategy: Indicate vector of two columns in index, which is subsequently processed by a
FUN taking two arguments and returning a ‘chron’ time/date.
O H L C
(01/02/05 17:05:00) 1.3546 1.35530 1.35460 1.35495
(01/02/05 17:10:00) 1.3553 1.35560 1.35490 1.35525
(01/02/05 17:15:00) 1.3556 1.35565 1.35515 1.35530
(01/02/05 17:25:00) 1.3550 1.35560 1.35500 1.35550
(01/02/05 17:30:00) 1.3556 1.35640 1.35535 1.35630
6 Reading Data in zoo
Example 5
Input class: Text file/connection (space-separated with non-matching header).
Input index: ‘factor’s with labels indicating dates (column 6) and unneeded weekdays
(column 5) and times (column 7).
Output class: Multivariate ‘zoo’ series.
Output index: ‘Date’.
Strategy: First, skip the header line, remove unneeded columns by setting colClasses to
"NULL", and set suitable col.names. Second, convert the date column to a ‘Date’ index using
format. Finally, aggregate over duplicate dates, keeping only the last observation.
views number
2009-06-28 910401 1246192687
2009-06-29 921537 1246278917
2009-06-30 934280 1246365403
2009-07-06 986463 1246888699
2009-07-07 995002 1246970243
2009-07-08 1005211 1247079398
2009-07-09 1011144 1247135553
2009-07-11 1026765 1247308591
Gabor Grothendieck, Achim Zeileis 7
views number
2009-07-09 1011144 1247135553
2009-07-16 1077726 1247778752
2009-07-17 1083059 1247845824
views number
2009-07-09 1011144 1247135553
2009-07-17 1083059 1247845824
Alternative approach: Above approach labels each point as it was originally labeled, i.e., if
Thursday is used it gets the date of that Thursday. Another approach is to always label the
resulting point as Friday and also use the last available value even if its not Thursday.
Create daily grid and fill in so Friday is filled in with prior value if Friday is NA.
views number
2009-07-03 934280 1246365403
2009-07-10 1011144 1247135553
2009-07-17 1083059 1247845824
8 Reading Data in zoo
Example 6
Input class: Text file/connection (comma-separated with header).
Input index: ‘factor’s with labels indicating dates (column 1) and times (column 2).
Output class: Multivariate ‘zoo’ series.
Output index: ‘chron’ (from chron) or ‘POSIXct’.
Strategy: Three versions, all using vector index = 1:2.
Without FUN, hence the index columns are pasted together and then passt do as.POSIXct()
because tz and format are specified.
R> z3 <- read.zoo(text = Lines, sep = ",", header = TRUE,
+ index = 1:2, tz = "", format = "%d.%m.%Y %H:%M")
R> z3
Example 7
Input class: Text file/connection (space-separated with header).
Input index: ‘factor’s with labels indicating dates (column 1) and times (column 2).
Output class: Multivariate ‘zoo’ series.
Output index: ‘POSIXct’.
Strategy: Due to standard date/time formats, only index = 1:2 and tz = "" need to be
specified to produce ‘POSIXct’ index.
V2 V3 V4 V5
2010-10-15 13:43:54 73.8 73.8 73.8 73.8
2010-10-15 13:44:15 73.8 73.8 73.8 73.8
2010-10-15 13:45:51 73.8 73.8 73.8 73.8
2010-10-15 13:46:21 73.8 73.8 73.8 73.8
2010-10-15 13:47:27 73.8 73.8 73.8 73.8
2010-10-15 13:47:54 73.8 73.8 73.8 73.8
2010-10-15 13:49:51 73.7 73.7 73.7 73.7
10 Reading Data in zoo
Example 8
Input class: Text file/connection (space-separated without header).
Input index: ‘factor’ with labels indicating dates.
Output class: Multivariate ‘zoo’ series, with separate columns depending on column 2.
Output index: ‘Date’.
Strategy: Non-standard na.strings format needs to be specified, series is split based on
second column, and date format (in column 1, default) needs to be specified.
A B C
2010-10-13 23 12 124
2010-10-14 43 54 65
2010-10-15 43 NA 65
Gabor Grothendieck, Achim Zeileis 11
Example 9
Input class: Text file/connection (comma-separated with header).
Input index: ‘factor’ with labels indicating date/time.
Output class: Univariate ‘zoo’ series.
Output index: ‘chron’ (from chron) or ‘POSIXct’.
Strategy: Ignore first two columns by setting colClasses to "NULL". Either produce ‘chron’
index via as.chron() or use all defaults to produce ‘POSIXct’ by setting tz.
Example 10
Input class: Text file/connection (space-separated with non-matching header).
Input index: ‘factor’ with labels indicating date (column 3) and time (column 4).
Output class: Multivariate ‘zoo’ series.
Output index: ‘chron’ (from chron) or ‘POSIXct’.
Strategy: skip non-matching header and extract date/time from two columns index = 3:4.
Either using sequence of two functions FUN and FUN2 or employ defaults yielding ‘POSIXct’.
V1 V2 V5 V6 V7 V8
(01/01/11 00:30:00) 1 1 5482.09 7670.81 2316.22 5465.13
(01/01/11 01:00:00) 2 1 5178.33 7474.04 2130.30 5218.61
(01/01/11 01:30:00) 3 1 4975.51 7163.73 2042.39 5058.19
(01/01/11 02:00:00) 4 1 5295.36 6850.14 1940.19 4897.96
(01/01/11 02:30:00) 5 1 5042.64 6587.94 1836.19 4749.05
(01/01/11 03:00:00) 6 1 4799.89 6388.51 1786.32 4672.92
V1 V2 V5 V6 V7 V8
2011-01-01 00:30:00 1 1 5482.09 7670.81 2316.22 5465.13
2011-01-01 01:00:00 2 1 5178.33 7474.04 2130.30 5218.61
2011-01-01 01:30:00 3 1 4975.51 7163.73 2042.39 5058.19
2011-01-01 02:00:00 4 1 5295.36 6850.14 1940.19 4897.96
2011-01-01 02:30:00 5 1 5042.64 6587.94 1836.19 4749.05
2011-01-01 03:00:00 6 1 4799.89 6388.51 1786.32 4672.92
Gabor Grothendieck, Achim Zeileis 13
Example 11
Input class: ‘data.frame’.
Input index: ‘Date’.
Output class: Multivariate ‘zoo’ series.
Output index: ‘Date’.
Strategy: Given a ‘data.frame’ only keep last row in each month. Use read.zoo() to
convert to ‘zoo’ and then na.locf() and duplicated().
Date A B C D
1 2009-12-31 4.9 18.4 32.6 77.0
2 2010-01-29 5.1 17.7 NA NA
3 2010-01-31 5.0 NA 32.8 78.7
4 2010-02-26 4.8 NA NA NA
5 2010-02-28 4.7 18.3 33.7 79.0
6 2010-03-31 5.3 19.4 32.4 77.8
7 2010-04-30 5.2 19.7 33.6 79.0
8 2010-05-28 5.4 NA NA 81.7
9 2010-05-30 NA NA 34.5 NA
10 2010-05-31 4.6 18.1 NA NA
A B C D
2009-12-31 4.9 18.4 32.6 77.0
2010-01-31 5.0 17.7 32.8 78.7
2010-02-28 4.7 18.3 33.7 79.0
2010-03-31 5.3 19.4 32.4 77.8
2010-04-30 5.2 19.7 33.6 79.0
2010-05-31 4.6 18.1 34.5 81.7
14 Reading Data in zoo
Example 12
Input class: Text file/connection (space-separated without header).
Input index: ‘factor’ with labels indicating dates.
Output class: Univariate ‘zoo’ series.
Output index: ‘Date’.
Strategy: Only keep last point in case of duplicate dates.
Example 13
Input class: Text file/connection (comma-separated with header).
Input index: ‘factor’ with labels indicating date/time.
Output class: Multivariate ‘zoo’ series.
Output index: ‘POSIXct’ or ‘chron’ (from chron).
Strategy: Dates and times are in standard format, hence the default ‘POSIXct’ can be
produced by setting tz or, alternatively, ‘chron’ can be produced by setting as.chron() as
FUN.
time.step.index value
2009-11-23 15:58:21 23301 800
2009-11-23 15:58:29 23309 950
R> z2 <- read.zoo(text = Lines, header = TRUE, sep = ",", FUN = as.chron)
R> z2
time.step.index value
(11/23/09 15:58:21) 23301 800
(11/23/09 15:58:29) 23309 950
16 Reading Data in zoo
Example 14
Input class: Text file/connection (space-separated with header).
Input index: ‘factor’s with labels indicating dates (column 1) times (column 2).
Output class: Univariate ‘zoo’ series.
Output index: ‘chron’ (from chron).
Strategy: Indicate vector index = 1:2 and use chron() (which takes two separate argu-
ments for dates and times) to produce ‘chron’ index.
Example 15
Input class: Text file/connection (space-separated with header).
Input index: ‘numeric’ year with quarters represented by separate columns.
Output class: Univariate ‘zoo’ series.
Output index: ‘yearqtr’.
Strategy: First, create a multivariate annual time series using the year index. Then, create
a regular univariate quarterly series by collapsing the annual series to a vector and adding a
new ‘yearqtr’ index from scratch.
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4 1994 Q1 1994 Q2
566 443 329 341 344 212 133 112 252 252
1994 Q3 1994 Q4
199 207
18 Reading Data in zoo
Further comments
Multiple files can be read and subsequently merged.
Affiliation:
Gabor Grothendieck
GKX Associates Inc.
E-mail: [email protected]
Achim Zeileis
Universität Innsbruck
E-mail: [email protected]