0% found this document useful (0 votes)

17 views

Characteristics of Collaboration in the Emerging Practice of Open Data Analysis

Uploaded by

d.leoger1469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Characteristics of Collaboration in the Emerging Practice of Open Data Analysis

Uploaded by

d.leoger1469

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

Characteristics of Collaboration in the Emerging Practice of

Open Data Analysis
Joohee Choi Yla Tausczik
University of Maryland University of Maryland
College Park, MD, USA College Park, MD, USA
[email protected] [email protected]

ABSTRACT This democratization of data science has encouraged data sci-

The democratization of data science and open government entists, broadly construed, to address problems that promote
data initiatives are inspiring groups from civic hackers to social good [6]. The movement toward open data and the ef-
data journalists to use data to address social issues. The forts taken by governments and public interest organizations
analysis of open government data is expected to encour- like the World Bank have also pushed in this direction. Open
age citizens to participate in government as well as to im- government data is used by civic hackers, often volunteers
prove transparency and efficiency in government processes. with software development skills, to take a “creative and of-
Through interviews and survey responses we gathered infor- ten technological approach to solving civic problems” [34].
mation on forty projects that involved the analysis of open Organizations, such as DataKind and Data Science for So-
data. We found that collaborations were interdisciplinary, cial Good coordinate volunteers to work with data for good
small in scale, with low turnover, and synchronous commu- causes. With the rise of data journalism [28] and the increas-
nication. Most of the projects asked exploratory questions ing availability of open data, data journalists use data to im-
and made use of descriptive statistics and visualizations. We prove government transparency and investigate social issues.
discuss how these findings contribute to an understanding of
In this paper we study existing uses of open data for social
the emerging practice of open data analysis and to a broader
good in order to gain a better understanding of this emerg-
understanding of open collaboration.
ing area of coordinated action. We are interested in both how
Author Keywords these projects are organized and the sorts of questions which
Open data, Data analysis for social good, Coordinated action they study. These projects involve many different types of
actors: independent data scientists, civic hackers, data jour-
ACM Classification Keywords nalists. These actors come from many different backgrounds:
H.5.3. Group and Organization Interfaces: Organizational software development, government, non-profit. There are
design many different sources of data and many pressing social prob-
lems. However, all individuals engaged in this work face the
INTRODUCTION shared challenge of analyzing messy data for intangible out-
Data science is growing at a rapid rate, fueled by greater avail- comes. Through interviews and surveys we gather individu-
ability of data and advances in the tools and techniques used als’ experiences in their own analysis of open data.
to analyze it. In both industry and the public sector, busi-
nesses, governments, journalists, and activists increasingly RELATED WORK
pursue data-driven approaches to decision-making. Advances Use of Open Government Data
in technology have also helped to “democratize data science” A growing number of governments provide data to the gen-
by making data science easier and more accessible [6]. With eral public through websites and online portals [3]. In the
less training and cost, individuals can now make use of data United States, President Obama’s 2009 Memorandum on
wrangling tools (e.g. Open Refine), pre-packaged machine Transparency and Open Government directed federal agen-
learning algorithms, and data exploration tools (e.g. Tableau). cies to provide internal data to the general public by disclos-
Many of these technologies make integration of data science ing it online. Jetzek [15] identified two major goals for the
and the web easier, such as the storage of data on the web open data movement in government: to promote democracy
with cloud computing and interactive visualization with tools and to capitalize on the power of open innovation. Open gov-
like D3.js or Google fusion tables. ernment data increases transparency by allowing public over-
sight of government, and it encourages participation by pro-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
viding a new avenue for the general public to become more
for profit or commercial advantage and that copies bear this notice and the full cita- involved in governance. These two aspects, transparency and
tion on the first page. Copyrights for components of this work owned by others than participation, help to promote democracy. In addition, open-
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission ing data allows the government to outsource data science to
and/or a fee. Request permissions from [email protected]. the crowd. This can efficiently make government services
CSCW ’17, February 25-March 01, 2017, Portland, OR, USA and technology better as well as fueling entrepreneurship and
© 2017 ACM. ISBN 978-1-4503-4335-0/17/03. . . $15.00 innovation [15].
DOI: http://dx.doi.org/10.1145/2998181.2998265

835
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

This new abundance of government data provides fodder for stages, from problem formulation to data collection through
civic hacking. Civic hackers make use of open data to build analysis to conclusion [24]. Collaboration and communi-
software applications with the aim of providing transparency cation between domain scientists and statisticians is impor-
and better understanding of government functions. More- tant throughout all stages, but particularly during the problem
over, there is an emerging community of non-profit organi- formulation period. Because the domain scientist may not
zations, startups, independent information technologists and clearly formulate the problem, the goal of the statistician is
volunteers engaged in this analysis [17]. One example is a to listen and draw out the nature of the problem, then refor-
local community group in Chicago that created an interac- mulate it in a way that can be tested statistically [18]. In this
tive visualization of all lobbyist activity in the city, includ- way the statistician establishes “a mapping from the client’s
ing lobbyists, lobbying firms, clients, and actions sought by domain to a statistical question” [12]. Chatfield [5] argues
lobbyists from the city (ChicagoLobbyists.org). Other exam- that statistical tasks are tricky because the context of the data
ples include hackathons, such as the Green Hackathon, that matters: there is often messiness in the data, and the objec-
have brought together 20-60 individuals with broad expertise tives of the analysis are not necessarily clear. The statistician
to work on societal issues and has resulted in software prod- is encouraged to ask many questions of the domain scien-
ucts that use open data [39]. One such application combined tist to gain background information and context to understand
supply chain information with child labor data from the UN the data. Because communication during this period is both
to provide an estimate of the likelihood that child labor was difficult and critical, Chatfield suggests the following: “from
used in the manufacturing of specific products. bitter experience, I particularly advise against consulting by
telephone or electronic mail, where one cannot see the data”.
Several papers from the HCI and CSCW communities have In this type of collaboration domain scientists provide under-
described some isolated uses of open government data within
standing of the problem, the goals, and the data; statisticians
specific domains. For example, researchers have studied the
provide the technical skills to construct the appropriate anal-
use of Geographic Information System (GIS) data to em-
ysis and extract meaningful results.
power regional communities [35, 38] and tax data to engage
citizens with the tradeoffs in government spending [19]. Oth- Open Science, Open Collaboration, and Open Innovation
ers have studied certain practices in the area of open data.
Open data analysis shares commonalities with several forms
Bohner and Disalvo [2] interviewed the leaders of civic tech
of collaboration in which sharing and openness are important
in Atlanta, finding that openness in government data is more a
tenets. There is a movement toward more open sharing in
spectrum than a binary. Erete and colleagues [9] showed that
science, particularly of data. Data sharing holds scientists ac-
non-profit organizations use data-driven stories as arguments
countable by allowing others to confirm findings. Data shar-
to potential funders and stakeholders. As yet, there have been
ing also accelerates scientific progress through the reuse of
no attempts to give a broad overview of the analysis of open
a valuable resource [23]. In spite of these advantages, data
data from a CSCW perspective.
sharing in science is difficult [1]. One obstacle is the will-
ingness of scientists to share their data. There is a tradeoff
Data Science and Data Analysis between cooperation and openness on one hand and compe-
Advances in hardware and software technologies have led to tition and secrecy on the other [36], and different scientific
a rapid increase in the amount of data collected. Companies disciplines adopt different norms of openness. Another dif-
and organizations are recognizing the advantages of using this ficulty is in the use of shared data. Scientists must assess
data in decision making and hiring people with the skills to whether a given dataset is relevant, whether they can under-
exploit this data. This has led to the burgeoning field of “data stand the data, and whether they trust the data before deciding
science”. Despite the recognition of the importance of data whether or not to reuse data [10]. Data often lacks adequate
science and the need to train data scientists, the field and skills documentation to understand the context in which it was cre-
are fuzzily defined [31]. Data scientists are expected to make ated, its format, and the meaning of its fields [1]. Understand-
meaning from data using a broad collection of skills. There is ing the data often requires interaction with one of its creators
little to no academic research about data scientists and their [32]. Open data analysis, like the movement toward open sci-
work practices. Instead a majority of the discussion has come ence, involves the sharing and reuse of open data.
from position articles in popular media. Harris and colleagues
[13] have argued that data scientists come from many differ- Open data analysis also involves the joint production of a
ent backgrounds that draw analytic skills from five different shared artifact. Forte and Lampe [11] define open collabo-
ration as online collaboration that satisfies four conditions. It
areas: business, machine learning, math, programming, and
must produce a shared artifact, collaboration must be sup-
statistics. A perfect data scientist is often described as a ‘uni-
ported by a technological platform, this platform must al-
corn’ because it is impossible for an individual to have all the
low for contributors to enter and exit the collaboration eas-
skills needed. Renowned data scientists have urged their field
to make use of more teams because it is so difficult for any ily and the platform must allow for flexible social structures.
individual to gain a complete skillset [29, 30]. The two most studied, prototypical examples of open col-
laboration are encyclopedia editing on Wikipedia and open
Collaboration is common in the practice of statistics, one of source software development. Easy entry into a collabora-
the parent disciplines of data science. One frequent type tion on technologically-mediated collaboration platforms al-
of collaboration is between a set of domain scientists and low large-scale participation [11]. Successful open source
one or more statisticians [16]. Data analysis has multiple projects can attract tens of thousands of participants [26].

836
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

However, easy exit means turnover is high in open collab- Participants either communicate at the same time or at dif-
oration [8]. On Wikipedia, a large majority of editors only ferent times (synchronicity), communication is remote or in
make a few edits on one occasion [4]. While technologically- person (physical distance), many people participate or few
mediated communication helps to facilitate large-scale col- people participate (scale), collaborations are short-term or
laboration by reducing the costs of communication, it may long term (planned permanence), turnover is high or low.
not be well suited for collaboration that requires high levels of Some collaborations draw participants from many different
iterative feedback between participants [11]. Technologically backgrounds (number of communities of practice), and these
mediated communication often lacks the richness needed to participants have different norms, practices, expertise and
establish common ground and support tightly coupled work tools. Work practices can be established, routine, and well
[27]. understood or they may be unestablished and in development
(nascence).
Open data analysis also shares similarities with the Do-It-
Yourself (DIY) and maker movements. The maker movement MoCA can be used to describe meaningful differences in
is the practice of working with materials (e.g. electronics, work practices. For example, a collaboration in which in-
fabrics) and fabrication tools [22]. Some have argued that dividuals come from many different communities of practice
it represents the “democratization of technological practice” entails a culturally diverse group with different norms, prac-
[33]. Like work with open data, many participants embrace tices, tools, and languages. Members can make use of their
a hacker ethos in which creating, playfulness, and tinkering different backgrounds by engaging in work in complementary
are encouraged [37]. Offline collaborations in hackerspaces ways, but diverse backgrounds may also cause more difficulty
are as important as online spaces [33, 22]. It has a mix of in working together. We use MoCA to address the specific
lay experts and professionals and has been described both as question:
a hobby activity as well as a form of open innovation that
leads to the creation of professional manufacturing products Research Question 1: How do open data analysis groups co-
[22]. Wang and Kaye [37] argue that it has looser commu- ordinate their analytic activities?
nity boundaries than traditional communities of practice and Open data analysis projects use open data to produce a tan-
describe it as a collection of practice. gible artifact, such as a tool or a report. A second goal of
this project was to understand what types of artifacts were
RESEARCH QUESTIONS being created. The intent of analyzing data is to produce in-
Given the large quantities of data and the complexity and mul- sights, such as identifying trends, observing anomalies, and
tidisciplinary nature of data analysis, collaboration is likely drawing meaningful inferences. These insights can be used
to play an important role in the analysis of open government to reflect on government practices and to suggest changes in
data. Governments make hundreds of datasets available and these practices.
the most interesting and valuable analyses often come from There are multiple approaches for extracting insights from
combining several of these in a novel way [14]. Thus, no data. By building data processing, summarization, and visu-
one individual can single-handedly analyze all of the avail- alization tools, projects make data more accessible to others.
able data. Even the work involved for a single project can Tools provide the means for an audience to find their own in-
be substantial and may require a variety of skills and knowl- sights in the data and to create their own meaning from these
edge. Open data is often provided in bad formats and must be insights. Alternatively, authors analyze data and summa-
extracted, cleaned, and processed; this requires coding skills. rize their analyses in reports. In reports, authors present the
To test hypotheses and claims requires statistical knowledge. insights they discovered, their interpretation and their con-
Context is often critical to understand data, to understand how clusion to an audience. There are different types of analy-
statistical models fit into research questions, and to interpret ses, some of which are more complex than others [21]. Ex-
the results of these models [5]. Thus, we expect that individu- ploratory analyses identify trends, correlations, or relation-
als working with open data would often work in collaboration ships in the data. These analyses can be used to generate
with others. ideas, but have not been formally evaluated. Inferential anal-
One the goals of this project was to understand how collab- yses evaluate whether a pattern will continue to hold for new
oration unfolds in open data analysis projects. Open data samples. Finally predictive analyses use a set of features to
analysis shares commonalities and differences with multiple predict an outcome of interest for a single person or unit. The
forms of collaboration in which sharing and openness are im- latter two, inferential and predictive analyses, require sub-
portant tenets, such as open science, open collaboration (e.g. stantially more skill to apply, but can provide more reliable
open source software), and the maker movement. To address conclusions. We categorized projects based on the type of ar-
this question, we employ the Lee and Paine [20] Model of tifact (tool or exploratory, inferential, or predictive analysis)
Coordinated Action (MoCA). MoCA is a descriptive model they created to address the question:
used to understand collaborative work; it expands Johansen’s Research Question 2: What type of artifacts are being pro-
1988 time-space matrix with two dimensions of synchronicity duced by open data analysis projects?
and physical distribution to seven dimensions including syn-
chronicity, physical distribution, scale, planned permanence,
turnover, number of communities of practice, and nascence.
Collaborations can be characterized along each dimension.

837
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

METHOD and relevant Facebook Groups. 32 participants began the sur-

vey and met the inclusion criteria of completing at least one
Interviews project using open government data. The only incentive given
We conducted semi-structured interviews in order to under- to participants to complete a lengthy survey that takes 20-
stand the current practices of open data analysis. Each par- 30 minute to complete was being entered into a raffle to win
ticipant was asked to describe one particular project that they $50 dollar Amazon gift certificate; as a result, only 22 people
had worked on. Interview questions asked about 1) the pur- completed all survey questions1 .
pose and goals of the project (e.g. “What problem (goal) does
the project address?”, “What are the questions that you asked Participants
about the data?”, “What story did you try to emphasize in In total, forty individuals participated in this study. These par-
the dataset?”) and 2) whether they collaborated with others ticipants came from a wide range of professions, from soft-
and, if so, how they collaborated with each other during the ware development to journalism to data analysis and public
process (e.g. “Who did you work with?”, “What is each par- service. Many were students and/or researchers. Most inter-
ticipant’s role?”, “How did you communicate with each other viewees had used open data in the context of civic hacking (7,
during the project?”). Since the goal of this exploratory re- 38%) or data journalism (7, 38%). The few exceptions were,
search was to gain a broad understanding of current practices, public officials who had worked with community members
we actively recruited four different groups of individuals who who were analyzing government data (2, 11%) and profes-
work with open data: data journalists, civic hackers, public sional data scientists who had been hired to use open data
officials, and professional data scientists. for specific projects (2, 11%). The largest number of sur-
vey participants used data in the context of civic hacking (9,
We recruited participants by sending direct email invita- 41%). Other survey participants included students and/or re-
tions to data journalists (https://jplusplus.github.io/global- searchers (7, 31%), journalists (3, 14%), data scientists (2,
directory/); sending direct messages to participants in 9%), and unspecified others (2, 9%). Each participant was
the civic hacking organization Code for America; send- asked to select a recent project that they had actively partici-
ing direct messages to authors of data science blogs pated in, survey participants were explicitly asked to discuss
(http://source.opennews.org/en-US); and advertising on one their most recent completed project.
author’s Twitter accounts. We also used snowball sampling
to broaden our base of interviewees. In total, 22 people par- Three quarters of participants were male (Interviews: 76%,
ticipated in the interview study, though 4 were excluded from Survey: 77%). On average participants were in their 30s (In-
the final analysis because they did not use data that was open. terviews: 35% 20-30 yrs., 47% 30-50 yrs., 18% over 50 yrs.,
For these 4 excluded cases, the analysis results were open to Survey: M = 33, range 22 to 50 yrs.). We specifically re-
the public but the dataset used for each project was created cruited U.S. participants to obtain a relatively homogeneous
privately and was not released to public. On average the in- sample. Different political climates of different countries are
terviews lasted for around 35 minutes, with the minimum of expected to influence the use of open data. All of the inter-
20 minutes and maximum 50 minutes. One participant was viewees lived in United States at the time of interviews. 81%
interviewed in-person, four by video, eleven by audio, and of survey participants reported that they currently lived in the
two by email. United States (other participants were from South Korea and
Singapore). Given the low sample size we did not exclude
The authors used iterative coding based on the fundamental non-U.S. participants. Their responses were not substantively
idea of grounded theory [7]. Initial codes were developed different from U.S. participants. All survey participants had
by one of the authors based on relevant dimensions of col- enrolled in some college classes and 50% had a Master’s de-
laborative work practice [20], allowing for open coding of gree or higher (we did not ask educational level for intervie-
additional dimensions that might be specifically relevant to wees).
data analysis. The authors then talked through whether codes
should be added or removed and settled on a final set of codes RESULTS
Research Question 1: How do open data analysis groups
Survey
coordinate their analytic activities?
To expand the number of study participants and to comple-
One major goal of this study was to characterize the specific
ment interview results with quantitative responses we sur-
nature of collaboration in the emerging practice of open data
veyed additional participants. The questionnaire focused on
analysis. Collaboration was an important part of most open
the same two aspects of open data analysis projects: the goal
data analysis projects (89%). In this section, we use Lee and
of the project and the way in which people worked together on
Paine’s [20] Model of Coordinated Action to describe col-
each project. Each participant was asked to answer the ques-
laboration in open data analysis projects along seven dimen-
tions for the last completed project in which they had used
sions: scale, planned permanence, turnover, number of com-
open government data. The survey questions were developed
munities of practice, synchronicity, physical distribution, and
from the responses to the earlier interviews.
nascence. These dimensions were coded from responses to
We recruited participants through the same sources listed 1
While 22 stayed until the end of the survey, the number of par-
above as well as by posting recruitment message on online ticipants for each question varies slightly since we did not exclude
forums and communities (e.g. http://reddit.com/r/opendata) responses from the dropouts

838
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

DIMENTION ITEM MEDIAN RANGE

Scale How many people were involved in the project? 3 1 - 40
Planned Permanence How long did the project last? (Days) 90 2 - 1,440
DIMENTION ITEM MEAN SD
How frequently did people join the project after it was started? 2.04 1.08
Turnover
How frequently did people leave the project before it was finished 1.79 1.10
Felt uncertain about project outcomes while working on the project 3.4 1.0
Nascence Had to make adjustments to their plans for analysis 3.5 1.1
Lacked important context to understand the data at the beginning 3.3 0.8
Openness to join the project 2.7 1.7
Project Openness Accessibility of data, code, and materials 3.4 1.5
Accessibility of end products 3.9 1.5
DIMENTION ITEM COUNT PERCENT
1+ years of training in more than two skills (inferential statistics, ma- 14 67%
Communities of Practice chine learning, software development)
1+ years of experience in more than two domain areas (government, 16 76%
journalism, activism, social services and non-profits)
In person and synchronous communication 17 74%
Synchronicity
Asynchronous communication 6 26%
Mostly in person or a substantial mix of in person and remote 13 57%
Physical Distribution
Mostly remote communication 10 43%
Citizens of a specific region, a specific group of citizens 8 40%
A member of the group 7 35%
Beneficiary
Government 3 15%
Specific client 2 10%
Online 10 42%
Group Formation Through work or friends 9 38%
Both online and through work and friends 5 21%
Table 1. Descriptive statistics for survey responses to seven MoCA dimensions (bolded) and additional items. 5 point scales were used to measure
Turnover (unipolar, “Never” to “Constantly”), Nascence (bipolar, “Strongly Disagree” to “Strongly Agree”), and Project Openness (unipolar, “Com-
pletely Open” to “Completely Closed”). The number of communities of practice were calculated based on participants ratings of their own and their
group members areas of expertise. All survey data was retained, even for participants who dropped out, some sample sizes are larger than 22.

semi-structured questions (e.g. “How did you communicate lyst at a governmental institution, contacted a colleague who
with each other during the project?”, “Did you work remotely had expertise with human resources data to help provide con-
or in the same place?”) asked during the interviews and from text to understand the data (Participant 4). Participant 6, a
responses to specific questions asked to survey participants civic hacker, said that that they often looked for people with
(e.g. “How frequently did other people join your project af- relevant domain expertise (e.g. labor law, health care) from
ter it was started?”). While coding interview transcripts, an within their organization when starting a project.
additional dimension, beneficiary of the project, emerged as
Survey participants were evenly split, with participants find-
another important aspect of collaboration. We included it as
ing collaborators through personal connections, such as work
an eighth dimension.
or friends (38%), online through organizations like Code for
America (42%), or both (21%). While most groups were
Scale
small, there were a few exceptions; one of our survey par-
Scale refers to the size of a group of practice. We asked in- ticipants indicated that he worked on the project in a group of
terviewees and survey participants how many people worked 40 people.
on their project in some capacity, such as by providing feed-
back, providing guidance, or conducting analyses. Both in- Another factor that affected scale was the degree to which
terviewees and survey participants reported working on data project groups were open. Some projects made their materi-
analysis projects in small teams (Table 1). So many projects als open, allowed anyone to join, and made their end prod-
may have been small in part due to the fact that participants ucts open; others were only partially open. Participant 7 used
found others to work with primarily from people they already GitHub to make all code publicly available both during and
knew. Several interviewees described seeking out colleagues after the project. In contrast, Participant 14, a data journalist,
and friends as collaborators because they knew these people worked with two other journalists; they compiled data from
had the expertise they needed to analyze the data. For ex- multiple sources including open and (previously) closed data,
ample Participant 13, a data journalist who worked with su- only making their results available once they were finished.
per PAC donation data, contacted a colleague at an organi- On average, projects made their end products “Mostly open”.
zation which he knew had experience reporting on political Groups were bimodal in terms of making their materials open
donations. Another interviewee, who worked as a data ana- and allowing anyone to join. Data journalists on average

839
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

made their project data, code and materials open, but did not
allow anyone to contribute to the project. By contrast, civic
hackers were more likely to make their project data, code, and
materials open and to allow anyone to join. Larger groups
were formed when project materials were made available and
anyone could join.

Turnover
Turnover refers to the frequency with which old members Figure 1. Mean of level of experience (standard error) estimated for each
area of expertise. Participants rated their own expertise (Self) and the
leave the group or how often new members join. In gen- most expert other member of the group for each area (Others).
eral, member turnover was rarely identified in the interviews,
which is consistent with our survey findings. We asked survey
participants to rate how frequently other people joined or left
their projects. On average, survey participants reported that me know the three things that she thought were most impor-
other people “Rarely” joined or left a project after it began. tant out of all seven forms”. This helped the team focus their
Many civic hacking projects (e.g. Participant 6 and 7) were attention on the most interesting parts of the data. She also
almost entirely open throughout the whole life cycle of the helped them understand what data was missing: only contri-
project. However, limited resources often prevented groups butions of at least $100 were recorded in the data.
from actively responding to and incorporating feedback from
others during development. The role of domain and technical experts was similar to the
roles of “thinker and doer” [25], where the domain experts did
Planned Permanence more of the thinking and framing of the work and the techni-
It was difficult to address planned permanence as described cal expert did more of the implementation and conducted the
by the MoCA framework because of the decentralized, in- analyses.
formal nature of open data analysis. The majority of groups Group members came from many different backgrounds.
did not start with fixed end date for their projects. Hence, This resulted in groups that had a wide variety of skills and
rather than ask how long projects were intended to last, we experience. Survey participants rated their own and other
asked participants how long their projects had actually lasted. team members’ levels of expertise (Figure 1, Figure 2). More
Projects lasted anywhere from 2 days to nearly four years. than half of groups had group members with at least 1 year of
Most projects had a specific end goal. Most of the projects training per subject in two or more specialized areas of infer-
reached that goal, creating a preliminary or finalized tool or ential statistics, software development or machine learning.
report. These are areas of skill that come from very different schools
of training. Individuals skilled at software development are
Number of Communities of Practice unlikely to be skilled at inferential statistics. Most groups
A community of practice is a collection of people who share also had group members with at least one year of experience
norms, practices, expertise, and tools. Participants and their in two or more domain-specific areas, such as government,
collaborators came from multiple communities of practice, journalism, activism, or non-profits. As we described in the
from software development to data science to journalism section on scale, collaborators were often chosen specifically
to city government. In describing the members of their because they had the complementary expertise needed to un-
groups, participants described collaborators with heteroge- derstand the data.
neous backgrounds. Furthermore, people with different back-
grounds played different roles within the group. Not only are many different communities of practice actively
engaged with open government data, even within groups there
Interviewees reported that within almost all of the projects are people from many different backgrounds. These people
there was at least one person who acted as a domain expert bring together a diversity of skills and practices that make
and at least one person who acted as a technical expert. Do- such groups highly interdisciplinary. This interdisciplinarity
main experts provided information about the larger context is in part intentional, with people from different backgrounds
of the data, including explaining what was and was not cap- playing different roles within the group.
tured by the data, identifying other sources of data, and iden-
tifying interesting and meaningful questions to ask with the Synchronicity and Physical Distribution
data. Technical experts completed most of the work and pro- Many interviewees made use of regular synchronous com-
vided guidance on which analytic methods to use. They also munication. Participant 15 and her collaborator spoke on
helped shape the questions by using “quantitative thinking”. the phone regularly while they were trying to design and
Participant 8, a front-end web developer, described how their scope the project. Participant 15, who had more experience
team worked with a city employee who understood regula- with data science projects, was in charge of coordinating the
tory frameworks in order to parse the data and focus in on the project. She worked with her collaborator to identify a project
most important parts. He said “she was the domain expert. goal and an appropriate data set. These conversations helped
I am just a software engineer ... There were like 7 different them to shape the project into one that would provide tangi-
forms to fill in to enter campaign finance data and there were ble benefits and that could be carried out in the few months
tons of different ways to fill out the seven forms ... she let they had to work on the project. Similarly, Participant 18 met

840
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

to make adjustments to their plans for analysis, and whether

they were uncertain about project outcomes while working
on the project. On average, survey participants leaned toward
agreement that they felt uncertainty in these areas. However,
this was a very weak effect and many participants did not re-
port uncertainty.
Differences in the level of uncertainty which individuals felt
can be explained in a few ways. In retrospect, participants
Figure 2. Mean of level of experience (standard error) estimated for tended not to see as much uncertainty as they may have expe-
each subject domain. Participants rated their own expertise (Self) and rienced during the process. At the beginning of the interview,
the most expert other member of the group for each area (Others). Participant 18 reported that the project goals, questions, and
intended outcomes had remained relatively constant from the
beginning of the project. Later, when we asked the partic-
ipant to walk through the stages of the project, he changed
weekly by phone with collaborators at a different site. Par-
his mind and described the ways in which their thinking and
ticipant 18, a government employee, had identified a data set
goals had changed. Multiple participants indicated that uncer-
and a data science problem that he knew could be put into
tainty was an inherent part of data analysis. Some may have
practice by one of their departments. He acted as an advisor
viewed this level of uncertainty as expected, a normal part of
and go-between because he understood the data and project
the process, while others viewed it as unexpected. Participant
goals. The collaborators at the other site were tasked with
10 said that making changes, especially to the questions ad-
carrying out the analyses. These collaborators had technical
dressed in the project, is inevitable: “Doing data analysis is
skills but lacked experience with this type of data. These con-
way of asking questions more than answering the question.
versations were used to discuss ongoing analyses. Through-
But it is not the final answer. There is always possibilities to
out the project, the collaborators used these phone calls to
reframe the question.” One answer can lead to the discovery
ask Participant 18 specific questions about the data to help
of another question, so there may not be a definite end to such
orient themselves and understand the context of the data bet-
a project. This iterative process of discovery was especially
ter. After making progress on their analyses, the collabora-
evident among data journalists. Participant 13 reported that
tors would present Participant 18 with their progress in order
they started out with a tentative hypothesis to test with a data
to get feedback on their results. They also arranged a face-to-
set and continuously improved this hypothesis. Participant 17
face meeting for all group members to meet. These conver-
also mentioned “conducting analysis until” they thought they
sations were valuable because they provided the technical ex-
“found newsworthy story from the data”. From these results it
perts with guidance, feedback, and context from a domain ex-
is clear that many participants experience some degree of un-
pert that helped them to do their work. Synchronous commu-
certainty in their work. It is not clear whether this uncertainty
nication in particular was critical for some projects because
is an inherent part of data analysis or whether the uncertainty
of the interdisciplinary nature of the groups in which domain
will be reduced as open data analysis practices become more
and technical experts took on different roles.
established.
According to the survey, most of the groups relied on syn-
chronous information channels such as face-to-face meetings, Beneficiary of the project
audio calls, and/or video calls for communication. More than The intended beneficiary of a project emerged as another im-
half of the respondents had in-person meetings with group portant dimension that helped to shape collaboration. This
members. The rest of respondents answered that they mostly dimension is not in the MoCA framework; rather we include
communicated remotely. it in this study because it played a recurring role across mul-
tiple interviews. We define the beneficiary of the project as
Nascence the intended audience and/or user of the open data analysis
Lee and Paine [20] define nascence as the degree to which product(s). Many projects had an explicit beneficiary. The
a coordinated action is new and developing versus old and beneficiary might be a paying client who had come to a data
established. To measure the concept at an individual level, scientist or civic hacker with a request for a specific tool or
we operationalized the concept by asking the subjective as- analysis that they planned to use (e.g. Participant 3). The ben-
sessment of uncertainty felt by an individual participator in a eficiary might be an organization which had a particular need
group. As governments have started to make more data open or an idea of what type of tool or analysis could provide soci-
to the general public and easier to use [3], the analysis of etal value (e.g. Participant 18). In these cases, the beneficiary
this data has grown as an area of practice. Many of the com- provided strong guidance to ensure that the project succeeded
munities of practice it draws from—data science, civic hack- in creating useful products that met their requirements. Other
ing, data journalism—are themselves new and developing. To projects built tools and analyzed data for a general audience,
gauge nascence from the perspective of individuals, we asked often the citizens of a given region. Civic hackers spoke of
more specifically how much uncertainty they had felt when building tools to inform citizens and data journalists spoke of
working on their projects. In particular we were interested in writing articles for potential readers. In these projects, par-
whether they felt that they had lacked context to understand ticipants designed their tools and analyses with the general
their data or problem at the beginning, whether they had had audience in mind. With no specific beneficiary and trying to

841
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

TYPE OF END PRODUCT COUNT PERCENT EXAMPLE

Tool 14 45% A tool for browsing political contributions in the state of Illinois
Analysis Report 17 55% Does it take longer for people to get out of minimum wage jobs now?
Exploratory 13 42% Are people getting more parking tickets now?
Inferential 2 6% What is causing leakage in a manufacturing pipeline?
Predictive 2 6% Which inspection sites are likely to violate rules in the future?
Table 2. Distribution and examples of artifacts produced by open data analysis projects. Projects coded for all 18 interview participants and 13 survey
participants who optionally provided a link to their project materials.

appeal to a larger group, they had to make educated guesses exploratory analyses to draw insights from the data, while
about how to produce something that would effectively ap- only a few projects used inferential statistics or predictive
peal to their intended audience. statistics (Table 2). Exploratory analyses focused on find-
ing patterns in the data, such as trends over time, anoma-
Survey responses were somewhat consistent with interviewee
lies, or extreme values. Almost all of the exploratory projects
responses, with the largest number of survey participants in-
made heavy use of visualizations. For example, Participant
dicating that their intended audience were the citizens of a 12 investigated whether it takes longer to get out of minimum
region. This was followed by projects that built tools or con- wage jobs now than it did in the past. For this they created
ducted analyses for one of the members of their group. A a visualization of changes in the percentage of workers who
smaller percentage indicated that their audience was a spe- held minimum wage jobs now and in the past. They then used
cific client or a specific group, such as government official or these visualizations to make an argument that escaping mini-
agency. mum wage jobs does takes longer than in the past.
Research Question 2: What type of artifacts are being Predictive analyses were used to inform decisions. Partici-
produced by open data analysis projects? pant 4 constructed projections based on census data to plan
To understand what types of artifacts were produced by these out various scenarios in planning for the future. A local gov-
open data analysis projects we coded the interview transcripts ernment intended to place a limited number of language insti-
and project materials when provided. Two types of projects tutes around their county and this data project aimed to find
emerged: those that conducted statistical analyses to address an optimal distribution for these institutes to maximize both
a specific research question and those that built a tool for the number of people who would benefit as well as the diver-
end users to explore the data on their own. For projects that sity of immigrant communities they served.
conducted statistical analyses, transcripts and project materi- Through interviews and surveys we found that nearly half of
als were further coded to identify the type of question: ex- projects left it up to the end users to draw their own conclu-
ploratory, inferential, or predictive. sion while the other half drew conclusions that relied almost
Slightly under half of the projects built tools for end users exclusively on exploratory and descriptive statistics. Very few
(Table 2). These projects developed software programs or projects used sophisticated statistical analyses.
websites that made the data easier to use for others. Some
projects built tools for readers to explore the data. In New DISCUSSION
York City, one group built an interactive map using 311 cit- Through interviews and survey responses we gathered infor-
izen complaints so that readers could explore which neigh- mation on 40 projects that involved the analysis of open gov-
borhoods had the most rat-infested restaurants. Participant 3 ernment data. We characterized the way in which work was
helped create a visualization tool for port officials to monitor coordinated and we categorized the type of artifacts produced
real-time international shipping price data. Using this tool, by these projects. Three major themes emerged. One, groups
port officials could observe unexpected changes in prices that were typically small, with low turnover, and relied heav-
could help them detect fraud. This tool allowed end users to ily on synchronous communication. Two, interdisciplinarity
monitor changes in the data in near real-time. Other projects played an important part in the formation of groups and the
included tools to support data analysis by speeding up the pro- roles individuals played within these groups. Three, very few
cessing of data (e.g. file conversion between data files). These projects produced artifacts that used sophisticated statistical
tools empower end users to use data to come to their own con- methods such as inferential or predictive analyses.
clusions. Using interactive visualizations, end users can fo-
In these respects, open data analysis shares some similari-
cus in on specific data points, monitor trends over time, and
ties and differences with other forms of open collaboration.
make their own comparisons. The purpose of these types of
Like prototypical open collaboration (e.g. Wikipedia, open
projects is not to make an observation, to make an argument,
or to support a decision. Instead the purpose is to make it source software), the production of shared artifact was cen-
easier for others to use the data. While some of these projects tral to open data analysis; unlike prototypical open collab-
included visualization, none made use of statistical analyses. oration this shared artifact varied in the degree to which it
was open and work on this artifact was not universally sup-
The other half of projects aimed to extract insights from data, ported by a technologically mediated collaboration platform.
and these insights were often summarized in a report. The Open data analysis projects had different levels of openness.
vast majority of these projects used descriptive statistics or All projects made use of data that was at least partially open

842
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

and most made their end products open. However, they var- terdisciplinarity engender a high level of interdependence,
ied in whether the project was open to new collaborators and which in turn may explain why collaborations are typically
whether materials were open while work was taking place. small in scale and use synchronous communication. Many
Only some projects used GitHub, an online version control forms of technologically-mediated communication that help
system with social transparency designed for software devel- collaborations scale may be insufficient to support the itera-
opment [8]. Technologically mediated platforms with low tive feedback required by complex, interdependent work [27].
barriers to entry and exit and flexible social structures en-
able the large-scale, asynchronous, high-turnover collabora- Open data analysis is an emerging practice, in which the
tions typical in most open collaboration [11]. Inconsistent contributors, norms, methods, and artifacts are still develop-
norms about openness and a lack of a universal, technolog- ing. Currently we find that collaboration is interdisciplinary,
interdependent, small in scale, with low turnover, and syn-
ically mediated platform may partially explain why we ob-
chronous communication. We argue that these characteristics
served open data analysis collaborations which were small,
stem from the lack of a centralized, technologically mediated
with low turnover, and synchronous communication.
collaboration platform as well as the task demands inherent
Open data analysis shares more similarities with less proto- in reusing data and performing statistical analysis. We expect
typical forms of open collaboration such as the maker move- open data analysis as a practice to evolve rapidly. Collective
ment and open science. Similar to the maker movement, there norms develop over time and, while norms of openness and
is no universal technologically mediated collaboration plat- sharing are currently heterogeneous, they may converge to-
form. In the maker movement, as in open data analysis, col- wards greater openness. More openness together with the de-
laboration takes places through a variety of different means. velopment of a technologically-mediated collaboration plat-
Sharing of designs and ideas takes place in person or on a va- form to support data analysis might facilitate the larger-scale
riety of online websites (e.g. Ikea Hacks website, Instructa- collaborations typical of other forms of open collaboration.
bles) [33, 37]. Collaboration frequently takes place offline in A greater total quantity of work can be completed with larger
hackerspaces and Fab Labs [22]. In hackerspaces individuals collaborations.
exchange knowledge of fabrication techniques; these spaces
are used to collaborate, to learn and to teach [33]. The lack of Similarly, techniques, methods, and objectives also develop
central technologically-mediated collaboration likely shapes over time. On average, participants were highly educated,
and project groups had contributors with years of experience
practice both at a community and artifact level for both the
in relevant technical areas and subject domains. Despite these
maker movement and open data analysis. Decentralization
skills, research questions remained exploratory. The avail-
likely creates looser community boundaries; both activities
ability of data science technologies, which have lowered bar-
are better explained as collectives of practice rather than com-
munities of practice [37]. Decentralization also may explain riers to entry in data science, may not be enough to make so-
why collaborations are smaller in scale. phisticated analyses accessible even for well-educated people
[6]. For the few cases in which sophisticated analyses were
The nature of data analysis tasks may create demands that used, these projects were often modeled after other existing
constrain collaboration practices as well. Many projects orga- projects. It may take time to build up a collective repository
nized work interdependently to support interdisciplinary roles of ideas to support more complex methods and questions.
within groups. Domain experts and technical experts took on
the roles of thinker and doer, respectively, which required it- In this paper we characterized collaboration in open data anal-
erative feedback between these two types of experts. Using ysis using the Model of Coordinated Action [20]. This paper
data that was collected by someone else is difficult. This is is one of the first to apply MoCA to describe collaboration
one of the challenges that scientists face in the reuse of other for an emerging coordinated action. This model provided a
scientists’ data. Data often lacks adequate documentation to systematic framework to compare and contrast collaborative
understand the context in which it was created, its format, practices in open data analysis against other forms of collab-
oration. This paper demonstrates that MoCA is an effective
and its meaning [1]. Scientists often need to interact with
framework to make task- and platform-independent compar-
the original creators of the data in order to fully understand
isons. The largest challenge we faced in using MoCA is op-
it [32]. In open data analysis projects, domain experts who
erationalization of its seven dimensions. Nascence, in par-
have more familiarity with the data play an invaluable role
explaining to technical experts the meaning of data entries ticular, was difficult to measure. We chose to operationalize
and fields and assessing issues of data quality. Domain ex- nascence as the degree of uncertainty individuals felt in their
perts also acted as advisors, guiding research questions and work. However, it was difficult to untangle whether individ-
interpretation. Through back-and-forth discussions techni- uals felt uncertainty because of the inherent uncertainty in
cal experts provided new results while domain experts gave discovering meaning from data or because individuals were
feedback on these results. This pattern of feedback shares a trying to figure out which questions, methods, and tools to
use in their analyses. There are also important aspects of col-
resemblance to the back-and-forth communication between
laboration that fall outside the scope of MoCA. For example,
scientists and statisticians that helps statisticians turn scien-
we found that the intended audience of the project shaped
tific questions into statistical questions [12].
collaboration in open data projects. Future work, will be re-
Analysis of open data requires interdisciplinary skills that a quired to determine whether seven dimensions are sufficient
single individual rarely possesses. The task demands of in- to characterize coordinated actions.

843
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

Limitations and Future Work analysis of open government data is expected to encourage
The greatest limitation of this study is the low survey sample citizens to participate in government as well as to improve
size. We gathered survey data to provide quantitative data to transparency and efficiency in government processes. We
complement results from our interviews and to increase our found that interdisciplinarity was important and that groups
sample size. Even so, we were only able to recruit a small were typically small, with low turnover and relied heavily
number of survey participants, despite multiple strategies for on synchronous communication. We found that most of the
recruiting a larger survey sample including posting recruit- projects analyzing government data asked exploratory ques-
ment messages in multiple online locales, sending personal- tions and made use of descriptive statistics and visualizations
ized email messages, and providing a monetary incentive (al- rather than more sophisticated questions and approaches. The
beit a low one). One of the challenges in studying open data emerging practice of open data analysis faces many chal-
analysis is that it has not yet developed a unified community lenges going forward, including how to tackle more complex
of practice. This creates two complications. First, the lack of questions, how to collaborate effectively with so many differ-
a unified community led to difficulties in recruiting a repre- ent communities of practice, and how to collaborate in ways
sentative cross section of participants. Second, the lack of a that scale when interdependent teamwork is so important.
centralized community made it hard to identify a sizable sam-
ple of the community. The participants that we were able to REFERENCES
recruit are likely to be more actively involved in the projects 1. Jeremy P. Birnholtz and Matthew J. Bietz. 2003. Data at
than typical individuals and more likely to identify with open Work: Supporting Sharing in Science and Engineering.
data as a community of practice. As a result of the low sam- In Proceedings of the SIGGROUP Conference on
ple size, and the heterogeneity within this community, we do Supporting Group Work (GROUP’03). ACM, New York,
not believe these participants are necessarily representative of NY, USA, 339–348. DOI:
all individuals who work with open government data. Instead http://dx.doi.org/10.1145/958160.958215
what the data does provide is a collection of over 40 exam-
ple projects. Using this set of projects we have identified a 2. Kirsten Boehner and Carl DiSalvo. 2016. Data, Design
number of patterns and themes in the way that these groups and Civics: An Exploratory Study of Civic Tech. In In
collaborate. Though these themes may not hold true for all Proceedings of the SIGCHI Conference on Human
projects, they are at least important considerations for many Factors in Computing Systems (CHI’16). ACM, New
such projects. As an exploratory study this study lays the York, NY, USA, 2970–2981. DOI:
groundwork for future work, which will hopefully comple- http://dx.doi.org/10.1145/2858036.2858326
ment these findings using a broader and more representative
sample. 3. Morgan Brazillian, Andrew Rice, Juliana Rotich, Mark
Howells, Joseph Decarolis, Cameron Brooks, Florian
Future work should look at open data analysis using a global Bauer, and Michael Liebreich. 2012. Open Source
sample. We specifically focused on the practices of open data Software and Crowdsourcing for Energy Analysis.
analysis in a single country because different countries have Energy Policy 49 (2012), 149–153. DOI:
very different political climates. Collaboration and the use http://dx.doi.org/10.1016/j.enpol.2012.06.032
of open data to fight government corruption in countries with
substantial political repression or retribution may be very dif- 4. Brian Butler, Elisabeth Joyce, and Jacqueline Pike.
ferent from the forms of collaboration in the U.S. 2008. Don’t Look Now, But We’ve Created a
Bureaucracy : The Nature and Roles of Policies and
In this paper we observed that large-scale collaborations are Rules in Wikipedia. In Proceedings of the SIGCHI
less typical of open data analysis than other, more prototypi- Conference on Human Factors in Computing Systems
cal forms of open collaboration. In part, this can be explained (CHI’08). ACM, New York, NY, USA, 1101–1110.
by the lack of a centralized, technologically mediated collab- DOI:http://dx.doi.org/10.1145/1357054.1357227
oration platform. Future work should evaluate this claim as
well as investigate what sorts of platforms could best support 5. Chris Chatfield. 2002. Confessions of a pragmatic
data analysis. We found that some projects used GitHub, but statistician. Journal of the Royal Statistical Society
this platform may not be well suited for data analysis. In par- Series D: The Statistician 51, 1 (2002), 1–20. DOI:
ticular, it lacks some technical capabilities such as the ability http://dx.doi.org/10.1111/1467-9884.00294
to store large quantities of data, to develop documentation
and metadata for data sets, and version control that supports 6. Sophie Chou, William Li, and Ramesh Sridharan. 2014.
data cleaning and processing. We also argue that this work re- Democratizing Data Science: Effecting positive social
quires interdisciplinarity and interdependent work which may change with data science. In Proceedings of the ACM
not be supported by the limited communication channels built SIGKDD Conference on Knowledge Discovery and
into platforms like GitHub. Data Mining (KDD) at Bloomberg (2014). DOI:
http://dx.doi.org/10.1.1.478.3295

CONCLUSION 7. Juliet M. Corbin and Anselm Strauss. 1990. Grounded

The democratization of data science and open government Theory Research: Procedures, Canons, and Evaluative
data initiatives have inspired groups from civic hackers to Criteria. Qualitative Sociology 13, 1 (1990), 3–21.
data journalists to use data to address social issues. The DOI:http://dx.doi.org/10.1007/BF00988593

844
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

8. Laura Dabbish, Rosta Farzan, Robert Kraut, and Tom 19. Namwook. Kim and Juho Kim. 2015. BudgetMap :
Postmes. 2012. Fresh Faces in the Crowd: Turnover, Issue-Driven Navigation for a Government Budget. In
Identity, and Commitment in Online Groups. In Proceedings of the SIGCHI Conference on Human
Proceedings of the ACM Conference on Factors in Computing Systems (CHI’15). ACM, New
Computer-Supported Cooperative Work and Social York, NY, USA, 1097–1102.
Computing (CSCW’12). ACM, New York, NY, USA,
20. Charlotte P. Lee and Drew Paine. 2015. From The
245–248. DOI:
Matrix to a Model of Coordinated Action (MoCA): A
http://dx.doi.org/10.1145/2145204.2145243
Conceptual Framework of and for CSCW. In
9. Sheena Erete, Emily Ryou, Geoff Smith, Khristina Proceedings of the ACM Conference on
Fassett, and Sarah Duda. 2016. Storytelling with Data : Computer-Supported Cooperative Work and Social
Examining the Use of Data by. In Proceedings of the Computing (CSCW’15). ACM, New York, NY, USA,
ACM Conference on Computer-Supported Cooperative 179–194.
Work and Social Computing (CSCW’16). ACM, New
21. Jeffery Leek and Roger D. Peng. 2015. What is the
York, NY, USA, 1273–1283.
Question? Science 347, 6228 (2015), 1314–1315.
10. Ixchel M. Faniel and Trond E. Jacobsen. 2010. Reusing
22. Silvia Lindtner, Garnet D. Hertz, and Paul Dourish.
scientific data: How earthquake engineering researchers
2014. Emerging sites of HCI innovation: Hackerspaces,
assess the reusability of colleagues’ data. Computer
Hardware Startups & Incubators. In Proceedings of the
Supported Cooperative Work 19, 3-4 (2010), 355–375.
SIGCHI Conference on Human Factors in Computing
DOI:
Systems (CHI’14). ACM, New York, NY, USA,
http://dx.doi.org/10.1007/s10606-010-9117-8
439–448. DOI:
11. Andrea Forte and Cliff Lampe. 2013. Defining, http://dx.doi.org/10.1145/2556288.2557132
Understanding and Supporting Open Collaboration:
23. Karen Seashore Louis, Lisa M. Jones, and Eric G.
Lessons from the Literature. American Behavioral
Campbell. 2002. Sharing in Science. American Scientist
Scientist 57, 5 (2013), 535–547. DOI:
90, 4 (2002), 304–307.
http://dx.doi.org/10.1177/0002764212469362
24. Jock R. MacKay and Wayne R. Oldford. 2000. Scientific
12. David J. Hand. 1994. Deconstructing Statistical
Method, Statistical Method and the Speed of Light.
Questions. Journal of the Royal Statistical Society.
Statist. Sci. 15, 3 (2000), 254–278.
Series A (Statistics in Society) 157, 3 (1994), 317–356.
http://www.jstor.org/stable/2676665.
http://www.jstor.org/stable/2983526
25. Henry Mintzberg. 1994. The Fall and Rise of Strategic
13. Harlan Harris, Sean Murphy, and Marck Vaisman. 2013.
Planning. Harvard Business Review 72, 1 (1994),
Analyzing the analyzers. O’Reilly Media.
107–114. https://hbr.org/1994/01/
14. Marijn Janssen, Yannis Charalabidis, and Anneke the-fall-and-rise-of-strategic-planning
Zuiderwijk. 2012. Benefits, adoption barriers and myths
26. Jae Yun Moon and Lee Sproull. 2000. Essence of
of open data and open government. Information Systems
distributed work: The case of the Linux kernel. First
Management 29 (2012), 258–268.
Monday 5, 11 (2000). DOI:
15. Thorhildur Jetzek, Michel Avital, and Niels http://dx.doi.org/10.5210/fm.v0i0.1479
Bjorn-Andersen. 2014. Data-Driven Innovation through
27. Gary Olson and Judith Olson. 2000. Distance Matters.
Open Government Data. Journal of Theoretical and
Human-Computer Interaction 15, 2 (2000), 139–178.
Applied Electronic Commerce Research 9, 2 (2014),
DOI:
15–16. DOI:http:
http://dx.doi.org/10.1207/S15327051HCI1523_4
//dx.doi.org/10.4067/S0718-18762014000200008
28. Sylvain Parasie and Eric Dagiral. 2013. Data-driven
16. Brian L. Joiner. 2010. Statistical consulting. In
Journalism and the Public Good:
Encyclopedia of Statistical Sciences. John Wiley {&}
Computer-assisted-reporters and
Sons, Inc., 1–9. DOI:http:
”Programmer-journalists” in Chicago. New Media &
//dx.doi.org/10.1002/0471667196.ess0409.pub3
Society 15, 6 (2013), 853–871. DOI:
17. Maxat Kassen. 2013. A promising phenomenon of open http://dx.doi.org/10.1177/1461444812463345
data: A case study of the Chicago open data project.
29. DJ Patil. 2011. Building data science teams. (2011).
Government Information Quarterly 30, 4 (2013),
http://radar.oreilly.com/2011/09/
508–513. DOI:
building-data-science-teams.html
http://dx.doi.org/10.1016/j.giq.2013.05.012
30. Gregory Piatesky. 2013. Unicorn Data Scientists vs Data
18. Ron S. Kenett. 2015. Statistics : A Life Cycle View.
Science Teams. (2013).
Quality Engineering 27, 1 (2015), 111–121. DOI:
http://www.kdnuggets.com/2013/12/
http://dx.doi.org/10.1080/08982112.2015.968054
unicorn-data-scientists-vs-data-science

845
Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

31. Foster Provost and Tom Fawcett. 2013. Data Science and 36. Theresa Velden. 2013. Explaining Field Differences in
its Relationship to Big Data and Data-Driven Decision Openness and Sharing in Scientific Communities. In
Making. Data Science and Big Data 1, 1 (2013), 51–59. Proceedings of the ACM Conference on
DOI:http://dx.doi.org/10.1089/big.2013.1508 Computer-Supported Cooperative Work and Social
32. Betsy Rolland and Charlotte P. Lee. 2013. Beyond trust Computing (CSCW’13). ACM, New York, NY, USA,
and reliability: reusing data in collaborative cancer 445–457. DOI:
epidemiology research. In Proceedings of the ACM http://dx.doi.org/10.1145/2441776.2441827
Conference on Computer-Supported Cooperative Work 37. Tricia Wang and Joseph Jofish Kaye. 2011. Inventive
and Social Computing (CSCW’13). ACM, New York, Leisure Practices: Understanding Hacking Communities
NY, USA, 435–444. DOI: as Sites of Sharing and Innovation. Extended Abstracts
http://dx.doi.org/10.1145/2441776.2441826 on Human Factors in Computing Systems - CHI EA ’11
33. Joshua G. Tanenbaum, Amanda M. Williams, Audrey (2011), 263–272. DOI:
Desjardins, and Karen Tanenbaum. 2013. Democratizing http://dx.doi.org/10.1145/1979742.1979615
technology: pleasure, utility and expressiveness in DIY 38. Gemma Webster, David E. Beel, Chris Mellish,
and maker practice. In Proceedings of the SIGCHI Claire D. Wallace, and Jeff Pan. 2015. CURIOS :
Conference on Human Factors in Computing Systems Connecting Community Heritage through Linked Data.
(CHI’13). ACM, New York, NY, USA, 2603–2612. In Proceedings of the ACM Conference on
DOI:http://dx.doi.org/10.1145/2470654.2481360 Computer-Supported Cooperative Work and Social
34. Joshua Tauberer. 2014. Open Government Data: The Computing (CSCW’15). ACM, New York, NY, USA,
Book (second edi ed.). Self published. 639–648.
https://opengovdata.io/
39. Jorge L. Zapico, Daniel Pargman, Hannes Ebner, and
35. Alex S. Taylor, Siân Lindley, Tim Regan, and David Elina Eriksson. 2013. Hacking sustainability :
Sweeney. 2015. Data-in-Place: Thinking through the Broadening participation through Green Hackathons. In
Relations Between Data and Community. In Fourth International Symposium on End-User
Proceedings of the SIGCHI Conference on Human Development. IT University of Copenhagen, Denmark.
Factors in Computing Systems (CHI’15). ACM, New
York, NY, USA, 2863–2872.

846

PHP Cheat Sheet [Updated] - Download PDF for Quick Reference
No ratings yet
PHP Cheat Sheet [Updated] - Download PDF for Quick Reference
65 pages
The Social Impact of Open Data
100% (1)
The Social Impact of Open Data
55 pages
CSIA 360 Project 1 Cybersecurity For OPEN Data
No ratings yet
CSIA 360 Project 1 Cybersecurity For OPEN Data
7 pages
Open Space: The Global Effort for Open Access to Environmental Satellite Data
From Everand
Open Space: The Global Effort for Open Access to Environmental Satellite Data
Mariel Borowitz
No ratings yet
Government Open Data: Benefits, Strategies, and Use
No ratings yet
Government Open Data: Benefits, Strategies, and Use
16 pages
1-s2.0-S0740624X1630003X-main
No ratings yet
1-s2.0-S0740624X1630003X-main
13 pages
Solving Public Problems - Topic 4 - Module 4 - Using Open Data to Define Your Problem
No ratings yet
Solving Public Problems - Topic 4 - Module 4 - Using Open Data to Define Your Problem
5 pages
2016 The U S Government Manual in XML A Case Study of A Data Gov Open Data Set
No ratings yet
2016 The U S Government Manual in XML A Case Study of A Data Gov Open Data Set
11 pages
Big Data Is Not a Monolith
From Everand
Big Data Is Not a Monolith
Cassidy R. Sugimoto
5/5 (1)
A Rationale For Data Governance As An Approach To Tackle Recurrent Drawbacks in Open Data Portals
No ratings yet
A Rationale For Data Governance As An Approach To Tackle Recurrent Drawbacks in Open Data Portals
9 pages
Impactos Sociales Open Data
No ratings yet
Impactos Sociales Open Data
55 pages
Open government data and the private sector An empirical view on business models and value creation (Magalhaes, Gustavo Roseira, Catarina) (Z-Library)
No ratings yet
Open government data and the private sector An empirical view on business models and value creation (Magalhaes, Gustavo Roseira, Catarina) (Z-Library)
10 pages
The Key Features Are
No ratings yet
The Key Features Are
4 pages
ODC Strategy Document 2024-2026
No ratings yet
ODC Strategy Document 2024-2026
15 pages
User Engagement Strategies For Open Data
No ratings yet
User Engagement Strategies For Open Data
25 pages
Open Data: What It Is and Why You Should Care: Public Library Quarterly April 2017
No ratings yet
Open Data: What It Is and Why You Should Care: Public Library Quarterly April 2017
14 pages
10 1016j Giq 2016 02 001
No ratings yet
10 1016j Giq 2016 02 001
13 pages
Government Information Quarterly: Judie Attard, Fabrizio Orlandi, Simon Scerri, Sören Auer
No ratings yet
Government Information Quarterly: Judie Attard, Fabrizio Orlandi, Simon Scerri, Sören Auer
20 pages
OGP Business Case
No ratings yet
OGP Business Case
10 pages
Civic Open Data at A Crossroads - Dominant Models and Current Challenges
No ratings yet
Civic Open Data at A Crossroads - Dominant Models and Current Challenges
8 pages
A Survey On The Problems Affecting The Development of Open Government Data Initiatives
No ratings yet
A Survey On The Problems Affecting The Development of Open Government Data Initiatives
7 pages
Smith G. & Sandberg J. (2018) - Barriers To Innovating With Open Government Data Exploring Experiences Across Service Phases and User Types.
No ratings yet
Smith G. & Sandberg J. (2018) - Barriers To Innovating With Open Government Data Exploring Experiences Across Service Phases and User Types.
18 pages
Open Government and Democracy: A Research Review: Karin Hansson, Kheira Belkacem, and Love Ekenberg
No ratings yet
Open Government and Democracy: A Research Review: Karin Hansson, Kheira Belkacem, and Love Ekenberg
16 pages
BeyondTransparency PDF
No ratings yet
BeyondTransparency PDF
315 pages
Data For Good Report
No ratings yet
Data For Good Report
61 pages
How To Assess The Success of The Open Data Ecosystem
No ratings yet
How To Assess The Success of The Open Data Ecosystem
24 pages
IJPSM_final
No ratings yet
IJPSM_final
25 pages
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
Using Open Data To Deliver Public Services Report
100% (3)
Using Open Data To Deliver Public Services Report
51 pages
Open Data Leaders Network Digest
No ratings yet
Open Data Leaders Network Digest
18 pages
Public Value Creation
No ratings yet
Public Value Creation
14 pages
Martin 2014 PDF
No ratings yet
Martin 2014 PDF
24 pages
ODB 3rdedition GlobalReport
No ratings yet
ODB 3rdedition GlobalReport
48 pages
Open Data - Infrastructure and Ecosystem by Tim Davies
No ratings yet
Open Data - Infrastructure and Ecosystem by Tim Davies
1 page
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
From Everand
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
Juniper Blake
No ratings yet
Instant Access to Beyond Transparency Open Data and the Future of Civic Innovation 1st Edition Brett Goldstein Editor ebook Full Chapters
No ratings yet
Instant Access to Beyond Transparency Open Data and the Future of Civic Innovation 1st Edition Brett Goldstein Editor ebook Full Chapters
40 pages
Beyond Transparency Open Data and the Future of Civic Innovation 1st Edition Brett Goldstein Editor - The ebook is ready for instant download and access
No ratings yet
Beyond Transparency Open Data and the Future of Civic Innovation 1st Edition Brett Goldstein Editor - The ebook is ready for instant download and access
78 pages
2022 Open Data Innovation - Visualizations and Process Redesign As A Way To Bridge The Transparency-Accountability Gap
No ratings yet
2022 Open Data Innovation - Visualizations and Process Redesign As A Way To Bridge The Transparency-Accountability Gap
10 pages
Decoding the Social World: Data Science and the Unintended Consequences of Communication
From Everand
Decoding the Social World: Data Science and the Unintended Consequences of Communication
Sandra Gonzalez-Bailon
No ratings yet
WKMIT 2023 Paper 4
No ratings yet
WKMIT 2023 Paper 4
12 pages
Detailed Profile
No ratings yet
Detailed Profile
14 pages
Managing-complexity-across-multiple-dimensions-of-liqui_2016_Government-Info
No ratings yet
Managing-complexity-across-multiple-dimensions-of-liqui_2016_Government-Info
16 pages
Guide To Data Analytics in Government
No ratings yet
Guide To Data Analytics in Government
14 pages
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Makers of the Environment : Building Resilience Into Our World, One Model at a Time.
From Everand
Makers of the Environment : Building Resilience Into Our World, One Model at a Time.
Finith Jernigan
No ratings yet
Open Data Roadmap For The UK - 2015
100% (2)
Open Data Roadmap For The UK - 2015
12 pages
Provision and Usage of Open Data
No ratings yet
Provision and Usage of Open Data
18 pages
Designing an Internet
From Everand
Designing an Internet
David D. Clark
No ratings yet
Reality Mining: Using Big Data to Engineer a Better World
From Everand
Reality Mining: Using Big Data to Engineer a Better World
Nathan Eagle
4/5 (2)
Open Data Products A Framework For Creating Valuable
No ratings yet
Open Data Products A Framework For Creating Valuable
19 pages
Jtaer 12 00003
No ratings yet
Jtaer 12 00003
21 pages
Ethical Data Use
From Everand
Ethical Data Use
Elian Wildgrove
No ratings yet
Power to the Public: The Promise of Public Interest Technology
From Everand
Power to the Public: The Promise of Public Interest Technology
Tara Dawson McGuinness
4/5 (4)
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
From Everand
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
Daniel Richards
No ratings yet
Researching Internet Governance: Methods, Frameworks, Futures
From Everand
Researching Internet Governance: Methods, Frameworks, Futures
Laura Denardis
No ratings yet
Trust in Open Data Applications
No ratings yet
Trust in Open Data Applications
20 pages
Open Data in Developing Countries: State of The Art
No ratings yet
Open Data in Developing Countries: State of The Art
56 pages
A Right To Data
No ratings yet
A Right To Data
28 pages
En Opendata Transpconf v4
No ratings yet
En Opendata Transpconf v4
30 pages
Data Science Projects for thesis and Portfolio: Solving Political Problems
From Everand
Data Science Projects for thesis and Portfolio: Solving Political Problems
Dr. Zemelak Goraga
No ratings yet
Open Data Handbook
No ratings yet
Open Data Handbook
27 pages
Screenshot 2024-01-22 at 11.19.35
No ratings yet
Screenshot 2024-01-22 at 11.19.35
8 pages
Stat - Representation of Data
No ratings yet
Stat - Representation of Data
17 pages
Customer Journey Map Presentation
No ratings yet
Customer Journey Map Presentation
10 pages
Module 2 Quiz: Email
No ratings yet
Module 2 Quiz: Email
3 pages
UNIT 2
No ratings yet
UNIT 2
235 pages
CT050 3 2 WAPP - Assignment - Question
No ratings yet
CT050 3 2 WAPP - Assignment - Question
4 pages
NSX-T 3.0 Operation Guide
No ratings yet
NSX-T 3.0 Operation Guide
113 pages
Himanshu Sharma
No ratings yet
Himanshu Sharma
5 pages
Name: Jeffrey P Tilad JR Date: Section: Bsis Score: Week: 7
No ratings yet
Name: Jeffrey P Tilad JR Date: Section: Bsis Score: Week: 7
3 pages
C_SAC_2501-DEMO (1)
No ratings yet
C_SAC_2501-DEMO (1)
5 pages
A Multi Purpose and Large Scale Speech Corpus in Persian and English For Speaker and Speech Recognition The Deepmine Database
No ratings yet
A Multi Purpose and Large Scale Speech Corpus in Persian and English For Speaker and Speech Recognition The Deepmine Database
6 pages
JUnit in Action 1st Edition Vincent Massol - Read the ebook online or download it to own the complete version
100% (1)
JUnit in Action 1st Edition Vincent Massol - Read the ebook online or download it to own the complete version
45 pages
ST Brochure
No ratings yet
ST Brochure
8 pages
IOT Case Study On Smart Irrigation System
0% (1)
IOT Case Study On Smart Irrigation System
7 pages
2 - Transmisor de Temperatura FOXBORO RTT80 PDF
No ratings yet
2 - Transmisor de Temperatura FOXBORO RTT80 PDF
32 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
DoppelPaymer Ransomware and Dridex 2
No ratings yet
DoppelPaymer Ransomware and Dridex 2
33 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
CAD Exp 7
No ratings yet
CAD Exp 7
3 pages
Optikam B1-B3-B5-B9 - en It Es FR de
No ratings yet
Optikam B1-B3-B5-B9 - en It Es FR de
24 pages
Be Die Nung San Lei Tung
No ratings yet
Be Die Nung San Lei Tung
30 pages
Module 9 - Ms Excel
No ratings yet
Module 9 - Ms Excel
9 pages
Adtran TA900 Troubleshooting Commands Mark Holloway
No ratings yet
Adtran TA900 Troubleshooting Commands Mark Holloway
5 pages
Salesforce
No ratings yet
Salesforce
2 pages
CIBG Volume27 Issue3 Pages665-670
No ratings yet
CIBG Volume27 Issue3 Pages665-670
7 pages
pibna-GP-009A-SeriesGPSoftwareRecoveryProcedure
No ratings yet
pibna-GP-009A-SeriesGPSoftwareRecoveryProcedure
10 pages
Handbook 19 20 20220105020952
No ratings yet
Handbook 19 20 20220105020952
194 pages
CERWIN15
No ratings yet
CERWIN15
12 pages
ITWS01 Module 2020 Approved
No ratings yet
ITWS01 Module 2020 Approved
59 pages

Characteristics of Collaboration in the Emerging Practice of Open Data Analysis

Uploaded by

Characteristics of Collaboration in the Emerging Practice of Open Data Analysis

Uploaded by

Session: Data & Work CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

Characteristics of Collaboration in the Emerging Practice of

ABSTRACT This democratization of data science has encouraged data sci-

METHOD and relevant Facebook Groups. 32 participants began the sur-

DIMENTION ITEM MEDIAN RANGE

to make adjustments to their plans for analysis, and whether

TYPE OF END PRODUCT COUNT PERCENT EXAMPLE

CONCLUSION 7. Juliet M. Corbin and Anselm Strauss. 1990. Grounded

You might also like