0% found this document useful (0 votes)

21 views17 pages

Combining Process Guidance for Big Data Projects

Uploaded by

agam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

Combining Process Guidance for Big Data Projects

Uploaded by

agam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321944704

Combining Process Guidance and Industrial Feedback for Successfully

Deploying Big Data Projects

Article · December 2017

CITATIONS READS

14 7,466

3 authors:

Christophe Ponsard Mounir Touzani

Centre d'Excellence en Technologies de l'Information et de la Communication Académie de Toulouse
169 PUBLICATIONS 1,325 CITATIONS 14 PUBLICATIONS 71 CITATIONS

SEE PROFILE SEE PROFILE

Annick Majchrowski

12 PUBLICATIONS 70 CITATIONS

SEE PROFILE

All content following this page was uploaded by Christophe Ponsard on 22 January 2018.

The user has requested enhancement of the downloaded file.

c 2017 by the authors; licensee RonPub, Lübeck, Germany. This article is an open access article distributed under the terms and conditions of
the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Open Access
Open Journal of Big Data (OJBD)
Volume 3, Issue 1, 2017
http://www.ronpub.com/ojbd
ISSN 2365-029X

Combining Process Guidance and

Industrial Feedback for
Successfully Deploying Big Data Projects
Christophe Ponsard A , Mounir Touzani B , Annick Majchrowski A
A
CETIC Research Centre, avenue Jean Mermoz 18, 6041 Gosselies, Belgium, {cp, am}@cetic.be
B
Académie de Toulouse, rue Saint-Roch 75, 31400 Toulouse, France, [email protected]

A BSTRACT
Companies are faced with the challenge of handling increasing amounts of digital data to run or improve their
business. Although a large set of technical solutions are available to manage such Big Data, many companies lack
the maturity to manage that kind of projects, which results in a high failure rate. This paper aims at providing better
process guidance for a successful deployment of Big Data projects. Our approach is based on the combination
of a set of methodological bricks documented in the literature from early data mining projects to nowadays. It is
complemented by learned lessons from pilots conducted in different areas (IT, health, space, food industry) with a
focus on two pilots giving a concrete vision of how to drive the implementation with emphasis on the identification
of values, the definition of a relevant strategy, the use of an Agile follow-up and a progressive rise in maturity.

T YPE OF PAPER AND K EYWORDS

Regular research paper: big data, process model, agile, method adoption, pilot case studies

1 I NTRODUCTION processing such amounts of data. They typically view

Big Data technologies as holding a lot of potential
to improve their performance and create competitive
In today’s world, there is an ever-increasing number of
advantages. The main challenges companies have to
people and devices that are being connected together.
face with Big Data are often summarised by a series
This results in the production of information at an
of “V” words. In addition to the Volume (i.e. the risk
exponentially growing rate and opens the Big Data area.
of information overload) already mentioned, other data
To give a few numbers, it is estimated that 90% of
dimensions are the Variety (i.e. the diversity of structured
the current world’s data has been produced in just the
and non-structured formats), the required Velocity (i.e.
last two years, and that the amount of data created
highly reactive, possibly real-time, data processing), the
by businesses doubles every 1.2 years [45]. The total
Visualization need (in order to interpret them easily) and
amount of data in the world reached one zettabyte (1021
the related Value (in order to derive an income) [37].
bytes) around 2010, and by 2020 more than 40 zettabytes
will be available. An important shift is that most of The ease to collect and store data, combined with the
the data is now being generated by devices rather than availability of analysing technologies (such as NoSQL
people, due to the emergence of the Internet of Things. Databases, MapReduce, Hadoop) has encouraged many
Companies are facing the many challenges of companies to launch Big Data projects. However, most

26
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

organisations are actually still failing to get business

value out of their data. A 2013 report surveying 300
companies about Big Data revealed that 55% of Big Data
projects fail and many others fall short of their objectives
[31]. An on-line survey conducted in July 2016 by
Gartner reported that many companies remain stuck at
the pilot stage and that only 15% actually deployed their
big data project to production[25].
Looking at the cause of such failures, it appears that
the main factor is actually not the technical dimension,
but rather the process and people dimensions, which are
thus equally important [24]. Of course the technology
selection for big data projects is important and needs to
be kept up-to-date with the fast technological evolution Figure 1: Evolution of data processing methodologies
to help selecting proper technologies [33]. However, (source: [36])
much less is devoted to methods and tools that can help
teams to achieve big data projects more effectively and
efficiently [46]. There exists some recent work in that This paper is structured as follows. Section 2 reviews
area, identifying key factors for a projects success [47], the main available methodologies for dealing with Big
stressing management issues [12], insisting on the need Data deployment. Section 3 presents the process
for team process methodologies and making a critical followed to come up with a method and validate it on
analysis of analytical methods [46]. our pilots. It stresses key requirements for successful
Our paper is aligned with those works and aims at deployment. Section 4 presents more detailed feedback
helping companies engaging in a Big Data adoption and highlight specific guidelines. Section 5 discusses
process to be driven by questions such as: some related work. Finally Section 6 draws some
conclusions and on-going extensions of our work.
• How can we be sure Big Data will help us?
• Which people with what skills should be involved? 2 E VOLUTION OF M ETHODS AND
P ROCESSES FOR DATA O RIENTED
• What steps should be done first?
P ROJECTS
• Is my project on the right track?
This section reviews existing methods and processes. It
Our main contribution is composed of practical highlights some known strengths and limitations. First,
guidelines and lessons learned from a set of pilot methods inherited from the related data mining field are
projects covering various domains (life sciences, health, presented before considering approaches more specific
space, IT). Those pilots are spread over three years to Big Data with a special attention to Agile methods.
and are conducted within a large project carried out in
Belgium. They are following a similar process which 2.1 Methods Inherited from Data Mining
is incrementally enhanced. The reported work is based
on the first four pilots while four others are in analysis Data mining was developed in the 1990’s to extract
phase. It significantly extends our first report published data patterns in structured information (databases) and to
in [43] by: discover business factors on a relatively small scale. In
contrast, Big Data is also considering unstructured data
• giving a more detailed overview of existing and operates on a larger scale. A common point between
methodologies that form the building bricks of our them, from a process point of view, is that both require
approach, the close cooperation of data scientists and management
• proving a detailed feedback over two industrial in order to be successful. Many methodologies and
pilots respectively in the data centre maintenance process models have been developed for data mining and
and medical care domains, knowledge discovery [36]. Figure 1 give an overview of
the evolution and parenthood of the main methodologies.
• putting our work in the light of other work focusing The seminal approach is KDD (Knowledge Discovery
on the successful adoption of Big data techniques. in Database) [22]. It was refined into many other
We also discuss in more detail some important approaches (like SEMMA [48], Two Crows [53]). It
issues like ethics and cybersecurity. was then standardised under CRISP-DM (Cross Industry

27
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

Figure 2: CRISP-DM method (source: [30])

Figure 3: Agile KDD method (source: [11])

Standard Process for Data Mining) [50] which is

depicted in Figure 2.
CRISP-DM is composed of six main phases, each one
is decomposed in sub-steps. The process is not linear but order to provide a better process guidance and value
rather organised as a global cycle with usually a lot of orientation. An agile evolution of KDD and CRISP-DM
back and forth within and between phases. CRISP-DM is AgileKDD [15] depicted in Figure 3. It is based on
has been widely used for the past 20 years, not only for the OpenUP life cycle which supports the statement
data mining but also for predictive analytics and big data in the Agile Manifesto [1]. Projects are divided in
projects. planned “sprints” with fixed deadlines, usually a few
However, CRISP-DM and alike methods suffer from weeks. Each sprint needs to deliver incremental value to
the following issues: stakeholders in a predictable and demonstrable manner.

• they fail to provide a good management view on For example, IBM has developed ASUM-DM, an
communication, knowledge and project aspects, extension and refinement of CRISP-DM combining
• they lack some form of maturity model enabling to traditional project management with agility principles
highlight more important steps and milestones that [26]. Figure 4 illustrates its main blocks and its
can be progressively raised, iterative principle driven by specific activities at the
level of the last columns. These include governance
• despite the standardisation, they are not always and community alignment. However, it does not cover
known by the wider business community, hence the infrastructure/operations side of implementing a data
difficult to adopt for managing the data value mining/predictive analytics project. It is more focused
aspect, on activities and tasks in the deployment phase and has
• the proposed iterative model is limited: the planned no templates nor guidelines.
iterations are little used in practice because they do
not loop back to the business level but rather stay Although it looks quite adequate, deploying an Agile
in the internal IT context. In addition to the lack of approach for Big Data may still face resistance, just as
control on the added value, those iterations are very it is the case for software development, typically in a
often postponed. This is the reason why more agile more rigid kind of organisation. A survey was conducted
models were introduced. to validate this acceptance [23]. It revealed that quite
similarly as for software development, companies tend
to accept Agile methods for projects with smaller scope,
2.2 Methods Adopting Agile Principles
lesser complexity, fewer security issues and inside
Agile methods, initially developed for software organisation with more freedom. Otherwise, a more
development, can also be applied to data analysis in traditional plan-managed approach is preferred.

28
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

Figure 4: ASUM-DM method (source: [29])

Figure 5: AABA Method (source: [9])

2.3 Methods Developped for Big Data Projects 4. The availability of a reference architecture and a
technology catalog ease the definition and evolution
Architecture-centric Agile Big data Analytics (AABA) of the data processing.
addresses technical and organizational challenges of
Big Data [9]. Figure 5 shows it supports an Agile
5. Feedback loops need to be open, e.g. about
delivery. It also integrates the Big Data system Design
non-functional requirements such as performance,
(BDD) method and Architecture-centric Agile Analytics
availability and security, but also for business
with architecture-supported DevOps (AAA) model for
feedback about emerging requirements.
effective value discovery and continuous delivery of
value.
Stampede is another method proposed by IBM to its
The method was validated on 11 case studies
customers. Expert resources are provided at cost to help
across various domains (marketing, telecommunications,
companies to get started with Big Data in the scope
healthcare) with the following recommendations:
of a well-defined pilot project [28]. Its main goal is
1. Data Analysts/Scientists should be involved early in to educate companies and help them get started more
the process, i.e. already at business analysis phase. quickly, in order to drive value from Big Data. A key tool
of the method is a half day workshop to share definitions,
2. Continuous architecture support is required for big identify scope/big data/infrastructure, establish a plan
data analytics. and most importantly establish the business value. The
pilot execution is typically spread over 12 weeks and
3. Agile bursts of effort help to cope with rapid carried out in an Agile way with a major milestone at
technology changes and new requirements. about 9 weeks as depicted in Figure 6.

29
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

Figure 6: The IBM Stampede method (source: [28])

Table 1: Maturity Model from Nott and Betteridge (IBM) (source: [39])

Level Ad hoc Foundational Competitive Differentiating Breakaway

Use of
Business
standard Competitive
Business Data-related ROI Data processing innovation is
reporting. Big advantage
strategy identified encouraged driven by data
Data is just achieved
processing
mentioned
Prediction of the Optimisation and
Limited to the Optimisation of
Analytics Event detection likelihood of process
past decision support
specific evolution automation
No coherent Definition of Big
Defined and Architecture
architecture Define architecture Data
IT standardised Big fully aligned with
of the but not oriented architectural
Alignment Data oriented Big Data
information towards analytics patterns for Big
architecture requirements
system Data
Rambling artefact Policy and
Culture
Largely based management, procedure well Large adoption, Generalised
and
on key people resistance to defined, partial daily use Adoption
governance
change adoption

2.4 Some Complementary Approaches data analysis used, the alignment of the IT infrastructure,
as well as aspects of culture and governance.
2.4.1 Introducing Maturity Models

Some attempts are being made to develop some

“Capability Maturity Model” (CMM) for scientific data 2.4.2 Cognitive “Sensemaking”
management processes in order to support the evaluation
and improvement of these processes [13, 39]. Such The Sensemaking approach has also an iterative nature.
a model describes the main types of processes and There are actually two internal cycles in the method:
the practices required for an effective management. a more classical “Foraging” loop trying to dig into the
Classic CMM characterizes organizations using different data to find out relations among them and a second
maturity levels that represent their ability to reliably approach “Sensemaking” loop trying to build sense out
perform processes of growing complexity and scope. of the data by reproducing the cognitive process followed
A 5-level scale is typical and proposed both by [13] by humans in order to build up a representation of an
and [39]. The first uses standard levels ranging information space for achieving his/her goal. It focuses
from “defined” to “optimized” while the latter uses a on challenges for modelling and analysis by bringing
more specific nomenclature ranging from “ad hoc” to cognitive models into requirements engineering, in order
“breakaway”. Table 1 details the main criteria relating to to analyse the features of data and the details of user
the place of the data in the business strategy, the type of activities [32].

30
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

2.4.3 Critical Success Factors The popular SCRUM approach was used as it
emphasizes collaboration, functioning software,
In complement to processes, many key success factors,
team-self management and flexibility to adapt to
best practices and risk checklists have been published,
business realities [49].
mostly in blogs for Chief Information Officers, e.g.
[4]. A systematic classification of Critical Success The global methodology is composed of three
Factors has been proposed by [24] using three key successive phases detailed hereafter:
dimensions: people, process and technology. It has been
further extended by [46] with tooling and governance 1. Big Data Context and Awareness. In this
dimensions. A few key factors are the following: introductory phase, one or more meetings take place
with the target organisation. A general introduction
• Data: quality, security, level of structure in data. is given on Big Data concepts, the available
platform, a few representative applications in
• Governance: management support, well-defined
different domains (possibly with already a focus on
organisation, data-driven culture.
the organisation domain), the main challenges and
• Objectives: business value identified (KPI), main steps. The maturity of the client and a few risk
business case-driven, realistic project size. factors can be checked (e.g. management support,
internal expertise, business motivation).
• Process: agility, change management, maturity,
coping with data growth. 2. Business and Use Case Understanding. This is also
the first phase of CRISP-DM. Its goals are to collect
• Team: data science skills, multidisciplinarity. the business needs/problems that must be addressed
using Big Data and also to identify one or some
• Tools: IT infrastructure, storage, data visualization
business use cases generating the most value out of
capabilities, performance monitoring.
the collected data. A checklist supporting this phase
is shown in Table 2.
3 M ETHOD D EVELOPMENT AND
VALIDATION P ROCESS Table 2: Workshop checklist (source: [29])

The global aim of our project is to come up with a Business Understanding Use Case Understanding
systematic method to help companies facing big data Strategy & Positioning Assess Business Maturity
challenges to validate the potential benefits of a big data Global Strategy
solution. The global process is depicted in Figure 7. Product/Services Positioning Determine
The process is driven by eight successive pilots Scorecards - KPI’s Use Case Objectives
which are used to tune the method and make more Digital Strategy Value to the Client
Touchpoints for Business Success Criteria
technical bricks available through the proposed common
Customers/Prospects
infrastructure. The final expected result is to provide a Search, e-Commerce, Assess Situation
commercial service to companies having such needs. Social Media, Websites,... Resource Requirements
The selected method is strongly inspired by what Direct Competitors Assumptions/Constraints
we learned from the available methods and processes Disruptive Elements Risks and Contingencies
described in Section 2: Disruptive Models Terminology
Disruptive Technologies Costs and Benefits
• the starting point was Stampede because of some Disruptive Behaviour
initial training and the underlying IBM platform.
Key aspects kept from the methods are the initial Select Potential Use Cases Refine Data Mining Goals
workshop with all stakeholders, the realistic focus Objectives Data Mining Goals
and a constant business value driver, Priorities Data Mining KPIs
Costs, ROI, Constraints
• however, to cope with the lack of reference material, Value to the Client Produce Project Plan
we defined a process model based on CRISP-DM Scope Approach
which is extensively documented, New Data Source Deliverables
Schedule
• the pilots are executed in an Agile way, given High Level Feasibility Risk Mitigation (Privacy,...)
the expert availabilities (university researchers), Data Mining Goals & KPIs Stakeholders to involve
the pilots are planned over longer periods than in Resources Availability Initial Assessment of
Stampede: 3-6 months instead of 12-16 weeks. Time to Deliver Tools and Techniques

31
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

Figure 7: Iterative development of the platform and method

This phase is organised based on one or a use it (e.g. right visualization, dashboard) and
few workshops, involving the Business Manager, start monitoring (performance, accuracy).
Data Analyst, IT architect and optionally selected
specialists, such as the IT security manager if there Our pilots are kept confidential. However, Table 3
are specific security/privacy issues that need to be presents the main features of the first four pilots based
checked at this early stage. Both the as-is and on the three first “V” of Big Data [37].
to-be are considered. Specific tools to support
the efficient organisation of those workshops are 4 L ESSONS AND R ECOMMENDATIONS
described in Section 4. At the end of this step, a L EARNED FROM OUR P ILOT C ASES
project planning is also defined.
In this section, we present some lessons learned and
3. Pilot Implementation of Service or Product. In this related guidelines that are useful to support the whole
phase, the following implementation activities are process and increase the chances of success. We also
carried out in an Agile way: illustrate our feedback based on some highlights from
two pilot cases used as running examples: the IT
• Data Understanding: analyse data sets to
maintenance pilot case and the clinical pathway pilot
detect interesting subset(s) for reaching the
case.
business objective(s) and make sure about
data quality.
4.1 Defining Progressive and Measurable
• Data Preparation: select the right data and Objectives.
clean/extend/format them as required.
• Modelling: select specific modelling Through the deployment of a Big Data solution, a
techniques (e.g. decision-tree or neural company expects to gain value out of its data. The
networks). The model is then built and tested business goals should be clearly expressed. There
for its accuracy and generality. Possible exists different methods to capture goals. In our pilots
modelling assumptions are also checked. we integrated goal oriented requirements engineering
The model parameters can be reviewed or techniques to elicit and structure business goals and
other/complementary techniques can be connect them with data processing processes and
selected. components [55, 56]. Such methods also include specific
techniques to verify that goals are not too idealised by
• Evaluation: assess the degree to which helping in the discovery of obstacles and their resolution
the model meets business objectives, using in order to define achievable goals.
realistic or even real data. Another way to connect the goals with the (business)
• Deployment: transfer the validated solution to reality is to define how to measure the resulting
production environment, make sure user can value which should be defined right from the business

32
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

Table 3: Main characteristics of first pilot wave

# Domain Volume Velocity Variety Main challenge

Business data and
High (requires
20 Go/analysis traceability (food,
1 Life science parallel Product quality
2 To/week pharmaceutical, cosmetic
processing)
industry)
Galileo ground segment Predictive maintenance of
2 Space maintenance (12 EU Medium High: messages, logs costly equipment. High
sites, 16 remote sites) dependability (99.8%)
Reduce morbidity and
Several sources and
3 Health 900 beds on 3 sites Real-time mortality, guarantee
formats
confidentiality
High
IT Predictive maintenance,
4 About 3000 servers (databases, Real-time
Maintenance cost optimisation
events, logs...)

understanding phase, typically by relying on KPIs (Key diagnose the root causes of incidents and resolve them in
Performance Indicators). Companies should already order to avoid possible further incidents which can turn
have defined their KPIs and be able to measure them. If into a nightmare when occurring in a reactive mode. The
this is not the case, they should start improving on this: ultimate goal is to increase the service availability, the
in other words, Business Intelligence should already be customer satisfaction and also reduce the operating costs.
present in companies. The resulting KPI is called Total Cost of Ownership
Based on this, different improvement strategies can be (TCO) and typical breakdown costs to be can be
identified and discussed to select a good business case. considered are:
In this process, the gap with the current situation should
also be considered, it is safer to keep a first project with • maintenance on hardware and software that could
quite modest objectives than risking to fail by trying a be reduced through a better prediction,
too complex project that could bring more value. Once a
• personnel working on these incidents,
pilot is successful, further improvements can be planned
in order to add more value. • any penalties related to customer Service Level
Agreements (SLAs),
Computer Maintenance Area Case Study. The
large IT provider considered here manages more • indirect effects on the client’s business and its brand
than 3000 servers that are hosting many web sites, image.
running applications and storing large amount of related
customer data. No matter what efforts are taken, Clinical Pathway Case Study. Hospitals are
servers are still likely to go off-line, networks to increasingly deploying clinical pathways, defined as
become unavailable or disks to crash and generally a multidisciplinary vision of the treatment process
at times that are not expected, less convenient and required by a group of patients with the same pathology
more costly to manage, like during the night or with predictable clinical follow-up [6]. The reason is
weekends. The considered company is currently not only to reduce the variability of clinical processes
applying standard incident management and preventive but also to improve care quality and have a better cost
maintenance procedures based on a complete monitoring control [54]. It also enables richer analysis of the
infrastructure covering both the hardware (network data produced and thus the profiling of patients with
appliances, servers, disks) and the application level higher risks (for example due to multi-pathology or
(service monitoring). intolerances).
In order to reduce the number of costly reactive events A typical workflow (e.g. for chemotherapy) is shown
and optimise preventive maintenance, the company is in Figure 8. It is a sequence of drugs deliveries or cures,
willing to develop more predictive maintenance by trying generally administered in day hospital. Each cure is
to anticipate the unavailability of the servers in such a followed by a resting period at home that lasts for a
way they can react preventively and, ultimately, prevent few days to a few weeks. A minimal interval between
such unavailability. In the process, the client wants to cures is required because chemotherapy drugs are toxic

33
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

operation by also taking into account risks that some

patient could be delayed.
In order to measure the quality of chemotherapeutic
cares, a quantifiable indicator called the “Relative Dose
Intensity” (RDI) was defined [35]. It captures both the
fact that the required dose is administered and the timing
of the delivery, on a scale from 0% (no treatment) to 100
% (total conformance).
Figure 8: A typical chemotherapy workflow
planned dose real duration
RDI = delivered dose × planned duration

and the body needs some time to recover between two Medical literature has shown, for a number of cancers,
drugs deliveries. When following the ideal treatment that the relapse-free survival is strongly correlated with
protocol, the number of cancerous cells are progressively the RDI. For instance, for breast cancer, a key threshold
reduced, hopefully to reach a full healing or cancer value is 85% [41]. Hence this indicator can be seen has
remission. If for some reason, chemotherapy cures do a gauge that should be carefully managed across the
not closely follow the intended periodicity or if doses whole clinical pathway.
are significantly reduced, the treatment efficiency may
be suboptimal. In such conditions, cancerous cells may 4.2 From Descriptive to Predictive and then
multiply again, which can result in a cancer relapse. Prescriptive Data Analysis.
Figure 9 shows the high level goals for the optimal Analytics is a multidisciplinary concept that can be
organisation of care pathways. Goals and obstacles are defined as the means to acquire data from diverse
respectively depicted using blue and red parallelograms. sources, process them to elicit meaningful patterns
Agents (either people or processing components) are and insights, and distribute the results to proper
pictured using yellow hexagons. Some expectations stakeholders [10, 44]. Business Analytics is the
on human agents are also captured using yellow application of such techniques by companies and
parallelograms. The adequate workflow should be organisations in order to get a better understanding of
enforced for all patients within the recommended their level of performance of their business and drive
deadlines given the possible impact on patient relapse. improvements. Three complementary categories of
Ethical principles also require a fair allocation of analytics can be distinguished and combined in order
resources, i.e. every patient deserves optimal care to reach the goal of creating insights and helping to
regardless of his medical condition or prognosis. The make better decisions. Those analytics consider different
workload should also be balanced to avoid the staff time focus, questions and techniques as illustrated in
having to manage unnecessary peak periods. Table 4 [38, 51].
Reaching those goals together, of course, requires In a number of domains, it is interesting to consider
enough resources to be available and a number of an evolution scheme starting from immediate reaction
related obstacles (in red) like monitoring the flow of raised by analysing data to more intelligence in
patients joining and leaving the pathway is important. anticipating undesirable situations, or even considering
The available workforce can also be influenced by how to prevent them as much as possible.
staff availability and some public holidays reducing
the available slot for delivering care. A number of Computer Maintenance Area Case Study. In terms
mitigation actions are then identified to better control of maintenance, starting from the identified KPI of total
that the workforce is adapted. An agent with a key cost of ownership (TCO) including the cost of purchase,
responsibility in the system is the scheduler, which must maintenance and repair in the event of a breakdown.
manage every appointment. Human agents are not very Different strategies can be envisaged:
good at this task because the problem is very large
and it is difficult to find a solution that simultaneously • react to problems only after the occurrence of
meets all patients and service constraints. Moreover, the a breakdown. This translates into a generally
planning must constantly be reconsidered to deal with high cost because quick reaction is required to
unexpected events and the flow of incoming/outgoing minimize downtime. Moreover, any unavailability
patients. In contrast, a combined predictive and has a negative impact in terms of image or even
prescriptive solution is very interesting because it has penalty if a Service Level Agreement (SLA) has
the capability to ensuring optimal care and service been violated. This should of course be minimised

34
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

Figure 9: Goal analysis for clinical pathways: strategic goals and main obstacles

through the use of next level strategies that can more likely to be saturated on days when backups are
benefit from data analytics, performed, usually predictably (weekend or month end).
An anticipation would avoid expensive interventions,
• anticipating occurrence of breakdown based on especially during weekends.
system monitoring. Simple strategies can be
implemented. For example, an alert can be
Clinical Pathway Case Study. The operation of
triggered when a storage approaches a threshold
clinical pathways is characterised by the occurrence
close to the maximum capacity. However, this
of many events which may be expected or not and
does enable the prediction of failures resulting from
thus impacting the scheduled behaviour. An important
complex sequence of events. Mostly descriptive
concern is to detect such events and decide about how
techniques are used at this level,
to manage possible deviations to minimise their impact,
• try to predict problems based on known history and especially on the quality of care KPI. Different possible
observation of the system. At this level, predictive strategies can be explored for this purpose:
data analysis techniques can discover cause-effect • reactive strategies should be avoided as much
relationships between parts of the system which, as possible because the impact on patient is
in cascade, can cause unavailability. For example, irreversible. Some case of reactive can be related
applying a badly validated patch may affect a to a patient no-show or last minute medical no-go.
service that can itself paralyse a business process, The action is to reschedule an new appointment as
soon as possible.
• improving the system is the ultimate step. It
is necessary to ensure that the system operates • preventive strategies can be used to reduce the risk
under optimum conditions by eliminating the root of now-show, for example by sending a reminder
causes that could trigger some failure. Prescriptive (e.g. text message, phone call) one or two days
techniques are used at this level. before the appointment. Descriptive data analytics
are enough at this level.
The predictive solution was the best option, but it
should only be considered if the preventive step is carried • predictive strategies relying on predictive analytics
out. Similarly, the most common time patterns should be can be used to learn risk factors for specific patients
identified and processed first. For example, a storage is which could result in more careful planning or

35
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

Table 4: Overview of analytics in terms of questions, techniques and outputs (source: [51])

Business Analytics
Descriptive Analytics Predictive Analytics Prescriptive Analytics
Outputs Techniques Questions

What has happened? What will happen? What should be done?

Why did it happen? Why will it happen? Why should it be done?

Statistical Analytics Data Mining Optimisation

Data Integration Machine Learning Simulation
Data Augmentation ... Operation Research
Data Reduction Management Science

Reports on historical data Future opportunities Recommended business decisions

Insight from raw data Future risks Optimal courses of actions
... ... ...

Table 5: Some workshop questions about data Table 6: Evaluation readiness checklist (partial)

Q.UD.1 What are the data sources and data types R.EV.1 Are you able to understand/use the results
used in your current business processes? of the models?
Q.UD.2 What tools/applications are used to deal R.EV.2 Do the model results make sense to you
with your current business processes? from a purely logical perspective?
Q.UD.3 Are your present business processes R.EV.3 Are there apparent inconsistencies that
performing complex processing on data? need further exploration?
Q.UD.4 How available is your data? What happens R.EV.4 From your initial glance, do the results
if data is not available? seem to address your organizations
Q.UD.5 Do different users have different access business questions?
rights on your data?
Q.UD.6 Does your data contain sensitive
information (e.g. personal or company support both as possible preparation before the workshop
confidential data)? and as checklist during the workshop. Table 5 shows a
Q.UD.7 What are the consequence of data few questions about the data to process.
alteration? Do you know the quality level
of your data?

4.4 Using Modelling Notations

guide the drug selection. For example, the possible
intolerance or interaction with another pathology Modelling using standard modelling notations is useful
could be anticipated and solved by selecting an to support business and data understanding. During
alternative drug cocktail. workshops, a whiteboard can be used to sketch models
together with the audience. Note this should not to be
• prescriptive strategies will deploy an globally confused with the data modelling step in CRISP-DM
optimising scheduler able to solve the planning which is related but actually occurs later in the process.
problem by taking into account the constraints
resulting for the treatment plan of each patient In our experience, data-flow and workflow models
and the service availabilities. Such a system was help to understand which process is generating, updating
successfully prototyped and is reported in [42]. or retrieving data. UML class diagrams also help to
capture the domain structure [40].
4.3 Using Questionnaires for Workshops On the other hand, use cases should be avoided
because they only focus on a specific function and cannot
Conducting a workshop requires to pay attention to many provide a good global picture of the problem. Those
issues while also focusing the discussion on the most should be used later in the process when considering the
relevant ones. A questionnaire can provide an efficient implementation.

36
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

4.5 Defining Activity Checkpoints IT system, health systems), it is useful to consider a

Multi Disciplinary Engineering Environment (MDDE)
An Agile approach allows the process to be quite flexible approach. A very good and comprehensive survey on
and enable to go back and forth across activities. To the approaches of data integration based on ontologies
make sure an activity can be started with enough input, is described in [17]. It also gives guidelines for the
we defined some readiness checklist shown in Table 6. selection of technologies of data integration for industrial
production systems.
5 R ELATED W ORK AND D ISCUSSION
5.2 Ethical Concerns about Data Privacy
5.1 Methodologies Focusing on Adoption
The interaction with companies also raised some ethical
Section 2 gives an exhaustive historical perspective of concerns and questions like: “Are we sufficiently
the evolution of relevant methodologies. While the cautious about the Big Data phenomenon?” It is
first proposed approaches based on data mining were certainly a great technological revolution of our time
already iterative in nature [50], their evolution over to collect large amounts of data and to value them to
time is clearly about paying a growing attention on improve the health and living conditions of humans.
how to ease the adoption of such methodologies by the Nevertheless, we are faced with a problem of ethics when
companies. The Agile culture has been a key milestone using predictive algorithms. Regulatory intensification is
to better include the customer in the process and to drive therefore necessary to find a good compromise between
that process towards the production of business value the use of personal data and the protection of privacy.
[23]. Commercial methods like IBM Stampeded are also For example, in the field of health, we can wonder
strongly inspired by this trend [28]. In complement to about the way governments intend to exploit the data
those methods, the need to identify barriers and adoption collected. Should those data be made available (by
factors has also been addressed by recent work also open data) or should a solution be found to support the
discussed earlier, such as critical success factors [24, 46]. exploitation of private data?
Consolidated Big Data Methodologies are also By the use of massive data in the medical community,
being published under more practical and simplified the legal and economic aspects change at great
presentation forms in order to be attractive for speed. This challenges ethical principles and rules
companies. The DISTINCT method is based on only in the relationship between a doctor and a patient.
four steps (acquire, process, analyse, visualise) and This also disturbs the balance between confidentiality
considers the use of feedback loops to enable repeated and transparency and creates a feeling of declining
refinements of the data processing [18]. Although the confidence in the health environment around the
analysis phase is not explicitly mentioned, this iterative management and exploitation of Big Data. The ethics
approach can be used to set up a feedback channel of this type of data requires a well supervised control of
between IT and business people. After each feedback the use of medical information [5, 16].
cycle, the system can then be refined by enhancing the Studies have also demonstrated the segmentation
data preparation or data analysis steps. power of predictive modelling and resulting business
The well-known “for Dummies” series has also a benefits for life insurance companies [3]. While
dedicated book on Big Data [27]. It contains a section some customers with “lower risk” could enjoy better
about how to create an implementation roadmap based conditions for their insurance, customers with higher
on factors like business urgency, budget, skills, and anticipated risks could be excluded due to unaffordable
risks level; an agile management approach is also rates, thus reducing the solidarity effect of life
recommended. The availability of Business Intelligence insurances.
is also identified as an easing factor. Data is also increasingly carrying location information
Work carried out in related fields about how due to the large development of mobile applications
to address organisational challenges is also worth and the emergence of the Internet of Things. A
being investigated. For example, Cloud Computing specific branch of Big Data called location analytics
Business Framework (CCBF) helps organisations is specifically focusing on this area and can endanger
achieve good Cloud design, deployment and services. privacy if applied without safeguards. Specific
Similar to our approach, CCBF is a conceptual and guidelines and techniques are being developed for this
an architectural framework relying on modelling, purpose. Some guidelines are issued e.g. by the
simulation, experiments and hybrid case studies [7, 8]. European Commission for public administration [2].
Given the variety and multidisciplinary nature of Specific data processing techniques and algorithms are
complex system being analysed (e.g. supply chains, also being developed for privacy preserving location

37
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

based services [34, 52]. realised the key when considering the adoption of Big
At a more general level, in order to better control the Data in an organisation, is the process followed to come
huge quantities of data processed every day and to ensure up with a method that fits the context, needs and will
that every single person is respected, the European maximize the chance of success. Based on this idea, we
Commission has issued the General Data Protection defined a generic guidance process relying on available
Regulation in 2016 that will come into force in May 2018 methods as building bricks. To be meaningful our
[19]. An EU portal with extensive resources is available approach is also strongly relying on lessons learned from
to give some starting points to companies [20]. industrial cases which on one hand helped in validating
Our recommendation, based on our pilots, is to our process guidance and on the other hand can also be
investigate this issue early in the process. This used as concrete supporting illustration.
can already be envisioned at the business and data Moving forward, we plan to consolidate our work
understanding phases and involve relevant people like based on what we will learn in the next series of
Chief Information Security Officer or even a more pilot projects. This includes investigating challenges
specific Data Protection Officer if this role is defined. from other domains. We plan to address life sciences
Actually this happened quite naturally in most of our which requires a sustained processing of high volume
pilot cases because the data had to be processed outside of data and the space domain with highly distributed
of the owning organisation. However, the focus was infrastructures. Considering the global development
more on confidentiality than on the purpose of the data process, until now we have mainly focused on the
processing itself. discovery and data understanding phases. So our plan is
to provide more guidance on the project execution phase
5.3 Cyber Security Issues using our most advanced pilots that are now reaching full
deployment. In our guidance process, we also had to
Among the challenges of the Big Data, data security is face a number of problems which sometimes blocked all
paramount against piracy and requires the development further progress. In some cases the reason was a lack of
of systems to secure trade by ensuring strict control of business value or maturity, for which the recommended
access to the Big Data platform and thus guarantee the action was to postpone the process. In other cases, some
confidentiality of data. Securing a Big Data platform blocking issues could not be overcome or were delaying
is nevertheless a domain in its own right because the the project a lot longer than expected, e.g. to set up
very principle of this system is that it can be based on a non-disclosure agreement about data access, to get
a heterogeneous architecture spread over several nodes. actual data access, to configure proprietary equipment,
The ENISA has produced a landscape of Big Data threats etc. Guidance about how to detect and avoid such cases
and a guide of good practices [14]. This document is also part of our work as it helps to increase the chance
lists typical Big Data assets, identifies related threats, of successful deployment.
vulnerabilities and risks. Based on these points, it
suggests emerging good practices and active areas for
research. ACKNOWLEDGEMENTS
Storing sensitive data on the Cloud, for example,
is not without consequences, because the regulations This research was partly funded by the Walloon Region
are not the same in all countries. A sensitive aspect through the “PIT Big Data” project (grant nr. 7481). We
is the management of the data storage and processing thank our industrial partners for sharing their cases and
locations, e.g. the need to process data in a given providing rich feedback.
country. However, as this situation is also hindering
European competitiveness in a global market, the EU is R EFERENCES
currently working on a framework for the free flow of
non-personal data in the EU [21]. [1] R. Balduino, “Introduction to OpenUP,” https://
www.eclipse.org/epf/general/OpenUP.pdf, 2007.
6 C ONCLUSIONS [2] L. Bargiotti, I. Gielis, B. Verdegem, P. Breyne,
F. Pignatelli, P. Smits, and R. Boguslawski,
In this paper, we described how we addressed the
“European Union Location Framework Guidelines
challenges and risks of deploying a Big Data solution
for public administrations on location privacy. JRC
within companies willing to adopt this technology in
Technical Reports,” 2016.
order to support their business development. We first
looked at different methods reported over time in the [3] M. Batty, “Predictive Modeling for Life Insurance
literature. Rather than building yet another method, we Ways Life Insurers Can Participate in the Business

38
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

Analytics Revolution,” Deloitte Consulting LLP, multi-disciplinary engineering environments: A

April 2010. review,” Open Journal of Information Systems
[4] T. Bedos, “5 key things to make big data analytics (OJIS), vol. 4, no. 1, pp. 1–26, 2017. [Online].
work in any business,” http://www.cio.com.au, Available: http://nbn-resolving.de/urn:nbn:de:101:
2015. 1-201711266863
[18] T. Erl, W. Khattak, and P. Buhler, Big Data
[5] J. Béranger, Big Data and Ethics: The Medical
Fundamentals: Concepts, Drivers & Techniques.
Datasphere. Elsevier Science, 2016.
Prentice Hall, 2016.
[6] H. Campbell, R. Hotchkiss, N. Bradshaw, and
[19] European Commission, “General Data Protection
M. Porteous, “Integrated care pathways,” British
Regulation 2016/679,” http://eur-lex.europa.
Medical Journal, pp. 133–137, 1998.
eu/legal-content/EN/TXT/?uri=CELEX:
[7] V. Chang, A Proposed Cloud Computing Business 32016R0679, 2016.
Framework. Commack, NY, USA: Nova Science
[20] European Commission, “The EU General
Publishers, Inc., 2015.
Data Protection Regulation (GDPR),”
[8] V. Chang, R. J. Walters, and G. Wills, “The http://www.eugdpr.org, 2016.
development that leads to the cloud computing [21] European Commission, “A framework for the free
business framework,” International Journal of flow of non-personal data in the EU,” http://europa.
Information Management, vol. 33, no. 3, pp. 524 eu/rapid/press-release MEMO-17-3191 en.htm,
– 538, 2013. 2017.
[9] H.-M. Chen, R. Kazman, and S. Haziyev, “Agile [22] U. Fayyad, G. Piatetsky-shapiro, and P. Smyth,
big data analytics development: An architecture- “From data mining to knowledge discovery in
centric approach,” in Proceedings HICSS’16, databases,” AI Magazine, vol. 17, pp. 37–54, 1996.
Hawaii, USA. Washington, DC, USA: IEEE
[23] P. Frankov, M. Drahoov, and P. Balco, “Agile
Computer Society, 2016, pp. 5378–5387.
project management approach and its use in big
[10] H. Chen, R. H. L. Chiang, and V. C. Storey, data management,” Procedia Computer Science,
“Business intelligence and analytics: From big data vol. 83, pp. 576 – 583, 2016.
to big impact,” MIS Q., vol. 36, no. 4, pp. 1165–
[24] J. Gao, A. Koronios, and S. Selle, “Towards A
1188, Dec. 2012.
Process View on Critical Success Factors in Big
[11] K. Collier, B. Carey, E. Grusy, C. Marjaniemi, Data Analytics Projects,” in AMCIS, 2015.
and D. Sautter, “A Perspective on Data Mining,”
[25] Gartner, “Gartner survey reveals investment in
Northern Arizona University, 1998.
big data is up but fewer organizations plan
[12] F. Corea, Big Data Analytics: A Management to invest,” http://www.gartner.com/newsroom/id/
Perspective. Springer Publishing Company, Inc., 3466117, 2016.
2016. [26] J. Haffar, “Have you seen asum-dm?”
[13] K. Crowston, “A capability maturity model for ftp://ftp.software.ibm.com/software/data/
scientific data management,” BibSonomy, 2010. sw-library/services/ASUM.pdf, 2015.
[14] E. Damiani et al., “Big data threat landscape and [27] J. Hurwitz, A. Nugent, F. Halper, and M. Kaufman,
good practice guide,” https://www.enisa.europa.eu/ Big Data For Dummies. John Wiley & Sons, 2013.
publications/bigdata-threat-landscape, 2016. [28] IBM, “Stampede,” http://www.ibmbigdatahub.
[15] G. S. do Nascimento and A. A. de Oliveira, An com/tag/1252, 2013.
Agile Knowledge Discovery in Databases Software [29] IBM, “ASUM-DM,” https://developer.
Process. Springer Berlin Heidelberg, 2012, pp. ibm.com/predictiveanalytics/2015/10/16/
56–64. have-you-seen-asum-dm, 2015.
[16] EESC, “The ethics of Big Data: Balancing [30] K. Jensen, “Crisp-dm process diagram,”
economic benefits and ethical questions of Big https://commons.wikimedia.org/wiki/File:
Data in the EU policy contex,” European Economic CRISP-DM Process Diagram.png distributed
and Social Committee, Visits and Publications under CC-SHA2, 2012.
Unit, 2017. [31] J. Kelly and J. Kaskade, “CIOs & Big Data: What
[17] F. J. Ekaputra, M. Sabou, E. Serral, E. Kiesling, Your IT Team Wants You to Know,” http://blog.
and S. Biffl, “Ontology-based data integration in infochimps.com/2013/01/24/cios-big-data, 2013.

39
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017

[32] L. Lau, F. Yang-Turner, and N. Karacapilidis, [43] C. Ponsard, A. Majchrowski, S. Mouton, and
“Requirements for big data analytics supporting M. Touzani, “Process guidance for the successful
decision making: A sensemaking perspective,” deployment of a big data project: Lessons learned
in Mastering data-intensive collaboration and from industrial cases,” in Proc. of the 2nd Int.
decision making, N. Karacapilidis, Ed. Springer Conf. on Internet of Things, Big Data and Security,
Science & Business Media, April 2014, vol. 5, pp. IoTBDS 2017, Porto, Portugal, April 24-26, 2017.
49 – 70. [44] D. J. Power, “Using ’Big Data’ for analytics and
[33] D. Lehmann, D. Fekete, and G. Vossen, decision support,” Journal of Decision Systems,
“Technology selection for big data and analytical vol. 23, no. 2, Mar. 2014.
applications,” Open Journal of Big Data (OJBD), [45] E. Rot, “How Much Data Will You Have
vol. 3, no. 1, pp. 1–25, 2017. [Online]. in 3 Years?” http://www.sisense.com/blog/
Available: http://nbn-resolving.de/urn:nbn:de:101: much-data-will-3-years, 2015.
1-201711266876
[46] J. Saltz and I. Shamshurin, “Big Data Team
[34] L. Liu, “From data privacy to location privacy: Process Methodologies: A Literature Review and
Models and algorithms,” in Proceedings of the the Identification of Key Factors for a Projects
33rd International Conference on Very Large Data Success,” in Proc. IEEE Int. Conf. on Big Data,
Bases, ser. VLDB ’07. VLDB Endowment, 2007, 2016.
pp. 1429–1430.
[47] J. S. Saltz, “The need for new processes,
[35] G. Lyman, “Impact of chemotherapy dose intensity
methodologies and tools to support big data teams
on cancer patient outcomes,” J Natl Compr Canc
and improve big data project effectiveness,” in
Netw, pp. 99–108, Jul 2009.
IEEE International Conference on Big Data, Big
[36] G. Mariscal, s. Marbn, and C. Fernndez, “A Data 2015, Santa Clara, CA, USA, October 29 -
survey of data mining and knowledge discovery November 1, 2015, pp. 2066–2071.
process models and methodologies,” Knowledge
[48] SAS Institute, “SEMMA Data Mining
Eng. Review, vol. 25, no. 2, pp. 137–166, 2010.
Methodology,” http://www.sas.com/technologies/
[37] A. D. Mauro, M. Greco, and M. Grimaldi, “A analytics/datamining/miner/semma.html, 2005.
formal definition of big data based on its essential
[49] Scrum Alliance, “What is scrum? an agile
features,” Library Review, vol. 65, no. 3, pp. 122–
framework for completing complex projects,”
135, 04 2016.
https://www.scrumalliance.org/why-scrum, 2016.
[38] M. Minelli, M. Chambers, and A. Dhiraj, Big Data,
[50] C. Shearer, “The CRISP-DM Model: The New
Big Analytics: Emerging Business Intelligence and
Blueprint for Data Mining,” Journal of Data
Analytic Trends for Today’s Businesses, 1st ed.
Warehousing, vol. 5, no. 4, 2000.
Wiley Publishing, 2013.
[51] R. Soltanpoor and T. Sellis, Prescriptive Analytics
[39] C. Nott, “Big Data & Analytics Maturity
for Big Data. Cham: Springer International
Model,” http://www.ibmbigdatahub.com/blog/
Publishing, 2016, pp. 245–256.
big-data-analytics-maturity-model, 2014.
[40] OMG, “Unified Modeling Language (UML) - [52] G. Sun, D. Liao, H. Li, H. Yu, and V. Chang,
Version 2.X,” http://www.omg.org/spec/UML, “L2p2: A location-label based approach for privacy
2005. preserving in lbs,” Future Generation Computer
Systems, vol. 74, no. Supplement C, pp. 375 – 384,
[41] M. Piccart, L. Biganzoli, and A. Di Leo, “The 2017.
impact of chemotherapy dose density and dose
intensity on breast cancer outcome: what have we [53] Two Crows Corporation, “Introduction to
learned?” Eur J Cancer., vol. 36, no. Suppl 1, April Data Mining and Knowledge Discovery,”
2000. http://www.twocrows.com/intro-dm.pdf, 2005.
[42] C. Ponsard, R. D. Landtsheer, Y. Guyot, [54] P. A. van Dam, “A dynamic clinical pathway for
F. Roucoux, and B. Lambeau, “Decision making the treatment of patients with early breast cancer is
support in the scheduling of chemotherapy a tool for better cancer care: implementation and
coping with quality of care, resources and ethical prospective analysis between 2002–2010,” World
constraints,” in ICEIS 2017 - Proc. of the 19th Int. Journal of Surgical Oncology, vol. 11, no. 1, 2013.
Conf. on Enterprise Information Systems, Porto, [55] A. van Lamsweerde, “Goal-oriented requirements
Portugal, April 26-29, 2017. engineering: a guided tour,” in Requirements

40
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects

Engineering, 2001. Proceedings. Fifth IEEE Mounir Touzani holds a

International Symposium on, 2001, pp. 249–262. PhD from the University of
Montpellier. His areas of
[56] A. van Lamsweerde, Requirements Engineering - expertise are requirements
From System Goals to UML Models to Software engineering, business process
Specifications. Wiley, 2009. analysis, business rule systems,
database engineering and Data
AUTHOR B IOGRAPHIES Science. He is actively involved
Ir. Christophe Ponsard in database operation for large
holds a master in Electrical scale administrative processes and is working on the
Engineering and Computer development of Big Data deployment methodologies,
Science. He runs the Software Machine Learning and Cloud computing.
and System Engineering Annick Majchrowski is a
department of CETIC focusing member of the CETIC Software
on requirements engineering, and Systems Engineering
model-driven development department since 2007. She
and software quality. He holds a BA in Mathematics
is actively contributing to several applied research and a BA in Computer Science.
programs at European level and transfer activities with She leads the activities related
local companies to foster the adoption of emerging to software process audit and
technologies like Big Data, Machine Learning and IoT. deployment. She is actively
involved in software process improvement in several
organisation both in the public and private sectors.
She also contributes to develop methodologies for the
optimal adoption of emerging technologies.

View publication stats

Lonely Planet Normandy and D Day Beaches Road Trips Lonely Planet 2024 scribd download
100% (1)
Lonely Planet Normandy and D Day Beaches Road Trips Lonely Planet 2024 scribd download
55 pages
TIFLIC Book Scanner (V2.1) - Parts List + Schematics
No ratings yet
TIFLIC Book Scanner (V2.1) - Parts List + Schematics
61 pages
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Big Data: Opportunities and challenges
From Everand
Big Data: Opportunities and challenges
BCS, The Chartered Institute for IT
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Big Data Management and Analysis: December 2014
No ratings yet
Big Data Management and Analysis: December 2014
5 pages
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Manuscript v9
No ratings yet
Manuscript v9
46 pages
A Theoretical Exploration of Data Management and Integration in Organisation Sectors
No ratings yet
A Theoretical Exploration of Data Management and Integration in Organisation Sectors
20 pages
Big Data Applications in Operations/Supply-Chain Management: A Literature Review.
No ratings yet
Big Data Applications in Operations/Supply-Chain Management: A Literature Review.
38 pages
Big Data Managementand Analysis
No ratings yet
Big Data Managementand Analysis
5 pages
Crash Course Big Data
From Everand
Crash Course Big Data
IntroBooks Team
No ratings yet
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
Big Data: Revolutionizing the Future
From Everand
Big Data: Revolutionizing the Future
Parvati Mishra
No ratings yet
Big Data Analytics For Logistics and Transportation: Conference Paper
No ratings yet
Big Data Analytics For Logistics and Transportation: Conference Paper
7 pages
BbI994
No ratings yet
BbI994
10 pages
Liapisetal.ICSOFT2014
No ratings yet
Liapisetal.ICSOFT2014
7 pages
Unlock Different vs of Big Data for Analytics
No ratings yet
Unlock Different vs of Big Data for Analytics
9 pages
Big Data Analytics in Supply Chain Management: A State-Of-The-Art Literature Review
No ratings yet
Big Data Analytics in Supply Chain Management: A State-Of-The-Art Literature Review
46 pages
A Study of Big Data Characteristics: October 2016
No ratings yet
A Study of Big Data Characteristics: October 2016
5 pages
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
What Every Manager Should Know About Big Data and Data Science
From Everand
What Every Manager Should Know About Big Data and Data Science
Lars Nielsen
No ratings yet
How to Be a Successful Software Project Manager
From Everand
How to Be a Successful Software Project Manager
Dr. Tuhin Chattopadhyay
No ratings yet
Digital Strategy: Boost Your Business with Big Data and Data Science
From Everand
Digital Strategy: Boost Your Business with Big Data and Data Science
Quick Solutions
No ratings yet
What Is A Data Driven Organization
No ratings yet
What Is A Data Driven Organization
11 pages
Big Data Review Paper Ijariie13560 PDF
No ratings yet
Big Data Review Paper Ijariie13560 PDF
7 pages
Beyond Outsourcing: Embracing Technological Convergence in the Digital Age
From Everand
Beyond Outsourcing: Embracing Technological Convergence in the Digital Age
Pasquale De Marco
No ratings yet
IEEE Conf Paper Formatvv
No ratings yet
IEEE Conf Paper Formatvv
5 pages
Big Data Analytics in Chemical Engineering: Annual Review of Chemical and Biomolecular Engineering February 2017
No ratings yet
Big Data Analytics in Chemical Engineering: Annual Review of Chemical and Biomolecular Engineering February 2017
27 pages
Author's Accepted Manuscript: Computer Standards & Interfaces
No ratings yet
Author's Accepted Manuscript: Computer Standards & Interfaces
27 pages
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
2016 JSME Turetken PublishedCopy V1
No ratings yet
2016 JSME Turetken PublishedCopy V1
19 pages
Kaisler2013 - Big Data - Issues and Challenges Moving Forward
No ratings yet
Kaisler2013 - Big Data - Issues and Challenges Moving Forward
10 pages
Managing Big Data Effectively
From Everand
Managing Big Data Effectively
Bhima Asan
No ratings yet
Big Data Analytics in Chemical Engineering: February 2017
No ratings yet
Big Data Analytics in Chemical Engineering: February 2017
27 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
From Everand
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Dr Mehmet Yildiz
4.5/5 (2)
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Big Data Review
No ratings yet
Big Data Review
8 pages
2022BigData
No ratings yet
2022BigData
10 pages
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Big Data Analytics in Cloud Computing: An Overview
No ratings yet
Big Data Analytics in Cloud Computing: An Overview
11 pages
Product Ecodesign and Materials: Current Status and Future Prospects
No ratings yet
Product Ecodesign and Materials: Current Status and Future Prospects
11 pages
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
No ratings yet
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
15 pages
2018 HISISE BigDatavf
No ratings yet
2018 HISISE BigDatavf
10 pages
AFormalDefinitionofBigDataBasedonitsEssentialFeatures DeMauroetal2016 PREPROOF PDF
No ratings yet
AFormalDefinitionofBigDataBasedonitsEssentialFeatures DeMauroetal2016 PREPROOF PDF
12 pages
Big Data Impact Analysis
No ratings yet
Big Data Impact Analysis
8 pages
Getting Data Science Done: Managing Projects From Ideas to Products
From Everand
Getting Data Science Done: Managing Projects From Ideas to Products
John Hawkins
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Big Data
No ratings yet
Big Data
5 pages
A Formal Definition of Big Data Based On Its Essential Features
No ratings yet
A Formal Definition of Big Data Based On Its Essential Features
12 pages
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
Big Data For Education in Student S' Perspective: G. Vaitheeswaran L. Arockiam
No ratings yet
Big Data For Education in Student S' Perspective: G. Vaitheeswaran L. Arockiam
7 pages
Processing Model From Mining Prospective
No ratings yet
Processing Model From Mining Prospective
5 pages
Challenges and Issues of Big Data
No ratings yet
Challenges and Issues of Big Data
19 pages
Application of Big Data For Students' Behavior Prediction in Education Industry
No ratings yet
Application of Big Data For Students' Behavior Prediction in Education Industry
11 pages
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Big Data The New Challenges in Data Mining
No ratings yet
Big Data The New Challenges in Data Mining
4 pages
Do It Smart: Seven Rules for Superior Information Technology Performance
From Everand
Do It Smart: Seven Rules for Superior Information Technology Performance
Jurgen Ringback
3.5/5 (2)
From Data To Decisions: Driving Performance in the Age of Analytics
From Everand
From Data To Decisions: Driving Performance in the Age of Analytics
Babatunde Yusuf
No ratings yet
bioenergy-data-july2020-laney
No ratings yet
bioenergy-data-july2020-laney
14 pages
BrighterioniPrevent
No ratings yet
BrighterioniPrevent
8 pages
Pay per mile insurance
No ratings yet
Pay per mile insurance
41 pages
Data Quality Model for MDM Repository
No ratings yet
Data Quality Model for MDM Repository
30 pages
TGI TEMPLATE Gov Decision Authorities Matrix
No ratings yet
TGI TEMPLATE Gov Decision Authorities Matrix
9 pages
CrosswalkNISTAIRMFISO53385339
100% (1)
CrosswalkNISTAIRMFISO53385339
4 pages
data analytics leaders survey 2024
No ratings yet
data analytics leaders survey 2024
22 pages
dataqualitymanagement
No ratings yet
dataqualitymanagement
20 pages
Alation WP 2023 DCMM Report FINAL20231130
No ratings yet
Alation WP 2023 DCMM Report FINAL20231130
20 pages
A Look at Effective Ways For AI System Security and Privacy en
No ratings yet
A Look at Effective Ways For AI System Security and Privacy en
31 pages
2023 Maersk Sustainability Report
No ratings yet
2023 Maersk Sustainability Report
65 pages
Wael Diab Presentation
No ratings yet
Wael Diab Presentation
32 pages
Katalog Roster Beton Indonesia 2021
No ratings yet
Katalog Roster Beton Indonesia 2021
234 pages
Dokumen - Tips - Chapter 11 Introduction To Information Technology Turban Rainer and Potter
No ratings yet
Dokumen - Tips - Chapter 11 Introduction To Information Technology Turban Rainer and Potter
42 pages
Webinar June 8th 2020 - Final
No ratings yet
Webinar June 8th 2020 - Final
60 pages
Fausto
No ratings yet
Fausto
44 pages
Excel Application in Cost-Volume-Profit Analysis Final
No ratings yet
Excel Application in Cost-Volume-Profit Analysis Final
71 pages
Rocket Criteria
No ratings yet
Rocket Criteria
104 pages
AUTOCAD Shortcuts - The Piping Engineering World
No ratings yet
AUTOCAD Shortcuts - The Piping Engineering World
5 pages
PeopleSoft Design Standards
No ratings yet
PeopleSoft Design Standards
20 pages
Special Carry Over Paper: Lucknow
No ratings yet
Special Carry Over Paper: Lucknow
7 pages
59. Bộ 4 đề Anh ôn vào 10 40 câu TN
No ratings yet
59. Bộ 4 đề Anh ôn vào 10 40 câu TN
42 pages
Sap Start
No ratings yet
Sap Start
73 pages
CV Lucie Mahu PDF
No ratings yet
CV Lucie Mahu PDF
4 pages
Waltan Individual Deposit Account Opening Form
No ratings yet
Waltan Individual Deposit Account Opening Form
3 pages
Data Sheet 6ES7151-1AA03-0AB0: General Information
No ratings yet
Data Sheet 6ES7151-1AA03-0AB0: General Information
2 pages
EVO-6800 Quick Start Guide - 20160922
No ratings yet
EVO-6800 Quick Start Guide - 20160922
154 pages
Project: Construction of Simunye Indoor Sport and Community Centre (Indoor Exercise) T24/2015
No ratings yet
Project: Construction of Simunye Indoor Sport and Community Centre (Indoor Exercise) T24/2015
14 pages
Term Paper of Statistics - Wilcoxon Test
No ratings yet
Term Paper of Statistics - Wilcoxon Test
17 pages
MO1 Overhauling Convectional Fuel
No ratings yet
MO1 Overhauling Convectional Fuel
110 pages
01 SquidGuard-Tips For Using LDAP
No ratings yet
01 SquidGuard-Tips For Using LDAP
1 page
ConversionTable PDF
No ratings yet
ConversionTable PDF
1 page
Don't Buffer Net:: - If The Path Is A False Path, Then No Need of Balancing The Path. So Set Don't Buffer Net Attribute
No ratings yet
Don't Buffer Net:: - If The Path Is A False Path, Then No Need of Balancing The Path. So Set Don't Buffer Net Attribute
4 pages
05-linkpred
No ratings yet
05-linkpred
79 pages
Pip Cholestin
No ratings yet
Pip Cholestin
2 pages
Changing Pattern of Financial Crime in Bangladesh
No ratings yet
Changing Pattern of Financial Crime in Bangladesh
21 pages
Complete Thesis
No ratings yet
Complete Thesis
90 pages
OceanofPDF.com Finite Element Methods Concepts and Applications in Geomechanics - Debasis Deb
No ratings yet
OceanofPDF.com Finite Element Methods Concepts and Applications in Geomechanics - Debasis Deb
288 pages
Unit1 - English For IT
No ratings yet
Unit1 - English For IT
6 pages
Surabaya X Beauty 2024 Surabaya - Copy of Pricelist
No ratings yet
Surabaya X Beauty 2024 Surabaya - Copy of Pricelist
10 pages
dc-machine-question-and-answer
No ratings yet
dc-machine-question-and-answer
12 pages
Raised Bed Garden Designs For 2x8
No ratings yet
Raised Bed Garden Designs For 2x8
2 pages
Fundamentals of Modern Electrical Substations - Part 3
No ratings yet
Fundamentals of Modern Electrical Substations - Part 3
21 pages
LLAMAS ARAH M. (Analytical Chem. Act.1)
No ratings yet
LLAMAS ARAH M. (Analytical Chem. Act.1)
6 pages

Combining Process Guidance for Big Data Projects

Uploaded by

Combining Process Guidance for Big Data Projects

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Combining Process Guidance and Industrial Feedback for Successfully

Article · December 2017

Christophe Ponsard Mounir Touzani

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Combining Process Guidance and

T YPE OF PAPER AND K EYWORDS

1 I NTRODUCTION processing such amounts of data. They typically view

organisations are actually still failing to get business

Figure 2: CRISP-DM method (source: [30])

Standard Process for Data Mining) [50] which is

Figure 4: ASUM-DM method (source: [29])

Figure 5: AABA Method (source: [9])

Figure 6: The IBM Stampede method (source: [28])

Level Ad hoc Foundational Competitive Differentiating Breakaway

Some attempts are being made to develop some

Figure 7: Iterative development of the platform and method

Table 3: Main characteristics of first pilot wave

# Domain Volume Velocity Variety Main challenge

operation by also taking into account risks that some

What has happened? What will happen? What should be done?

Statistical Analytics Data Mining Optimisation

Reports on historical data Future opportunities Recommended business decisions

4.4 Using Modelling Notations

4.5 Defining Activity Checkpoints IT system, health systems), it is useful to consider a

Analytics Revolution,” Deloitte Consulting LLP, multi-disciplinary engineering environments: A

Engineering, 2001. Proceedings. Fifth IEEE Mounir Touzani holds a

View publication stats

You might also like