Combining Process Guidance for Big Data Projects
Combining Process Guidance for Big Data Projects
net/publication/321944704
CITATIONS READS
14 7,466
3 authors:
Annick Majchrowski
12 PUBLICATIONS 70 CITATIONS
SEE PROFILE
All content following this page was uploaded by Christophe Ponsard on 22 January 2018.
Open Access
Open Journal of Big Data (OJBD)
Volume 3, Issue 1, 2017
http://www.ronpub.com/ojbd
ISSN 2365-029X
A BSTRACT
Companies are faced with the challenge of handling increasing amounts of digital data to run or improve their
business. Although a large set of technical solutions are available to manage such Big Data, many companies lack
the maturity to manage that kind of projects, which results in a high failure rate. This paper aims at providing better
process guidance for a successful deployment of Big Data projects. Our approach is based on the combination
of a set of methodological bricks documented in the literature from early data mining projects to nowadays. It is
complemented by learned lessons from pilots conducted in different areas (IT, health, space, food industry) with a
focus on two pilots giving a concrete vision of how to drive the implementation with emphasis on the identification
of values, the definition of a relevant strategy, the use of an Agile follow-up and a progressive rise in maturity.
Regular research paper: big data, process model, agile, method adoption, pilot case studies
26
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
27
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
• they fail to provide a good management view on For example, IBM has developed ASUM-DM, an
communication, knowledge and project aspects, extension and refinement of CRISP-DM combining
• they lack some form of maturity model enabling to traditional project management with agility principles
highlight more important steps and milestones that [26]. Figure 4 illustrates its main blocks and its
can be progressively raised, iterative principle driven by specific activities at the
level of the last columns. These include governance
• despite the standardisation, they are not always and community alignment. However, it does not cover
known by the wider business community, hence the infrastructure/operations side of implementing a data
difficult to adopt for managing the data value mining/predictive analytics project. It is more focused
aspect, on activities and tasks in the deployment phase and has
• the proposed iterative model is limited: the planned no templates nor guidelines.
iterations are little used in practice because they do
not loop back to the business level but rather stay Although it looks quite adequate, deploying an Agile
in the internal IT context. In addition to the lack of approach for Big Data may still face resistance, just as
control on the added value, those iterations are very it is the case for software development, typically in a
often postponed. This is the reason why more agile more rigid kind of organisation. A survey was conducted
models were introduced. to validate this acceptance [23]. It revealed that quite
similarly as for software development, companies tend
to accept Agile methods for projects with smaller scope,
2.2 Methods Adopting Agile Principles
lesser complexity, fewer security issues and inside
Agile methods, initially developed for software organisation with more freedom. Otherwise, a more
development, can also be applied to data analysis in traditional plan-managed approach is preferred.
28
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
2.3 Methods Developped for Big Data Projects 4. The availability of a reference architecture and a
technology catalog ease the definition and evolution
Architecture-centric Agile Big data Analytics (AABA) of the data processing.
addresses technical and organizational challenges of
Big Data [9]. Figure 5 shows it supports an Agile
5. Feedback loops need to be open, e.g. about
delivery. It also integrates the Big Data system Design
non-functional requirements such as performance,
(BDD) method and Architecture-centric Agile Analytics
availability and security, but also for business
with architecture-supported DevOps (AAA) model for
feedback about emerging requirements.
effective value discovery and continuous delivery of
value.
Stampede is another method proposed by IBM to its
The method was validated on 11 case studies
customers. Expert resources are provided at cost to help
across various domains (marketing, telecommunications,
companies to get started with Big Data in the scope
healthcare) with the following recommendations:
of a well-defined pilot project [28]. Its main goal is
1. Data Analysts/Scientists should be involved early in to educate companies and help them get started more
the process, i.e. already at business analysis phase. quickly, in order to drive value from Big Data. A key tool
of the method is a half day workshop to share definitions,
2. Continuous architecture support is required for big identify scope/big data/infrastructure, establish a plan
data analytics. and most importantly establish the business value. The
pilot execution is typically spread over 12 weeks and
3. Agile bursts of effort help to cope with rapid carried out in an Agile way with a major milestone at
technology changes and new requirements. about 9 weeks as depicted in Figure 6.
29
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
Table 1: Maturity Model from Nott and Betteridge (IBM) (source: [39])
2.4 Some Complementary Approaches data analysis used, the alignment of the IT infrastructure,
as well as aspects of culture and governance.
2.4.1 Introducing Maturity Models
30
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
2.4.3 Critical Success Factors The popular SCRUM approach was used as it
emphasizes collaboration, functioning software,
In complement to processes, many key success factors,
team-self management and flexibility to adapt to
best practices and risk checklists have been published,
business realities [49].
mostly in blogs for Chief Information Officers, e.g.
[4]. A systematic classification of Critical Success The global methodology is composed of three
Factors has been proposed by [24] using three key successive phases detailed hereafter:
dimensions: people, process and technology. It has been
further extended by [46] with tooling and governance 1. Big Data Context and Awareness. In this
dimensions. A few key factors are the following: introductory phase, one or more meetings take place
with the target organisation. A general introduction
• Data: quality, security, level of structure in data. is given on Big Data concepts, the available
platform, a few representative applications in
• Governance: management support, well-defined
different domains (possibly with already a focus on
organisation, data-driven culture.
the organisation domain), the main challenges and
• Objectives: business value identified (KPI), main steps. The maturity of the client and a few risk
business case-driven, realistic project size. factors can be checked (e.g. management support,
internal expertise, business motivation).
• Process: agility, change management, maturity,
coping with data growth. 2. Business and Use Case Understanding. This is also
the first phase of CRISP-DM. Its goals are to collect
• Team: data science skills, multidisciplinarity. the business needs/problems that must be addressed
using Big Data and also to identify one or some
• Tools: IT infrastructure, storage, data visualization
business use cases generating the most value out of
capabilities, performance monitoring.
the collected data. A checklist supporting this phase
is shown in Table 2.
3 M ETHOD D EVELOPMENT AND
VALIDATION P ROCESS Table 2: Workshop checklist (source: [29])
The global aim of our project is to come up with a Business Understanding Use Case Understanding
systematic method to help companies facing big data Strategy & Positioning Assess Business Maturity
challenges to validate the potential benefits of a big data Global Strategy
solution. The global process is depicted in Figure 7. Product/Services Positioning Determine
The process is driven by eight successive pilots Scorecards - KPI’s Use Case Objectives
which are used to tune the method and make more Digital Strategy Value to the Client
Touchpoints for Business Success Criteria
technical bricks available through the proposed common
Customers/Prospects
infrastructure. The final expected result is to provide a Search, e-Commerce, Assess Situation
commercial service to companies having such needs. Social Media, Websites,... Resource Requirements
The selected method is strongly inspired by what Direct Competitors Assumptions/Constraints
we learned from the available methods and processes Disruptive Elements Risks and Contingencies
described in Section 2: Disruptive Models Terminology
Disruptive Technologies Costs and Benefits
• the starting point was Stampede because of some Disruptive Behaviour
initial training and the underlying IBM platform.
Key aspects kept from the methods are the initial Select Potential Use Cases Refine Data Mining Goals
workshop with all stakeholders, the realistic focus Objectives Data Mining Goals
and a constant business value driver, Priorities Data Mining KPIs
Costs, ROI, Constraints
• however, to cope with the lack of reference material, Value to the Client Produce Project Plan
we defined a process model based on CRISP-DM Scope Approach
which is extensively documented, New Data Source Deliverables
Schedule
• the pilots are executed in an Agile way, given High Level Feasibility Risk Mitigation (Privacy,...)
the expert availabilities (university researchers), Data Mining Goals & KPIs Stakeholders to involve
the pilots are planned over longer periods than in Resources Availability Initial Assessment of
Stampede: 3-6 months instead of 12-16 weeks. Time to Deliver Tools and Techniques
31
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
This phase is organised based on one or a use it (e.g. right visualization, dashboard) and
few workshops, involving the Business Manager, start monitoring (performance, accuracy).
Data Analyst, IT architect and optionally selected
specialists, such as the IT security manager if there Our pilots are kept confidential. However, Table 3
are specific security/privacy issues that need to be presents the main features of the first four pilots based
checked at this early stage. Both the as-is and on the three first “V” of Big Data [37].
to-be are considered. Specific tools to support
the efficient organisation of those workshops are 4 L ESSONS AND R ECOMMENDATIONS
described in Section 4. At the end of this step, a L EARNED FROM OUR P ILOT C ASES
project planning is also defined.
In this section, we present some lessons learned and
3. Pilot Implementation of Service or Product. In this related guidelines that are useful to support the whole
phase, the following implementation activities are process and increase the chances of success. We also
carried out in an Agile way: illustrate our feedback based on some highlights from
two pilot cases used as running examples: the IT
• Data Understanding: analyse data sets to
maintenance pilot case and the clinical pathway pilot
detect interesting subset(s) for reaching the
case.
business objective(s) and make sure about
data quality.
4.1 Defining Progressive and Measurable
• Data Preparation: select the right data and Objectives.
clean/extend/format them as required.
• Modelling: select specific modelling Through the deployment of a Big Data solution, a
techniques (e.g. decision-tree or neural company expects to gain value out of its data. The
networks). The model is then built and tested business goals should be clearly expressed. There
for its accuracy and generality. Possible exists different methods to capture goals. In our pilots
modelling assumptions are also checked. we integrated goal oriented requirements engineering
The model parameters can be reviewed or techniques to elicit and structure business goals and
other/complementary techniques can be connect them with data processing processes and
selected. components [55, 56]. Such methods also include specific
techniques to verify that goals are not too idealised by
• Evaluation: assess the degree to which helping in the discovery of obstacles and their resolution
the model meets business objectives, using in order to define achievable goals.
realistic or even real data. Another way to connect the goals with the (business)
• Deployment: transfer the validated solution to reality is to define how to measure the resulting
production environment, make sure user can value which should be defined right from the business
32
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
understanding phase, typically by relying on KPIs (Key diagnose the root causes of incidents and resolve them in
Performance Indicators). Companies should already order to avoid possible further incidents which can turn
have defined their KPIs and be able to measure them. If into a nightmare when occurring in a reactive mode. The
this is not the case, they should start improving on this: ultimate goal is to increase the service availability, the
in other words, Business Intelligence should already be customer satisfaction and also reduce the operating costs.
present in companies. The resulting KPI is called Total Cost of Ownership
Based on this, different improvement strategies can be (TCO) and typical breakdown costs to be can be
identified and discussed to select a good business case. considered are:
In this process, the gap with the current situation should
also be considered, it is safer to keep a first project with • maintenance on hardware and software that could
quite modest objectives than risking to fail by trying a be reduced through a better prediction,
too complex project that could bring more value. Once a
• personnel working on these incidents,
pilot is successful, further improvements can be planned
in order to add more value. • any penalties related to customer Service Level
Agreements (SLAs),
Computer Maintenance Area Case Study. The
large IT provider considered here manages more • indirect effects on the client’s business and its brand
than 3000 servers that are hosting many web sites, image.
running applications and storing large amount of related
customer data. No matter what efforts are taken, Clinical Pathway Case Study. Hospitals are
servers are still likely to go off-line, networks to increasingly deploying clinical pathways, defined as
become unavailable or disks to crash and generally a multidisciplinary vision of the treatment process
at times that are not expected, less convenient and required by a group of patients with the same pathology
more costly to manage, like during the night or with predictable clinical follow-up [6]. The reason is
weekends. The considered company is currently not only to reduce the variability of clinical processes
applying standard incident management and preventive but also to improve care quality and have a better cost
maintenance procedures based on a complete monitoring control [54]. It also enables richer analysis of the
infrastructure covering both the hardware (network data produced and thus the profiling of patients with
appliances, servers, disks) and the application level higher risks (for example due to multi-pathology or
(service monitoring). intolerances).
In order to reduce the number of costly reactive events A typical workflow (e.g. for chemotherapy) is shown
and optimise preventive maintenance, the company is in Figure 8. It is a sequence of drugs deliveries or cures,
willing to develop more predictive maintenance by trying generally administered in day hospital. Each cure is
to anticipate the unavailability of the servers in such a followed by a resting period at home that lasts for a
way they can react preventively and, ultimately, prevent few days to a few weeks. A minimal interval between
such unavailability. In the process, the client wants to cures is required because chemotherapy drugs are toxic
33
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
and the body needs some time to recover between two Medical literature has shown, for a number of cancers,
drugs deliveries. When following the ideal treatment that the relapse-free survival is strongly correlated with
protocol, the number of cancerous cells are progressively the RDI. For instance, for breast cancer, a key threshold
reduced, hopefully to reach a full healing or cancer value is 85% [41]. Hence this indicator can be seen has
remission. If for some reason, chemotherapy cures do a gauge that should be carefully managed across the
not closely follow the intended periodicity or if doses whole clinical pathway.
are significantly reduced, the treatment efficiency may
be suboptimal. In such conditions, cancerous cells may 4.2 From Descriptive to Predictive and then
multiply again, which can result in a cancer relapse. Prescriptive Data Analysis.
Figure 9 shows the high level goals for the optimal Analytics is a multidisciplinary concept that can be
organisation of care pathways. Goals and obstacles are defined as the means to acquire data from diverse
respectively depicted using blue and red parallelograms. sources, process them to elicit meaningful patterns
Agents (either people or processing components) are and insights, and distribute the results to proper
pictured using yellow hexagons. Some expectations stakeholders [10, 44]. Business Analytics is the
on human agents are also captured using yellow application of such techniques by companies and
parallelograms. The adequate workflow should be organisations in order to get a better understanding of
enforced for all patients within the recommended their level of performance of their business and drive
deadlines given the possible impact on patient relapse. improvements. Three complementary categories of
Ethical principles also require a fair allocation of analytics can be distinguished and combined in order
resources, i.e. every patient deserves optimal care to reach the goal of creating insights and helping to
regardless of his medical condition or prognosis. The make better decisions. Those analytics consider different
workload should also be balanced to avoid the staff time focus, questions and techniques as illustrated in
having to manage unnecessary peak periods. Table 4 [38, 51].
Reaching those goals together, of course, requires In a number of domains, it is interesting to consider
enough resources to be available and a number of an evolution scheme starting from immediate reaction
related obstacles (in red) like monitoring the flow of raised by analysing data to more intelligence in
patients joining and leaving the pathway is important. anticipating undesirable situations, or even considering
The available workforce can also be influenced by how to prevent them as much as possible.
staff availability and some public holidays reducing
the available slot for delivering care. A number of Computer Maintenance Area Case Study. In terms
mitigation actions are then identified to better control of maintenance, starting from the identified KPI of total
that the workforce is adapted. An agent with a key cost of ownership (TCO) including the cost of purchase,
responsibility in the system is the scheduler, which must maintenance and repair in the event of a breakdown.
manage every appointment. Human agents are not very Different strategies can be envisaged:
good at this task because the problem is very large
and it is difficult to find a solution that simultaneously • react to problems only after the occurrence of
meets all patients and service constraints. Moreover, the a breakdown. This translates into a generally
planning must constantly be reconsidered to deal with high cost because quick reaction is required to
unexpected events and the flow of incoming/outgoing minimize downtime. Moreover, any unavailability
patients. In contrast, a combined predictive and has a negative impact in terms of image or even
prescriptive solution is very interesting because it has penalty if a Service Level Agreement (SLA) has
the capability to ensuring optimal care and service been violated. This should of course be minimised
34
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
Figure 9: Goal analysis for clinical pathways: strategic goals and main obstacles
through the use of next level strategies that can more likely to be saturated on days when backups are
benefit from data analytics, performed, usually predictably (weekend or month end).
An anticipation would avoid expensive interventions,
• anticipating occurrence of breakdown based on especially during weekends.
system monitoring. Simple strategies can be
implemented. For example, an alert can be
Clinical Pathway Case Study. The operation of
triggered when a storage approaches a threshold
clinical pathways is characterised by the occurrence
close to the maximum capacity. However, this
of many events which may be expected or not and
does enable the prediction of failures resulting from
thus impacting the scheduled behaviour. An important
complex sequence of events. Mostly descriptive
concern is to detect such events and decide about how
techniques are used at this level,
to manage possible deviations to minimise their impact,
• try to predict problems based on known history and especially on the quality of care KPI. Different possible
observation of the system. At this level, predictive strategies can be explored for this purpose:
data analysis techniques can discover cause-effect • reactive strategies should be avoided as much
relationships between parts of the system which, as possible because the impact on patient is
in cascade, can cause unavailability. For example, irreversible. Some case of reactive can be related
applying a badly validated patch may affect a to a patient no-show or last minute medical no-go.
service that can itself paralyse a business process, The action is to reschedule an new appointment as
soon as possible.
• improving the system is the ultimate step. It
is necessary to ensure that the system operates • preventive strategies can be used to reduce the risk
under optimum conditions by eliminating the root of now-show, for example by sending a reminder
causes that could trigger some failure. Prescriptive (e.g. text message, phone call) one or two days
techniques are used at this level. before the appointment. Descriptive data analytics
are enough at this level.
The predictive solution was the best option, but it
should only be considered if the preventive step is carried • predictive strategies relying on predictive analytics
out. Similarly, the most common time patterns should be can be used to learn risk factors for specific patients
identified and processed first. For example, a storage is which could result in more careful planning or
35
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
Table 4: Overview of analytics in terms of questions, techniques and outputs (source: [51])
Business Analytics
Descriptive Analytics Predictive Analytics Prescriptive Analytics
Outputs Techniques Questions
Table 5: Some workshop questions about data Table 6: Evaluation readiness checklist (partial)
Q.UD.1 What are the data sources and data types R.EV.1 Are you able to understand/use the results
used in your current business processes? of the models?
Q.UD.2 What tools/applications are used to deal R.EV.2 Do the model results make sense to you
with your current business processes? from a purely logical perspective?
Q.UD.3 Are your present business processes R.EV.3 Are there apparent inconsistencies that
performing complex processing on data? need further exploration?
Q.UD.4 How available is your data? What happens R.EV.4 From your initial glance, do the results
if data is not available? seem to address your organizations
Q.UD.5 Do different users have different access business questions?
rights on your data?
Q.UD.6 Does your data contain sensitive
information (e.g. personal or company support both as possible preparation before the workshop
confidential data)? and as checklist during the workshop. Table 5 shows a
Q.UD.7 What are the consequence of data few questions about the data to process.
alteration? Do you know the quality level
of your data?
36
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
37
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
based services [34, 52]. realised the key when considering the adoption of Big
At a more general level, in order to better control the Data in an organisation, is the process followed to come
huge quantities of data processed every day and to ensure up with a method that fits the context, needs and will
that every single person is respected, the European maximize the chance of success. Based on this idea, we
Commission has issued the General Data Protection defined a generic guidance process relying on available
Regulation in 2016 that will come into force in May 2018 methods as building bricks. To be meaningful our
[19]. An EU portal with extensive resources is available approach is also strongly relying on lessons learned from
to give some starting points to companies [20]. industrial cases which on one hand helped in validating
Our recommendation, based on our pilots, is to our process guidance and on the other hand can also be
investigate this issue early in the process. This used as concrete supporting illustration.
can already be envisioned at the business and data Moving forward, we plan to consolidate our work
understanding phases and involve relevant people like based on what we will learn in the next series of
Chief Information Security Officer or even a more pilot projects. This includes investigating challenges
specific Data Protection Officer if this role is defined. from other domains. We plan to address life sciences
Actually this happened quite naturally in most of our which requires a sustained processing of high volume
pilot cases because the data had to be processed outside of data and the space domain with highly distributed
of the owning organisation. However, the focus was infrastructures. Considering the global development
more on confidentiality than on the purpose of the data process, until now we have mainly focused on the
processing itself. discovery and data understanding phases. So our plan is
to provide more guidance on the project execution phase
5.3 Cyber Security Issues using our most advanced pilots that are now reaching full
deployment. In our guidance process, we also had to
Among the challenges of the Big Data, data security is face a number of problems which sometimes blocked all
paramount against piracy and requires the development further progress. In some cases the reason was a lack of
of systems to secure trade by ensuring strict control of business value or maturity, for which the recommended
access to the Big Data platform and thus guarantee the action was to postpone the process. In other cases, some
confidentiality of data. Securing a Big Data platform blocking issues could not be overcome or were delaying
is nevertheless a domain in its own right because the the project a lot longer than expected, e.g. to set up
very principle of this system is that it can be based on a non-disclosure agreement about data access, to get
a heterogeneous architecture spread over several nodes. actual data access, to configure proprietary equipment,
The ENISA has produced a landscape of Big Data threats etc. Guidance about how to detect and avoid such cases
and a guide of good practices [14]. This document is also part of our work as it helps to increase the chance
lists typical Big Data assets, identifies related threats, of successful deployment.
vulnerabilities and risks. Based on these points, it
suggests emerging good practices and active areas for
research. ACKNOWLEDGEMENTS
Storing sensitive data on the Cloud, for example,
is not without consequences, because the regulations This research was partly funded by the Walloon Region
are not the same in all countries. A sensitive aspect through the “PIT Big Data” project (grant nr. 7481). We
is the management of the data storage and processing thank our industrial partners for sharing their cases and
locations, e.g. the need to process data in a given providing rich feedback.
country. However, as this situation is also hindering
European competitiveness in a global market, the EU is R EFERENCES
currently working on a framework for the free flow of
non-personal data in the EU [21]. [1] R. Balduino, “Introduction to OpenUP,” https://
www.eclipse.org/epf/general/OpenUP.pdf, 2007.
6 C ONCLUSIONS [2] L. Bargiotti, I. Gielis, B. Verdegem, P. Breyne,
F. Pignatelli, P. Smits, and R. Boguslawski,
In this paper, we described how we addressed the
“European Union Location Framework Guidelines
challenges and risks of deploying a Big Data solution
for public administrations on location privacy. JRC
within companies willing to adopt this technology in
Technical Reports,” 2016.
order to support their business development. We first
looked at different methods reported over time in the [3] M. Batty, “Predictive Modeling for Life Insurance
literature. Rather than building yet another method, we Ways Life Insurers Can Participate in the Business
38
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
39
Open Journal of Big Data (OJBD), Volume 3, Issue 1, 2017
[32] L. Lau, F. Yang-Turner, and N. Karacapilidis, [43] C. Ponsard, A. Majchrowski, S. Mouton, and
“Requirements for big data analytics supporting M. Touzani, “Process guidance for the successful
decision making: A sensemaking perspective,” deployment of a big data project: Lessons learned
in Mastering data-intensive collaboration and from industrial cases,” in Proc. of the 2nd Int.
decision making, N. Karacapilidis, Ed. Springer Conf. on Internet of Things, Big Data and Security,
Science & Business Media, April 2014, vol. 5, pp. IoTBDS 2017, Porto, Portugal, April 24-26, 2017.
49 – 70. [44] D. J. Power, “Using ’Big Data’ for analytics and
[33] D. Lehmann, D. Fekete, and G. Vossen, decision support,” Journal of Decision Systems,
“Technology selection for big data and analytical vol. 23, no. 2, Mar. 2014.
applications,” Open Journal of Big Data (OJBD), [45] E. Rot, “How Much Data Will You Have
vol. 3, no. 1, pp. 1–25, 2017. [Online]. in 3 Years?” http://www.sisense.com/blog/
Available: http://nbn-resolving.de/urn:nbn:de:101: much-data-will-3-years, 2015.
1-201711266876
[46] J. Saltz and I. Shamshurin, “Big Data Team
[34] L. Liu, “From data privacy to location privacy: Process Methodologies: A Literature Review and
Models and algorithms,” in Proceedings of the the Identification of Key Factors for a Projects
33rd International Conference on Very Large Data Success,” in Proc. IEEE Int. Conf. on Big Data,
Bases, ser. VLDB ’07. VLDB Endowment, 2007, 2016.
pp. 1429–1430.
[47] J. S. Saltz, “The need for new processes,
[35] G. Lyman, “Impact of chemotherapy dose intensity
methodologies and tools to support big data teams
on cancer patient outcomes,” J Natl Compr Canc
and improve big data project effectiveness,” in
Netw, pp. 99–108, Jul 2009.
IEEE International Conference on Big Data, Big
[36] G. Mariscal, s. Marbn, and C. Fernndez, “A Data 2015, Santa Clara, CA, USA, October 29 -
survey of data mining and knowledge discovery November 1, 2015, pp. 2066–2071.
process models and methodologies,” Knowledge
[48] SAS Institute, “SEMMA Data Mining
Eng. Review, vol. 25, no. 2, pp. 137–166, 2010.
Methodology,” http://www.sas.com/technologies/
[37] A. D. Mauro, M. Greco, and M. Grimaldi, “A analytics/datamining/miner/semma.html, 2005.
formal definition of big data based on its essential
[49] Scrum Alliance, “What is scrum? an agile
features,” Library Review, vol. 65, no. 3, pp. 122–
framework for completing complex projects,”
135, 04 2016.
https://www.scrumalliance.org/why-scrum, 2016.
[38] M. Minelli, M. Chambers, and A. Dhiraj, Big Data,
[50] C. Shearer, “The CRISP-DM Model: The New
Big Analytics: Emerging Business Intelligence and
Blueprint for Data Mining,” Journal of Data
Analytic Trends for Today’s Businesses, 1st ed.
Warehousing, vol. 5, no. 4, 2000.
Wiley Publishing, 2013.
[51] R. Soltanpoor and T. Sellis, Prescriptive Analytics
[39] C. Nott, “Big Data & Analytics Maturity
for Big Data. Cham: Springer International
Model,” http://www.ibmbigdatahub.com/blog/
Publishing, 2016, pp. 245–256.
big-data-analytics-maturity-model, 2014.
[40] OMG, “Unified Modeling Language (UML) - [52] G. Sun, D. Liao, H. Li, H. Yu, and V. Chang,
Version 2.X,” http://www.omg.org/spec/UML, “L2p2: A location-label based approach for privacy
2005. preserving in lbs,” Future Generation Computer
Systems, vol. 74, no. Supplement C, pp. 375 – 384,
[41] M. Piccart, L. Biganzoli, and A. Di Leo, “The 2017.
impact of chemotherapy dose density and dose
intensity on breast cancer outcome: what have we [53] Two Crows Corporation, “Introduction to
learned?” Eur J Cancer., vol. 36, no. Suppl 1, April Data Mining and Knowledge Discovery,”
2000. http://www.twocrows.com/intro-dm.pdf, 2005.
[42] C. Ponsard, R. D. Landtsheer, Y. Guyot, [54] P. A. van Dam, “A dynamic clinical pathway for
F. Roucoux, and B. Lambeau, “Decision making the treatment of patients with early breast cancer is
support in the scheduling of chemotherapy a tool for better cancer care: implementation and
coping with quality of care, resources and ethical prospective analysis between 2002–2010,” World
constraints,” in ICEIS 2017 - Proc. of the 19th Int. Journal of Surgical Oncology, vol. 11, no. 1, 2013.
Conf. on Enterprise Information Systems, Porto, [55] A. van Lamsweerde, “Goal-oriented requirements
Portugal, April 26-29, 2017. engineering: a guided tour,” in Requirements
40
C. Ponsard, M. Touzani, A. Majchrowski: Combining Process Guidance and Industrial Feedback for Successfully Deploying Big Data Projects
41