Building Data Science Ecosystems for Smart Cities and Smart Commerce

© 2017 IBM Corporation1
Building Data Science Ecosystems
for Smart City and Smart Commerce
Dr. Alex Liu
Chief Data Scientist
Analytics Services

Alex Liu Introduction
▪ Chief Data Scientist – Analytics Services at IBM
▪ A Data Scientist Thought Leader
▪ Chief Data Scientist for a few corporations
before joined IBM
▪ Taught advanced data analytics for the
University of South California and the University
of California at Irvine
▪ Consulted for the United Nations, Ingram Micro
…
▪ M.S. and Ph.D. from Stanford University

Data Science: Turning Data into VALUE With Models
Data Science produces insights/values via
a complicated processes
a big set of tools
3
BigInsights
(HDFS)
Cloudant
(DBaaS)
dashDB
(Analytics)
Swift
(Object
Storage
)
SQDB
(Managed
DB2)

Data Science projects return very valuable results
but a lot failed
▪Netflix, for example, integrates data science
into each part of their business; they
estimate a billion dollars in incremental
value from their personalization and
recommendation alone.
▪Knight Capital Group, for instance, lost
$440 million in 45 minutes after a mistake in
updating a model (New York times).
▪Gartner estimated that 60% of big data
projects fail in 2016, and in 2017.

Reasons why data science projects failed
▪Wrong data
▪Wrong question
▪Week stakeholders commitment
▪Lack of diverse expertise per CIO DIVE
▪Many research leads to conclude mismatch among question, data,
model, researchers, tools … as main reason
▪Due to bad coordination of parts involved

WHY - Data Science very complicated
Very Complicated enough just for Model Building Stage
• More than 50 different models: SVM, Neural Net, Decision Trees/Forests, Naïve Bayes,
Regression, SMO, k-nearest Neighbor, Clustering, Rules, …
• Combinatorially explosive number of parameter choices per algorithm: kernel type, pruning
strategy, number of trees in a forest, learning rate, …
• Wide variation in performance across different algorithm implementations (e.g., SPSS vs Python
vs WEKA vs SPARK …)
• User-Defined algorithms
• Substantial cost in user and compute time
• User spends time on trying new combinations and parameters
• Computational cost for training a single SVM can exceed 24h
• Selection commonly based on data scientist bias
• Each additional pipeline stage increases complexity dramatically!

WHY - challenges for managing data scientists
▪High turn over rate of data scientists (average tenure 18 months)
▪Ever developing new technologies to master
▪Lack of training for data scientists
7
Data Prep
Modeling
DeploymentEvaluation
Exploration

© 2017 IBM Corporation
A data science ecosystem approach
A data science ECOSYSTEM has three basic elements
1) Data portal, 2) Data Science platform, 3) Data Science community

The defining characteristics of an ecosystem - mutuality & orchestration
Markets comprise entities that operate
out of individual self-interest
Ecosystems comprise entities that operate
out of orchestrated, mutual shared-interest
A set of individuals or organizations who
exchange products or services within an
environment governed by the laws of supply
and demand
A set of individuals or organizations who
formally or informally operate together to
produce something of greater value for the
mutual benefit of the ecosystem as a whole
Ecosystems exists because operating in an orchestrated environment, participants
can deliver more value within the ecosystem acting together than acting alone

▪ Value capture reflects a networked, dynamic,
everyone-to-everyone process of exchange
▪ Ecosystems produce more value as a whole,
than the sum of the individual participants acting
independently
Value creation in a value chain
▪ Value creation is incremental as organizations
cover costs plus some return on assets
▪ Value capture reflects an additive, sequential
process of exchange
Value creation in traditional data science tends to be linear;
value creation in ecosystems tends to be networked …
Cost
plus
return
Cost
plus
Cost
plus
Cost
plus
Cost
plus
Value creation in an ecosystem

Ecosystems can yield substantial benefits
New capabilities Improved access Improved Agility
Increase Success Ratio of Data Science Projects
Embrace ecosystems’ strategic potential
Ecosystems enable organizations
to access critical capabilities that
they would otherwise have
difficulty obtaining
Ecosystems support greater
access to new or different
resources such as new talents,
new tools, new data sets
Ecosystems support quick
creation of new types of products,
with different combinations of
organizations and assets

1) An ecosystem for city data science
101
010
101
Platform
~ IBM DSX
Events Data Social Media
Analytical
Insights for
Smart Cities
Connecting all
the data
scientists from
a DS
community
Applications
Optimizing Operations Solutions
IoT Data

RMDS Communities at IBM Glendale
▪Pasadena/Glendale Meetup Community
▪Local face to face community – more than 1100 members
▪https://www.meetup.com/RMDS_LA/
▪https://www.linkedin.com/groups/1895501 has 29K participants
2 former CDOs and the current CDO are members, also presented their work here.
Aim to create an environment for utilizing big data analytics to make smart cities

105,000+
collections
349 citizen apps
500,000 data
resources
175 agencies
450 APIs
Source: City of LA Mayor’s Tech Advisor Presentation
at RMDS Meetup.
Data.gov
citizen data science ecosystem with
open data

IBM Data Science Experience
Community Open Source IBM Added Value
Powered by IBM Watson Data Platform
• Find tutorials and datasets
• Connect with Data Scientists
• Ask questions
• Read articles and papers
• Fork and share projects
• Code in Scala/Python/R/SQL
• Jupyter Notebooks
• RStudio IDE and Shiny
• Apache Spark
• Your favorite libraries
• Watson Machine Learning
• SPSS Modeler Canvas
• Advanced Visualizations
• Projects and Version Control
• Managed Spark Service
* Closed beta
a data science platform – the IBM Data Science Experience
- now IBM Watson Studio Local
https://datascience.ibm.com/

Overview
IBM Data Science Experience (DSX – now Watson Studio Local) is a social environment
where you can collaborate to solve data challenges with the best tools and latest expertise.
▪ Community
▪ Projects
▪ Notebooks
▪ Data Asset
▪ Model Management
▪ Admin console

Forrester Research Ranks IBM as a Leader in
Multimodal Predictive Analytics and Machine Learning
IBM puts AI to work. IBM Watson is a vast umbrella of
technologies and solutions, one of which is Watson Studio, a
PAML solution. Watson Studio was designed from the ground
up to aesthetically blend SPSS-inspired workflow capabilities
with open source machine learning libraries and notebook-
based interfaces.
It is designed for all collaborators — business stakeholders, data
engineers, data scientists, and app developers — who are key
to making machine learning models surface into production
applications. Watson Studio offers easy integrated access to
IBM Cloud pretrained machine learning models such as Visual
Recognition, Watson Natural Language Classifier, and many
others.
It is a perfectly balanced PAML solution for enterprise data
science teams that want the productivity of visual tools and
access to the latest open source via a notebook-based coding
interface.
Source: “The Forrester WaveTM: Multimodal Predictive Analytics And Machine
Learning Solutions, Q3 2018”, Forrester Research, September 2018

Los Angeles City as the Best Using Data in the United States
19

EX2 – IBM Weather Data
1km Visible (GOES-R will be even better)
http://www.ibm.com/weather

EX: Weather Data Serving Retails
Weather Data + WATSON Studio + RMDS Community
A data science ecosystem with weather data
101
010
101
Platform
~ IBM DSX
Weather Data Transaction
Analytical
Insights for
Smart
Commerces
Connecting all
the data
scientists from
a DS
community
Applications
Optimizing Operations Solutions
IoT Data

Five steps for building successful data science ecosystems
Know your
ecosystem and
identify new
opportunities
Recognize your
capabilities and
identify your gaps
Identify ecosystem
value and how you
might capture it
Make connections
in pursuit of your
objectives
Measure your
success and decide
next steps
▪ Understand your value chain
▪ Understand your ecosystem
▪ Identify and prioritize
ecosystem value
pools
▪ Understand your capabilities
relative to your business
model
▪ Determine what to invest in
and what to partner for
▪ Choose the right partner
▪ Engage in the right way
▪ Define measurement
model and collect data
▪ Refine business model
and ecosystem
partnerships
1
2
34
5

IBM Analytics University 2018
Notices and disclaimers
Copyright © 2018 by International Business Machines Corporation (IBM).
No part of this document may be reproduced or transmitted in any form
without written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products
that have not yet been announced by IBM) has been reviewed for accuracy as
of the date of initial publication and could include unintentional technical or
typographical errors. IBM shall have no responsibility to update this
information. This document is distributed “as is” without any warranty,
either express or implied. In no event shall IBM be liable for any damage
arising from the use of this information, including but not limited to, loss of
data, business interruption, loss of profit or loss of opportunity.
IBM products and services are warranted according to the terms and
conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans
are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled,
isolated environments. Customer examples are presented
as illustrations of how those customers have used IBM products and
the results they may have achieved. Actual performance, cost, savings or other
results in other operating environments may vary.
References in this document to IBM products, programs, or services does not
imply that IBM intends to make such products, programs or services available
in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by
independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or other
guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions
the customer may need to take to comply with such laws. IBM does not
provide legal advice or represent or warrant that its services or products will
ensure that the customer is in compliance with any law.

IBM Analytics University 2018
Notices and disclaimers
continued
Information concerning non-IBM products was obtained from the suppliers
of those products, their published announcements or other publicly available
sources. IBM has not tested those products in connection with this
publication and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of
those products. IBM does not warrant the quality of any third-party
products, or the ability of any such third-party products to interoperate with
IBM’s products. IBM expressly disclaims all warranties, expressed or
implied, including but not limited to, the implied warranties of
merchantability and fitness for a particular, purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS,
Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management
System™, FASP®, FileNet®, Global Business Services®,
Global Technology Services®, IBM ExperienceOne™, IBM SmartCloud®, IBM
Social Business®, Information on Demand, ILOG, Maximo®,
MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower,
PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®,
Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli® Trusteer®, Unica®, urban{code}®, Watson,
WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM trademarks is
available on the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.

Building Data Science Ecosystems for Smart Cities and Smart Commerce

Recommended

More Related Content

What's hot (20)

Similar to Building Data Science Ecosystems for Smart Cities and Smart Commerce (20)

Recently uploaded (20)

Building Data Science Ecosystems for Smart Cities and Smart Commerce