SlideShare a Scribd company logo
A Data Lake and a Data Lab to Optimize
Operations and Safety Within a Nuclear Fleet
Hadoop Summit 2016, San José, June 30th
Marie-Luce PICARD, EDF R&D – marie-luce.picard@edf.fr
Jean-Marc RANGOD, EDF-DPNT
Christophe SALPERWYCK, EDF R&D
Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
2
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
3
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
4
ELECTRICITY GENERATION
623.5 TWH
All electricity-related activities
Generation
Transmission & Distribution
Trading and Sales & Marketing
Energy services
Key figures*
€72.9 billion in sales
38.5 million customers
158,161 employees worldwide
84.7% of generation does not emit CO2
2014 INVESTMENTS
€4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
EDF :
AN EFFICIENT,
RESPONSIBLE
ELECTRICITY COMPANY
AND THE CHAMPION
OF LOW-CARBON
GROWTH
WORLD’S LEADING OPERATOR, EXCELLENT
PERFORMANCE IN FRANCE
72.9 GW installed capacity, 54% of the Group’s net generation
capacity
477.7 TWh generated, 77% of the Group’s output
58 reactors operated in France,
15 in the UK
3 EPR under construction:
— 1 in Flamanville (France)
— 2 in Taishan (China)
2 EPR in project phase
 OSART safety audit
17 best practices identified by IAEA
 France
Best generation performance for six years
 UK
World record for safety in the workplace
 China
Strengthened cooperation agreement with CNNC
NUCLEAR
EDF 2015 I P.5
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet
R&D KEY FIGURES
Scientific
partnerships with
actors of Paris-
Saclay
research departments
8
exceptional buildings
4 outstanding hall test
1 Unique equipment,
innovative
communication
tools
Diverse areas of
expertise
1500
work stations
Plenty of
collaborative
spaces
EDF LAB PARIS-SACLAY
9
Main Big Data related challenges for EDF
Power Generation
 Process monitoring and condition-based maintenance
from sensors
 Power generation forecasting for renewables
Energy management
 Load forecasting
 Balancing and optimizing generation and consumption
(using smart metering information, including
renewables)
 Electrical networks
 Smart Grid operations (local)
 Condition-based maintenance
Customers and sales
 New services to customers using smart-metering data
 Smart Homes, Smart Building, Smart Cities management
related to energy
10
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
11
Operations and maintenance of the nuclear fleet
 The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of
equipment and systems while strengthening our competitiveness:
 Have better diagnosis, improved performance and availability
 Make a better use of data and documents, so far stored into Data silos
 More globally, the IT teams and projects aim at:
 Strengthen performance of operations and maintenance through a global fleet approach
 Simplify the Industrial Information System architecture
 Improve and develop the way we use our data
 Accumulate and archive data through time
… while reducing costs
12
Voluminous and heterogeneous data …. stored in data silos
Source : Wikipedia
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
 Focus on data:
 High volume:
 data is stored up to 40-60 years (lifetime of the plant)
 SCADA data can be sampled every 20 to 40 ms (but mainly a few
seconds)
 Around 10.000 sensors per plant
 Variety:
 Data is heterogeneous
 Time series, images, documents
 Various data sources
 The actual systems (historians) don’t allow
too many concurrent access, and their SLA are
quite bad
13
A Data Lake for the nuclear fleet
ESPADON : the Data Lake for the nuclear
fleet
One DB by nuclear site, gathering data from
sensors. Use of Data Historians.
Source : Wikipedia
© M. Caraveo, Hadoop cluster NOE data center
14
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
| 15
A data lake for the nuclear fleet: big picture
….
Files
(chemical
information)
Historian -
SCADA
Files
(dosimetry)
E-monitoring
application
Viz
Interactive
queries and
reporting
Web Service
Hadoop cluster – ESPADON
Data Lake
Reports
© M. Caraveo, Hadoop cluster NOE data
center
16
Zoom on data
 4 generations of plants, but high level of normalization of data and sensors (for
example, use of trigrams for identification of elementary systems)
 Two main types of sensors : ANA (for analogic) and TOR (for state events)
 Time series
 Volume
 For the POC, 10 plants, 2 years: about 20 billions of points
 Target (59 plants) : 15 To of data (all plants, whole lifecycle)
Metric, global Date Value Quality
BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M
BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good
BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M
BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M
BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M
BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good
BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good
BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good
BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M
BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M
BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M
BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M
BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
17
Data model
 Use of HBASE and PHOENIX
 Distributed key/values store
 Allows models update (normalization requirements evolution, new indicators… new plants)
 Phoenix for SQL compliance + BI tools
 Tables
 3 tables : DDT, ANA, TOR
 Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)
 Sequential storage ; split into Hfiles and Hregion according to the plant unit
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurANA Float
q H_QualitéANA Char(10)
n H_NiveauxANA varchar(10)
Clé ColumnFamily Colonne Valeur Phoenix type
m
(concat(metriquei
d, timestamp))
0 v H_ValeurTOR Varchar(10)
q H_QualiteTOR Char(10)
n H_NiveauxTOR Varchar(10)
18
Validation and performances evaluation
 POC validation
 Upload of historical data; queries / analyses
 Existing functions: viz, reports, services
 Data injection: SCADA for the whole fleet,
integration of other sources of data
 Results
 6 weeks (estimated) needed to upload historical data
from 59 plants
 Queries for validating the model :
 Use of Jmeter for simulating load
 With or without insertion workload
 ~ < 1 second for drawing a curve for a selected month
 Integration of an existing GUI for viz (realized within a
few days)
 Validation of specific calculation within reports
 ODBC link for specific e-monitoring application
 Integration of various sources of (structured) data into
the data lake
 ‘Real-time’ insertion of data (micro-batch):
 Up to 2M points / s
 Very low latency between insertion and availability (< 10s)
SELECT
MIN(v), MAX(v),
FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),
TO_CHAR(ts, 'dd') as day,
TO_CHAR(ts, 'HH') as hour,
TO_CHAR(ts, 'mm') as minute,
count(*) as cnt
FROM
ORLI_ANA
WHERE
m = ? AND
ts > current_time()-1 AND //last 24h
ts < current_time()
GROUP BY
day, hour, minute
Phoenix query (ANA)
19
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
20
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
 Active and reactive power are indicators of constraints on alternators: effect on
their wears
• ~ 50 plants
• 20 years of data
• 10 min interval data
• Phoenix queries allow to select plants and periods of time
• Compute and show reactive power per day or per hour of the
day
• More detailed analysis
• Fleet level analysis
• Interactive queries
21
Added value of data science algorithms on heterogeneous data:
Operations and maintenance can be better optimized through data analytics run on
data coming from the whole fleet
Monitoring and control of contractual agreements when network frequency
varies (plants have to contribute to the global balance)
• Pattern matching
• Response time for different plants
• Different levels of analysis : by plant, by
generation, global
• Generic approach implemented for any
kind of patterns
22
Added value of data science algorithms on heterogeneous data
Prediction of plants cooling according to the quality of incoming water in the
plants
• Correlations?
• According to the plants
• Use of GAM models
• Integration of two internal sources +
external data
• Better understanding
• // Work in progress //
23
Integration of data science and visualization: architecture
Hadoop Cluster Web Service REST
(VM)
Browser
24
Integration of data science: a global approach
Pre-processing
Data quality
Sampling
Synchronization
…
Selection and queries
Threshold
Pattern matching
Period of time
…
Analysis and data science
Reporting
Exploratory analysis
(distribution …)
Modelling
…
25
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
26
A Data Lab in progress: a team, an approach …
… and some questions
Objectives:
Bring value from data analytics
Issues:
 Skills and organization (between entities)
 Architecture :
 Operational Hadoop cluster and loads (use of a multitenant
enterprise cluster)
 Other loads (data science)
 Data prep within Hadoop + edge machine for data science (Spark, R,
Python)
 How to quantify value
 Developments costs and maintenance
 How to industrialize
Source: Xebia
27
Outline
1. A FEW WORDS ABOUT EDF
2. CONTEXT AND OBJECTIVES
3. A DATA LAKE FOR A NUCLEAR FLEET
4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS
5. A DATA LAB IN PROGRESS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
28
Takeaways
 A Data Lake for our nuclear fleet
 In progress : industrialization and decommissioning of Historian applications
 Great reduction of licensing costs
 A Data Lab under construction
 POCs showing the added value of data science algorithms
 predictive maintenance
 In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation
costs optimization
 Issues remaining : skills, organization, technical architecture, quantify value
 Perspectives and technical issues:
 Data lakes and labs for other fleets (thermal plants, hydro, renewables)
 Scalable time-series analytics (synchronization, missing data …)
 Handling heterogeneous data (textual, images, graphs …)
 IoT platform
References
A proof of concept with Hadoop: storage and analytics of electrical time-series.
Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-
concent-with-hadoop
Massive Smart Meter Data Storage and Processing on top of Hadoop.
Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012,
Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php
Searching time-series with Hadoop in an electric power company.
Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/
Real-time energy data-analytics with Storm.
Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June
2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard
Computing Data Quality Indicators on Big Data Stream Using a CEP
Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network
Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin
http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks
Ad

Recommended

Wheatgrass details 2
Wheatgrass details 2
Atin Gupta
 
3. pns injury_._
3. pns injury_._
suknisa
 
Health benefits of ginger, Sharda jain ,Nausea of pregnancy,Menstrual pain
Health benefits of ginger, Sharda jain ,Nausea of pregnancy,Menstrual pain
Lifecare Centre
 
Antioxidants
Antioxidants
rx_sonali
 
How To Deal With Work Place Stress By Mr. Nilesh Mandlecha
How To Deal With Work Place Stress By Mr. Nilesh Mandlecha
Health Education Library for People
 
cupping & dry needling
cupping & dry needling
Sarah Guarino
 
Cathechism of Nature Cure.pptx
Cathechism of Nature Cure.pptx
AdityaAnand38650
 
Stress & stress management
Stress & stress management
rehan012
 
BASIC PRINCIPLES OF NATUROPATHY
BASIC PRINCIPLES OF NATUROPATHY
Doctor Devanshi Goel
 
Lever system in human body.pptx
Lever system in human body.pptx
Kolkata,west bengal, India
 
10 health benefits of apple
10 health benefits of apple
abdullah hil kafi
 
Application for Yoga for Stress Management
Application for Yoga for Stress Management
Satwa Yoga
 
Trigger Point Therapy Slides
Trigger Point Therapy Slides
Katie Emmett 🌐 Myofascial Decompression Therapy
 
Function of cervical region
Function of cervical region
Dr Vicky Kasundra
 
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
ESD UNU-IAS
 
Resisted exercises for lower limb
Resisted exercises for lower limb
Shamadeep Kaur (PT)
 
nutracuticals
nutracuticals
gaurav gautam
 
Importance of Yoga in The Corporate Sector
Importance of Yoga in The Corporate Sector
Yogita Mate
 
3 Explanation of Pratikramana Sutras 1-10
3 Explanation of Pratikramana Sutras 1-10
mehtavikas99
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
mark madsen
 
Building a Data Analytics PaaS for Smart Cities
Building a Data Analytics PaaS for Smart Cities
DataWorks Summit/Hadoop Summit
 
Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry
DataWorks Summit/Hadoop Summit
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 

More Related Content

What's hot (11)

BASIC PRINCIPLES OF NATUROPATHY
BASIC PRINCIPLES OF NATUROPATHY
Doctor Devanshi Goel
 
Lever system in human body.pptx
Lever system in human body.pptx
Kolkata,west bengal, India
 
10 health benefits of apple
10 health benefits of apple
abdullah hil kafi
 
Application for Yoga for Stress Management
Application for Yoga for Stress Management
Satwa Yoga
 
Trigger Point Therapy Slides
Trigger Point Therapy Slides
Katie Emmett 🌐 Myofascial Decompression Therapy
 
Function of cervical region
Function of cervical region
Dr Vicky Kasundra
 
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
ESD UNU-IAS
 
Resisted exercises for lower limb
Resisted exercises for lower limb
Shamadeep Kaur (PT)
 
nutracuticals
nutracuticals
gaurav gautam
 
Importance of Yoga in The Corporate Sector
Importance of Yoga in The Corporate Sector
Yogita Mate
 
3 Explanation of Pratikramana Sutras 1-10
3 Explanation of Pratikramana Sutras 1-10
mehtavikas99
 
Application for Yoga for Stress Management
Application for Yoga for Stress Management
Satwa Yoga
 
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
"Nigerian Medicinal Plants with Potentials for the Prevention and Management ...
ESD UNU-IAS
 
Resisted exercises for lower limb
Resisted exercises for lower limb
Shamadeep Kaur (PT)
 
Importance of Yoga in The Corporate Sector
Importance of Yoga in The Corporate Sector
Yogita Mate
 
3 Explanation of Pratikramana Sutras 1-10
3 Explanation of Pratikramana Sutras 1-10
mehtavikas99
 

Viewers also liked (20)

Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
mark madsen
 
Building a Data Analytics PaaS for Smart Cities
Building a Data Analytics PaaS for Smart Cities
DataWorks Summit/Hadoop Summit
 
Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry
DataWorks Summit/Hadoop Summit
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
Filling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Logistique : Le transport dans le commerce
Logistique : Le transport dans le commerce
Thomas Malice
 
The real world use of Big Data to change business
The real world use of Big Data to change business
DataWorks Summit/Hadoop Summit
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
Capgemini
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...
Regional Science Academy
 
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
mark madsen
 
Processing and retrieval of geotagged unmanned aerial system telemetry
Processing and retrieval of geotagged unmanned aerial system telemetry
DataWorks Summit/Hadoop Summit
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
DataWorks Summit/Hadoop Summit
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Logistique : Le transport dans le commerce
Logistique : Le transport dans le commerce
Thomas Malice
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
Capgemini
 
Creative Capital, Information & Communication Technologies, & Economic Growth...
Creative Capital, Information & Communication Technologies, & Economic Growth...
Regional Science Academy
 
Ad

Similar to A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
The Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
PosterPresentation
PosterPresentation
Raj Shekhar
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
Larry Smarr
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
inside-BigData.com
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
Dataconomy Media
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Flink Forward
 
big_data_casestudies_2.ppt
big_data_casestudies_2.ppt
vishal choudhary
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.
catherine roussey
 
Linked Sensor Data cube
Linked Sensor Data cube
Laurent Lefort
 
TeraGrid Communication and Computation
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – Harris
Splunk
 
Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
inside-BigData.com
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4
Andy Moore
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
CloudLightning
 
Nfcis2009
Nfcis2009
Harikrishnan Tulsidas
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
The Pacific Research Platform
The Pacific Research Platform
Larry Smarr
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
PosterPresentation
PosterPresentation
Raj Shekhar
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
Larry Smarr
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
inside-BigData.com
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
Dataconomy Media
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
Flink Forward
 
big_data_casestudies_2.ppt
big_data_casestudies_2.ppt
vishal choudhary
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Weather Station Data Publication at Irstea: an implementation Report.
Weather Station Data Publication at Irstea: an implementation Report.
catherine roussey
 
Linked Sensor Data cube
Linked Sensor Data cube
Laurent Lefort
 
TeraGrid Communication and Computation
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – Harris
Splunk
 
Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
inside-BigData.com
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4
Andy Moore
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
CloudLightning
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
The Future of AI Agent Development Trends to Watch.pptx
The Future of AI Agent Development Trends to Watch.pptx
Lisa ward
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
The Future of AI Agent Development Trends to Watch.pptx
The Future of AI Agent Development Trends to Watch.pptx
Lisa ward
 

A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

  • 1. A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Hadoop Summit 2016, San José, June 30th Marie-Luce PICARD, EDF R&D – [email protected] Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D
  • 2. 2 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 3. 3 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 4. 4 ELECTRICITY GENERATION 623.5 TWH All electricity-related activities Generation Transmission & Distribution Trading and Sales & Marketing Energy services Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2 2014 INVESTMENTS €4.5 BILLION EDF: A GLOBAL LEADER IN ELECTRICITY *as of 2015 EDF : AN EFFICIENT, RESPONSIBLE ELECTRICITY COMPANY AND THE CHAMPION OF LOW-CARBON GROWTH
  • 5. WORLD’S LEADING OPERATOR, EXCELLENT PERFORMANCE IN FRANCE 72.9 GW installed capacity, 54% of the Group’s net generation capacity 477.7 TWh generated, 77% of the Group’s output 58 reactors operated in France, 15 in the UK 3 EPR under construction: — 1 in Flamanville (France) — 2 in Taishan (China) 2 EPR in project phase  OSART safety audit 17 best practices identified by IAEA  France Best generation performance for six years  UK World record for safety in the workplace  China Strengthened cooperation agreement with CNNC NUCLEAR EDF 2015 I P.5
  • 8. Scientific partnerships with actors of Paris- Saclay research departments 8 exceptional buildings 4 outstanding hall test 1 Unique equipment, innovative communication tools Diverse areas of expertise 1500 work stations Plenty of collaborative spaces EDF LAB PARIS-SACLAY
  • 9. 9 Main Big Data related challenges for EDF Power Generation  Process monitoring and condition-based maintenance from sensors  Power generation forecasting for renewables Energy management  Load forecasting  Balancing and optimizing generation and consumption (using smart metering information, including renewables)  Electrical networks  Smart Grid operations (local)  Condition-based maintenance Customers and sales  New services to customers using smart-metering data  Smart Homes, Smart Building, Smart Cities management related to energy
  • 10. 10 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 11. 11 Operations and maintenance of the nuclear fleet  The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of equipment and systems while strengthening our competitiveness:  Have better diagnosis, improved performance and availability  Make a better use of data and documents, so far stored into Data silos  More globally, the IT teams and projects aim at:  Strengthen performance of operations and maintenance through a global fleet approach  Simplify the Industrial Information System architecture  Improve and develop the way we use our data  Accumulate and archive data through time … while reducing costs
  • 12. 12 Voluminous and heterogeneous data …. stored in data silos Source : Wikipedia One DB by nuclear site, gathering data from sensors. Use of Data Historians.  Focus on data:  High volume:  data is stored up to 40-60 years (lifetime of the plant)  SCADA data can be sampled every 20 to 40 ms (but mainly a few seconds)  Around 10.000 sensors per plant  Variety:  Data is heterogeneous  Time series, images, documents  Various data sources  The actual systems (historians) don’t allow too many concurrent access, and their SLA are quite bad
  • 13. 13 A Data Lake for the nuclear fleet ESPADON : the Data Lake for the nuclear fleet One DB by nuclear site, gathering data from sensors. Use of Data Historians. Source : Wikipedia © M. Caraveo, Hadoop cluster NOE data center
  • 14. 14 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 15. | 15 A data lake for the nuclear fleet: big picture …. Files (chemical information) Historian - SCADA Files (dosimetry) E-monitoring application Viz Interactive queries and reporting Web Service Hadoop cluster – ESPADON Data Lake Reports © M. Caraveo, Hadoop cluster NOE data center
  • 16. 16 Zoom on data  4 generations of plants, but high level of normalization of data and sensors (for example, use of trigrams for identification of elementary systems)  Two main types of sensors : ANA (for analogic) and TOR (for state events)  Time series  Volume  For the POC, 10 plants, 2 years: about 20 billions of points  Target (59 plants) : 15 To of data (all plants, whole lifecycle) Metric, global Date Value Quality BU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/M BU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 Good BU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/M BU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/M BU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/M BU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 Good BU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 Good BU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 Good BU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/M BU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/M BU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/M BU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/M BU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M
  • 17. 17 Data model  Use of HBASE and PHOENIX  Distributed key/values store  Allows models update (normalization requirements evolution, new indicators… new plants)  Phoenix for SQL compliance + BI tools  Tables  3 tables : DDT, ANA, TOR  Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time)  Sequential storage ; split into Hfiles and Hregion according to the plant unit Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurANA Float q H_QualitéANA Char(10) n H_NiveauxANA varchar(10) Clé ColumnFamily Colonne Valeur Phoenix type m (concat(metriquei d, timestamp)) 0 v H_ValeurTOR Varchar(10) q H_QualiteTOR Char(10) n H_NiveauxTOR Varchar(10)
  • 18. 18 Validation and performances evaluation  POC validation  Upload of historical data; queries / analyses  Existing functions: viz, reports, services  Data injection: SCADA for the whole fleet, integration of other sources of data  Results  6 weeks (estimated) needed to upload historical data from 59 plants  Queries for validating the model :  Use of Jmeter for simulating load  With or without insertion workload  ~ < 1 second for drawing a curve for a selected month  Integration of an existing GUI for viz (realized within a few days)  Validation of specific calculation within reports  ODBC link for specific e-monitoring application  Integration of various sources of (structured) data into the data lake  ‘Real-time’ insertion of data (micro-batch):  Up to 2M points / s  Very low latency between insertion and availability (< 10s) SELECT MIN(v), MAX(v), FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC), TO_CHAR(ts, 'dd') as day, TO_CHAR(ts, 'HH') as hour, TO_CHAR(ts, 'mm') as minute, count(*) as cnt FROM ORLI_ANA WHERE m = ? AND ts > current_time()-1 AND //last 24h ts < current_time() GROUP BY day, hour, minute Phoenix query (ANA)
  • 19. 19 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 20. 20 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet  Active and reactive power are indicators of constraints on alternators: effect on their wears • ~ 50 plants • 20 years of data • 10 min interval data • Phoenix queries allow to select plants and periods of time • Compute and show reactive power per day or per hour of the day • More detailed analysis • Fleet level analysis • Interactive queries
  • 21. 21 Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet Monitoring and control of contractual agreements when network frequency varies (plants have to contribute to the global balance) • Pattern matching • Response time for different plants • Different levels of analysis : by plant, by generation, global • Generic approach implemented for any kind of patterns
  • 22. 22 Added value of data science algorithms on heterogeneous data Prediction of plants cooling according to the quality of incoming water in the plants • Correlations? • According to the plants • Use of GAM models • Integration of two internal sources + external data • Better understanding • // Work in progress //
  • 23. 23 Integration of data science and visualization: architecture Hadoop Cluster Web Service REST (VM) Browser
  • 24. 24 Integration of data science: a global approach Pre-processing Data quality Sampling Synchronization … Selection and queries Threshold Pattern matching Period of time … Analysis and data science Reporting Exploratory analysis (distribution …) Modelling …
  • 25. 25 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 26. 26 A Data Lab in progress: a team, an approach … … and some questions Objectives: Bring value from data analytics Issues:  Skills and organization (between entities)  Architecture :  Operational Hadoop cluster and loads (use of a multitenant enterprise cluster)  Other loads (data science)  Data prep within Hadoop + edge machine for data science (Spark, R, Python)  How to quantify value  Developments costs and maintenance  How to industrialize Source: Xebia
  • 27. 27 Outline 1. A FEW WORDS ABOUT EDF 2. CONTEXT AND OBJECTIVES 3. A DATA LAKE FOR A NUCLEAR FLEET 4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS 5. A DATA LAB IN PROGRESS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 28. 28 Takeaways  A Data Lake for our nuclear fleet  In progress : industrialization and decommissioning of Historian applications  Great reduction of licensing costs  A Data Lab under construction  POCs showing the added value of data science algorithms  predictive maintenance  In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation costs optimization  Issues remaining : skills, organization, technical architecture, quantify value  Perspectives and technical issues:  Data lakes and labs for other fleets (thermal plants, hydro, renewables)  Scalable time-series analytics (synchronization, missing data …)  Handling heterogeneous data (textual, images, graphs …)  IoT platform
  • 29. References A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of- concent-with-hadoop Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/ Real-time energy data-analytics with Storm. Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June 2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard Computing Data Quality Indicators on Big Data Stream Using a CEP Wenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015. Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Network Guillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks

Editor's Notes

  • #6: Nuclear energy supplies competitive, carbon-free electricity that we generate in the best possible safety conditions. In 2014, the International Atomic Energy Agency conducted an audit on how nuclear safety is integrated into the organisation and processes of our central departments: the IAEA found no departure from its standards and identified 17 best practices. → In France, we achieved our best performance in six years thanks to our management of scheduled shutdowns: the average length of extensions was halved. Wintertime fleet availability topped 90%. Our annual output was up 3% (415.9 TWh). • The principle of the “Grand Carénage” maintenance programme was approved. The programme involves renovating the French nuclear fleet over a 10-year period in order to extend its operating life beyond 40 years if all conditions are met. The investment is put at €55 billion for the entire fleet. • The Flamanville EPR worksite is continuing, the first nuclear plant to be built in France for 15 years. → In the UK, output was good (56.3 TWh) despite the unscheduled shutdown of two plants. EDF Energy established a world record for safety in the workplace (0.98 accidents requiring more than one day of lost time per million hours worked by employees and subcontractors). • The Hinkley Point C project to build two EPR in Somerset took a major step forward: in October, the European Commission approved the main terms of the agreements concluded with the British government. → In China, through partnerships, we are taking good advantage of the expertise we have acquired in the design, construction, operation and maintenance of our nuclear fleet. • Construction of two 1,750 MW EPR in Taishan (EDF 30% in partnership with CGN) is ongoing. • We signed an agreement to strengthen cooperation in engineering, operation and maintenance with CNNC, China’s largest state-owned nuclear company.
  • #30: 29