SlideShare a Scribd company logo
BAE SYSTEMS PROPRIETARY1 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
BAE Systems Apache Spark GraphX and
GraphFrames
April 11th 2016
​ Eddie Baggott
BAE SYSTEMS PROPRIETARY2 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Functional and Data Architect
• BAE Systems, Norkom
• Anti Fraud, AML, Compliance, Watch lists, Cyber Security
• Disclaimer
• All my own opinion
Introduction
BAE SYSTEMS PROPRIETARY3 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Graph databases are databases that use graph structures for semantic
queries with nodes, edges and properties to represent and store data.
• Storing and showing Networks
What are graph databases
BAE SYSTEMS PROPRIETARY4 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Finding networks
• Analyse Relationships
• What to see how customers and accounts are connected
• See the transactions between them
• Credit Card
• Comprised Devices
• AML Rings
• Insurance
• Unauthorized Trading
• Social Networks
• Uber – Lyft Cancel Wars
• Panama Papers
What are they used for
BAE SYSTEMS PROPRIETARY5 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
Customer behaviour
Relationships
Showing direction of payments , co-ownerships
Use different type of lines and shapes to give extra meanings
Width of lines can show bigger amounts
BAE SYSTEMS PROPRIETARY6 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
offshoreleaks.icij.org/nodes/262484
Start search with “mossack fonseca”
Panama Papers
BAE SYSTEMS PROPRIETARY7 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
Spider out one level
Panama Papers
BAE SYSTEMS PROPRIETARY8 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
Show more connections
Panama Papers
BAE SYSTEMS PROPRIETARY9 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Graph Databases
•  Neo4j, Titan ,OrientDB
•  Can Store and manage data
•  Transversal queries
• Processing Engine
• Spark , Giraph
•  GraphX
•  GraphFrames
• Can be complementary and used together e.g. MazeRunner
• Elastic Search Graph
•  New , uses search and term relevancy
Graph Databases : different approaches
BAE SYSTEMS PROPRIETARY10 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
Apache Spark
DataFrames
GraphFrames
BAE SYSTEMS PROPRIETARY11 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• GraphX is a graph computation engine built on top of Spark that enables
users to interactively build, transform and reason about graph structured
data at scale. It comes complete with a library of common algorithms.
• Spark , based on RDDs
• Num Vertices, Num Edges ,Degrees
•  Algorithms
•  PageRank
•  Connected Components
•  Triangle Counting
GraphX
BAE SYSTEMS PROPRIETARY12 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
•  In Big Data “Hello World” is usually a “Word Count”, of Wikipedia
• So lets graph wiki
•  Clean the Data
•  Making a Vertex RDD
val vertices = articles.map(a => (pageHash(a.title), a.title))
•  Making the Edge RDD
val edges: RDD[Edge[Double]] = articles.flatMap { a =>
Edge(srcVid, dstVid, 1.0) }
•  Making the Graph
val graph = Graph(vertices, edges, "")
•  Run PageRank on Wikipedia
val dublinGraph = graph.subgraph(vpred = (v, t) =>
t.toLowerCase contains “dublin")
val prDublin = dublinGraph.staticPageRank(5)
titleAndPrGraph.vertices.top(10).print
GraphX Example
BAE SYSTEMS PROPRIETARY13 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• GraphFrames support general graph processing, similar to Apache Spark’s
GraphX library. However, GraphFrames are built on top of Spark
DataFrames, resulting in some key advantages:
• Python, Java & Scala APIs: GraphFrames provide uniform APIs for all 3
languages. For the first time, all algorithms in GraphX are available from
Python & Java.
• Powerful queries: GraphFrames allow users to phrase queries in the
familiar, powerful APIs of Spark SQL and DataFrames.
• Saving & loading graphs: GraphFrames fully support DataFrame data
sources , allowing writing and reading graphs using many formats like
Parquet, JSON, and CSV.
• In GraphFrames, vertices and edges are represented as DataFrames,
allowing us to store arbitrary data with each vertex and edge
• http://spark-packages.org/package/graphframes/graphframes
Spark Graph Frames
BAE SYSTEMS PROPRIETARY14 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
Spark Graph Frames Example
Customer	 ID	
Eddie	 1	
Alan	 2	
Matt	 3	
Deirdre	 4	
Bob	 5	
Sue	 6	
John	 7	
// Create Vertices ( customer ) and Edges payments )
Vertices = customers.select("Customer", "id").distinct()
Edges = payments.select("Sender","Receiver","Amount", "Country")
Graph = GraphFrame(Vertices, Edges)
Sender 	 Receiver	 Amount 	 Country 	
Eddie	 Matt	 10,000 	 UK	
Eddie	 Deirdre	 15,000 	 Irl	
Eddie	 Bob	 25,000 	 USA	
Alan	 Sue	 32,000 	 USA	
Alan	 John	 43,000 	 USA	
Matt	 Alan	 50,000 	 Irl	
Matt	 Deirdre	 60,000 	 Irl	
Matt	 Bob	 120,000 	 USA
BAE SYSTEMS PROPRIETARY15 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Who sent more than 100k?
graph.vertices.filter(“amount> 100000").show
Matt
• Who sent to more than 2 people?
graph.inDegrees.filter("inDegree > 2").show
Eddie,Matt
• Who sent to most to Ireland?
graph.edges.filter(“country =‘Irl’” "). groupBy(”sender”).sum
•  Who are most connected?
results = graph.pageRank(resetProbability=0.15, maxIter=10)
display(results.vertices)
Spark Graph Frames Example
BAE SYSTEMS PROPRIETARY16 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Another way to see who is sending money to who
Chord Diagram
BAE SYSTEMS PROPRIETARY17 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
www.elastic.co/products/graph
• Find connections based on relevance
• 
Elastic Search : Graph
BAE SYSTEMS PROPRIETARY18 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
• Graph good for Finding networks and Analysing Relationships
• Different approaches
• Lots of visualization options
• Get the benefits of using Spark
• We’re hiring!
• http://www.baesystems.com/en/cybersecurity/careers
•  Any Questions?
Recap
FREEDOM OF INFORMATION ACT
This document (<projectreference><documentnumber>) contains confidential and commercially sensitive material
which is provided for the Authority’s internal use only and is not intended for general dissemination.
The information contained herein pertains to bodies dealing with security, national security and/or defence matters
that would be exempt under Sections 23, 24 and 26 of the Freedom of Information Act 2000 (FOIA). It also consists of
information which describes our methodologies, processes and commercial arrangements all of which would be exempt
from disclosure under Sections 41 and 43 of the Act.
Should the Authority receive any request for disclosure of the information provided in this document, the Authority is
requested to notify BAE Systems Applied Intelligence. BAE Systems Applied Intelligence shall provide every assistance
to the Authority in complying with its obligations under the Act.
BAE Systems Applied Intelligence’s point of contact for FOIA requests is:
Chief Counsel
Legal Department
BAE Systems Applied Intelligence
Surrey Research Park
Guildford Gu2 7YP
Telephone 01483 816082
BAE SYSTEMS PROPRIETARY19 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY
BAE SYSTEMS
Surrey Research Park
Guildford
Surrey
GU2 7YP
United Kingdom
T: +44 (0)1483 816000
F: +44 (0)1483 816144
Copyright © 2015 BAE Systems. All Rights Reserved.
BAE SYSTEMS, the BAE SYSTEMS Logo and the product names referenced herein are trademarks of BAE Systems plc.
No part of this document may be copied, reproduced, adapted or redistributed in any form or by any means without
the express prior written consent of BAE Systems Applied Intelligence.
BAE Systems Applied Intelligence Limited registered in England and Wales Company No. 1337451 with its registered
office at Surrey Research Park, Guildford, England, GU2 7YP.
BAE SYSTEMS PROPRIETARY20 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved.
(See final slide for restrictions on use.)
|
BAE SYSTEMS PROPRIETARY

More Related Content

PDF
Neo4j Data Loading with Kettle
PDF
Training Series: Build APIs with Neo4j GraphQL Library
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
PDF
Training Week: Build APIs with Neo4j GraphQL Library
PDF
Full Stack Graph in the Cloud
PPTX
GraphQL - The new "Lingua Franca" for API-Development
PDF
GraphConnect 2014 SF: How eBay and Shutl Deliver Even Faster Using Neo4j
PPTX
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...
Neo4j Data Loading with Kettle
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Training Week: Build APIs with Neo4j GraphQL Library
Full Stack Graph in the Cloud
GraphQL - The new "Lingua Franca" for API-Development
GraphConnect 2014 SF: How eBay and Shutl Deliver Even Faster Using Neo4j
05Nov13 Webinar: Introducing Revolution R Enterprise 7 - The Big Data Big Ana...

What's hot (20)

PPTX
GraphTour - Neo4j Platform Overview
PDF
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
PDF
Graphs & Neo4j - Past Present Future
PDF
Boost your APIs with GraphQL 1.0
PPT
Graphql presentation
PPTX
GraphQL Introduction
PPTX
An intro to GraphQL
PDF
Keeping up a Competitive Ceph/RadosGW S3 API (Cephalocon Barcelona 2019)
PPTX
GraphTour - Workday: Tracking activity with Neo4j (English Version)
PDF
GraphQL across the stack: How everything fits together
PDF
GraphQL as an alternative approach to REST (as presented at Java2Days/CodeMon...
PDF
Full Stack Development with Neo4j and GraphQL
PDF
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
PDF
GraphQL over REST at Reactathon 2018
PDF
The Apollo and GraphQL Stack
PDF
GraphQL Advanced
PDF
2017 big data landscape and cutting edge innovations public
PDF
Graphql
PPTX
Gimel at Teradata Analytics Universe 2018
PDF
GraphQL & Prisma from Scratch
GraphTour - Neo4j Platform Overview
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Graphs & Neo4j - Past Present Future
Boost your APIs with GraphQL 1.0
Graphql presentation
GraphQL Introduction
An intro to GraphQL
Keeping up a Competitive Ceph/RadosGW S3 API (Cephalocon Barcelona 2019)
GraphTour - Workday: Tracking activity with Neo4j (English Version)
GraphQL across the stack: How everything fits together
GraphQL as an alternative approach to REST (as presented at Java2Days/CodeMon...
Full Stack Development with Neo4j and GraphQL
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
GraphQL over REST at Reactathon 2018
The Apollo and GraphQL Stack
GraphQL Advanced
2017 big data landscape and cutting edge innovations public
Graphql
Gimel at Teradata Analytics Universe 2018
GraphQL & Prisma from Scratch
Ad

Viewers also liked (20)

PPT
Talking about locations
PPTX
PPT
Копирайтинг. Cтатьи о дизайне интерьера
PDF
Investor Deck: A New Cause Marketing Model for Insurance
PDF
SD CADD meeting: Introduction to the PDB
PPTX
Bryan Chaplog
PDF
McLean Sibanda
PDF
Media_Entertainment_Veriticals
PDF
A look ahead at spark 2.0
PDF
Cypher to SQL online mapper
PDF
大海原の小さなイルカ
PDF
Connecting Cassandra Data with GraphFrames (Jon Haddad, The Last Pickle) | C*...
DOC
umesh.dhokanei
PDF
Mercer Capital's Value Focus: Laboratory Services | Year-End 2015
DOCX
PDF
Data Science in the Cloud
PDF
Big Data Analytics London - Data Science in the Cloud
PDF
Why I started Machine Learning Casual Talks? #MLCT
PDF
DMM.comラボでの日本語全文検索の利用事例紹介
PDF
Introduction to Spark SQL training workshop
Talking about locations
Копирайтинг. Cтатьи о дизайне интерьера
Investor Deck: A New Cause Marketing Model for Insurance
SD CADD meeting: Introduction to the PDB
Bryan Chaplog
McLean Sibanda
Media_Entertainment_Veriticals
A look ahead at spark 2.0
Cypher to SQL online mapper
大海原の小さなイルカ
Connecting Cassandra Data with GraphFrames (Jon Haddad, The Last Pickle) | C*...
umesh.dhokanei
Mercer Capital's Value Focus: Laboratory Services | Year-End 2015
Data Science in the Cloud
Big Data Analytics London - Data Science in the Cloud
Why I started Machine Learning Casual Talks? #MLCT
DMM.comラボでの日本語全文検索の利用事例紹介
Introduction to Spark SQL training workshop
Ad

Similar to Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016 (20)

PDF
Deep Dive: More Oracle Data Pump Performance Tips and Tricks
PDF
OSC Online MySQL Version up
PDF
MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?
PDF
MySQL 20 años: pasado, presente y futuro; conoce las nuevas características d...
PDF
Using The Mysql Binary Log As A Change Stream
PDF
MySQL London Tech Tour March 2015 - Whats New
PDF
20190817 coscup-oracle my sql innodb cluster sharing
PPTX
Oracle GoldenGate Performance Tuning
PPTX
GoldenGate CDR from UKOUG 2017
PDF
20190713_MySQL開発最新動向
PPTX
Sitecore Install Extensions in Action
PDF
Upcoming changes in MySQL 5.7
PDF
Sunshine php my sql 8.0 v2
PDF
ASHviz - Dats visualization research experiments using ASH data
PDF
DriverPack Solution Download Full ISO free
PDF
Atlantis Word Processor 4.4.5.1 Free Download
PDF
Neo4j Vision and Roadmap
PDF
Adobe After Effects 2025 v25.1.0 Free Download
PDF
iTop VPN Crack 6.3.3 serial Key Free 2025
PDF
Examples extract import data from anoter
Deep Dive: More Oracle Data Pump Performance Tips and Tricks
OSC Online MySQL Version up
MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?
MySQL 20 años: pasado, presente y futuro; conoce las nuevas características d...
Using The Mysql Binary Log As A Change Stream
MySQL London Tech Tour March 2015 - Whats New
20190817 coscup-oracle my sql innodb cluster sharing
Oracle GoldenGate Performance Tuning
GoldenGate CDR from UKOUG 2017
20190713_MySQL開発最新動向
Sitecore Install Extensions in Action
Upcoming changes in MySQL 5.7
Sunshine php my sql 8.0 v2
ASHviz - Dats visualization research experiments using ASH data
DriverPack Solution Download Full ISO free
Atlantis Word Processor 4.4.5.1 Free Download
Neo4j Vision and Roadmap
Adobe After Effects 2025 v25.1.0 Free Download
iTop VPN Crack 6.3.3 serial Key Free 2025
Examples extract import data from anoter

More from John Mulhall (13)

PPTX
cloud-migrations.pptx
PDF
HUGIreland_VincentDeStocklin_DataScienceWorkflows
PDF
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
PPTX
Introduction to Software - Coder Forge - John Mulhall
PDF
HUG_Ireland_Streaming_Ted_Dunning
PDF
HUG_Ireland_Apache_Arrow_Tomer_Shiran
PDF
HUG Ireland Event - HPCC Presentation Slides
PDF
HUG Ireland Event Presentation - In-Memory Databases
PDF
HUG_Ireland_BryanQuinnPresentation_20160111
PDF
HUG Ireland Event - Dama Ireland slides
PDF
Periscope Getting Started-2
PDF
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
PDF
Sonra Intelligence Ltd
cloud-migrations.pptx
HUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
Introduction to Software - Coder Forge - John Mulhall
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event Presentation - In-Memory Databases
HUG_Ireland_BryanQuinnPresentation_20160111
HUG Ireland Event - Dama Ireland slides
Periscope Getting Started-2
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
Sonra Intelligence Ltd

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Computer network topology notes for revision
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Mega Projects Data Mega Projects Data
Foundation of Data Science unit number two notes
climate analysis of Dhaka ,Banglades.pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Computer network topology notes for revision
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Launch Your Data Science Career in Kochi – 2025
Mega Projects Data Mega Projects Data

Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016

  • 1. BAE SYSTEMS PROPRIETARY1 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY BAE Systems Apache Spark GraphX and GraphFrames April 11th 2016 ​ Eddie Baggott
  • 2. BAE SYSTEMS PROPRIETARY2 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Functional and Data Architect • BAE Systems, Norkom • Anti Fraud, AML, Compliance, Watch lists, Cyber Security • Disclaimer • All my own opinion Introduction
  • 3. BAE SYSTEMS PROPRIETARY3 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Graph databases are databases that use graph structures for semantic queries with nodes, edges and properties to represent and store data. • Storing and showing Networks What are graph databases
  • 4. BAE SYSTEMS PROPRIETARY4 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Finding networks • Analyse Relationships • What to see how customers and accounts are connected • See the transactions between them • Credit Card • Comprised Devices • AML Rings • Insurance • Unauthorized Trading • Social Networks • Uber – Lyft Cancel Wars • Panama Papers What are they used for
  • 5. BAE SYSTEMS PROPRIETARY5 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY Customer behaviour Relationships Showing direction of payments , co-ownerships Use different type of lines and shapes to give extra meanings Width of lines can show bigger amounts
  • 6. BAE SYSTEMS PROPRIETARY6 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY offshoreleaks.icij.org/nodes/262484 Start search with “mossack fonseca” Panama Papers
  • 7. BAE SYSTEMS PROPRIETARY7 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY Spider out one level Panama Papers
  • 8. BAE SYSTEMS PROPRIETARY8 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY Show more connections Panama Papers
  • 9. BAE SYSTEMS PROPRIETARY9 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Graph Databases •  Neo4j, Titan ,OrientDB •  Can Store and manage data •  Transversal queries • Processing Engine • Spark , Giraph •  GraphX •  GraphFrames • Can be complementary and used together e.g. MazeRunner • Elastic Search Graph •  New , uses search and term relevancy Graph Databases : different approaches
  • 10. BAE SYSTEMS PROPRIETARY10 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY Apache Spark DataFrames GraphFrames
  • 11. BAE SYSTEMS PROPRIETARY11 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • GraphX is a graph computation engine built on top of Spark that enables users to interactively build, transform and reason about graph structured data at scale. It comes complete with a library of common algorithms. • Spark , based on RDDs • Num Vertices, Num Edges ,Degrees •  Algorithms •  PageRank •  Connected Components •  Triangle Counting GraphX
  • 12. BAE SYSTEMS PROPRIETARY12 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY •  In Big Data “Hello World” is usually a “Word Count”, of Wikipedia • So lets graph wiki •  Clean the Data •  Making a Vertex RDD val vertices = articles.map(a => (pageHash(a.title), a.title)) •  Making the Edge RDD val edges: RDD[Edge[Double]] = articles.flatMap { a => Edge(srcVid, dstVid, 1.0) } •  Making the Graph val graph = Graph(vertices, edges, "") •  Run PageRank on Wikipedia val dublinGraph = graph.subgraph(vpred = (v, t) => t.toLowerCase contains “dublin") val prDublin = dublinGraph.staticPageRank(5) titleAndPrGraph.vertices.top(10).print GraphX Example
  • 13. BAE SYSTEMS PROPRIETARY13 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • GraphFrames support general graph processing, similar to Apache Spark’s GraphX library. However, GraphFrames are built on top of Spark DataFrames, resulting in some key advantages: • Python, Java & Scala APIs: GraphFrames provide uniform APIs for all 3 languages. For the first time, all algorithms in GraphX are available from Python & Java. • Powerful queries: GraphFrames allow users to phrase queries in the familiar, powerful APIs of Spark SQL and DataFrames. • Saving & loading graphs: GraphFrames fully support DataFrame data sources , allowing writing and reading graphs using many formats like Parquet, JSON, and CSV. • In GraphFrames, vertices and edges are represented as DataFrames, allowing us to store arbitrary data with each vertex and edge • http://spark-packages.org/package/graphframes/graphframes Spark Graph Frames
  • 14. BAE SYSTEMS PROPRIETARY14 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY Spark Graph Frames Example Customer ID Eddie 1 Alan 2 Matt 3 Deirdre 4 Bob 5 Sue 6 John 7 // Create Vertices ( customer ) and Edges payments ) Vertices = customers.select("Customer", "id").distinct() Edges = payments.select("Sender","Receiver","Amount", "Country") Graph = GraphFrame(Vertices, Edges) Sender Receiver Amount Country Eddie Matt 10,000 UK Eddie Deirdre 15,000 Irl Eddie Bob 25,000 USA Alan Sue 32,000 USA Alan John 43,000 USA Matt Alan 50,000 Irl Matt Deirdre 60,000 Irl Matt Bob 120,000 USA
  • 15. BAE SYSTEMS PROPRIETARY15 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Who sent more than 100k? graph.vertices.filter(“amount> 100000").show Matt • Who sent to more than 2 people? graph.inDegrees.filter("inDegree > 2").show Eddie,Matt • Who sent to most to Ireland? graph.edges.filter(“country =‘Irl’” "). groupBy(”sender”).sum •  Who are most connected? results = graph.pageRank(resetProbability=0.15, maxIter=10) display(results.vertices) Spark Graph Frames Example
  • 16. BAE SYSTEMS PROPRIETARY16 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Another way to see who is sending money to who Chord Diagram
  • 17. BAE SYSTEMS PROPRIETARY17 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY www.elastic.co/products/graph • Find connections based on relevance •  Elastic Search : Graph
  • 18. BAE SYSTEMS PROPRIETARY18 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY • Graph good for Finding networks and Analysing Relationships • Different approaches • Lots of visualization options • Get the benefits of using Spark • We’re hiring! • http://www.baesystems.com/en/cybersecurity/careers •  Any Questions? Recap
  • 19. FREEDOM OF INFORMATION ACT This document (<projectreference><documentnumber>) contains confidential and commercially sensitive material which is provided for the Authority’s internal use only and is not intended for general dissemination. The information contained herein pertains to bodies dealing with security, national security and/or defence matters that would be exempt under Sections 23, 24 and 26 of the Freedom of Information Act 2000 (FOIA). It also consists of information which describes our methodologies, processes and commercial arrangements all of which would be exempt from disclosure under Sections 41 and 43 of the Act. Should the Authority receive any request for disclosure of the information provided in this document, the Authority is requested to notify BAE Systems Applied Intelligence. BAE Systems Applied Intelligence shall provide every assistance to the Authority in complying with its obligations under the Act. BAE Systems Applied Intelligence’s point of contact for FOIA requests is: Chief Counsel Legal Department BAE Systems Applied Intelligence Surrey Research Park Guildford Gu2 7YP Telephone 01483 816082 BAE SYSTEMS PROPRIETARY19 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY
  • 20. BAE SYSTEMS Surrey Research Park Guildford Surrey GU2 7YP United Kingdom T: +44 (0)1483 816000 F: +44 (0)1483 816144 Copyright © 2015 BAE Systems. All Rights Reserved. BAE SYSTEMS, the BAE SYSTEMS Logo and the product names referenced herein are trademarks of BAE Systems plc. No part of this document may be copied, reproduced, adapted or redistributed in any form or by any means without the express prior written consent of BAE Systems Applied Intelligence. BAE Systems Applied Intelligence Limited registered in England and Wales Company No. 1337451 with its registered office at Surrey Research Park, Guildford, England, GU2 7YP. BAE SYSTEMS PROPRIETARY20 Unpublished Work Copyright 2015 BAE Systems. All Rights Reserved. (See final slide for restrictions on use.) | BAE SYSTEMS PROPRIETARY