SlideShare a Scribd company logo
Using Servers for Fast Data Transfers
Mary Hester
Relatiemanager Onderzoek
Netwerkdag 2017
14 December 2017
Using Servers for Fast Data Transfers
http://www.spiegel.de/wissenschaft/technik/niederlaender-wollen-radwege-mit-geothermie-beheizen-a-862937.html
• To researchers, getting
access to, and/or
transferring data is hard.
• For example:
• To a supercomputing center,
• To a local cluster,
• To collaborators, etc.
What is the problem?
3
Why this can happen…
4
Fasterdata: http://fasterdata.es.net/network-tuning/tcp-issues-explained/packet-loss/
• What can we do?
• Provide infrastructure to make this possible
• Lossless networks
• No firewalls; but still secure environments
• Servers that act as gateways for the data transfers
• Provide education to support the use of this infrastructure
• To ICT departments
• Research groups/departments as needed
What needs to happen?
5
• Dedicated servers for
transferring data
• a.k.a. “data transfers nodes”
• Decouples LAN issues from WAN
• Enables faster transfers
• Part of a higher-level concept
called a Science DMZ
• End users do not log into the infrastructure directly
• Should be a seamless part of the infrastructure that improves
performance for end users
One possible solution: DTNs and Science DMZs
6
10G
Virtual Circuit
Nx10GE
10GE
10GE
10GE
10GE
10G
Routed
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
High performance
Data Transfer Node
with high-speed storage
Site/Campus
Virtual Circuits
Per-service
security policy
control points
Clean,
High-bandwidth
path to/from
WAN
Dedicated
path for virtual
circuit traffic
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
http://fasterdata.es.net/science-dmz/
• “High performing” servers
• Host tuning
• Fast storage
• High performing speeds are relative—100G, 40G, 10G or multiple 1G
• Lossless networks/connections are really important
• Security policies that do not deter data transfers
• ACLs
• Host-based firewalls
• Limited ports used for applications (i.e., no web/email)
One possible solution…continued
7http://fasterdata.es.net/home/requirements-and-expectations/
Relative comparison for data transfers
8
Campus internet
(+1000 Mbps)
Home internet
(100 Mbps)
High Performance
(+10.000 Gbps)
40 GB
400 GB
4 TB
1 minute 1 hour 1
day
1TB
10 TB
100 TB
700 MB
7 GB
700 GB
• People have been building networks like this for years
• HPC centers
• HEP facilities…
• JISC
• Jasmin Compute has Science DMZ architecture
• Protocols
• Spanish R&E community investigating performance of data transfer
protocols (i.e., like ASPERA)
• NII is working with MMCFTP
• HEP/CERN looking into other solutions outside of Globus toolkit
(gridFTP-based service)
Other work in Europe
9
mary.hester@surfnet.nl
Many thanks!
UMC Research LAN Pilot
A common, virtual and trusted research
infrastructure
for University Medical Centers
Paul van Dijk, SURF
The challenge
• 8 UMCs in NL
• Researchers dealing with huge data sets
• Omics data – full genome e.g. 75GB/pp
• Imaging data
• Collaboration is key!
• How to deal with growing demands for
data transfers and compute scale out
• How can Science DMZ concepts help?
The challenge
Can we create one virtual pool of
resources?
• How to share data and resources in
a safe and high performance
way?
• Requirements and perspectives?
• Researchers
• Resource owners
• (Research) IT staff
• Security officer
What is needed?
1. Facilities and approaches that help to
establish sufficient trust so UMCs are
willing to open up internal resources to
each other
2. High performance configurations and
solutions
From 1 to 2 network zones
General purpose
zone
Research Data Zone
Many
Small
files
Very
Large
files
Borrowing concepts from
“Science DMZ”
Performance obstacles! Friction
Alert!!
UMC Research Data Zones – Interconnected
• Multi point VPN with L3VPN
• One single MSP port needed
• BGP routing via SURFnet core
routers
• Facilitate:
• data transfers
• compute scale-out
• both in all directions
Next Steps
1. Add more partners
2. A common policy
3. Using federated identities
for access control
Conclusions
• So far... happy researchers
• Minimize impact for (Research) ICT staff, after initial setup
• General purpose network “off-loaded”
• From 8 UMCs to 1 UMC with 8 locations:
a local national private UMC network
It feels like remote
clusters are
available locally
Fast data
transfer speeds
achieved
Towards fast and easy data transfer
Discussion
Ad hoc support not scalable
And many more...
• ad hoc support niet efficient
• Compatibiliteit problematisch
• Inrichten kost teveel tijd voor
onderzoeksproject
Optimization
• Larger packetsize (jumboframes)
• Other networkprotocols (UDP)
• Specialized data transfer
software (GridFTP)
• Access control in stead of firewall
Climatology (UU)
Population Imaging (LUMC)
Bacterial drug resistance
Discovery (TUDelft)
Science DMZ concept
• Developed in the US
• Dedicated network zone voor research
data en –services
• Optimized for research data
• Data Transfer Nodes with high throughput
• Standardized solution
• Compatibility
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
High performance
Data Transfer Node
with high-speed storage
Per-service
security policy
control points
Clean,
High-bandwidth
WAN path
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
perfSONAR
Fasterdata knowledgebase:
http://fasterdata.es.net/science-dmz/
Pilots with UMCs and UvA
SURFinternet
M
S
P
University campus network A
Storage
& RDM
Current situation
SURFinternet
M
S
P
Universitair campusnetwerk A
Storage
& RDM
Performance
monitoring
Science DMZ concept
Science DMZ
Data Transfer
node
Data Transfer
node
SURFinternet
M
S
P
Universitair campusnetwerk A
Research Data Zone
Storage
& RDM
Campusnetwerk B
Campusnetwerk C
Data Transfer
Node
Performance
monitoring
Connected Research Data Zones
mary.hester@surfnet.nl
paul.vandijk@surfnet.nl
peter.hinrich@surfnet.nl
Many thanks!
Suggested reading: blog.surf.nl/researchlan
Poster: surfdrive.surf.nl/files/index.php/s/xWHTr7rf4LaJLvF
Discussion
• Hoe kunnen we dit op een schaalbare manier uitrollen?
• Minimaliseren kosten en menskracht
• Wat zou de rol van SURF kunnen zijn?
• Kennis en expertise?
• Coordineren van standaardisatie?
• Beheer van DTN’s on site?
• Wat is de rol van instellingen?
• Campusnetwerk?
• Ondersteuning van onderzoekers?

More Related Content

PPTX
Enabling efficient movement of data into & out of a high-performance analysis...
PPTX
The Science DMZ
PPTX
"Filling the Digital Preservation Gap" with Archivematica
PDF
Science DMZ at Imperial
PPTX
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
PPTX
HathiTrust Research Center Data Capsule Overview 09.10.14
PPT
DIET_BLAST
PPTX
Taming Big Data!
Enabling efficient movement of data into & out of a high-performance analysis...
The Science DMZ
"Filling the Digital Preservation Gap" with Archivematica
Science DMZ at Imperial
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
HathiTrust Research Center Data Capsule Overview 09.10.14
DIET_BLAST
Taming Big Data!

What's hot (20)

PPTX
Accelerating Discovery via Science Services
PDF
Federation and Interoperability in the Nectar Research Cloud
DOCX
JPJ1448 Cooperative Caching for Efficient Data Access in Disruption Toleran...
PPTX
IPv4 address planning - Networkshop44
DOCX
Cooperative caching for efficient data access in
PPTX
2016 09 cxo forum
PDF
Virtualization for HPC at NCI
PPTX
Learning Systems for Science
PPTX
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
PPTX
2017 bio it world
PPTX
2015 09 emc lsug
PPTX
Open Science Data Cloud (June 21, 2010)
PPT
Ticer summer school_24_aug06
PPTX
A4 r overview deck_1.7
PPTX
Coding the Continuum
PDF
What Are Science Clouds?
PPTX
2015 04 bio it world
PDF
Working with Instrument Data (GlobusWorld Tour - UMich)
PDF
Research Papers Recommender based on Digital Repositories Metadata
PPT
A New Partnership for Cross-Scale, Cross-Domain eScience
Accelerating Discovery via Science Services
Federation and Interoperability in the Nectar Research Cloud
JPJ1448 Cooperative Caching for Efficient Data Access in Disruption Toleran...
IPv4 address planning - Networkshop44
Cooperative caching for efficient data access in
2016 09 cxo forum
Virtualization for HPC at NCI
Learning Systems for Science
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
2017 bio it world
2015 09 emc lsug
Open Science Data Cloud (June 21, 2010)
Ticer summer school_24_aug06
A4 r overview deck_1.7
Coding the Continuum
What Are Science Clouds?
2015 04 bio it world
Working with Instrument Data (GlobusWorld Tour - UMich)
Research Papers Recommender based on Digital Repositories Metadata
A New Partnership for Cross-Scale, Cross-Domain eScience
Ad

Similar to Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoekers (20)

PDF
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
PDF
Common Design Elements for Data Movement Eli Dart
PPTX
Network Engineering for High Speed Data Sharing
PPTX
Shared services - the future of HPC and big data facilities for UK research
PPTX
Built around answering questions
PPTX
Future services on Janet
PDF
Data management for Quantitative Biology -Basics and challenges in biomedical...
PPTX
Science DMZ
PPTX
e-Infrastructure available for research, using the right tool for the right job
PDF
Tutorial: Maximizing Performance and Network Utility with a Science DMZ
PPTX
Data-intensive bioinformatics on HPC and Cloud
PDF
Science DMZ as a Service: Creating Science Super- Facilities with GENI
PPTX
Research network infrastructure engineers
PDF
IBM Aspera overview
PPTX
CLIMB System Introduction Talk - CLIMB Launch
PDF
Graham Pryor
PPT
Managing research data at Bristol
PPTX
The Pacific Research Platform
PDF
Bertenthal
PPT
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Common Design Elements for Data Movement Eli Dart
Network Engineering for High Speed Data Sharing
Shared services - the future of HPC and big data facilities for UK research
Built around answering questions
Future services on Janet
Data management for Quantitative Biology -Basics and challenges in biomedical...
Science DMZ
e-Infrastructure available for research, using the right tool for the right job
Tutorial: Maximizing Performance and Network Utility with a Science DMZ
Data-intensive bioinformatics on HPC and Cloud
Science DMZ as a Service: Creating Science Super- Facilities with GENI
Research network infrastructure engineers
IBM Aspera overview
CLIMB System Introduction Talk - CLIMB Launch
Graham Pryor
Managing research data at Bristol
The Pacific Research Platform
Bertenthal
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Ad

More from SURFnet (20)

PDF
7-minute-speeches. Deel 3.
PDF
The mobile evolution of the employee and student pass
PDF
Location-based services: van theorie naar praktijk. Deel 2
PDF
Automatisering en orkestratie: update en toekomstplannen
PDF
Welke nieuwe mogelijkheden biedt het SURFnet8-netwerk? Deel 2
PDF
Welke nieuwe mogelijkheden biedt het SURFnet8-netwerk? Deel 1
PDF
RUGnet, een service oriented internationaal netwerk van Fryslân tot China
PDF
Opening en netwerkvisie SURF
PDF
Trends in unwired communications
PDF
Netwerkfunctievirtualisatie: proof-of-concept en demo
PDF
SURF-dienstenportfolio: draadvrije netwerk. Deel 4
PDF
SURF-dienstenportfolio: draadvrije netwerk. Deel 3
PDF
SURF-dienstenportfolio: draadvrije netwerk. Deel 2
PDF
SURF-dienstenportfolio: draadvrije netwerk. Deel 1
PDF
De toekomst van netwerkinfrastructuur op de campus: in gesprek!
PDF
7-minute-speeches. Deel 2
PDF
Nieuwe mogelijkheden van het SURFnet-netwerk Dashboard
PDF
7-minute-speeches
PDF
Winnende voorstellen location-based services - deel 2
PDF
Winnende voorstellen location-based services - deel 1
7-minute-speeches. Deel 3.
The mobile evolution of the employee and student pass
Location-based services: van theorie naar praktijk. Deel 2
Automatisering en orkestratie: update en toekomstplannen
Welke nieuwe mogelijkheden biedt het SURFnet8-netwerk? Deel 2
Welke nieuwe mogelijkheden biedt het SURFnet8-netwerk? Deel 1
RUGnet, een service oriented internationaal netwerk van Fryslân tot China
Opening en netwerkvisie SURF
Trends in unwired communications
Netwerkfunctievirtualisatie: proof-of-concept en demo
SURF-dienstenportfolio: draadvrije netwerk. Deel 4
SURF-dienstenportfolio: draadvrije netwerk. Deel 3
SURF-dienstenportfolio: draadvrije netwerk. Deel 2
SURF-dienstenportfolio: draadvrije netwerk. Deel 1
De toekomst van netwerkinfrastructuur op de campus: in gesprek!
7-minute-speeches. Deel 2
Nieuwe mogelijkheden van het SURFnet-netwerk Dashboard
7-minute-speeches
Winnende voorstellen location-based services - deel 2
Winnende voorstellen location-based services - deel 1

Recently uploaded (20)

PPTX
Prawn filtration system. also known by the name pokkalii cultivation
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
Little Red Dots As Late-stage Quasi-stars
PPTX
Introduction to Proteins, Examples and Importance
PPTX
2. Autacoid and used for drug pharmacology.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
biomolecules-class12th chapter board classespptx
PPTX
LESSON 4_The Scientific Investigation.pptx
PPT
oscillatoria known as blue -green algae
PPT
Chemical bonding and molecular structure
PDF
NEET 2025 Megaa Solved Papers Collection
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
GENETIC TECHNOLOGY A level biology
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
PDF
diccionario toefl examen de ingles para principiante
PPTX
Discovery of Novel Antibiotics from Uncultured Microbes.pptx
Prawn filtration system. also known by the name pokkalii cultivation
Cell Membrane: Structure, Composition & Functions
bbec55_b34400a7914c42429908233dbd381773.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Little Red Dots As Late-stage Quasi-stars
Introduction to Proteins, Examples and Importance
2. Autacoid and used for drug pharmacology.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
biomolecules-class12th chapter board classespptx
LESSON 4_The Scientific Investigation.pptx
oscillatoria known as blue -green algae
Chemical bonding and molecular structure
NEET 2025 Megaa Solved Papers Collection
The KM-GBF monitoring framework – status & key messages.pptx
GENETIC TECHNOLOGY A level biology
Derivatives of integument scales, beaks, horns,.pptx
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
diccionario toefl examen de ingles para principiante
Discovery of Novel Antibiotics from Uncultured Microbes.pptx

Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoekers

  • 1. Using Servers for Fast Data Transfers Mary Hester Relatiemanager Onderzoek Netwerkdag 2017 14 December 2017
  • 2. Using Servers for Fast Data Transfers http://www.spiegel.de/wissenschaft/technik/niederlaender-wollen-radwege-mit-geothermie-beheizen-a-862937.html
  • 3. • To researchers, getting access to, and/or transferring data is hard. • For example: • To a supercomputing center, • To a local cluster, • To collaborators, etc. What is the problem? 3
  • 4. Why this can happen… 4 Fasterdata: http://fasterdata.es.net/network-tuning/tcp-issues-explained/packet-loss/
  • 5. • What can we do? • Provide infrastructure to make this possible • Lossless networks • No firewalls; but still secure environments • Servers that act as gateways for the data transfers • Provide education to support the use of this infrastructure • To ICT departments • Research groups/departments as needed What needs to happen? 5
  • 6. • Dedicated servers for transferring data • a.k.a. “data transfers nodes” • Decouples LAN issues from WAN • Enables faster transfers • Part of a higher-level concept called a Science DMZ • End users do not log into the infrastructure directly • Should be a seamless part of the infrastructure that improves performance for end users One possible solution: DTNs and Science DMZs 6 10G Virtual Circuit Nx10GE 10GE 10GE 10GE 10GE 10G Routed Border Router WAN Science DMZ Switch/Router Enterprise Border Router/Firewall Site / Campus LAN High performance Data Transfer Node with high-speed storage Site/Campus Virtual Circuits Per-service security policy control points Clean, High-bandwidth path to/from WAN Dedicated path for virtual circuit traffic Site / Campus access to Science DMZ resources perfSONAR perfSONAR http://fasterdata.es.net/science-dmz/
  • 7. • “High performing” servers • Host tuning • Fast storage • High performing speeds are relative—100G, 40G, 10G or multiple 1G • Lossless networks/connections are really important • Security policies that do not deter data transfers • ACLs • Host-based firewalls • Limited ports used for applications (i.e., no web/email) One possible solution…continued 7http://fasterdata.es.net/home/requirements-and-expectations/
  • 8. Relative comparison for data transfers 8 Campus internet (+1000 Mbps) Home internet (100 Mbps) High Performance (+10.000 Gbps) 40 GB 400 GB 4 TB 1 minute 1 hour 1 day 1TB 10 TB 100 TB 700 MB 7 GB 700 GB
  • 9. • People have been building networks like this for years • HPC centers • HEP facilities… • JISC • Jasmin Compute has Science DMZ architecture • Protocols • Spanish R&E community investigating performance of data transfer protocols (i.e., like ASPERA) • NII is working with MMCFTP • HEP/CERN looking into other solutions outside of Globus toolkit (gridFTP-based service) Other work in Europe 9
  • 11. UMC Research LAN Pilot A common, virtual and trusted research infrastructure for University Medical Centers Paul van Dijk, SURF
  • 12. The challenge • 8 UMCs in NL • Researchers dealing with huge data sets • Omics data – full genome e.g. 75GB/pp • Imaging data • Collaboration is key! • How to deal with growing demands for data transfers and compute scale out • How can Science DMZ concepts help?
  • 13. The challenge Can we create one virtual pool of resources? • How to share data and resources in a safe and high performance way? • Requirements and perspectives? • Researchers • Resource owners • (Research) IT staff • Security officer
  • 14. What is needed? 1. Facilities and approaches that help to establish sufficient trust so UMCs are willing to open up internal resources to each other 2. High performance configurations and solutions
  • 15. From 1 to 2 network zones General purpose zone Research Data Zone Many Small files Very Large files Borrowing concepts from “Science DMZ”
  • 17. UMC Research Data Zones – Interconnected • Multi point VPN with L3VPN • One single MSP port needed • BGP routing via SURFnet core routers • Facilitate: • data transfers • compute scale-out • both in all directions
  • 18. Next Steps 1. Add more partners 2. A common policy 3. Using federated identities for access control
  • 19. Conclusions • So far... happy researchers • Minimize impact for (Research) ICT staff, after initial setup • General purpose network “off-loaded” • From 8 UMCs to 1 UMC with 8 locations: a local national private UMC network It feels like remote clusters are available locally Fast data transfer speeds achieved
  • 20. Towards fast and easy data transfer Discussion
  • 21. Ad hoc support not scalable And many more... • ad hoc support niet efficient • Compatibiliteit problematisch • Inrichten kost teveel tijd voor onderzoeksproject Optimization • Larger packetsize (jumboframes) • Other networkprotocols (UDP) • Specialized data transfer software (GridFTP) • Access control in stead of firewall Climatology (UU) Population Imaging (LUMC) Bacterial drug resistance Discovery (TUDelft)
  • 22. Science DMZ concept • Developed in the US • Dedicated network zone voor research data en –services • Optimized for research data • Data Transfer Nodes with high throughput • Standardized solution • Compatibility 10GE 10GE 10GE 10GE 10G Border Router WAN Science DMZ Switch/Router Enterprise Border Router/Firewall Site / Campus LAN High performance Data Transfer Node with high-speed storage Per-service security policy control points Clean, High-bandwidth WAN path Site / Campus access to Science DMZ resources perfSONAR perfSONAR perfSONAR Fasterdata knowledgebase: http://fasterdata.es.net/science-dmz/
  • 23. Pilots with UMCs and UvA
  • 24. SURFinternet M S P University campus network A Storage & RDM Current situation
  • 25. SURFinternet M S P Universitair campusnetwerk A Storage & RDM Performance monitoring Science DMZ concept Science DMZ Data Transfer node Data Transfer node
  • 26. SURFinternet M S P Universitair campusnetwerk A Research Data Zone Storage & RDM Campusnetwerk B Campusnetwerk C Data Transfer Node Performance monitoring Connected Research Data Zones
  • 27. [email protected] [email protected] [email protected] Many thanks! Suggested reading: blog.surf.nl/researchlan Poster: surfdrive.surf.nl/files/index.php/s/xWHTr7rf4LaJLvF
  • 28. Discussion • Hoe kunnen we dit op een schaalbare manier uitrollen? • Minimaliseren kosten en menskracht • Wat zou de rol van SURF kunnen zijn? • Kennis en expertise? • Coordineren van standaardisatie? • Beheer van DTN’s on site? • Wat is de rol van instellingen? • Campusnetwerk? • Ondersteuning van onderzoekers?