Applications of HPCC Systems at Clemson University Amy Apon, PhD ● Linh Ngo, PhD ● Michael Payne Big Data Systems Laboratory Clemson University
Applications of HPCC Systems at Clemson University Clemson Strengths and Opportunities 
PhD-level faculty & research staff Talented students Significant industry collaborators 
Palmetto – Top 5 in US Academic Supercomputers ~2000 nodes, 20K cores, 600 GPUs 100Gb Internet connectivity 
Facilities 
People
Applications of HPCC Systems at Clemson University Big Data Systems Lab Overview 
Perform World Class Research on the Systems and Enabling Information Technology for Advanced Data Analytics 
Big Data Systems Lab Research Areas 
Systems and Architectures 
Tools and Operations 
Data Analytics and Applications 
Big Data Systems Lab Vision
Applications of HPCC Systems at Clemson University Effect of High Performance Computing on Academic Research Productivity 
Motivation: There is a lot of pressure on federal funding 
We propose efficiency as a measure from which to gain insights on return on investment 
We show that locally- available HPC has a positive effect on the ability of a university to do research
Motivation: Government and business need information about public sentiment. Research: We develop and apply methods to analyze large amounts of textual data to enable inquiry of social and business problems. 
Applications of HPCC Systems at Clemson University Text mining of news reports and social media for business intelligence
Shared Execution Environment 
Temporary Local Storage 
User Privileges Only 
Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers
Applications of HPCC Systems at Clemson University 
Linh Ngo, PhD 
HPCC Systems in a Shared Research Computing Environment
Shared Execution Environment 
Temporary Local Storage 
User Privileges Only 
Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers 
How to provision and configure an HPCC cluster dynamically for research purposes? 
• 
Step 1: Configure, install, and deploy HPCC as a non-root user 
• 
Step 2: Dynamically provision HPCC cluster in a shared research environment
binutils ICU XALAN APR … 
/ 
/usr 
/lib64 
/lib 
… 
/opt 
/ 
/home 
$USER 
… 
/parallel_scratch 
/local_scratch 
Applications of HPCC Systems at Clemson University Installation and Configuration of Dependencies
Administrative privileges 
Non-administrative privileges 
… 
/ 
etc 
init.d 
HPCCSystems 
opt 
HPCCSystems 
var 
lib 
HPCCSystems 
log 
HPCCSystems 
configmgr 
mydafilesrv 
mydafilesrv 
… 
/ 
home 
$USER 
hpcc 
local_scratch 
hpcc 
parallel_scratch 
lib 
HPCCSystems 
log 
$USER 
lock 
pid 
$USER 
Applications of HPCC Systems at Clemson University Resolving Non-default Installation Path Conflicts
Remove/relax root-level settings: 
i.e.: is_root 
Reduce default configuration settings for resource requirements: 
depended on resource allocation requests 
Applications of HPCC Systems at Clemson University Non-root Deployment
mydafilesrv mydafile myeclc mythor myroxie … 
PBS_NODEFILE 
environment.xml 
1 
2 
3 
4 
5 
user.palmetto.clemson.edu 
Applications of HPCC Systems at Clemson University 
Dynamic Provisioning 
Deploy to /local_scratch or /parallel_scratch?
Applications of HPCC Systems at Clemson University 
Michael Payne Using HPCC Systems to Manage Academic Data LexisNexis Summer 2014 Internship
• 
Research in Scholarly Data requires academic data from many different sources, which store data under various formats 
• 
Aggregating these sources into a useful and cohesive structure requires a data-intensive approach to preprocessing, integration and analysis 
• 
HPCC Systems is a platform to streamline this process 
Applications of HPCC Systems at Clemson University 
Using HPCC Systems to Manage Academic Data
Higher Education Institutions 
Research 
High Performance Computing Capability 
Funding Support 
Applications of HPCC Systems at Clemson University Categories of Scholarly Data
Higher Education Institutions 
Research 
High Performance Computing Capability 
Funding Support 
Detailed Award (XML) Federal Funding (tab- delimited) Expenditures (tab-delimited) Institution Patent (Excel) Degrees Conferred (Excel) 
NIH Award Data (CSV/Excel) 
Top500 Supercomputer affiliated with academic institutions (XML) 
Institutions from States with EPSCoR status (tab-delimited) 
Institutional Information with Carnegie’s Research Classifications (multi-sheet Excel) 
Enrollment (multi-sheet Excel) Financial (multi-sheet Excel) Faculty (multi-sheet Excel) 
Detailed list of articles by discipline with abstracts, references, and disambiguated authors. (XML) 
Applications of HPCC Systems at Clemson University Scholarly Data Description
Institution name/email 
Name similarity Address 
PI/Author name Institution name 
Name similarity 
Match name with WoS ‘s Organization-Enhanced name 
Acknowledgment attributes (2008 on ward, automatic) 
Applications of HPCC Systems at Clemson University Examples of Scholarly Data Links
• 
Porting data analytic processes to ECL 
• 
Applying Machine Learning techniques for article abstract classification 
Applications of HPCC Systems at Clemson University 
Ongoing Work
LexisNexis Internship Machine Learning Manager Timothy Humphrey Mentor Arjuna Chala 
Applications of HPCC Systems at Clemson University Summer 2014 Internship - Logistic Regression for Dense Matrices
• 
Prediction using continuous and discrete values 
• 
No distributional assumptions on the predictors 
• 
May not be normally distributed or linearly related 
• 
Relationship between the discrete variable and the predictor is non-linear 
Applications of HPCC Systems at Clemson University Logistic Regression
Matrices can be partitioned Schemes must be compatible There are multiple choices! 
X 
= 
4 x 4 
4 x 1 
4 x 1 
2 x 3 
X 
= 
2 x 1 
1 x 1 
2 x 1 
3 x 1 
Applications of HPCC Systems at Clemson University Parallel Block Basic Linear Algebra Subprograms (PB-BLAS)
• 
Logistic Runtimes 
• 
Hard Coded Mapping 
• 
Full Higgs Dataset 11,000,000 x 28 
0 
5 
10 
15 
20 
25 
30 
35 
Higgs 1,000 
Higgs 10,000 
Higgs 100,000 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL
• 
Logistic Runtimes 
• 
Auto Mapping 
• 
Full Elsevier Dataset 100,000 x 3,291 
0 
5 
10 
15 
20 
25 
Elsevier 100 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL
0 
50 
100 
150 
200 
250 
300 
350 
400 
450 
Elsevier 1,000 
Time in Minutes 
PB-BLAS 
Non PB-BLAS 
Applications of HPCC Systems at Clemson University Machine Learning in ECL 
• 
Logistic Runtimes 
• 
Auto Mapping 
• 
Full Elsevier Dataset 100,000 x 3,291
• 
Logistic Regression code and supporting functions have been documented and merged to ECL-ML GitHub repository 
• 
Auto block vector mapping function for any user that wants to use PB-BLAS 
• 
Ready to use element wise multiplication in PB-BLAS 
• 
Updated debugging statements that a clear understanding of errors 
• 
Test functions for both block vector mapping function 
• 
Sample code for using logistic regression 
• 
Currently working on K-means implementation that utilizes PB-BLAS 
Applications of HPCC Systems at Clemson University 
Project Summary
Linh Ngo, PhD ● Alex Herzog, PhD ● Michael Payne ● Amy Apon, PhD {lngo, aherzog, mpayne3, aapon}@clemson.edu Big Data Systems Laboratory Clemson University

More Related Content

PDF
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
PPTX
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
PDF
BioTeam Bhanu Rekepalli Presentation at BICoB 2015
PPTX
User Inspired Management of Scientific Jobs in Grids and Clouds
PPT
PPTX
Cotizacion+(1)
PDF
Implementing SPF record | Part 8#17
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
BioTeam Bhanu Rekepalli Presentation at BICoB 2015
User Inspired Management of Scientific Jobs in Grids and Clouds
Cotizacion+(1)
Implementing SPF record | Part 8#17

Viewers also liked (17)

PDF
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
DOCX
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
PDF
Affordable Homes India
DOC
Ta1 7º ano p1
PDF
Uu ite
DOCX
Trabajo conflicto armado en colombia
PDF
Perfect your pitch
PPT
2013 qld pga championship sponsorship invitation
DOCX
Top 10 lời khuyên SEO thành công vào năm 2014
PDF
Analyzing Big Data's Weakest Link (hint: it might be you)
PPTX
Builders Bridge
PPTX
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydi
DOC
Author guidelines
PDF
Webinar 2013 11-21-campanile_esri_italia
PPT
backup prosess
PDF
Game-Changing NFL Analytics with KEL
DOCX
Khóa học internet marketing, đào tạo bán hàng trực tuyến
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
Affordable Homes India
Ta1 7º ano p1
Uu ite
Trabajo conflicto armado en colombia
Perfect your pitch
2013 qld pga championship sponsorship invitation
Top 10 lời khuyên SEO thành công vào năm 2014
Analyzing Big Data's Weakest Link (hint: it might be you)
Builders Bridge
Aaaaa english trabajo 11 de maritza,tatiana ,sofia,maria,leydi
Author guidelines
Webinar 2013 11-21-campanile_esri_italia
backup prosess
Game-Changing NFL Analytics with KEL
Khóa học internet marketing, đào tạo bán hàng trực tuyến
Ad

Similar to HPCC Systems Engineering Summit - Applications of HPCC Systems at Clemson University (7)

PDF
HUG Ireland Event - HPCC Presentation Slides
PDF
Modern Computing: Cloud, Distributed, & High Performance
PPT
Cyberinfrastructure at Clemson University
PDF
Statewide It Robert Henschel
PPTX
High performance computing for research
PDF
The Education of Computational Scientists
PDF
Studies of HPCC Systems from Machine Learning Perspectives
HUG Ireland Event - HPCC Presentation Slides
Modern Computing: Cloud, Distributed, & High Performance
Cyberinfrastructure at Clemson University
Statewide It Robert Henschel
High performance computing for research
The Education of Computational Scientists
Studies of HPCC Systems from Machine Learning Perspectives
Ad

More from HPCC Systems (20)

PPTX
Natural Language to SQL Query conversion using Machine Learning Techniques on...
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
PPTX
Towards Trustable AI for Complex Systems
PPTX
Welcome
PPTX
Closing / Adjourn
PPTX
Community Website: Virtual Ribbon Cutting
PPTX
Path to 8.0
PPTX
Release Cycle Changes
PPTX
Geohashing with Uber’s H3 Geospatial Index
PPTX
Advancements in HPCC Systems Machine Learning
PPTX
Docker Support
PPTX
Expanding HPCC Systems Deep Neural Network Capabilities
PPTX
Leveraging Intra-Node Parallelization in HPCC Systems
PPTX
DataPatterns - Profiling in ECL Watch
PPTX
Leveraging the Spark-HPCC Ecosystem
PPTX
Work Unit Analysis Tool
PPTX
Community Award Ceremony
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
PPTX
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
PPTX
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Towards Trustable AI for Complex Systems
Welcome
Closing / Adjourn
Community Website: Virtual Ribbon Cutting
Path to 8.0
Release Cycle Changes
Geohashing with Uber’s H3 Geospatial Index
Advancements in HPCC Systems Machine Learning
Docker Support
Expanding HPCC Systems Deep Neural Network Capabilities
Leveraging Intra-Node Parallelization in HPCC Systems
DataPatterns - Profiling in ECL Watch
Leveraging the Spark-HPCC Ecosystem
Work Unit Analysis Tool
Community Award Ceremony
Dapper Tool - A Bundle to Make your ECL Neater
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...

Recently uploaded (20)

PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
STKI Israel Market Study 2025 version august
PPTX
Modernising the Digital Integration Hub
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Architecture types and enterprise applications.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
Configure Apache Mutual Authentication
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPT
Geologic Time for studying geology for geologist
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPT
What is a Computer? Input Devices /output devices
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Consumable AI The What, Why & How for Small Teams.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
Five Habits of High-Impact Board Members
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
4 layer Arch & Reference Arch of IoT.pdf
STKI Israel Market Study 2025 version august
Modernising the Digital Integration Hub
UiPath Agentic Automation session 1: RPA to Agents
Module 1.ppt Iot fundamentals and Architecture
Architecture types and enterprise applications.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
sustainability-14-14877-v2.pddhzftheheeeee
Flame analysis and combustion estimation using large language and vision assi...
Configure Apache Mutual Authentication
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Geologic Time for studying geology for geologist
OpenACC and Open Hackathons Monthly Highlights July 2025
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
What is a Computer? Input Devices /output devices
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Consumable AI The What, Why & How for Small Teams.pdf
search engine optimization ppt fir known well about this
Five Habits of High-Impact Board Members
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx

HPCC Systems Engineering Summit - Applications of HPCC Systems at Clemson University

  • 1. Applications of HPCC Systems at Clemson University Amy Apon, PhD ● Linh Ngo, PhD ● Michael Payne Big Data Systems Laboratory Clemson University
  • 2. Applications of HPCC Systems at Clemson University Clemson Strengths and Opportunities PhD-level faculty & research staff Talented students Significant industry collaborators Palmetto – Top 5 in US Academic Supercomputers ~2000 nodes, 20K cores, 600 GPUs 100Gb Internet connectivity Facilities People
  • 3. Applications of HPCC Systems at Clemson University Big Data Systems Lab Overview Perform World Class Research on the Systems and Enabling Information Technology for Advanced Data Analytics Big Data Systems Lab Research Areas Systems and Architectures Tools and Operations Data Analytics and Applications Big Data Systems Lab Vision
  • 4. Applications of HPCC Systems at Clemson University Effect of High Performance Computing on Academic Research Productivity Motivation: There is a lot of pressure on federal funding We propose efficiency as a measure from which to gain insights on return on investment We show that locally- available HPC has a positive effect on the ability of a university to do research
  • 5. Motivation: Government and business need information about public sentiment. Research: We develop and apply methods to analyze large amounts of textual data to enable inquiry of social and business problems. Applications of HPCC Systems at Clemson University Text mining of news reports and social media for business intelligence
  • 6. Shared Execution Environment Temporary Local Storage User Privileges Only Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers
  • 7. Applications of HPCC Systems at Clemson University Linh Ngo, PhD HPCC Systems in a Shared Research Computing Environment
  • 8. Shared Execution Environment Temporary Local Storage User Privileges Only Applications of HPCC Systems at Clemson University Shared Computing Resources among Researchers How to provision and configure an HPCC cluster dynamically for research purposes? • Step 1: Configure, install, and deploy HPCC as a non-root user • Step 2: Dynamically provision HPCC cluster in a shared research environment
  • 9. binutils ICU XALAN APR … / /usr /lib64 /lib … /opt / /home $USER … /parallel_scratch /local_scratch Applications of HPCC Systems at Clemson University Installation and Configuration of Dependencies
  • 10. Administrative privileges Non-administrative privileges … / etc init.d HPCCSystems opt HPCCSystems var lib HPCCSystems log HPCCSystems configmgr mydafilesrv mydafilesrv … / home $USER hpcc local_scratch hpcc parallel_scratch lib HPCCSystems log $USER lock pid $USER Applications of HPCC Systems at Clemson University Resolving Non-default Installation Path Conflicts
  • 11. Remove/relax root-level settings: i.e.: is_root Reduce default configuration settings for resource requirements: depended on resource allocation requests Applications of HPCC Systems at Clemson University Non-root Deployment
  • 12. mydafilesrv mydafile myeclc mythor myroxie … PBS_NODEFILE environment.xml 1 2 3 4 5 user.palmetto.clemson.edu Applications of HPCC Systems at Clemson University Dynamic Provisioning Deploy to /local_scratch or /parallel_scratch?
  • 13. Applications of HPCC Systems at Clemson University Michael Payne Using HPCC Systems to Manage Academic Data LexisNexis Summer 2014 Internship
  • 14. • Research in Scholarly Data requires academic data from many different sources, which store data under various formats • Aggregating these sources into a useful and cohesive structure requires a data-intensive approach to preprocessing, integration and analysis • HPCC Systems is a platform to streamline this process Applications of HPCC Systems at Clemson University Using HPCC Systems to Manage Academic Data
  • 15. Higher Education Institutions Research High Performance Computing Capability Funding Support Applications of HPCC Systems at Clemson University Categories of Scholarly Data
  • 16. Higher Education Institutions Research High Performance Computing Capability Funding Support Detailed Award (XML) Federal Funding (tab- delimited) Expenditures (tab-delimited) Institution Patent (Excel) Degrees Conferred (Excel) NIH Award Data (CSV/Excel) Top500 Supercomputer affiliated with academic institutions (XML) Institutions from States with EPSCoR status (tab-delimited) Institutional Information with Carnegie’s Research Classifications (multi-sheet Excel) Enrollment (multi-sheet Excel) Financial (multi-sheet Excel) Faculty (multi-sheet Excel) Detailed list of articles by discipline with abstracts, references, and disambiguated authors. (XML) Applications of HPCC Systems at Clemson University Scholarly Data Description
  • 17. Institution name/email Name similarity Address PI/Author name Institution name Name similarity Match name with WoS ‘s Organization-Enhanced name Acknowledgment attributes (2008 on ward, automatic) Applications of HPCC Systems at Clemson University Examples of Scholarly Data Links
  • 18. • Porting data analytic processes to ECL • Applying Machine Learning techniques for article abstract classification Applications of HPCC Systems at Clemson University Ongoing Work
  • 19. LexisNexis Internship Machine Learning Manager Timothy Humphrey Mentor Arjuna Chala Applications of HPCC Systems at Clemson University Summer 2014 Internship - Logistic Regression for Dense Matrices
  • 20. • Prediction using continuous and discrete values • No distributional assumptions on the predictors • May not be normally distributed or linearly related • Relationship between the discrete variable and the predictor is non-linear Applications of HPCC Systems at Clemson University Logistic Regression
  • 21. Matrices can be partitioned Schemes must be compatible There are multiple choices! X = 4 x 4 4 x 1 4 x 1 2 x 3 X = 2 x 1 1 x 1 2 x 1 3 x 1 Applications of HPCC Systems at Clemson University Parallel Block Basic Linear Algebra Subprograms (PB-BLAS)
  • 22. • Logistic Runtimes • Hard Coded Mapping • Full Higgs Dataset 11,000,000 x 28 0 5 10 15 20 25 30 35 Higgs 1,000 Higgs 10,000 Higgs 100,000 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL
  • 23. • Logistic Runtimes • Auto Mapping • Full Elsevier Dataset 100,000 x 3,291 0 5 10 15 20 25 Elsevier 100 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL
  • 24. 0 50 100 150 200 250 300 350 400 450 Elsevier 1,000 Time in Minutes PB-BLAS Non PB-BLAS Applications of HPCC Systems at Clemson University Machine Learning in ECL • Logistic Runtimes • Auto Mapping • Full Elsevier Dataset 100,000 x 3,291
  • 25. • Logistic Regression code and supporting functions have been documented and merged to ECL-ML GitHub repository • Auto block vector mapping function for any user that wants to use PB-BLAS • Ready to use element wise multiplication in PB-BLAS • Updated debugging statements that a clear understanding of errors • Test functions for both block vector mapping function • Sample code for using logistic regression • Currently working on K-means implementation that utilizes PB-BLAS Applications of HPCC Systems at Clemson University Project Summary
  • 26. Linh Ngo, PhD ● Alex Herzog, PhD ● Michael Payne ● Amy Apon, PhD {lngo, aherzog, mpayne3, aapon}@clemson.edu Big Data Systems Laboratory Clemson University