The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R

Ebook488 pages4 hours

The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R

Name: The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
Author: Colleen M. Farrelly
ISBN: 9781718503090

By Colleen M. Farrelly and Yaé Ulrich Gaba

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This advanced machine learning book highlights many algorithms from a geometric perspective and introduces tools in network science, metric geometry, and topological data analysis through practical application.

Whether you’re a mathematician, seasoned data scientist, or marketing professional, you’ll find The Shape of Data to be the perfect introduction to the critical interplay between the geometry of data structures and machine learning.

This book’s extensive collection of case studies (drawn from medicine, education, sociology, linguistics, and more) and gentle explanations of the math behind dozens of algorithms provide a comprehensive yet accessible look at how geometry shapes the algorithms that drive data analysis.

In addition to gaining a deeper understanding of how to implement geometry-based algorithms with code, you’ll explore:

Supervised and unsupervised learning algorithms and their application to network data analysis
The way distance metrics and dimensionality reduction impact machine learning
How to visualize, embed, and analyze survey and text data with topology-based algorithms
New approaches to computational solutions, including distributed computing and quantum algorithms

Skip carousel

LanguageEnglish

PublisherNo Starch Press

Release dateSep 12, 2023

ISBN9781718503090

Author

Colleen M. Farrelly

Related authors

Skip carousel

Related to The Shape of Data

Related ebooks

Skip carousel

Social Media Data Mining and Analytics
Ebook
Social Media Data Mining and Analytics
byGabor Szabo
Rating: 0 out of 5 stars
0 ratings
Scala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data
Ebook
Scala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data
byPatrick R. Nicolas
Rating: 0 out of 5 stars
0 ratings
Java Persistence with NoSQL: Revolutionize your Java apps with NoSQL integration (English Edition)
Ebook
Java Persistence with NoSQL: Revolutionize your Java apps with NoSQL integration (English Edition)
byOtávio Santana
Rating: 0 out of 5 stars
0 ratings
Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript
Ebook
Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript
byLaurence Svekis
Rating: 0 out of 5 stars
0 ratings
Building iOS 17 Apps with Xcode Storyboards: Develop iOS 17 Apps with Xcode 15 and Swift
Ebook
Building iOS 17 Apps with Xcode Storyboards: Develop iOS 17 Apps with Xcode 15 and Swift
byNeil Smyth
Rating: 0 out of 5 stars
0 ratings
Getting Started with Greenplum for Big Data Analytics
Ebook
Getting Started with Greenplum for Big Data Analytics
bySunila Gollapudi
Rating: 0 out of 5 stars
0 ratings
Learning Highcharts 4
Ebook
Learning Highcharts 4
byJoe Kuan
Rating: 0 out of 5 stars
0 ratings
Enterprise DevOps Framework: Transforming IT Operations
Ebook
Enterprise DevOps Framework: Transforming IT Operations
byShamayel M. Farooqui
Rating: 0 out of 5 stars
0 ratings
The Ultimate TypeScript Developer's Handbook : A Comprehensive Journey for New Developers
Ebook
The Ultimate TypeScript Developer's Handbook : A Comprehensive Journey for New Developers
byMadison Giroux
Rating: 0 out of 5 stars
0 ratings
The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data
Ebook
The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data
byGene Kim
Rating: 4 out of 5 stars
4/5
A Theorem on the Golden Section and Fibonacci Numbers
Ebook
A Theorem on the Golden Section and Fibonacci Numbers
byRolando Zucchini
Rating: 0 out of 5 stars
0 ratings
The Lindahl Letter: 3 Years of AI/ML Research Notes
Ebook
The Lindahl Letter: 3 Years of AI/ML Research Notes
byNels Lindahl
Rating: 0 out of 5 stars
0 ratings
Machine Learning: Hands-On for Developers and Technical Professionals
Ebook
Machine Learning: Hands-On for Developers and Technical Professionals
byJason Bell
Rating: 0 out of 5 stars
0 ratings
ASP.NET 3.5 Application Architecture and Design
Ebook
ASP.NET 3.5 Application Architecture and Design
byVivek Thakur
Rating: 0 out of 5 stars
0 ratings
The Wisdom Bible of 100 Stock Gods: Unlocking the Core Secret of Profiting Wildly for Individual Investors
Ebook
The Wisdom Bible of 100 Stock Gods: Unlocking the Core Secret of Profiting Wildly for Individual Investors
byJames Kim
Rating: 0 out of 5 stars
0 ratings
Spring Boot 3.0 Crash Course
Ebook
Spring Boot 3.0 Crash Course
byKit Harrington
Rating: 0 out of 5 stars
0 ratings
Xamarin 4 By Example
Ebook
Xamarin 4 By Example
byMatteo Bortolu
Rating: 0 out of 5 stars
0 ratings
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management
Ebook
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management
byLeonardo Luis
Rating: 0 out of 5 stars
0 ratings
Beginning Mobile Application Development in the Cloud
Ebook
Beginning Mobile Application Development in the Cloud
byRichard Rodger
Rating: 0 out of 5 stars
0 ratings
.Net Framework and Programming in ASP.NET
Ebook
.Net Framework and Programming in ASP.NET
byPriyanka Agarwal
Rating: 0 out of 5 stars
0 ratings
jQuery 2.0 Development Cookbook
Ebook
jQuery 2.0 Development Cookbook
byLeon Revill
Rating: 0 out of 5 stars
0 ratings
Coding for Kids: Scratch and Python Basics-Learn to Code the Fun Way!
Ebook
Coding for Kids: Scratch and Python Basics-Learn to Code the Fun Way!
byAmit Gupta
Rating: 0 out of 5 stars
0 ratings
Learning Azure DocumentDB
Ebook
Learning Azure DocumentDB
byBecker Riccardo
Rating: 0 out of 5 stars
0 ratings
Practical Java Programming with ChatGPT: Develop, Prototype and Validate Java Applications by integrating OpenAI API and leveraging Generative AI and LLMs
Ebook
Practical Java Programming with ChatGPT: Develop, Prototype and Validate Java Applications by integrating OpenAI API and leveraging Generative AI and LLMs
byAlan S. Bluck
Rating: 0 out of 5 stars
0 ratings
Learning Apache Thrift: Make applications cross-communicate using Apache Thrift!
Ebook
Learning Apache Thrift: Make applications cross-communicate using Apache Thrift!
byKrzysztof Rakowski
Rating: 0 out of 5 stars
0 ratings
Algorithmic Probability: Fundamentals and Applications
Ebook
Algorithmic Probability: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
TypeScript Programming In Action: Code Editing For Software Engineers
Ebook
TypeScript Programming In Action: Code Editing For Software Engineers
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Ebook
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Algorithm Standard Requirements
Ebook
Algorithm Standard Requirements
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Swarm Intelligence: Fundamentals and Applications
Ebook
Swarm Intelligence: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

The Coming Wave: AI, Power, and Our Future
Ebook
The Coming Wave: AI, Power, and Our Future
byMustafa Suleyman
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Nexus: A Brief History of Information Networks from the Stone Age to AI
Ebook
Nexus: A Brief History of Information Networks from the Stone Age to AI
byYuval Noah Harari
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
Ebook
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
byMichael Wooldridge
Rating: 4 out of 5 stars
4/5
Co-Intelligence: Living and Working with AI
Ebook
Co-Intelligence: Living and Working with AI
byEthan Mollick
Rating: 4 out of 5 stars
4/5
Unlocking the Power of Agentic AI: Transforming Work and Life
Ebook
Unlocking the Power of Agentic AI: Transforming Work and Life
byRemon Geyser
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
Ebook
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
byS M Howard
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
Ebook
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
byBen Preston
Rating: 0 out of 5 stars
0 ratings
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
Ebook
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
byDan Wardrope
Rating: 5 out of 5 stars
5/5
Writing AI Prompts For Dummies
Ebook
Writing AI Prompts For Dummies
byStephanie Diamond
Rating: 0 out of 5 stars
0 ratings
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
AI Money Machine: Unlock the Secrets to Making Money Online with AI
Ebook
AI Money Machine: Unlock the Secrets to Making Money Online with AI
byLucas Bennett
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions
Ebook
The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions
byGeoff Woods
Rating: 4 out of 5 stars
4/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from
Ebook
ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from
byBen Wong
Rating: 5 out of 5 stars
5/5
MidJourney Magnified: Crafting Visual Magic – The Novice to Pro Playbook
Ebook
MidJourney Magnified: Crafting Visual Magic – The Novice to Pro Playbook
byJ.C. Kepler
Rating: 0 out of 5 stars
0 ratings
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
1200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library.
Ebook
1200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library.
byAmaru Frank
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
Some Future Day: How AI Is Going to Change Everything
Ebook
Some Future Day: How AI Is Going to Change Everything
byMarc Beckman
Rating: 0 out of 5 stars
0 ratings
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
80 Ways to Use ChatGPT in the Classroom
Ebook
80 Ways to Use ChatGPT in the Classroom
byStan Skrabut
Rating: 5 out of 5 stars
5/5
Models, Metaphors, and Intuition: How we think, learn and communicate
Ebook
Models, Metaphors, and Intuition: How we think, learn and communicate
byMichael Ruhl Frank
Rating: 3 out of 5 stars
3/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 3 out of 5 stars
3/5

Related categories

Skip carousel

Reviews for The Shape of Data

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

The Shape of Data - Colleen M. Farrelly

PRAISE FOR

The Shape of Data

"The title says it all. Data is bound by many complex relationships not easily shown in our two-dimensional, spreadsheet-filled world. The Shape of Data walks you through this richer view and illustrates how to put it into practice."

—Stephanie Thompson, data scientist and speaker

"The Shape of Data is a novel perspective and phenomenal achievement in the application of geometry to the field of machine learning. It is expansive in scope and contains loads of concrete examples and coding tips for practical implementations, as well as extremely lucid, concise writing to unpack the concepts. Even as a more veteran data scientist who has been in the industry for years now, having read this book I’ve come away with a deeper connection to and new understanding of my field."

—Kurt Schuepfer, PhD, McDonald’s Corporation

"The Shape of Data is a great source for the application of topology and geometry in data science. Topology and geometry advance the field of machine learning on unstructured data, and The Shape of Data does a great job introducing new readers to the subject."

—Uchenna Ike Chukwu, senior quantum developer

The Shape of Data

Geometry-Based Machine Learning and Data Analysis in R

by Colleen M. Farrelly and Yaé Ulrich Gaba

All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.

First printing

27 26 25 24 23 1 2 3 4 5

ISBN-13: 978-1-7185-0308-3 (print)

ISBN-13: 978-1-7185-0309-0 (ebook)

Publisher: William Pollock

Managing Editor: Jill Franklin

Production Manager: Sabrina Plomitallo-González

Production Editor: Sydney Cromwell

Developmental Editor: Alex Freed

Cover Illustrator: Gina Redman

Interior Design: Octopod Studios

Technical Reviewer: Franck Kalala Mutombo

Copyeditor: Kim Wimpsett

Compositor: Jeff Lytle, Happenstance Type-O-Rama

Proofreader: Scout Festa

Indexer: BIM Creatives, LLC

For information on distribution, bulk sales, corporate sales, or translations, please contact No Starch Press ® directly at [email protected] or:

No Starch Press, Inc.

245 8th Street, San Francisco, CA 94103

phone: 1.415.863.9900

www.nostarch.com

Library of Congress Cataloging-in-Publication Data

Names: Farrelly, Colleen, author. | Gaba, Yaé Ulrich, author.

Title: The shape of data : network science, geometry-based machine learning, and topological data

analysis in R / by Colleen M. Farrelly and Yaé Ulrich Gaba.

Description: San Francisco, CA : No Starch Press, [2023] | Includes bibliographical references.

Identifiers: LCCN 2022059967 (print) | LCCN 2022059968 (ebook) | ISBN 9781718503083 (paperback) |

ISBN 9781718503090 (ebook)

Subjects: LCSH: Geometric programming. | Topology. | Machine learning. | System analysis--Data

processing. | R (Computer program language)

Classification: LCC T57.825 .F37 2023 (print) | LCC T57.825 (ebook) | DDC 006.3/1--dc23/

eng/20230301

LC record available at https://lccn.loc.gov/2022059967

LC ebook record available at https://lccn.loc.gov/2022059968

No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.

The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the authors nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.

To my grandmother Irene Borree, who enjoyed our discussions about new technologies into her late nineties.

—Colleen M. Farrelly

To God Almighty, Yeshua Hamashiach, the Key to all treasures of wisdom and knowledge and the Waymaker.

To my beloved wife, Owolabi. Thank you for believing.

To my parents, Prudence and Gilberte, and my siblings, Olayèmi, Boladé, and Olabissi.

To Jeff Sanders, the man I would call my academic father.

—Yaé Ulrich Gaba

About the Authors

Colleen M. Farrelly is a senior data scientist whose academic and industry research has focused on topological data analysis, quantum machine learning, geometry-based machine learning, network science, hierarchical modeling, and natural language processing. Since graduating from the University of Miami with an MS in biostatistics, Colleen has worked as a data scientist in a variety of industries, including healthcare, consumer packaged goods, biotech, nuclear engineering, marketing, and education. Colleen often speaks at tech conferences, including PyData, SAS Global, WiDS, Data Science Africa, and DataScience SALON. When not working, Colleen can be found writing haibun/haiga or swimming.

Yaé Ulrich Gaba completed his doctoral studies at the University of Cape Town (UCT, South Africa) with a specialization in topology and is currently a research associate at Quantum Leap Africa (QLA, Rwanda). His research interests are computational geometry, applied algebraic topology (topological data analysis), and geometric machine learning (graph and point-cloud representation learning). His current focus lies in geometric methods in data analysis, and his work seeks to develop effective and theoretically justified algorithms for data and shape analysis using geometric and topological ideas and methods.

About the Technical Reviewer

Franck Kalala Mutombo is a professor of mathematics at Lubumbashi University and the former academic director of AIMS Senegal. He previously worked in a research position at University of Strathclyde and at AIMS South Africa in a joint appointment with the University of Cape Town. He holds a PhD in mathematical sciences from the University of Strathclyde, Glasgow, Scotland. He is an expert in the study and analysis of complex network structure and applications. His most recent research considers the impact of network structure on long-range interactions applied to epidemics, diffusion, and object clustering. His other research interests include differential geometry of manifolds, finite element methods for partial differential equations, and data science.

Foreword

The title of Colleen M. Farrelly and Yaé Ulrich Gaba’s book, The Shape of Data, is as fitting and beautiful as the journey that the authors invite us to experience, as we discover the geometric shapes that paint the deeper meaning of our analytical data insights.

Enabling and combining common machine learning, data science, and statistical solutions, including the combinations of supervised/unsupervised or deep learning methods, by leveraging topological and geometric data analysis provides new insights into the underlying data problem. It reminds us of our responsibilities as data scientists, that with any algorithmic approach a certain data bias can greatly skew our expected results. As an example, the data scientist needs to understand the underlying data context well to avoid performing a two-dimensional Euclidean-based distance analysis when the underlying data needs to account for three-dimensional nuances, such as what a routing analysis would require when traveling the globe.

Throughout the book’s mathematical data analytics tour, we encounter the origin of data analysis on structured data and the many seemingly unstructured data scenarios that can be turned into structured data, which enables standard machine learning algorithms to perform predictive and prescriptive analytical insights. As we ride through the valleys and peaks of our data, we learn to collect features along the way that become key inputs into other data layers, forming geometrical interpretations of varying unstructured data sources including network data, images, and text-based data. In addition, Farrelly and Gaba are masterful in detailing the foundational and advanced concepts supported by the well-defined examples in both R and Python, available for download from their book’s web page.

Throughout my opportunities to collaborate with Farrelly and Gaba on several exciting projects over the past years, I always hoped for a book to emerge that would explain as clearly and eloquently as The Shape of Data does the evolution of the topological data analysis space all the way to leveraging distributed and quantum computing solutions.

During my days as a CTO at Cypher Genomics, Farrelly was leading our initiatives in genomic data analytics. She immediately inspired me with her keen understanding of how best to establish correlations between disease ontologies versus symptom ontologies, while also using simulations to understand the implications of missing links in the map. Farrelly’s pragmatic approach helped us successfully resolve critical issues by creating an algorithm that mapped across gene, symptom, and disease ontologies in order to predict disease from gene or symptom data. Her focus on topology-based network mining for diagnostics helped us define the underlying data network shape, properties, and link distributions using graph summaries and statistical testing. Our combined efforts around ontology mapping, graph-based prediction, and network mining and decomposition resulted in critical data network discoveries related to metabolomics, proteomics, gene regulatory networks, patient similarity networks, and variable correlation networks.

From our joint genomics and related life sciences analytics days to our most recent quantum computing initiatives, Farrelly and Gaba have consistently demonstrated a strong passion and unique understanding of all the related complexities and how to apply their insights to several everyday problems. Joining them on their shape of data journey will be valuable time spent as you embark on a well-scripted adventure of R and Python algorithms that solve general or niche problems in machine learning and data analysis using geometric patterns to help shape the desired results.

This book will be relevant and captivating to beginners and devoted experts alike. First-time travelers will find it easy to dive into algorithm examples designed for analyzing network data, including social and geographic networks, as well as local and global metrics, to understand network structure and the role of individuals in the network. The discussion covers clustering methods developed for use on network data, link prediction algorithms to suggest new edges in a network, and tools for understanding how, for example, processes or epidemics spread through networks.

Advanced readers will find it intriguing to dive into recently developing topics such as replacing linear algebra with nonlinear algebra in machine learning algorithms and exterior calculus to quantity needs in disaster planning. The Shape of Data has made me want to roll up my sleeves and dive into many new challenges, because I feel as well equipped as Lara Croft in Tomb Raider thanks to Farrelly’s tremendous treasure map and deeply insightful exploration work. Could there be a hidden bond or hidden layer between them?

Michael Giske

Technology executive, global CIO of B-ON, and chairman of Inomo Technologies

Acknowledgments

I, Colleen, would like to thank my parents, John and Nancy, and my grandmother Irene for their support while I was writing this book and for encouraging me to play with mathematics when I was young.

I would also like to thank Justin Moeller for the sports and art conversations that led me into data science as a career, as well as his and Christy Moeller’s support over the long course of writing this book, and Ross Eggebeen, Mark Mayor, Matt Mayor, and Malori Mayor for their ongoing support with this project and other writing endeavors over the years.

My career in this field and this book would not have been possible without the support of Cynthia DeJong, John Pustejovsky, Kathleen Karrer, Dan Feaster, Willie Prado, Richard Schoen, and Ken Baker during my educational years, particularly my transition from the medical/social sciences to mathematics during medical school. I’m grateful for the support of Jay Wigdale and Michael Giske over the course of my career, as well as the support from many friends and colleagues, including Peter Wittek, Diana Kachan, Recinda Sherman, Natashia Lewis, Louis Fendji, Luke Robinson, Joseph Fustero, Uchenna Chukwu, Jay and Jenny Rooney, and Christine and Junwen Lin.

This book would not have been possible without our editor, Alex Freed; our managing editor, Jill Franklin; and our technical reviewer, Franck Kalala Mutombo. We both would also like to acknowledge the contributions of Bastian Rieck and Noah Giansiracusa. We are grateful for the support of No Starch Press’s marketing team, particularly Briana Blackwell in publicizing our speaking engagements.

We are also grateful to R, which provided open source packages and graphics generated with code, as well as Microsoft PowerPoint, which was used with permission to generate the additional images in this book. We would also like to thank NightCafe for providing a platform to generate images and granting full rights to creators.

No achievement in life is without the help of many known and unknown individuals. I, Yaé, would like to thank just a few who made this work possible.

To my wife, Owolabi, for your unwavering support. To Colleen Farrelly, for initiating this venture and taking me along. To Franck Kalala, my senior colleague and friend, for his excellent reviewer skills.

To my friends and colleagues: Collins A. Agyingi, David S. Attipoe, Rock S. Koffi, Evans D. Ocansey, Michael Kateregga, Mamana Mbiyavanga, Jordan F. Masakuna and Gershom Buri for the care. To Jan Groenewald and the entire AIMS-NEI family.

To my spiritual fathers and mentors, Pst Dieudonné Kantu and the entire SONRISE family, Pst Daniel Mukanya, and Pst Magloire N. Kunantu, whose leadership inpired my own.

Introduction

The first time I, Colleen, confronted my own hesitancy with math was when geometry provided a solution to an art class problem I faced: translating a flat painting onto a curved vase. Straight lines from my friend’s canvas didn’t behave the same way on the curved vase. Distances between points on the painting grew or shrank with the curvature. We’d stumbled upon the differences between the geometry we’d learned in class (where geometry behaved like the canvas painting) and the geometry of real-world objects like the vase. Real-world data often behaves more like the vase than the canvas painting. As an industry data scientist, I’ve worked with many non-data-science professionals who want to learn new data science methods but either haven’t encountered a lot of math or coding in their career path or have a lingering fear of math from prior educational experiences. Math-heavy papers without coding examples often limit the toolsets other professionals can use to solve important problems in their own fields.

Math is simply another language with which to understand the world around us; like any language, it’s possible to learn. This book is focused on geometry, but it is not a math textbook. We avoid proofs, rarely use equations, and try to simplify the math behind the algorithms as much as possible to make these tools accessible to a wider audience. If you are more mathematically advanced and want the full mathematical theory, we provide references at the end of the book.

Geometry underlies every single machine learning algorithm and problem setup, and thousands of geometry-based algorithms exist today. This book focuses on a few dozen algorithms in use now, with preference given to those with packages to implement them in R. If you want to understand how geometry relates to algorithms, how to implement geometry-based algorithms with code, or how to think about problems you encounter through the lens of geometry, keep reading.

Who Is This Book For?

Though this book is for anyone anywhere who wants a hands-on guide to network science, geometry-based aspects of machine learning, and topology-based algorithms, some background in statistics, machine learning, and a programming language (R or Python, ideally) will be helpful. This book was designed for the following:

Healthcare professionals working with small sets of patient data

Math students looking for an applied side of what they’re learning

Small-business owners who want to use their data to drive sales

Physicists or chemists interested in using topological data analysis for a research project

Curious sociologists who are wary of proof-based texts

Statisticians or data scientists looking to beef up their toolsets

Educators looking for practical examples to show their students

Engineers branching out into machine learning

We’ll be surveying many areas of science and business in our examples and will cover dozens of algorithms shaping data science today. Each chapter will focus on the intuition behind the algorithms discussed and will provide examples of how to use those algorithms to solve a problem using the R programming language. While the book is written with examples presented in R, our downloadable repository (https://nostarch.com/download/ShapeofData_PythonCode.zip) includes R and Python code for examples where Python has an analogous function to support users of both languages. Feel free to skip around to sections most relevant to your interests.

About This Book

This book starts with an introduction to geometry in machine learning. Topics relevant to geometry-based algorithms are built through a series of network science chapters that transition into metric geometry, geometry- and topology-based algorithms, and some newer implementations of these algorithms in natural language processing, distributed computing, and quantum computing. Here’s a quick overview of the chapters in this book:

Chapter 1: The Geometric Structure of Data Details how machine learning algorithms can be examined from a geometric perspective with examples from medical and image data

Chapter 2: The Geometric Structure of Networks Introduces network data metrics, structure, and types through examples of social networks

Chapter 3: Network Analysis Introduces supervised and unsupervised learning on network data, network-based clustering algorithms, comparisons of different networks, and disease spread across networks

Chapter 4: Network Filtration Moves from network data to simplicial complex data, extends network metrics to higher-dimensional interactions, and introduces hole-counting in objects like networks

Chapter 5: Geometry in Data Science Provides an overview on the curse of dimensionality, the role of distance metrics in machine learning, dimensionality reduction and data visualization, and applications to time series and probability distributions

Chapter 6: Newer Applications of Geometry in Machine Learning Details several geometry-based algorithms, including supervised learning in educational data, geometry-based disaster planning, and activity preference ranking

Chapter 7: Tools for Topological Data Analysis Focuses on topology-based unsupervised learning algorithms and their application to student data

Chapter 8: Homotopy Algorithms Introduces an algorithm related to path planning and small data analysis

Chapter 9: Final Project: Analyzing Text Data Focuses on a text dataset, a deep learning algorithm used in text embedding, and analytics of processed text data through algorithms from previous chapters

Chapter 10: Multicore and Quantum Computing Dives into distributed computing solutions and quantum algorithms, including a quantum network science example and a quantum image analytics algorithm

Downloading and Installing R

We’ll be using the R programming language in this book. R is easy to install and compatible with macOS, Linux, and Windows operating systems. You can choose the download for your system at https://cloud.r-project.org. You might be prompted to click a link for your geographic location (or a general cloud connection option). If you haven’t installed R before, you can choose the first-time installation of the base, which is the first download option on the R for Windows page.

Once you click the first-time option, you should see a screen that will give you an option to download R for Windows.

After R downloads, you’ll follow the installation instructions that your system provides as a prompt. This will vary slightly depending on the operating system. However, the installation guide will take you through the steps needed to set up R.

You may want to publish your projects or connect R with other open source projects, such as Python. RStudio provides a comfortable interface with options to connect R more easily with other platforms. You can find RStudio’s download at https://www.rstudio.com. Once you download RStudio, simply follow your operating system’s command prompts to install with the configurations that work best for your use case.

Installing R Packages

R has several options for installing new packages on your system. The command line option is probably the easiest. You’ll use the install.packages("package_name") option, where package_name is the name of the package you want to install, such as install.packages(mboost) to install the mboost package. From there, you may be asked to choose your geographic location for the download. The package will then download (and download any package dependencies that are not already on your machine).

You can also use your graphical user interface (GUI) to install a package. This might be preferable if you want to browse available packages rather than install just one specific package to meet your needs. You can select Install package(s) from the Packages menu option after you launch R on your machine.

You’ll be prompted to select your location, and the installation will happen as it would with the command line option for package installation.

Getting Help with R

R has many useful features if you need help with a function or a package in your code. The help() function allows you to get information about a function or package that you have installed in R. Adding the package name after the function (such as help(glmboost, mboost) for help with the generalized linear modeling boosted regression function through the mboost package) will pull up information about a package not yet installed in your machine so that you can understand what the function does before deciding to install the new package. This is helpful if you’re looking for something specific but not sure that what you’re finding online is exactly what you need. In lieu of using the help() function, you can add a question mark before the function name (such as ?glmboost).

You can also browse for vignettes demonstrating how to use functions in a package using the command browseVignettes(), which will pull up vignettes for each package you have installed in R. If you want a vignette for a specific package, you can name that package like so: browseVignettes(package=mboost). Many packages come with a good overview of how to apply the package’s functions to an example dataset.

R has a broad user base, and internet searches or coding forums can provide additional resources for specific issues related to a package. There are also many good tutorials that overview the basic programming concepts and common functions in R. If you are less familiar with programming, you may want to go through a free tutorial on R programming or work with data in R before attempting the code in this book.

Because R is an evolving language with new packages added and removed regularly, we encourage you to keep up with developments via package websites and web searches. Packages that are discontinued can still be installed and used as legacy packages but require some caution, as they aren’t updated by the package author. We’ll see one of these in this book with an example of how to install a legacy package. Similarly, new packages are developed regularly, and you should find and use new packages in the field of geometry as they become available.

Support for Python Users

While this book presents examples in R code, our downloadable repository (https://nostarch.com/download/ShapeofData_PythonCode.zip) includes translations to Python packages and functions where possible. Most examples have a Python

Enjoying the preview?

Page 1 of 1

The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R

About this ebook

Colleen M. Farrelly

Related authors

Related to The Shape of Data

Related ebooks

Social Media Data Mining and Analytics

Scala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data

Java Persistence with NoSQL: Revolutionize your Java apps with NoSQL integration (English Edition)

Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript

Building iOS 17 Apps with Xcode Storyboards: Develop iOS 17 Apps with Xcode 15 and Swift

Getting Started with Greenplum for Big Data Analytics

Learning Highcharts 4

Enterprise DevOps Framework: Transforming IT Operations

The Ultimate TypeScript Developer's Handbook : A Comprehensive Journey for New Developers

The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data

A Theorem on the Golden Section and Fibonacci Numbers

The Lindahl Letter: 3 Years of AI/ML Research Notes

Machine Learning: Hands-On for Developers and Technical Professionals

ASP.NET 3.5 Application Architecture and Design

The Wisdom Bible of 100 Stock Gods: Unlocking the Core Secret of Profiting Wildly for Individual Investors

Spring Boot 3.0 Crash Course

Xamarin 4 By Example

Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management

Beginning Mobile Application Development in the Cloud

.Net Framework and Programming in ASP.NET

jQuery 2.0 Development Cookbook

Coding for Kids: Scratch and Python Basics-Learn to Code the Fun Way!

Learning Azure DocumentDB

Practical Java Programming with ChatGPT: Develop, Prototype and Validate Java Applications by integrating OpenAI API and leveraging Generative AI and LLMs

Learning Apache Thrift: Make applications cross-communicate using Apache Thrift!

Algorithmic Probability: Fundamentals and Applications

TypeScript Programming In Action: Code Editing For Software Engineers

Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery

Algorithm Standard Requirements

Swarm Intelligence: Fundamentals and Applications

Intelligence (AI) & Semantics For You

The Coming Wave: AI, Power, and Our Future

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

Nexus: A Brief History of Information Networks from the Stone Age to AI

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going

Co-Intelligence: Living and Working with AI

Unlocking the Power of Agentic AI: Transforming Work and Life

ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)

A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)

Artificial Intelligence: A Guide for Thinking Humans

ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve

100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi

The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd

Writing AI Prompts For Dummies

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

AI Money Machine: Unlock the Secrets to Making Money Online with AI

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION

The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions

Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures

ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from

MidJourney Magnified: Crafting Visual Magic – The Novice to Pro Playbook

The Secrets of ChatGPT Prompt Engineering for Non-Developers

1200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library.

Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)

Some Future Day: How AI Is Going to Change Everything

Midjourney Mastery - The Ultimate Handbook of Prompts

80 Ways to Use ChatGPT in the Classroom

Models, Metaphors, and Intuition: How we think, learn and communicate

AI for Educators: AI for Educators

Related categories

Reviews for The Shape of Data

What did you think?

Book preview

The Shape of Data - Colleen M. Farrelly

PRAISE FOR

The Shape of Data

Library of Congress Cataloging-in-Publication Data

About the Authors

About the Technical Reviewer

Foreword

Acknowledgments

Introduction