The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
()
About this ebook
Whether you’re a mathematician, seasoned data scientist, or marketing professional, you’ll find The Shape of Data to be the perfect introduction to the critical interplay between the geometry of data structures and machine learning.
This book’s extensive collection of case studies (drawn from medicine, education, sociology, linguistics, and more) and gentle explanations of the math behind dozens of algorithms provide a comprehensive yet accessible look at how geometry shapes the algorithms that drive data analysis.
In addition to gaining a deeper understanding of how to implement geometry-based algorithms with code, you’ll explore:
- Supervised and unsupervised learning algorithms and their application to network data analysis
- The way distance metrics and dimensionality reduction impact machine learning
- How to visualize, embed, and analyze survey and text data with topology-based algorithms
- New approaches to computational solutions, including distributed computing and quantum algorithms
Related to The Shape of Data
Related ebooks
Social Media Data Mining and Analytics Rating: 0 out of 5 stars0 ratingsScala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data Rating: 0 out of 5 stars0 ratingsJava Persistence with NoSQL: Revolutionize your Java apps with NoSQL integration (English Edition) Rating: 0 out of 5 stars0 ratingsBuilding iOS 17 Apps with Xcode Storyboards: Develop iOS 17 Apps with Xcode 15 and Swift Rating: 0 out of 5 stars0 ratingsGetting Started with Greenplum for Big Data Analytics Rating: 0 out of 5 stars0 ratingsLearning Highcharts 4 Rating: 0 out of 5 stars0 ratingsEnterprise DevOps Framework: Transforming IT Operations Rating: 0 out of 5 stars0 ratingsThe Ultimate TypeScript Developer's Handbook : A Comprehensive Journey for New Developers Rating: 0 out of 5 stars0 ratingsThe Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data Rating: 4 out of 5 stars4/5A Theorem on the Golden Section and Fibonacci Numbers Rating: 0 out of 5 stars0 ratingsThe Lindahl Letter: 3 Years of AI/ML Research Notes Rating: 0 out of 5 stars0 ratingsMachine Learning: Hands-On for Developers and Technical Professionals Rating: 0 out of 5 stars0 ratingsASP.NET 3.5 Application Architecture and Design Rating: 0 out of 5 stars0 ratingsThe Wisdom Bible of 100 Stock Gods: Unlocking the Core Secret of Profiting Wildly for Individual Investors Rating: 0 out of 5 stars0 ratingsSpring Boot 3.0 Crash Course Rating: 0 out of 5 stars0 ratingsXamarin 4 By Example Rating: 0 out of 5 stars0 ratingsBeginning Mobile Application Development in the Cloud Rating: 0 out of 5 stars0 ratings.Net Framework and Programming in ASP.NET Rating: 0 out of 5 stars0 ratingsjQuery 2.0 Development Cookbook Rating: 0 out of 5 stars0 ratingsCoding for Kids: Scratch and Python Basics-Learn to Code the Fun Way! Rating: 0 out of 5 stars0 ratingsLearning Azure DocumentDB Rating: 0 out of 5 stars0 ratingsLearning Apache Thrift: Make applications cross-communicate using Apache Thrift! Rating: 0 out of 5 stars0 ratingsAlgorithmic Probability: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsTypeScript Programming In Action: Code Editing For Software Engineers Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery Rating: 0 out of 5 stars0 ratingsAlgorithm Standard Requirements Rating: 0 out of 5 stars0 ratingsSwarm Intelligence: Fundamentals and Applications Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
The Coming Wave: AI, Power, and Our Future Rating: 5 out of 5 stars5/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5Co-Intelligence: Living and Working with AI Rating: 4 out of 5 stars4/5Unlocking the Power of Agentic AI: Transforming Work and Life Rating: 5 out of 5 stars5/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi Rating: 0 out of 5 stars0 ratingsWriting AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsAI Money Machine: Unlock the Secrets to Making Money Online with AI Rating: 5 out of 5 stars5/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 4 out of 5 stars4/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5MidJourney Magnified: Crafting Visual Magic – The Novice to Pro Playbook Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/51200+ AI Prompts for Everyone.: Artificial Intelligence Prompt Library. Rating: 0 out of 5 stars0 ratingsSome Future Day: How AI Is Going to Change Everything Rating: 0 out of 5 stars0 ratingsMidjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5Models, Metaphors, and Intuition: How we think, learn and communicate Rating: 3 out of 5 stars3/5AI for Educators: AI for Educators Rating: 3 out of 5 stars3/5
Reviews for The Shape of Data
0 ratings0 reviews
Book preview
The Shape of Data - Colleen M. Farrelly
PRAISE FOR
The Shape of Data
"The title says it all. Data is bound by many complex relationships not easily shown in our two-dimensional, spreadsheet-filled world. The Shape of Data walks you through this richer view and illustrates how to put it into practice."
—Stephanie Thompson, data scientist and speaker
"The Shape of Data is a novel perspective and phenomenal achievement in the application of geometry to the field of machine learning. It is expansive in scope and contains loads of concrete examples and coding tips for practical implementations, as well as extremely lucid, concise writing to unpack the concepts. Even as a more veteran data scientist who has been in the industry for years now, having read this book I’ve come away with a deeper connection to and new understanding of my field."
—Kurt Schuepfer, PhD, McDonald’s Corporation
"The Shape of Data is a great source for the application of topology and geometry in data science. Topology and geometry advance the field of machine learning on unstructured data, and The Shape of Data does a great job introducing new readers to the subject."
—Uchenna Ike
Chukwu, senior quantum developer
The Shape of Data
Geometry-Based Machine Learning and Data Analysis in R
by Colleen M. Farrelly and Yaé Ulrich Gaba
THE SHAPE OF DATA. Copyright © 2023 by Colleen M. Farrelly and Yaé Ulrich Gaba.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
First printing
27 26 25 24 23 1 2 3 4 5
ISBN-13: 978-1-7185-0308-3 (print)
ISBN-13: 978-1-7185-0309-0 (ebook)
Publisher: William Pollock
Managing Editor: Jill Franklin
Production Manager: Sabrina Plomitallo-González
Production Editor: Sydney Cromwell
Developmental Editor: Alex Freed
Cover Illustrator: Gina Redman
Interior Design: Octopod Studios
Technical Reviewer: Franck Kalala Mutombo
Copyeditor: Kim Wimpsett
Compositor: Jeff Lytle, Happenstance Type-O-Rama
Proofreader: Scout Festa
Indexer: BIM Creatives, LLC
For information on distribution, bulk sales, corporate sales, or translations, please contact No Starch Press ® directly at [email protected] or:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900
www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Names: Farrelly, Colleen, author. | Gaba, Yaé Ulrich, author.
Title: The shape of data : network science, geometry-based machine learning, and topological data
analysis in R / by Colleen M. Farrelly and Yaé Ulrich Gaba.
Description: San Francisco, CA : No Starch Press, [2023] | Includes bibliographical references.
Identifiers: LCCN 2022059967 (print) | LCCN 2022059968 (ebook) | ISBN 9781718503083 (paperback) |
ISBN 9781718503090 (ebook)
Subjects: LCSH: Geometric programming. | Topology. | Machine learning. | System analysis--Data
processing. | R (Computer program language)
Classification: LCC T57.825 .F37 2023 (print) | LCC T57.825 (ebook) | DDC 006.3/1--dc23/
eng/20230301
LC record available at https://lccn.loc.gov/2022059967
LC ebook record available at https://lccn.loc.gov/2022059968
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is
basis, without warranty. While every precaution has been taken in the preparation of this work, neither the authors nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
To my grandmother Irene Borree, who enjoyed our discussions about new technologies into her late nineties.
—Colleen M. Farrelly
To God Almighty, Yeshua Hamashiach, the Key to all treasures of wisdom and knowledge and the Waymaker.
To my beloved wife, Owolabi. Thank you for believing.
To my parents, Prudence and Gilberte, and my siblings, Olayèmi, Boladé, and Olabissi.
To Jeff Sanders, the man I would call my academic father.
—Yaé Ulrich Gaba
About the Authors
Colleen M. Farrelly is a senior data scientist whose academic and industry research has focused on topological data analysis, quantum machine learning, geometry-based machine learning, network science, hierarchical modeling, and natural language processing. Since graduating from the University of Miami with an MS in biostatistics, Colleen has worked as a data scientist in a variety of industries, including healthcare, consumer packaged goods, biotech, nuclear engineering, marketing, and education. Colleen often speaks at tech conferences, including PyData, SAS Global, WiDS, Data Science Africa, and DataScience SALON. When not working, Colleen can be found writing haibun/haiga or swimming.
Yaé Ulrich Gaba completed his doctoral studies at the University of Cape Town (UCT, South Africa) with a specialization in topology and is currently a research associate at Quantum Leap Africa (QLA, Rwanda). His research interests are computational geometry, applied algebraic topology (topological data analysis), and geometric machine learning (graph and point-cloud representation learning). His current focus lies in geometric methods in data analysis, and his work seeks to develop effective and theoretically justified algorithms for data and shape analysis using geometric and topological ideas and methods.
About the Technical Reviewer
Franck Kalala Mutombo is a professor of mathematics at Lubumbashi University and the former academic director of AIMS Senegal. He previously worked in a research position at University of Strathclyde and at AIMS South Africa in a joint appointment with the University of Cape Town. He holds a PhD in mathematical sciences from the University of Strathclyde, Glasgow, Scotland. He is an expert in the study and analysis of complex network structure and applications. His most recent research considers the impact of network structure on long-range interactions applied to epidemics, diffusion, and object clustering. His other research interests include differential geometry of manifolds, finite element methods for partial differential equations, and data science.
Foreword
The title of Colleen M. Farrelly and Yaé Ulrich Gaba’s book, The Shape of Data, is as fitting and beautiful as the journey that the authors invite us to experience, as we discover the geometric shapes that paint the deeper meaning of our analytical data insights.
Enabling and combining common machine learning, data science, and statistical solutions, including the combinations of supervised/unsupervised or deep learning methods, by leveraging topological and geometric data analysis provides new insights into the underlying data problem. It reminds us of our responsibilities as data scientists, that with any algorithmic approach a certain data bias can greatly skew our expected results. As an example, the data scientist needs to understand the underlying data context well to avoid performing a two-dimensional Euclidean-based distance analysis when the underlying data needs to account for three-dimensional nuances, such as what a routing analysis would require when traveling the globe.
Throughout the book’s mathematical data analytics tour, we encounter the origin of data analysis on structured data and the many seemingly unstructured data scenarios that can be turned into structured data, which enables standard machine learning algorithms to perform predictive and prescriptive analytical insights. As we ride through the valleys and peaks of our data, we learn to collect features along the way that become key inputs into other data layers, forming geometrical interpretations of varying unstructured data sources including network data, images, and text-based data. In addition, Farrelly and Gaba are masterful in detailing the foundational and advanced concepts supported by the well-defined examples in both R and Python, available for download from their book’s web page.
Throughout my opportunities to collaborate with Farrelly and Gaba on several exciting projects over the past years, I always hoped for a book to emerge that would explain as clearly and eloquently as The Shape of Data does the evolution of the topological data analysis space all the way to leveraging distributed and quantum computing solutions.
During my days as a CTO at Cypher Genomics, Farrelly was leading our initiatives in genomic data analytics. She immediately inspired me with her keen understanding of how best to establish correlations between disease ontologies versus symptom ontologies, while also using simulations to understand the implications of missing links in the map. Farrelly’s pragmatic approach helped us successfully resolve critical issues by creating an algorithm that mapped across gene, symptom, and disease ontologies in order to predict disease from gene or symptom data. Her focus on topology-based network mining for diagnostics helped us define the underlying data network shape, properties, and link distributions using graph summaries and statistical testing. Our combined efforts around ontology mapping, graph-based prediction, and network mining and decomposition resulted in critical data network discoveries related to metabolomics, proteomics, gene regulatory networks, patient similarity networks, and variable correlation networks.
From our joint genomics and related life sciences analytics days to our most recent quantum computing initiatives, Farrelly and Gaba have consistently demonstrated a strong passion and unique understanding of all the related complexities and how to apply their insights to several everyday problems. Joining them on their shape of data journey will be valuable time spent as you embark on a well-scripted adventure of R and Python algorithms that solve general or niche problems in machine learning and data analysis using geometric patterns to help shape the desired results.
This book will be relevant and captivating to beginners and devoted experts alike. First-time travelers will find it easy to dive into algorithm examples designed for analyzing network data, including social and geographic networks, as well as local and global metrics, to understand network structure and the role of individuals in the network. The discussion covers clustering methods developed for use on network data, link prediction algorithms to suggest new edges in a network, and tools for understanding how, for example, processes or epidemics spread through networks.
Advanced readers will find it intriguing to dive into recently developing topics such as replacing linear algebra with nonlinear algebra in machine learning algorithms and exterior calculus to quantity needs in disaster planning. The Shape of Data has made me want to roll up my sleeves and dive into many new challenges, because I feel as well equipped as Lara Croft in Tomb Raider thanks to Farrelly’s tremendous treasure map and deeply insightful exploration work. Could there be a hidden bond or hidden layer
between them?
Michael Giske
Technology executive, global CIO of B-ON, and chairman of Inomo Technologies
Acknowledgments
I, Colleen, would like to thank my parents, John and Nancy, and my grandmother Irene for their support while I was writing this book and for encouraging me to play with mathematics when I was young.
I would also like to thank Justin Moeller for the sports and art conversations that led me into data science as a career, as well as his and Christy Moeller’s support over the long course of writing this book, and Ross Eggebeen, Mark Mayor, Matt Mayor, and Malori Mayor for their ongoing support with this project and other writing endeavors over the years.
My career in this field and this book would not have been possible without the support of Cynthia DeJong, John Pustejovsky, Kathleen Karrer, Dan Feaster, Willie Prado, Richard Schoen, and Ken Baker during my educational years, particularly my transition from the medical/social sciences to mathematics during medical school. I’m grateful for the support of Jay Wigdale and Michael Giske over the course of my career, as well as the support from many friends and colleagues, including Peter Wittek, Diana Kachan, Recinda Sherman, Natashia Lewis, Louis Fendji, Luke Robinson, Joseph Fustero, Uchenna Chukwu, Jay and Jenny Rooney, and Christine and Junwen Lin.
This book would not have been possible without our editor, Alex Freed; our managing editor, Jill Franklin; and our technical reviewer, Franck Kalala Mutombo. We both would also like to acknowledge the contributions of Bastian Rieck and Noah Giansiracusa. We are grateful for the support of No Starch Press’s marketing team, particularly Briana Blackwell in publicizing our speaking engagements.
We are also grateful to R, which provided open source packages and graphics generated with code, as well as Microsoft PowerPoint, which was used with permission to generate the additional images in this book. We would also like to thank NightCafe for providing a platform to generate images and granting full rights to creators.
No achievement in life is without the help of many known and unknown individuals. I, Yaé, would like to thank just a few who made this work possible.
To my wife, Owolabi, for your unwavering support. To Colleen Farrelly, for initiating this venture and taking me along. To Franck Kalala, my senior colleague and friend, for his excellent reviewer skills.
To my friends and colleagues: Collins A. Agyingi, David S. Attipoe, Rock S. Koffi, Evans D. Ocansey, Michael Kateregga, Mamana Mbiyavanga, Jordan F. Masakuna and Gershom Buri for the care. To Jan Groenewald and the entire AIMS-NEI family.
To my spiritual fathers and mentors, Pst Dieudonné Kantu and the entire SONRISE family, Pst Daniel Mukanya, and Pst Magloire N. Kunantu, whose leadership inpired my own.
Introduction
The first time I, Colleen, confronted my own hesitancy with math was when geometry provided a solution to an art class problem I faced: translating a flat painting onto a curved vase. Straight lines from my friend’s canvas didn’t behave the same way on the curved vase. Distances between points on the painting grew or shrank with the curvature. We’d stumbled upon the differences between the geometry we’d learned in class (where geometry behaved like the canvas painting) and the geometry of real-world objects like the vase. Real-world data often behaves more like the vase than the canvas painting. As an industry data scientist, I’ve worked with many non-data-science professionals who want to learn new data science methods but either haven’t encountered a lot of math or coding in their career path or have a lingering fear of math from prior educational experiences. Math-heavy papers without coding examples often limit the toolsets other professionals can use to solve important problems in their own fields.
Math is simply another language with which to understand the world around us; like any language, it’s possible to learn. This book is focused on geometry, but it is not a math textbook. We avoid proofs, rarely use equations, and try to simplify the math behind the algorithms as much as possible to make these tools accessible to a wider audience. If you are more mathematically advanced and want the full mathematical theory, we provide references at the end of the book.
Geometry underlies every single machine learning algorithm and problem setup, and thousands of geometry-based algorithms exist today. This book focuses on a few dozen algorithms in use now, with preference given to those with packages to implement them in R. If you want to understand how geometry relates to algorithms, how to implement geometry-based algorithms with code, or how to think about problems you encounter through the lens of geometry, keep reading.
Who Is This Book For?
Though this book is for anyone anywhere who wants a hands-on guide to network science, geometry-based aspects of machine learning, and topology-based algorithms, some background in statistics, machine learning, and a programming language (R or Python, ideally) will be helpful. This book was designed for the following:
Healthcare professionals working with small sets of patient data
Math students looking for an applied side of what they’re learning
Small-business owners who want to use their data to drive sales
Physicists or chemists interested in using topological data analysis for a research project
Curious sociologists who are wary of proof-based texts
Statisticians or data scientists looking to beef up their toolsets
Educators looking for practical examples to show their students
Engineers branching out into machine learning
We’ll be surveying many areas of science and business in our examples and will cover dozens of algorithms shaping data science today. Each chapter will focus on the intuition behind the algorithms discussed and will provide examples of how to use those algorithms to solve a problem using the R programming language. While the book is written with examples presented in R, our downloadable repository (https://nostarch.com/download/ShapeofData_PythonCode.zip) includes R and Python code for examples where Python has an analogous function to support users of both languages. Feel free to skip around to sections most relevant to your interests.
About This Book
This book starts with an introduction to geometry in machine learning. Topics relevant to geometry-based algorithms are built through a series of network science chapters that transition into metric geometry, geometry- and topology-based algorithms, and some newer implementations of these algorithms in natural language processing, distributed computing, and quantum computing. Here’s a quick overview of the chapters in this book:
Chapter 1: The Geometric Structure of Data Details how machine learning algorithms can be examined from a geometric perspective with examples from medical and image data
Chapter 2: The Geometric Structure of Networks Introduces network data metrics, structure, and types through examples of social networks
Chapter 3: Network Analysis Introduces supervised and unsupervised learning on network data, network-based clustering algorithms, comparisons of different networks, and disease spread across networks
Chapter 4: Network Filtration Moves from network data to simplicial complex data, extends network metrics to higher-dimensional interactions, and introduces hole-counting in objects like networks
Chapter 5: Geometry in Data Science Provides an overview on the curse of dimensionality, the role of distance metrics in machine learning, dimensionality reduction and data visualization, and applications to time series and probability distributions
Chapter 6: Newer Applications of Geometry in Machine Learning Details several geometry-based algorithms, including supervised learning in educational data, geometry-based disaster planning, and activity preference ranking
Chapter 7: Tools for Topological Data Analysis Focuses on topology-based unsupervised learning algorithms and their application to student data
Chapter 8: Homotopy Algorithms Introduces an algorithm related to path planning and small data analysis
Chapter 9: Final Project: Analyzing Text Data Focuses on a text dataset, a deep learning algorithm used in text embedding, and analytics of processed text data through algorithms from previous chapters
Chapter 10: Multicore and Quantum Computing Dives into distributed computing solutions and quantum algorithms, including a quantum network science example and a quantum image analytics algorithm
Downloading and Installing R
We’ll be using the R programming language in this book. R is easy to install and compatible with macOS, Linux, and Windows operating systems. You can choose the download for your system at https://cloud.r-project.org. You might be prompted to click a link for your geographic location (or a general cloud connection option). If you haven’t installed R before, you can choose the first-time installation of the base, which is the first download option on the R for Windows page.
Once you click the first-time option, you should see a screen that will give you an option to download R for Windows.
After R downloads, you’ll follow the installation instructions that your system provides as a prompt. This will vary slightly depending on the operating system. However, the installation guide will take you through the steps needed to set up R.
You may want to publish your projects or connect R with other open source projects, such as Python. RStudio provides a comfortable interface with options to connect R more easily with other platforms. You can find RStudio’s download at https://www.rstudio.com. Once you download RStudio, simply follow your operating system’s command prompts to install with the configurations that work best for your use case.
Installing R Packages
R has several options for installing new packages on your system. The command line option is probably the easiest. You’ll use the install.packages("package_name") option, where package_name is the name of the package you want to install, such as install.packages(mboost
) to install the mboost package. From there, you may be asked to choose your geographic location for the download. The package will then download (and download any package dependencies that are not already on your machine).
You can also use your graphical user interface (GUI) to install a package. This might be preferable if you want to browse available packages rather than install just one specific package to meet your needs. You can select Install package(s) from the Packages menu option after you launch R on your machine.
You’ll be prompted to select your location, and the installation will happen as it would with the command line option for package installation.
Getting Help with R
R has many useful features if you need help with a function or a package in your code. The help() function allows you to get information about a function or package that you have installed in R. Adding the package name after the function (such as help(glmboost, mboost
) for help with the generalized linear modeling boosted regression function through the mboost package) will pull up information about a package not yet installed in your machine so that you can understand what the function does before deciding to install the new package. This is helpful if you’re looking for something specific but not sure that what you’re finding online is exactly what you need. In lieu of using the help() function, you can add a question mark before the function name (such as ?glmboost).
You can also browse for vignettes demonstrating how to use functions in a package using the command browseVignettes(), which will pull up vignettes for each package you have installed in R. If you want a vignette for a specific package, you can name that package like so: browseVignettes(package=mboost
). Many packages come with a good overview of how to apply the package’s functions to an example dataset.
R has a broad user base, and internet searches or coding forums can provide additional resources for specific issues related to a package. There are also many good tutorials that overview the basic programming concepts and common functions in R. If you are less familiar with programming, you may want to go through a free tutorial on R programming or work with data in R before attempting the code in this book.
Because R is an evolving language with new packages added and removed regularly, we encourage you to keep up with developments via package websites and web searches. Packages that are discontinued can still be installed and used as legacy packages but require some caution, as they aren’t updated by the package author. We’ll see one of these in this book with an example of how to install a legacy package. Similarly, new packages are developed regularly, and you should find and use new packages in the field of geometry as they become available.
Support for Python Users
While this book presents examples in R code, our downloadable repository (https://nostarch.com/download/ShapeofData_PythonCode.zip) includes translations to Python packages and functions where possible. Most examples have a Python