0% found this document useful (0 votes)
145 views

Advanced Analytics with Pyspark 1st Edition Akash Tandon download

The document discusses the book 'Advanced Analytics with PySpark' by Akash Tandon and others, which focuses on using PySpark for large-scale data analytics. It emphasizes practical applications and examples to teach data science concepts, covering topics like machine learning, data cleansing, and complex analytics workflows. The book is designed for readers to learn through hands-on examples rather than as a comprehensive reference guide.

Uploaded by

joeselvincie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views

Advanced Analytics with Pyspark 1st Edition Akash Tandon download

The document discusses the book 'Advanced Analytics with PySpark' by Akash Tandon and others, which focuses on using PySpark for large-scale data analytics. It emphasizes practical applications and examples to teach data science concepts, covering topics like machine learning, data cleansing, and complex analytics workflows. The book is designed for readers to learn through hands-on examples rather than as a comprehensive reference guide.

Uploaded by

joeselvincie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Advanced Analytics with Pyspark 1st Edition

Akash Tandon download

https://ebookmeta.com/product/advanced-analytics-with-
pyspark-1st-edition-akash-tandon/

Download more ebook from https://ebookmeta.com


We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!

Distributed Machine Learning with PySpark 1st Edition


Abdelaziz Testas

https://ebookmeta.com/product/distributed-machine-learning-with-
pyspark-1st-edition-abdelaziz-testas/

Data Analysis with Python and PySpark 1st Edition


Jonathan Rioux

https://ebookmeta.com/product/data-analysis-with-python-and-
pyspark-1st-edition-jonathan-rioux/

Machine Learning with PySpark: With Natural Language


Processing and Recommender Systems 1st Edition Pramod
Singh

https://ebookmeta.com/product/machine-learning-with-pyspark-with-
natural-language-processing-and-recommender-systems-1st-edition-
pramod-singh/

1 000 Places to See in the United States Canada Before


You Die 3rd Edition Schultz

https://ebookmeta.com/product/1-000-places-to-see-in-the-united-
states-canada-before-you-die-3rd-edition-schultz/
Gender and Noun Classification Éric Mathieu (Editor)

https://ebookmeta.com/product/gender-and-noun-classification-
eric-mathieu-editor/

Stuffed Braswell Liz

https://ebookmeta.com/product/stuffed-braswell-liz/

Fight Fire with Fire Proactive Cybersecurity Strategies


for Today s Leaders 1st Edition Tarun

https://ebookmeta.com/product/fight-fire-with-fire-proactive-
cybersecurity-strategies-for-today-s-leaders-1st-edition-tarun/

Canadian Multiculturalism And The Far Right Walter J.


Bossy And The Origins Of The ‘Third Force’, 1930s–1970s
1st Edition Bàrbara Molas

https://ebookmeta.com/product/canadian-multiculturalism-and-the-
far-right-walter-j-bossy-and-the-origins-of-the-third-
force-1930s-1970s-1st-edition-barbara-molas/

The IGBT Device: Physics, Design and Applications of


the Insulated Gate Bipolar Transistor 2nd Edition B.
Jayant Baliga

https://ebookmeta.com/product/the-igbt-device-physics-design-and-
applications-of-the-insulated-gate-bipolar-transistor-2nd-
edition-b-jayant-baliga/
Many Bridges, One River: Organizing for Justice in
Vietnamese American Communities 1st Edition Thuan
Nguyen (Editor)

https://ebookmeta.com/product/many-bridges-one-river-organizing-
for-justice-in-vietnamese-american-communities-1st-edition-thuan-
nguyen-editor/
Advanced Analytics with
PySpark
Patterns for Learning from Data at Scale Using
Python and Spark

Akash Tandon, Sandy Ryza, Uri Laserson, Sean


Owen, and Josh Wills
Advanced Analytics with PySpark
by Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh
Wills
Copyright © 2022 Akash Tandon. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://oreilly.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
[email protected].

Acquisitions Editor: Jessica Haberman

Development Editor: Jeff Bleiel

Production Editor: Christopher Faucher

Copyeditor: Penelope Perkins

Proofreader: Kim Wimpsett

Indexer: Sue Klefstad

Interior Designer: David Futato

Cover Designer: Karen Montgomery

Illustrator: Kate Dullea

June 2022: First Edition


Revision History for the First Edition
2022-06-14: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781098103651 for


release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Advanced Analytics with PySpark, the cover image, and related trade
dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do
not represent the publisher’s views. While the publisher and the
authors have used good faith efforts to ensure that the information
and instructions contained in this work are accurate, the publisher
and the authors disclaim all responsibility for errors or omissions,
including without limitation responsibility for damages resulting from
the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code
samples or other technology this work contains or describes is
subject to open source licenses or the intellectual property rights of
others, it is your responsibility to ensure that your use thereof
complies with such licenses and/or rights.
978-1-098-10365-1
[LSI]
Preface

Apache Spark’s long lineage of predecessors, from MPI (message


passing interface) to MapReduce, made it possible to write programs
that take advantage of massive resources while abstracting away the
nitty-gritty details of distributed systems. As much as data
processing needs have motivated the development of these
frameworks, in a way the field of big data has become so related to
them that its scope is defined by what these frameworks can handle.
Spark’s original promise was to take this a little further—to make
writing distributed programs feel like writing regular programs.
The rise in Spark’s popularity coincided with that of the Python data
(PyData) ecosystem. So it makes sense that Spark’s Python API—
PySpark—has significantly grown in popularity over the last few
years. Although the PyData ecosystem has recently sprung up some
distributed programming options, Apache Spark remains one of the
most popular choices for working with large datasets across
industries and domains. Thanks to recent efforts to integrate
PySpark with the other PyData tools, learning the framework can
help you boost your productivity significantly as a data science
practitioner.
We think that the best way to teach data science is by example. To
that end, we have put together a book of applications, trying to
touch on the interactions between the most common algorithms,
datasets, and design patterns in large-scale analytics. This book isn’t
meant to be read cover to cover: page to a chapter that looks like
something you’re trying to accomplish, or that simply ignites your
interest, and start there.
Why Did We Write This Book Now?
Apache Spark experienced a major version upgrade in 2020—version
3.0. One of the biggest improvements was the introduction of Spark
Adaptive Execution. This feature takes away a big portion of the
complexity around tuning and optimization. We do not refer to it in
the book because it’s turned on by default in Spark 3.2 and later
versions, and so you automatically get the benefits.
The ecosystem changes, combined with Spark’s latest major release,
make this edition a timely one. Unlike previous editions of Advanced
Analytics with Spark, which chose Scala, we will use Python. We’ll
cover best practices and integrate with the wider Python data
science ecosystem when appropriate. All chapters have been
updated to use the latest PySpark API. Two new chapters have been
added and multiple chapters have undergone major rewrites. We will
not cover Spark’s streaming and graph libraries. With Spark in a new
era of maturity and stability, we hope that these changes will
preserve the book as a useful resource on analytics for years to
come.
How This Book Is Organized
Chapter 1 places Spark and PySpark within the wider context of data
science and big data analytics. After that, each chapter comprises a
self-contained analysis using PySpark. Chapter 2 introduces the
basics of data processing in PySpark and Python through a use case
in data cleansing. The next few chapters delve into the meat and
potatoes of machine learning with Spark, applying some of the most
common algorithms in canonical applications. The remaining
chapters are a bit more of a grab bag and apply Spark in slightly
more exotic applications—for example, querying Wikipedia through
latent semantic relationships in the text, analyzing genomics data,
and identifying similar images.
This book is not about PySpark’s merits and disadvantages. There
are a few other things that it is not about either. It introduces the
Spark programming model and basics of Spark’s Python API,
PySpark. However, it does not attempt to be a Spark reference or
provide a comprehensive guide to all Spark’s nooks and crannies. It
does not try to be a machine learning, statistics, or linear algebra
reference, although many of the chapters provide some background
on these before using them.
Instead, this book will help the reader get a feel for what it’s like to
use PySpark for complex analytics on large datasets by covering the
entire pipeline: not just building and evaluating models, but also
cleansing, preprocessing, and exploring data, with attention paid to
turning results into production applications. We believe that the best
way to teach this is by example.
Here are examples of some tasks that will be tackled in this book:
Predicting forest cover
We predict type of forest cover using relevant features like
location and soil type by using decision trees (see Chapter 4).
Querying Wikipedia for similar entries
We identify relationships between entries and query the
Wikipedia corpus by using NLP (natural language processing)
techniques (see Chapter 6).

Understanding utilization of New York cabs


We compute average taxi waiting time as a function of location
by performing temporal and geospatial analysis (see Chapter 7).

Reduce risk for an investment portfolio


We estimate financial risk for an investment portfolio using the
Monte Carlo simulation (see Chapter 9).

When possible, we attempt not to just provide a “solution,” but to


demonstrate the full data science workflow, with all of its iterations,
dead ends, and restarts. This book will be useful for getting more
comfortable with Python, Spark, and machine learning and data
analysis. However, these are in service of a larger goal, and we hope
that most of all this book will teach you how to approach tasks like
those described earlier. Each chapter, in about 20 measly pages, will
try to get as close as possible to demonstrating how to build one
piece of these data applications.

Conventions Used in This Book


The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file
extensions.

Constant width
Used for program listings, as well as within paragraphs to refer to
program elements such as variable or function names, databases,
data types, environment variables, statements, and keywords.

Constant width bold


Shows commands or other text that should be typed literally by
the user.

Constant width italic


Shows text that should be replaced with user-supplied values or
by values determined by context.

This element signifies a tip or suggestion.

This element signifies a general note.

This element indicates a warning or caution.

Using Code Examples


Supplemental material (code examples, exercises, etc.) is available
for download at https://github.com/sryza/aas.
If you have a technical question or a problem using the code
examples, please send email to [email protected].
This book is here to help you get your job done. In general, if
example code is offered with this book, you may use it in your
programs and documentation. You do not need to contact us for
permission unless you’re reproducing a significant portion of the
code. For example, writing a program that uses several chunks of
code from this book does not require permission. Selling or
distributing examples from O’Reilly books does require permission.
Answering a question by citing this book and quoting example code
does not require permission. Incorporating a significant amount of
example code from this book into your product’s documentation
does require permission.
We appreciate, but do not require, attribution. An attribution usually
includes the title, author, publisher, and ISBN. For example:
“Advanced Analytics with PySpark by Akash Tandon, Sandy Ryza, Uri
Laserson, Sean Owen, and Josh Wills (O’Reilly). Copyright 2022
Akash Tandon, 978-1-098-10365-1.”
If you feel your use of code examples falls outside fair use or the
permission given above, feel free to contact us at
[email protected].

O’Reilly Online Learning

For more than 40 years, O’Reilly Media has provided technology and
business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge


and expertise through books, articles, and our online learning
platform. O’Reilly’s online learning platform gives you on-demand
access to live training courses, in-depth learning paths, interactive
coding environments, and a vast collection of text and video from
O’Reilly and 200+ other publishers. For more information, visit
https://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the
publisher:

O’Reilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

800-998-9938 (in the United States or Canada)

707-829-0515 (international or local)

707-829-0104 (fax)

We have a web page for this book, where we list errata, examples,
and any additional information. You can access this page at
https://oreil.ly/adv-analytics-pyspark.
Email [email protected] to comment or ask technical
questions about this book.
For news and information about our books and courses, visit
https://oreilly.com.
Find us on LinkedIn: https://linkedin.com/company/oreilly-media
Follow us on Twitter: https://twitter.com/oreillymedia
Watch us on YouTube: https://youtube.com/oreillymedia

Acknowledgments
It goes without saying that you wouldn’t be reading this book if it
were not for the existence of Apache Spark and MLlib. We all owe
thanks to the team that has built and open sourced it and the
hundreds of contributors who have added to it.
We would like to thank everyone who spent a great deal of time
reviewing the content of the previous editions of the book with
expert eyes: Michael Bernico, Adam Breindel, Ian Buss, Parviz
Deyhim, Jeremy Freeman, Chris Fregly, Debashish Ghosh, Juliet
Hougland, Jonathan Keebler, Nisha Muktewar, Frank Nothaft, Nick
Pentreath, Kostas Sakellis, Tom White, Marcelo Vanzin, and Juliet
Hougland again. Thanks all! We owe you one. This has greatly
improved the structure and quality of the result.
Sandy also would like to thank Jordan Pinkus and Richard Wang for
helping with some of the theory behind the risk chapter.
Thanks to Jeff Bleiel and O’Reilly for the experience and great
support in getting this book published and into your hands.
Chapter 1. Analyzing Big Data

When people say that we live in an age of big data they mean that
we have tools for collecting, storing, and processing information at a
scale previously unheard of. The following tasks simply could not
have been accomplished 10 or 15 years ago:

Build a model to detect credit card fraud using thousands of


features and billions of transactions
Intelligently recommend millions of products to millions of users
Estimate financial risk through simulations of portfolios that
include millions of instruments
Easily manipulate genomic data from thousands of people to
detect genetic associations with disease
Assess agricultural land use and crop yield for improved
policymaking by periodically processing millions of satellite
images
Sitting behind these capabilities is an ecosystem of open source
software that can leverage clusters of servers to process massive
amounts of data. The introduction/release of Apache Hadoop in
2006 has led to widespread adoption of distributed computing. The
big data ecosystem and tooling have evolved at a rapid pace since
then. The past five years have also seen the introduction and
adoption of many open source machine learning (ML) and deep
learning libraries. These tools aim to leverage vast amounts of data
that we now collect and store.
But just as a chisel and a block of stone do not make a statue, there
is a gap between having access to these tools and all this data and
doing something useful with it. Often, “doing something useful”
means placing a schema over tabular data and using SQL to answer
questions like “Of the gazillion users who made it to the third page
in our registration process, how many are over 25?” The field of how
to architect data storage and organize information (data
warehouses, data lakes, etc.) to make answering such questions
easy is a rich one, but we will mostly avoid its intricacies in this
book.
Sometimes, “doing something useful” takes a little extra work. SQL
still may be core to the approach, but to work around idiosyncrasies
in the data or perform complex analysis, we need a programming
paradigm that’s more flexible and with richer functionality in areas
like machine learning and statistics. This is where data science
comes in and that’s what we are going to talk about in this book.
In this chapter, we’ll start by introducing big data as a concept and
discuss some of the challenges that arise when working with large
datasets. We will then introduce Apache Spark, an open source
framework for distributed computing, and its key components. Our
focus will be on PySpark, Spark’s Python API, and how it fits within a
wider ecosystem. This will be followed by a discussion of the
changes brought by Spark 3.0, the framework’s first major release in
four years. We will finish with a brief note about how PySpark
addresses challenges of data science and why it is a great addition
to your skillset.
Previous editions of this book used Spark’s Scala API for code
examples. We decided to use PySpark instead because of Python’s
popularity in the data science community and an increased focus by
the core Spark team to better support the language. By the end of
this chapter, you will ideally appreciate this decision.

Working with Big Data


Many of our favorite small data tools hit a wall when working with
big data. Libraries like pandas are not equipped to deal with data
that can’t fit in our RAM. Then, what should an equivalent process
look like that can leverage clusters of computers to achieve the same
outcomes on large datasets? Challenges of distributed computing
require us to rethink many of the basic assumptions that we rely on
in single-node systems. For example, because data must be
partitioned across many nodes on a cluster, algorithms that have
wide data dependencies will suffer from the fact that network
transfer rates are orders of magnitude slower than memory
accesses. As the number of machines working on a problem
increases, the probability of a failure increases. These facts require a
programming paradigm that is sensitive to the characteristics of the
underlying system: one that discourages poor choices and makes it
easy to write code that will execute in a highly parallel manner.
Discovering Diverse Content Through
Random Scribd Documents
The Project Gutenberg eBook of New York
Journal of Pharmacy, Volume 1 (of 3), 1852
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Title: New York Journal of Pharmacy, Volume 1 (of 3), 1852

Creator: College of Pharmacy of the City of New York

Editor: B. W. McCready

Release date: December 29, 2016 [eBook #53828]


Most recently updated: October 23, 2024

Language: English

Credits: Produced by Charlene Taylor, Bryan Ness, RichardW, and the


Online Distributed Proofreading Team at http://www.pgdp.net
(This book was produced from scanned images of public
domain material from the Google Books project, and from
The Internet Archive.)

*** START OF THE PROJECT GUTENBERG EBOOK NEW YORK


JOURNAL OF PHARMACY, VOLUME 1 (OF 3), 1852 ***
N EW YOR K
JOU R N AL OF
PH AR M AC Y,
Vo l u m e 1
(1852)
NEW YORK
JOURNAL OF PHARMACY,
PUBLISHED BY AUTHORITY OF
THE COLLEGE OF PHARMACY OF THE CITY OF NEW
YORK,

EDITED BY
BENJAMIN W. McCREADY, M. D.
PROFESSOR OF MATERIA MEDICA AND PHARMACY IN THE
COLLEGE OF PHARMACY,

ASSISTED BY A PUB­LISH­ING COM­MIT­TEE, CON­SIST­ING OF


JOHN H. CUR­RIE, THOM­AS B. MER­RICK, EU­GENE DU­PUY,
WM. HEGE­MAN, GEORGE D. COGGES­HALL.
VOLUME I.
NEW YORK:
JOSEPH W. HARRISON, PRINTER,
NO. 197 CENTRE, NEAR CANAL STREET.
1852.
{1}

NEW YORK
JOURNAL OF PHARMACY.
JANUARY, 1852.

TO OUR READERS.

The College of Pharmacy was founded with a view to the elevation


of the professional standing and scientific attainments of
Apothecaries, as well as to guard their material interests by raising a
barrier against ignorance and imposture. What they have
accomplished and how far they have been successful it does not
become the Board of Trustees to state; if the results have not, in all
respects, been what might be desired, it has not arisen from want of
earnest effort and honest intention on their part. As a further means
of benefiting their profession, of keeping its members acquainted
with the progress it is making at home and abroad, and of inspiring
among them a spirit of scientific inquiry, they believe that the
establishment of a Journal, devoted to the pursuits and the interests
of Apothecaries, would be of the highest utility.
By far the wealthiest and most populous city in the Union, New
York, with its environs, contains several hundred Apothecaries,
among whom are many of great experience and eminent ability; it
contains numerous Laboratories where chemicals are manufactured
on a large scale, and where the appliances and refinements of
modern science are compelled into the service of commerce; it
contains within itself all the means of scientific progress, and yet
these means lie, for the most part, waste and idle; the observations
that are made and the processes that are invented profit only the
observer and the inventor. Both they and their consequences—for
even apparently trivial observations may contain in themselves the
germ of important discoveries, and no man can tell what fruit they
may produce in the minds of others—are lost to the world.
New York is the commercial centre of the Union, the point to
which our products are brought for exportation, and from which
various goods, {2} obtained from abroad, are distributed to the
remainder of the United States. It is the chief drug mart of the
Union; the source from which the largest part of our country draws
its supplies of all medicines that are not the products of their own
immediate vicinities. It is thus connected more intimately with the
Druggists of a large portion of our country than any other city; many
visit it annually or oftener; most have business relations with it. Is
the spirit of trade incompatible with that of science? Is money-
getting to absorb all our faculties to the exclusion of anything nobler
or higher? Are we ever to remain merely the commercial metropolis
of our Union, but to permit science and art to centre in more
congenial and less busy abodes? Shall we not rather attempt to
profit by our many advantages, to use the facilities thrown in our
way by the channels of trade for the diffusion of scientific
knowledge, and in return avail ourselves of the information which
may flow into us from the interior?
But it is not alone, we hope, by the information it would impart
that a Journal such as is contemplated would be useful. A higher and
no less useful object would be that it would excite a spirit of inquiry
and emulation among the profession itself; it would encourage
observation and experiment; it would train our young men to more
exact habits of scientific inquiry. In diffusing information it would
create it, and would be doubly happy in being the means of making
discoveries it was intended to promulgate.
Such are the views which have determined the Trustees of the
College to publish a Journal of Pharmacy. It will appear on the first
day of every month, each number containing thirty-two octavo
pages. It will be devoted exclusively to the interests and pursuits of
the Druggist and Apothecary. While it is hoped that its pages will
present everything that is important relating to the scientific
progress of Pharmacy, it is intended to be mainly practical in its
character, subserving the daily wants of the Apothecary, and
presenting, as far as possible, that kind of information which can be
turned to immediate account, whether it relates to new drugs and
formulæ, or improved processes, manipulations, and apparatus.
Such are the aims and ends of the New York Journal of Pharmacy;
and the Druggists of New York are more particularly appealed to to
sustain it, not only by their subscriptions, but by contributions from
their pens. This last, indeed, is urgently pressed upon them; for,
unless it receives such aid, however successful otherwise, it will fail
in one great object for which it was originated. When special
information is wanted on any {3} particular subject, the conductors of
the Journal, if in their power, will always be happy to afford it.
It is no part of the intention of the College to derive an income
from the Journal. As soon as the state of the subscription list
warrants it, it is intended to increase its size so that each number
shall contain forty-eight instead of thirty-two pages.
REPORT OF COMMITTEE OF COLLEGE OF PHARMACY AS AMENDED.

The Committee to whom was referred the subject of the


establishment of a Journal of Pharmacy in the city of New York, have
given their attention to the subject, and beg leave to report as
follows:
1. That in their opinion it is all important that a Journal of
Pharmacy should be established in this city as soon as practicable,
for reasons well known, and therefore unnecessary here to
enumerate.
2. They recommend that the first number of a Journal of thirty-
two octavo pages be issued on the 1st day of January next, and one
number each month thereafter, to be called the New York Journal of
Pharmacy.
3. The general control of the Journal shall be vested in a
committee of five, which shall review every article intended for
publication, four of whom shall be elected annually by the Board of
Trustees at the first stated meeting succeeding the annual election
of officers; and a committee of the same number shall be now
elected, who shall act until the next annual election, to be
denominated the Publishing Committee. The President of the College
of Pharmacy shall be “ex officio” a member of this Committee, and
the whole number of this Committee shall be five, two of whom may
act.
4. That an Editor be appointed by the Publishing Committee who
shall attend to all the duties of its publication, and cause to be
prepared all articles for the Journal, and to have the entire
management of it under the control and direction of the Publishing
Committee.
5. The compensation for the services of the Editor, together with
all financial matters connected with the Journal, shall be subject to
the control of the Publishing Committee. {4}
6. The matter to be published in the Journal shall be original
communications, extracts from foreign and domestic journals, and
editorials. No matter shall be published except what may relate
directly or indirectly to the subject of Pharmacy, and the legitimate
business of Druggists and Apothecaries. No advertisements of
nostrums shall be admitted.
7. The subscription list shall be kept in the hands of the
Publishers, subject to the disposal of the Publishing Committee.
(Signed) T. B. M ERRICK ,
Chairman .
The Board then balloted for members of the Publishing
Committee, when the following were found to be elected.
M ESSRS. J NO. H. C URRIE ,
T HOS. B. M ERRICK ,
C. B. G UTHRIE ,
E UGENE D UPUY ,
with Ex Officio, G EO. D. C OGGESHALL ,
President of the College .
ON TWO VARIETIES OF FALSE JALAP.
BY JOHN H. CURRIE.

Two different roots have for some time back been brought to the
New York market, for the purpose of adulterating or counterfeiting
the various preparations of Jalap. They differ materially from the
Mechoacan and other varieties of false Jalap which formerly existed
in our markets, as described by Wood and Bache in the United
States Dispensatory, while some of the pieces bear no slight
resemblance to the true root. The specimens I have been able to
procure are so imperfect, and so altered by the process of drying,
that the botanists I have consulted are unable to give any
information even as to the order to which they belong. I have not
been able either to trace their commercial history, nor do I know
how, under the present able ad­min­is­ tra­tion of the law for the
inspection of drugs, they have obtained admission to our port. The
article or articles, since {5} there are at least two of them, come done
up in bales like those of the true Jalap, and are probably brought
from the same port, Vera Cruz.
No. 1 appears to be the rhizome or underground stem of an
exogenous perennial herb, throwing up at one end each year one or
more shoots, which after flowering die down to the ground. It comes
in pieces varying in length from two to five inches, and in thickness
from the third of an inch to three inches. In some of the pieces the
root has apparently been split or cut lengthwise; in others,
particularly in the large pieces, it has been sliced transversely like
Colombo root. The pieces are somewhat twisted or contorted,
corrugated longitudinally and externally, varying in color from a
yellowish to a dark brown. The transverse sections appear as if the
rhizome may have been broken in pieces at nodes from two to four
inches distant from each other, and at which the stem was enlarged.
Or the same appearance may have been caused by the rhizome
having been cut into sections of various length; and the resinous
juice exuding on the cut surfaces, has hindered them from
contracting to the same extent as the intervening part of the root.
On the cut or broken surfaces are seen concentric circles of woody
fibres, the intervening parenchyma being contracted and depressed.
The fresh broken surfaces of these pieces exhibit in a marked
manner the concentric layers of woody fibres. The pieces that are
cut longitudinally, on the other hand, are heavier than those just
described, though their specific gravity is still not near so great as
that of genuine Jalap. Their fracture is more uniform, of a greyish
brown color, and highly resinous.
This variety of false Jalap, when exhausted with alcohol, the
tincture thus obtained evaporated, and the residuum washed with
water, yielded from 91⁄2 to 151⁄2 per cent. of resin, the average of
ten experiments being 13 per cent. Its appearance was strikingly like
that of Jalap resin. It had a slightly sweetish mucilaginous taste,
leaving a little acridity, and the odor was faintly jalapine. It
resembled Jalap resin in being slowly soluble in concentrated
sulphuric acid, but unlike Jalap resin it was wholly soluble in ether. In
a dose of ten grains it proved feebly purgative, causing two or three
moderate liquid stools. Its operation was unattended with griping or
other unpleasant effect, except a slight feeling of nausea felt about
half an hour after the extract had been swallowed, and continuing
for some time.
This variety of false Jalap is probably used, when ground, for the
purpose of mixing with and adulterating the powder of true Jalap, or
is sold {6} for it, or for the purpose of obtaining from it its resin or
extract, which is sold as genuine resin or extract of Jalap. The
powder strikingly resembles that of true Jalap, has a faint odor of
Jalap, but is destitute, to a great extent, of its flavor. The dust, too,
arising from it, is much less irritating to the air passages.
The second variety is a tuber possibly of an orchidate plant, a
good deal resembling in shape, color and size, a butternut, (Juglans
cinerea.) Externally it is black or nearly so, in some places shining as
if varnished by some resinous exudation, but generally dull, marked
by deep longitudinal cuts extending almost to the centre of the
tubers; internally it is yellow or yellowish white, having a somewhat
horny fracture, and marked in its transverse sections with dots as if
from sparse, delicate fibres. When first imported the root is
comparatively soft, but becomes dry and brittle by keeping. Its odor
resembles that of Jalap, and its taste is nauseous, sweetish, and
mucilaginous.
This root contains no resin whatever. Treated with boiling water it
yields a large amount (75 per cent.) of extract. This is soluble, to a
great extent, likewise in alcohol. With iodine no blue color is
produced.
The extract obtained from this drug appears, in ordinary doses,
perfectly inert, five or ten grains producing, when swallowed, no
effect whatever. Is this root employed for the purpose of obtaining
its extract, and is this latter sold as genuine extract of Jalap?
Of the effect which frauds of this kind cannot fail to have on the
practice of medicine it does not fall within my province to speak, but
commercially its working is sufficiently obvious. One hundred pounds
of Jalap at the market price, 60 cents per pound, will cost $60. In
extracting this there will be employed about $5 worth of alcohol,
making in all $65. There will be obtained forty pounds of extract,
costing thus $1 621⁄2 per pound.
One hundred pounds of false Jalap, No. 1, may be obtained for
$20; admitting the alcohol to cost $5, it will make in all $25. This will
produce thirty-six pounds of extract, costing rather less than 70
cents per pound.
One hundred pounds of variety No. 2 may be had for $20, and no
alcohol is necessary in obtaining the extract. The yield being
seventy-five pounds, the extract will cost rather less than twenty-
seven cents per pound.
{7}

VIRGIN SCAMMONY,
WITH SOME RE­M ARKS UPON THE CHAR­A C­T ER­I S­T ICS OF SCAM­M O­N Y
RE­S IN.
BY B. W. BULL.

The more extended use in medicine which this substance has


acquired within a few years, and its consequent greater
consumption, render the knowledge of its peculiarities and the
modes of ascertaining its purity doubly important to the druggist and
apothecary.
An instance occurred a few weeks since, showing the necessity of
careful and thorough examination of every parcel of this drug, and
possessing some interest, from the fact that no description of any
similar attempt at falsification has, I believe, been before published.
The commercial house with which I am connected, purchased a
parcel of what purported to be virgin scammony from the importer,
who obtained it direct from Smyrna. A sample of it was examined
and found to contain seventy per cent. of resinous matter, but when
the whole lot was received, it was found to consist evidently of two
different grades of the article.
The whole of it was composed of amorphous pieces, possessing
externally a similar appearance. Upon breaking them, however, a
manifest difference was observable. Some of the pieces possessed
the resinous fracture, and the other char­ac­ter­is­tics of virgin
scammony, while the remainder, which constituted about five eighths
of the whole, exposed a dull, non-resinous surface when freshly
broken.
I selected two samples, each possessing in the highest degree the
char­ac­ter­is­tics of the two varieties, and subjected them to the action
of sulphuric ether with the following results, designating the resinous
or best No. 1, and the other specimen No. 2:―
No. 1. No. 2.
Specific gravity 1,143 1,3935
Per cent. Per cent.
Resinous matter and water 94.35 49.86
Vegetable substance insoluble
3.20 45.16
in ether
Inorganic matter 2.45 4.98
100.00 100.00
{8}

The vegetable substance in No. 2 was principally, if not entirely,


farinaceous or starchy matter, of which the other contained not a
trace. The result shows that this parcel of scammony was composed
partly of true virgin scammony mixed with that of an inferior quality;
and also indicates the necessity of examining the whole of every
parcel, and of not trusting to the favorable result of the examination
of a mere sample.
The powder in the two specimens was very similar in shade, and
they possessed in about the same degree the odor peculiar to the
substance, showing the fallacy of relying upon this as a means of
judging of the comparative goodness of different samples. This fact
may appear anomalous, but on different occasions the powder of
No. 2 was selected as having the most decided scammony odor.
Since examining the above, I have had an opportunity of
experimenting upon a portion of scammony imported from Trieste as
the true Aleppo scammony, of which there are exported from Aleppo
not more than from two hundred and fifty to three hundred pounds
annually.
The parcel consisted of a sample of one pound only, which was
obtained from a druggist of respectability in that place by one of my
partners, who was assured that the sample in question was from the
above source, and the kind above alluded to. This scammony was in
somewhat flattish pieces, covered externally with a thin coating of
chalk in which it had been rolled, the structure was uniformly
compact, the color of the fracture greenish, and it possessed in a
high degree the caseous odor.
The fracture was unusually sensitive to the action of moisture. By
merely breathing upon a freshly exposed surface, a film resembling
the bloom upon fruit was at once perceived. Its specific gravity was
1,209, which, it will be observed, approximates with unusual
accuracy to that given by Pereira as the specific gravity of true
scammony, viz. 1,210. It contained―
Resinous matter and water 89.53 per cent.
Vegetable substance insoluble in ether 7.55 per cent.
Inorganic matter 2.92 per cent.

There was no starchy matter present in the portion examined.


The mode of deciding upon the value or goodness of different
samples of scammony, by ascertaining the amount of matter soluble
in sulphuric ether, has seemed to me productive of a negative result
in showing {9} how much non-resinous matter was present, rather
than a certain method of ascertaining the actual amount of
scammony resin present; but some experiments upon the resinous
residuum lead to a more favorable conclusion.
The results of the analyses made by Johnston, who seems to be
the only chemist who has paid any attention to its ultimate
composition, show that it varies in composition materially from many
other resins.
According to his analyses, as contained in Löwig, it
C40 H33 O8
has the formula
While that of Guaiac resin is C40 H23 O10
Of Colophony C40 H30 O4
Or expressed in per cents:―
Scammony. Guaiac. Colophony.
Carbon 56.08 70.37 79.81
Hydrogen 7.93 6.60 9.77
Oxygen 35.99 23.03 10.42
100.00 100.00 100.00
The resin analysed by Johnston was obtained by evaporating the
alcoholic solution, and he describes it as opaque, pale yellow, hard,
and brittle; when obtained, however, by evaporating the ethereal
solution I have found it transparent.
It might be inferred that, with a composition so different from that
of the substances above adduced, its behavior with re-agents would
be different from theirs; and its action with strong acids confirms the
supposition, as may be seen by reference to the appended papers
from a late number of the Paris Journal of Pharmacy.
The Edinburgh Pharmacopœia has an extract of scammony among
its officinal preparations, prepared by treating scammony with proof
spirit, and evaporating the solution. It is described as of a dirty
greenish brown color. This color, however, is not a necessary
accompaniment, but is owing either to some coloring matter being
dissolved in the menstruum or to the partial oxydation of the
dissolved substance under the influence of the air and the heat of
the operation.
The ethereal solution of scammony resin, when gradually
evaporated, and without exposure to heat, leaves a colorless or
amber-colored resin, perfectly transparent and soluble in alcohol;
when heated, however, {10} during the operation, more or less
insoluble matter of a dark color is found. Sometimes the ethereal
solution, when spontaneously evaporated, leaves a dark residuum,
but a second solution and evaporation leave it as above described.
This product, obtained from several different parcels of virgin
scammony, I have considered free from admixture with any of the
substances with which scammony is said to be adulterated, and from
the similarity of their behavior, and, as the circumstances under
which the sample from Trieste above alluded to was obtained are
such as to make its genuineness very certain, feel warranted in so
doing.
Sulphuric acid does not immediately decompose it, but produces
the effect described by M. Thorel.
Nitric acid produces no discoloration, nor does hydrochloric acid
immediately.
If scammony should be adulterated with colophony, sulphuric acid
would be a very ready method of detection, though it would seem
that this substance would hardly be resorted to, unless an entirely
new mode of sophisticating the article should be adopted abroad.
The introduction of farinaceous substances and chalk is effected
while the scammony is in a soft condition, in which state it would be
difficult to incorporate colophony completely with the mass.
An admixture of resin of guaiac is also detected by the same
agent, a fact which seems to have escaped observation.
When brought in contact with sulphuric acid, resin of guaiac
immediately assumes a deep crimson hue, and this reaction is so
distinct that a proportion of not more than four or five per cent. is
readily detected.
The deep red mixture of sulphuric acid with resin of guaiac
becomes green when diluted with water, a remarkable change,
which adds to the efficacy of the test. Scammony resin, on the
contrary, suffers no alteration by dilution.
In addition, nitric acid affords a ready mode of ascertaining the
presence of resin of guaiac. It is well known that nitric acid, when
mixed with an alcoholic solution of guaiac, causes a deep green
color, which soon passes into brown, or if the solution is dilute, into
yellow.
This reaction is manifest when scammony resin is mixed with
guaiac resin in the proportion above mentioned, though the greenish
blue tinge is then very transient, and sometimes not readily
perceived.
Chloride of soda is a delicate test for the presence of guaiac resin.
{11} Added to an alcoholic solution, a beautiful green color appears,

while it produces no effect on scammony resin. This reaction is very


evident, though transient, when a very small proportion of guaiac is
present. Nitrate of silver causes a blue color in a solution of guaiac
resin, as does also sesqui-chloride of iron, neither of which agents
affects the color of a solution of scammony resin. In fact, the
evidences of the presence of guaiac are so numerous and distinct
that there can be no possibility of an undetected adulteration with
this substance.
The high price of resin of jalap would seem to be sufficient to
prevent its being resorted to as a means of sophisticating
scammony; but in case this substance should be made use of, the
method proposed for detecting it by means of ether is defective,
since, according to authorities, resin of jalap is partially soluble in
that substance.
It becomes of interest to know whether in the preparation of
scammony the juice of the plant from which it is obtained is ever
mixed with that of other plants of similar properties, or with that of
plants destitute of efficacy. This information can, of course, only be
furnished by those familiar with the localities and with the mode of
its preparation.
1
“In advancing the opinion that scammony should only be employed for
therapeutic purposes in the state of resin, I mean that this resin should only be
prepared by the apothecary himself. When, however, it is impossible for the
apothecary to do so, and the commercial article is in consequence resorted to,
there arises a liability to deception. We must then be enabled to recognise its
purity.
To avoid detection of the fraud, the admixture must either be in small quantity,
or it must possess nearly the same action. In this latter case, resin of jalap would
be employed as being less in price and nearly as active.
The method I propose for detecting an adulteration of this nature, in case it
should be attempted, is based on the one side upon the entire insolubility of resin
of jalap in rectified sulphuric ether, and on the other, upon the solubility of
scammony resin in this liquid. Nothing is easier than the detection of a mixture of

You might also like