0% found this document useful (0 votes)
129 views115 pages

Linux Journal March 2017

Uploaded by

r0b3rtk3nt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views115 pages

Linux Journal March 2017

Uploaded by

r0b3rtk3nt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

INTEGRATE WEB APPS WITH APACHE


WATCH:
ISSUE
OVERVIEW

V
MARCH 2017 | ISSUE 275
http://www.linuxjournal.com
Since 1994: The Original Magazine of the Linux Community

BIG DATA,
Hadoop and R

AUTOMATION
TIPS
+
for SysAdmins

A LOOK AT MANIPULATING
IMAGES
Cutting the Cable Cord with ImageMagick

LJ275-March2017.indd 1 2/18/17 10:17 AM


Practical books
for the most technical
people on the planet.

GEEK GUIDES

Download books for free with a


simple one-time registration.

http://geekguide.linuxjournal.com

LJ275-March2017.indd 2 2/18/17 10:17 AM


! !
NEW Cloud-Scale NEW Why Innovative
Automation App Developers
with Puppet Love High-Speed
Author:
OSDBMS
John S. Tonello Author:
Sponsor: Ted Schmidt
Puppet Sponsor:
IBM

Tame the SUSE


Docker Life Enterprise
Cycle with Storage 4
SUSE Author:
Author: Ted Schmidt
John S. Tonello Sponsor:
Sponsor: SUSE
SUSE

BotFactory: Containers
Automating 101
the End of Author:
Cloud Sprawl Sol Lederman
Author: Sponsor: Puppet
John S. Tonello
Sponsor:
BotFactory.io

An API Public Cloud


Marketplace Scalability
Primer for Enterprise
for Mobile, Applications
Web and IoT Author:
Author: Petros Koutoupis
Ted Schmidt Sponsor:
Sponsor: SUSE
IBM

LJ275-March2017.indd 3 2/18/17 10:17 AM


CONTENTS MARCH 2017
ISSUE 275
FEATURES
80 Big Data 96 Integrating
Demonstrator: Web Applications
Using Hadoop with Apache
to Build a Linux Make your third-party web
Cluster for Log applications accessible
through an existing
Data Analysis Apache installation.
Using R Andy Carlson
Big Data analysis in less than
an hour: create a Linux cluster,
analyze data and destroy
the cluster.
Rune Torbensen
and Søren Top

Cover Image © Can Stock Photo / kentoh

4 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 4 2/18/17 10:17 AM


CONTENTS

COLUMNS
38 Reuven M. Lerner’s
At the Forge
Unsupervised Learning

48 Dave Taylor’s 28
Work the Shell
Image Manipulation
with ImageMagick

54 Kyle Rankin’s
Hack and /
Sysadmin 101: Automation

60 Shawn Powers’
The Open-Source 36
Classroom
The Post-TV Age?

110 Doc Searls’ EOF


The Problem with "Content"

IN EVERY ISSUE
8 Current_Issue.tar.gz
60
10 Letters
18 UPFRONT ON THE COVER

36 Editors’ Choice UʘÌi}À>ÌiÊ7iLÊ««ÃÊ܈̅Ê«>V…i]Ê«°Ê™È


UÊ ˆ}Ê >Ì>]Ê>`œœ«Ê>˜`Ê,]Ê«°Ênä
Cover Image © Can Stock Photo / kentoh

70 New Products UÊÊœœŽÊ>ÌÊ


ÕÌ̈˜}Ê̅iÊ
>LiÊ
œÀ`]Ê«°ÊÈä
UÊÕ̜“>̈œ˜Ê/ˆ«ÃÊvœÀÊ-ÞÃ>`“ˆ˜Ã]Ê«°Êx{

113 Advertisers Index


UÊ>˜ˆ«Õ>̈˜}Ê“>}iÃÊ܈̅Ê“>}i>}ˆVŽ]Ê«°Ê{n

LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., PO Box 980985, Houston, TX 77098 USA.
Subscription rate is $29.50/year. Subscriptions start with the next issue.

5 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 5 2/18/17 10:18 AM


Executive Editor Jill Franklin
[email protected]
Senior Editor Doc Searls
[email protected]
Associate Editor Shawn Powers
[email protected]
Art Director Garrick Antikajian
[email protected]
Products Editor James Gray
[email protected]
Editor Emeritus Don Marti
[email protected]
Technical Editor Michael Baxter
[email protected]
Senior Columnist Reuven Lerner
[email protected]
Security Editor Mick Bauer
[email protected]
Hack Editor Kyle Rankin
[email protected]
Virtual Editor Bill Childers
[email protected]

Contributing Editors
)BRAHIM (ADDAD s 2OBERT ,OVE s :ACK "ROWN s $AVE 0HILLIPS s -ARCO &IORETTI s ,UDOVIC -ARCOTTE
0AUL "ARRY s 0AUL -C+ENNEY s $AVE 4AYLOR s $IRK %LMENDORF s *USTIN 2YAN s !DAM -ONSEN

President Carlie Fairchild


[email protected]

Publisher Mark Irgang


[email protected]

Associate Publisher John Grogan


[email protected]

Director of Digital Experience Katherine Druckman


[email protected]

Accountant Candy Beauchamp


[email protected]

Linux Journal is published by, and is a registered trade name of,


Belltown Media, Inc.
0/ "OX  (OUSTON 48  53!

Editorial Advisory Panel


Nick Baronian
Kalyana Krishna Chadalavada
"RIAN #ONNER s +EIR $AVIS
-ICHAEL %AGER s 6ICTOR 'REGORIO
$AVID ! ,ANE s 3TEVE -ARQUEZ
$AVE -C!LLISTER s 4HOMAS 1UINLAN
#HRIS $ 3TARK s 0ATRICK 3WARTZ

Advertising
% -!),: [email protected]
52,: www.linuxjournal.com/advertising
0(/.%     EXT 

Subscriptions
% -!),: [email protected]
52,: www.linuxjournal.com/subscribe
-!), 0/ "OX  (OUSTON 48  53!

LINUX IS A REGISTERED TRADEMARK OF ,INUS 4ORVALDS

LJ275-March2017.indd 6 2/18/17 10:18 AM


You Manage data expansion
with SUSE Enterprise
cannot Storage.

keep up
SUSE Enterprise Storage, the leading
open source storage solution, is highly
scalable and resilient, enabling high-end

with data
functionality at a fraction of
the cost.

explosion.
suse.com/storage

Data

LJ275-March2017.indd 7 2/18/17 10:18 AM


Current_Issue.tar.gz

It’s a Bird! It’s


a Plane! Nope,
It’s My Server!
SHAWN
POWERS

Shawn Powers is the


Associate Editor for
Linux Journal. He’s

L
IKE MOST FANCY TECH TERMS h#LOUD also the Gadget Guy
for LinuxJournal.com,
Computing” has lost its newness, and it’s now
and he has an
JUST A COMMODITY WE PURCHASE )TS OFTEN SO interesting collection
much easier to provision virtual machines than it of vintage Garfield
coffee mugs. Don’t
is to buy and host your own servers. Yes, there are
let his silly hairdo
concerns over privacy and security when your data fool you, he’s a
is in the cloud. When you host in your own data pretty ordinary guy
and can be reached
CENTER HOWEVER THERES STILL THE POSSIBILITY OF
via email at
a rogue cleaning crew getting to your servers. [email protected].
(We’ve all seen the movies; it just takes a mop Or, swing by the
#linuxjournal IRC
and a blue jumper to get you into the most secure
channel on
DATA CENTER 2EGARDLESS OF YOUR STANCE ON CLOUD Freenode.net.
COMPUTING ITS HERE TO STAY 4HIS MONTH WE TALK A
bit about how to live in this bold new world.
2EUVEN - ,ERNER STARTS THINGS OFF WITH MORE
V

INFORMATION ABOUT MACHINE LEARNING 7HAT IF THE


SOFTWARE ITSELF COULD SUSS OUT THE IMPORTANT PATTERNS
AND INFORMATION FROM YOUR DATA INSTEAD OF FEEDING VIDEO:
Shawn
IT PRESORTED INFORMATION FOR SIMPLE DATA MINING Powers
runs
2EUVEN DESCRIBES hUNSUPERVISED MACHINE LEARNINGv through
THIS MONTH AND ITS EITHER AWESOME OR TERRIFYING the latest
issue.
DEPENDING ON HOW YOU FEEL ABOUT SUCH THINGS
$AVE 4AYLOR FOLLOWS WITH A LOOK AT ONE OF MY

8 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 8 2/18/17 10:18 AM


Current_Issue.tar.gz

FAVORITE COMMAND LINE APPLICATION SUITES )MAGE-AGICK ) USE A FEW OF


THE TOOLS WITH MY ONGOING BIRDCAM PROJECT $AVE EXPLORES ALL SORTS OF
WAYS TO MANIPULATE IMAGES FROM THE COMMAND LINE AND HIS ARTICLE IS
ONLY THE TIP OF THE ICEBERG
+YLE 2ANKIN DELVES INTO AUTOMATION FOR SYSADMINS THIS MONTH
4HANKS TO $EV/PS AND THE LIKE TASK AUTOMATION IS FAR MORE ROBUST
than it used to be. Kyle explains why we should automate, when we
should automate and how we should go about doing it. Although
ITS TEMPTING TO REPLACE YOUR SYSTEM ADMINISTRATOR WITH A FEW $EV/PS
TOOLS THAT HUMAN FACTOR IS INVALUABLE
) TAKE A LOOK AT ENTERTAINMENT THIS MONTH SPECIFICALLY AS IT PERTAINS
TO WATCHING TELEVISION -ANY OF YOU ARE hCORD CUTTERSv BUT FOR THE
FEW STILL HOLDING ON TO THEIR CABLE SUBSCRIPTION SUCH AS MYSELF ITS
IMPORTANT TO UNDERSTAND THE VARIOUS OPTIONS OUT THERE 4HANKFULLY
we’re closer to a complete television experience over the internet
THAN EVER BEFORE 4HIS MONTH ) LOOK AT SOME OF THE AVAILABLE OPTIONS
2UNE 4ORBENSEN AND 3’REN 4OP DESCRIBE BUILDING A ,INUX CLUSTER IN
THE CLOUD FOR THE PURPOSE OF DATA ANALYSIS WITH 2 "UILDING A HUGE
CLUSTER OF MACHINES IN YOUR OWN DATA CENTER IS IMPRACTICAL FOR MOST
people, especially when that cluster isn’t needed as a permanent
FIXTURE 2UNE AND 3’REN SHOW HOW TO TAKE ADVANTAGE OF THE CLOUD
ALONG WITH OPEN SOURCE TOOLS LIKE (ADOOP AND 2 TO ANALYZE LOGS
!NDY #ARLSON FOLLOWS THEM WITH AN ARTICLE ON WRITING CUSTOM
!PACHE CONFIGURATIONS IN ORDER TO RUN SPECIFIC APPLICATIONS ON THE
WEB 9ES ITS THE SORT OF THING YOU CAN DO ON YOUR OWN HARDWARE
but again, the cloud is a convenient way to treat computing like
A COMMODITY SO HAVING METHODS THAT ARE FLEXIBLE ARE VITAL IN OUR
MODERN )4 WORLD
Love it or hate it, the cloud isn’t going away any time soon. As
SOMEONE WHO HISTORICALLY HAS HAD SERVER RACKS FULL OF LAST GENERATION
SERVERS IN HIS BASEMENT THATS ALL ) COULD AFFORD ) CANT EXPRESS HOW
HAPPY THE CLOUD MAKES ME "EFORE YOU CAN SAFELY TAKE ADVANTAGE
OF USING SOMEONE ELSES SERVERS FOR YOUR DATA ITS IMPORTANT TO
UNDERSTAND NOT ONLY HOW THE CLOUD ITSELF WORKS BUT ALSO HOW YOUR
SOFTWARE INTEGRATES !ND UNDERSTANDING OPEN SOURCE SOFTWARE IS
what we love most here at Linux Journal Q

9 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 9 2/18/17 10:18 AM


LETTERS
LETTERS
[ ]

PREVIOUS NEXT

V
V

Current_Issue.tar.gz UpFront

Stop
2EGARDING $OC 3EARLS h$EBUGGING $EMOCRACYv IN THE *ANUARY 
issue: please stop printing childish personal insults in Linux Journal.
7HEN YOU REFER TO THE 0RESIDENT %LECT AS h)NTERNET TROLL $ONALD 4RUMPv
YOU ARE BEING PERSONALLY INSULTING CHILDISH BY PLAYING FUNNY GAMES
WITH SOMEONES NAME hHEH HEH HES GOT RUMP IN HIS NAMEv AND
PROMOTING POLITICS OF HATE AND FEAR 4HIS HAS NO PLACE IN A TECHNICAL
journal and no relevance to Linux or computing in general.

9OU WOULD NOT REFER TO THE LOSING CANDIDATE AS A hSHREWv NOR WOULD YOU
PLAY CHILDISH GAMES WITH HER NAME AS A WAY OF INSULTING HER )N FACT YOU
DID REFER TO HER BY HER CORRECT NAME WITHOUT PERSONAL INSULT

)M SORRY BUT THE ELECTION PROCESS DOES NOT NEED hDEBUGGINGv BECAUSE
YOUR FAVORITE LOST 4HIS HAS HAPPENED EVERY FOUR YEARS SINCE THE RATIFICATION
OF THE 53 #ONSTITUTIONˆSOMEONE WINS SOMEONE LOSES )TS A DIRECT AND
INEVITABLE SIDE EFFECT OF HAVING ONE 0RESIDENT INSTEAD OF TWO %VERY TIME
NEARLY HALF THE VOTERS ARE DISAPPOINTED

By picking sides and using personal insults to make political commentary


AFTER THE ELECTION IS OVER AND DONE YOU DISAPPOINT UPWARDS OF HALF OF
YOUR SUBSCRIBERS $OC 3EARLS OWES AN APOLOGY TO THE READERS IF NOT TO
-R 4RUMP HIMSELF
—Mark Kramer

Doc Searls replies: Mark is right. I do owe readers an apology. By calling

10 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 10 2/18/17 10:18 AM


LETTERS

Donald Trump a troll (take a look at the Wikipedia


definition of internet troll at https://en.wikipedia.org/
wiki/Internet_troll and draw your own conclusions),
I was being a troll as well. Even though trolling At Your Service
wasn’t my intent, that has been the effect so far: SUBSCRIPTIONS: Linux Journal is available

every response to my January column, both here and in a variety of digital formats, including PDF,
.epub, .mobi and an online digital edition,
on our website, has been as negative as Mark’s, and as well as apps for iOS and Android devices.
Renewing your subscription, changing your
for the same reasons. email address for issue delivery, paying your
invoice, viewing your account details or other
subscription inquiries can be done instantly
online: http://www.linuxjournal.com/subs.
Opening with that remark also failed to support the Email us at [email protected] or reach
main purpose of that column, which was to call for us via postal mail at Linux Journal, PO Box
980985, Houston, TX 77098 USA. Please
help in rescuing journalism—and real journals such remember to include your complete name
and address when contacting us.
as this one—from drowning in a sea of “content”,
ACCESSING THE DIGITAL ARCHIVE:
way too much of which is crap routed by algorithms Your monthly download notifications

aimed by surveillance-gathered data into echo will have links to the various formats
and to the digital archive. To access the
chambers of the like-minded. This has the effect of digital archive at any time, log in at
http://www.linuxjournal.com/digital.
increasing enmity and blame toward those in echo
LETTERS TO THE EDITOR: We welcome your
chambers with opposing sympathies, which is worse letters and encourage you to submit them
at http://www.linuxjournal.com/contact or
than dangerous in democratic societies, because it mail them to Linux Journal, PO Box 980985,

tears apart the center spaces of basic agreement Houston, TX 77098 USA. Letters may be
edited for space and clarity.
those societies require. You can see how this looks WRITING FOR US: We always are looking
in 4HE 7ALL 3TREET *OURNAL’s Blue Feed, Red Feed for contributed articles, tutorials and
real-world stories for the magazine.
site (HTTPGRAPHICSWSJCOMBLUE FEED RED FEED), An author’s guide, a list of topics and
due dates can be found online:
subtitled “See Liberal Facebook and Conservative http://www.linuxjournal.com/author.

Facebook, Side by Side”. FREE e-NEWSLETTERS: Linux Journal


editors publish newsletters on both
a weekly and monthly basis. Receive
I am sure most of the systems driving us into hostile late-breaking news, technical tips and
tricks, an inside look at upcoming issues
camps are built on Linux. (Isn’t everything now?) and links to in-depth stories featured on
http://www.linuxjournal.com. Subscribe
So I don’t think I’m off base calling for help here. for free today: http://www.linuxjournal.com/
enewsletters.

Shotcut Video Editor ADVERTISING: Linux Journal is a great


resource for readers and advertisers alike.
) JUST READ 3HAWN 0OWERS !UGUST  Linux Request a media kit, view our current
editorial calendar and advertising due dates,
Journal article about his traveling gear (yes, I or learn more about other advertising
and marketing opportunities by visiting
AM THAT FAR BEHIND AND BESIDES BEING SHOCKED us on-line: http://www.linuxjournal.com/
that he uses a MacBook Air (just kidding, Apple advertising. Contact us directly for further
information: [email protected] or
hardware is good enough), I was also shocked +1 713-344-1956 ext. 2.

11 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 11 2/18/17 10:18 AM


LETTERS

THAT HE USES &INAL #UT 0RO FOR HIS VIDEO EDITING

&OR THE EDITING NEEDS HE DESCRIBED THERE IS NO EXCUSE FOR NOT USING THE
excellent Shotcut video editor (https://shotcut.org  "ESIDES THE FACT THAT
IT HAS MORE THAN ENOUGH FEATURES ITS OPEN SOURCE AND AVAILABLE FOR
Linux, Windows and Mac OS.

) AM JUST WONDERING WHAT KIND OF EXCUSE HE WILL GIVE FOR NOT ADOPTING
3HOTCUT FOR HIS VIDEO EDITING NEEDS ) AM NOT PRESSURING HIM WITH
this, just playing and using this opportunity to let him know about
Shotcut.

I’m not connected with the project; I just use, advocate and provide
SOME HELP WITH MY SMALL KNOWLEDGE IN THE 3HOTCUT FORUM 3EE ALSO THE
VIDEOS ON 9OU4UBE FOR SOME HELP GETTING STARTED
—Luis Sismeiro

Shawn Powers replies: It’s easy to feel a bit defensive with questions
like, “what’s your excuse?”, but tone is often easy to misinterpret in
email, so I’m going to assume this was a friendly message. I don’t think
I need an excuse, because I don’t think I’ve done anything wrong. But
I’ll answer the question of “why”, because that’s a fair one.

It’s possible that some of this was answered in my article, or in issues


past, but really quickly: I have several jobs, and those jobs require me to
have various computer systems. On a daily basis, I truly use Windows,
OS X and Linux. I love open-source software, but I’m not a zealot. I use
whatever tool I can use to get the job done. For Linux Journal, it makes
sense to use Linux software for video editing. However, all the times I’ve
attempted to do that through the years, it’s been inefficient at best and
impossible at the worst.

I’ve never tried Shotcut, but now that you’ve brought it to my attention,
I’ll be giving it a try. Heck, perhaps I’ll write about it. The thing that’s
important to know though is that if it is a program that crashes or doesn’t
work well for me, I likely won’t use it.

12 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 12 2/18/17 10:18 AM


LETTERS

So after all that, thank you for bringing the project to my attention. I’ll
definitely check it out!

Fake News
) READ -R 3EARLS TANTRUM IN THE *ANUARY  ISSUE WITH GREAT
AMUSEMENT )T SEEMS HE IS NOT A FAN OF $ONALD 4RUMP BUT RATHER THAN
CALLING FOR THE DEATH OF THE %LECTORAL #OLLEGE THE OTHER FAD du jour), he
SAYS hWE NEED TO HACK THE NEWS BACK IN A LOGICAL DIRECTION AND AWAY
FROM THE FACT FREE MISLEADING AND EMOTION STIRRING WAYS THAT NEWS IS
MADE TODAYv )N OTHER WORDS h$OCv IS CALLING FOR GLOBAL hFACT CHECKINGv
ON THE INTERNETˆAKA A GLOBALIZED 7IKIPEDIA !ND WHO PRAY TELL WILL
BE TRUSTED WITH THAT CURATION PROCESS 7E DONT NEED TO LOOK FAR FOR AN
ANSWER BECAUSE OTHERS HAVE SUGGESTED SIMILAR THINGS BEFORE WITH REGARD
to broadcast news: https://en.wikipedia.org/wiki/Fairness_Doctrine.

3O THE CREATIVE CHAOS OF THE "AZAAR IS GOOD FOR ,INUX BUT BAD IN THE ARENA OF
POLITICS AND NEWS )NTERESTING HOW WHEN LIBERALS LOSE ELECTIONS THEIR IMMEDIATE
INSTINCTS ARE TO CHANGE THE #ONSTITUTIONAL PROCESS AND CALL FOR CLAMPING DOWN
ON RADIO 46 AND NOW THE INTERNET 4HE &IRST !MENDMENT APPLIES ONLY TO
WHOEVER HAS THE CORRECT VIEWS 7OULDNT IT BE BETTER IF WE HAD AN EDUCATED
ELECTORATE WHO COULD SMELL TRUTH FROM FICTION ON THEIR OWN "UT HOW OFTEN IS
CRITICAL THINKING TAUGHT TODAY ,ET US TAKE THIS AS A TEACHABLE MOMENT

-R 3EARLS GOES ON TO SHOW SOME PRETTY GRAPHS AND POINTS OUT THAT h4HIS
KIND OF STUDY DOES NOT SHOW A MANDATEv 4RUE THE ELECTION OF 4RUMP
ALONE DOES NOT SHOW A MANDATE BUT WHAT DOES THIS TABLE SHOW

2008(DEM/REP) 2016(DEM/REP)

State Governors 28/22 18/31 from Wikipedia

https://ballotpedia.org/
State Senates 28/20 13/37 0ARTISAN?COMPOSITION?OF?STATE?SENATES

https://ballotpedia.org/
State House 32/16 18/31 0ARTISAN?COMPOSITION?OF?STATE?HOUSES

US Senate 49/49 44/54 from Wikipedia

https://ballotpedia.org/
US House 233/198 194/241 5NITED?3TATES?#ONGRESS?ELECTIONS ?

13 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 13 2/18/17 10:18 AM


LETTERS

)TS ALMOST AS IF A VAST RIGHT WING CONSPIRACY INFILTRATED EVERY STATELOCAL


NATIONAL NEWS OUTLET THEN HIRED AN ARMY OF COVERT %AST BLOC OPERATIVES TO
CREATE hFAKE NEWSv ON SOCIAL MEDIA ALL DESIGNED TO SWAY THE ELECTION UP
and down the ballot completely inverting the majority control at state and
FEDERAL LEVELS SINCE "ARACK /BAMA HAS BEEN ELECTED

Hillary would no doubt agree.


—Steve Langer

Doc Searls replies: I meant “hack” in the broad sense it has been used
here since Linux Journal began in 1994. If you want a specific definition
(or a set of them), consult the Jargon File: http://www.catb.org/jargon/html.

The Trump vs. Hillary contest was maximally interesting at the time I wrote
the column, but it was also beside the main point I tried to make: there
are dangerously dysfunctional ways our democracy now informs itself in
the networked world. “Fake news” is currently the most obvious example,
although I believe the real problems are deeper and more systematic than
that. However one looks at it, some fixing is required.

It’s about the how of democracy, not about the who.

Mars Lander Program


2EGARDING $AVE 4AYLORS -ARS ,ANDER PROGRAM IN THE 3EPTEMBER /CTOBER
AND .OVEMBER  ISSUES ITS A CURIOUS GAME IN A FEW LINES OF SHELL SCRIPT
) APPRECIATED THE FINAL VERSION "UT bc IS NOT SO FRIENDLY OF A TOOL AND IT
PUTS SOME UNFAIR MISTAKES IN THE GAME (ERES A SOLUTION FOR bc uses:

speed=$(  $bc  <<<  "scale=3;;  $speed  +  $gravity  +  $thrust"  |  sed  -­e  


's/^\([-­]*\)\./\10./')  
...  
altitude=$(  $bc  <<<  "scale=3;;  $altitude  +  $speed"  |  sed  -­e  
's/^\([-­]*\)\./\10./')

4HANK YOU GOOD WORK


—José Nicolau

14 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 14 2/18/17 10:18 AM


LETTERS

Dave Taylor replies: Interesting tweak to the script. I’ll have to send it
over to my NASA friends to QA!

kpcli—KeePass Command-Line Interface


2ELATED TO DERHANS *ANUARY  ARTICLE ON PASSWORD MANAGERS h/NLINE
0RIVACY AND 3ECURITY 5SING A 0ASSWORD -ANAGERv ) THOUGHT THAT LETTING YOUR
AUDIENCE KNOW ABOUT KPCLI MIGHT BE USEFUL HTTPKPCLISOURCEFORGENET.
—Lester Hightower

Shawn Powers’ Synology Review


2EGARDING 3HAWN 0OWERS h-Y ,OVE !FFAIR WITH 3YNOLOGYv IN THE *ANUARY
 ISSUE MANY APPLICATIONS ARE DIFFICULT TO INSTALL DIRECTLY ON A 3YNOLOGY
BECAUSE 3YNOLOGY DOES NOT USE A STANDARD REPOSITORY FORMAT LIKE 20- OR
!04 BUT IT DOES SUPPORT $OCKER -Y TESTS WITH $OCKER HAVE BEEN GREAT SO
FAR ) CAN BUILD AND TEST $OCKER IMAGES ON MY $EBIAN SERVER AND THEN COPY
THE FINISHED IMAGES TO THE 3YNOLOGY FOR DEPLOYMENT

In my case, I am trying to get PostGIS/Geoserver/Geonode going. I upgraded


FROM  TO '" OF 2!- TO SUPPORT IT ) NEED TO SUPPORT A VERY SMALL TEAM
OF USERS SHARING GEOSPATIAL DATA SO WE DONT NEED A LOT OF COMPUTE
HORSEPOWER JUST A SHARED DATA STORE %ND USERS WILL PROBABLY BE USING 1')3
AS A CLIENT SO MOST OF THE COMPUTATION WILL TAKE PLACE ON THEIR LAPTOPS

"UT THE USE CASE THAT ) APPLIED TO JUSTIFY BUYING THE 3YNOLOGY WAS
h3YNOLOGY #LOUD 3TATIONv IT MIGHT HAVE COME OUT AFTER YOU WROTE
YOUR REVIEW  )T IS GREAT FILES ON THE SERVER ARE NOT KEPT LOCKED UP IN A
SPECIAL CONTAINER LIKE OWN#LOUD SO YOU CAN SEAMLESSLY DROP FILES INTO
AN ORDINARY FOLDER ON THE 3YNOLOGY AND HAVE THEM REPLICATE OUT TO YOUR
Cloud Station clients.

4HIS MEANS ,!. USERS CAN SEE FILES DIRECTLY VIA 3AMBA !PPLE4ALK OR .&3
and not have to copy them to their own hard drives, but I (working
at home) also get access via synchronization. Working remotely, I
CAN EXPORT A LARGE FORMAT MAP TO A 0$& FILE AND IT WILL BE UPLOADED
AUTOMATICALLY VIA #LOUD 3TATION SYNC 4HEN MY TEAM CAN VIEW THE
MAP FROM THE 3YNOLOGY VIA ,!. FILE SHARE OR WEB SERVER &ILE 3TATION

15 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 15 2/18/17 10:18 AM


LETTERS

without syncing a copy to their own laptops.

I love ownCloud by the way, and it will run on a Synology (I tried it out), and
) USE IT ON MY OWN $EBIAN SERVER BUT #LOUD 3TATION FITS OUR USE CASES BETTER
—Brian Wilson

Shawn Powers replies: I’m fairly certain Cloud Station has been there
for a while, but I have to admit I haven’t tried it. (That will change!) My
concerns with running things on the Synology directly are all horsepower-
related. I love it for things like reverse proxying, web hosting and torrent
management. My Plex Media Server, however, I put on a separate box
because I fear the Synology wouldn’t be able to manage the transcoding.
I also share your frustration with the packages provided by Synology, but
thankfully, there are some community-maintained programs that can be
downloaded and installed.

The Geo stuff you’re doing sounds cool, by the way, and it sounds like
a perfect use case since the CPU demands aren’t too high. And thanks
again for the tip about Cloud Station; I’ll have to give it a try!

Curl Example in “Automated Slack Notifications”


Doesn’t Work
)N THE h!UTOMATIC 3LACK .OTIFICATIONSv PIECE IN THE $ECEMBER  ISSUES
5P&RONT SECTION THE curl command to send a message to Slack doesn’t
WORK AT LEAST ON MY -AC"OOK  )T GIVES THE FOLLOWING ERROR

curl:  (3)  [globbing]  unmatched  close  brace/bracket  in  column  8

)N FACT THE DATA BLOCK SHOULD BE ENCLOSED IN SINGLE QUOTES NOT DOUBLE QUOTES
—Khoa Le

Shawn Powers replies: Double quotes and single quotes are often
the bane of my programming. Half the time I get errors like you
mention here, and the other half of the time I end up with output
that looks like, “Thank you $NAME, your contribution to $THING
was greatly appreciated!”

16 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 16 2/18/17 10:18 AM


LETTERS

In this case, the script worked for me on Linux, and once it worked,
I didn’t try it elsewhere. I could have, as I use OS X as well, but
sadly, I didn’t. Thanks for pointing out the issue. Hopefully everyone
struggling will see this letter!

WRITE LJ A LETTER
We love hearing from our PHOTOS
Send your Linux-related photos to
readers. Please send us your
[email protected], and
comments and feedback via we'll publish the best ones here.
http://www.linuxjournal.com/contact.

RETURN TO CONTENTS

LINUX JOURNAL
on your Android device

Download the app


now from the
Google Play Store.

www.linuxjournal.com/android

For more information about advertising opportunities within Linux Journal iPhone, iPad and
Android apps, contact John Grogan at +1-713-344-1956 x2 or [email protected].

17 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 17 2/18/17 10:18 AM


UPFRONT

UPFRONT NEWS + FUN

PREVIOUS NEXT

V
V

Letters Editors’ Choice

diff -u
What’s New in
Kernel Development
Filesystem capabilities are supposed to be an improvement over simply
running something as the root user. The idea is that you identify the
specific special powers a program needs and then give it the ability to do
only those special powers. Unfortunately, capabilities have become very
complicated, with some individual capabilities being used to grant so
many special powers that they might as well just be the root user after all.
In particular, kernel developers who create new powers don’t always
know of which capability that power should be a part, so any given
capability can end up providing either too much or too little power to
the program.
Michael Kerrisk recently began an effort to document some basic
guidelines to help developers figure out which capability would best house
any particular new power. For example, “Don’t choose CAP_SYS_ADMIN
if you can possibly avoid it!” Apparently CAP_SYS_ADMIN has become
a huge dumping ground for powers of all sorts, falling prey to the
might-as-well-be-root syndrome.
Unfortunately, Casey Schaufler pointed out some POSIX history that
led to poor decisions being made early on, regarding how to organize
filesystem capabilities. For example:

18 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 18 2/20/17 12:13 PM


UPFRONT

%VERYONE INVOLVED WAS LOOKING TO USE CAPABILITIES TO MEET " LEAST


PRIVILEGE REQUIREMENTS IN .3! SECURITY EVALUATIONS "ECAUSE THOSE
EVALUATIONS WERE OF SECURITY POLICY BY FAR THE EASIEST THING TO DO WAS
TO CREATE A SINGLE CAPABILITY FOR ALL THE THINGS THAT DIDNT SHOW UP IN THE
security policy and declare that the people doing the evaluation didn’t
have to look over there.

5LTIMATELY MY GUESS IS THAT FILESYSTEM CAPABILITIES WILL HAVE TO BE


REPLACED BY SOME KIND OF NON 0/3)8 SOLUTION THATS BETTER THOUGHT OUT
BUT WHAT THAT MIGHT LOOK LIKE REMAINS AN OPEN QUESTION
4HE fbdev DRIVERS HAVE BEEN ON THE CHOPPING BLOCK FOR QUITE A WHILE
now, as the DRM FRAMEWORK HAS BEEN TRYING TO REPLACE THEM FOR YEARS
"UT WHENEVER ANYONE TRIES TO GET RID OF THOSE LAST STRAGGLING FBDEV DRIVERS
Geert Uytterhoeven OR SOMEONE ELSE ALWAYS ASKS THE SAME QUESTION
h#AN $2- BE USED TO CREATE EXTREMELY SIMPLE DISPLAY DRIVERSv !ND THE
ANSWER IS ALWAYS h9ES ABSOLUTELY .OT RIGHT NOW BUT SOONv !T WHICH
point Geert or someone else vetoes the expurgation, and the process
begins again some months later.
But, this time it turned out, as Daniel Vetter noted, DRM has indeed
ADVANCED TO THE POINT OF BEING ABLE TO PRODUCE THOSE SIMPLE DRIVERS RIGHT
NOW 4HE COMMUNITY HAD HEARD THE OBJECTIONS AND THE COMMUNITY HAD
ANSWERED )T TOOK A LITTLE TIME THIS TIME FOR THIS TO BE MADE CLEAR 0ARTLY
this was because Geert had gone through the rejection ritual so many
times in the past that it had become an ingrained habit.
"UT ONCE THE TRUTH FINALLY BECAME CLEAR 'EERT WITHDREW HIS OBJECTIONS
AND NOW IT LOOKS AS THOUGH THE LAST FEW REMAINING FBDEV DRIVERS SOON WILL
BE HISTORYˆOR AT LEAST THE REMAINING OBSTACLES ARE NO LONGER ABSOLUTE
DEAL BREAKERS
4HIS HAS BEEN A LONG TIME COMING AND THE $2- FOLKS HAVE HAD TO
ENDURE A LOT OF FRUSTRATION IN THE PROCESS SO THERE WAS MUCH RINGING OF
bells when the path was cleared at last.
/NE THING THATS NO FUN IS WHEN THE CPU ITSELF CONTAINS SECURITY HOLES
)TS A REAL PAIN TO DISCOVER THAT CERTAIN OPCODES LEAK CRUCIAL INFORMATION
because it essentially means that those opcodes never should be used.
4HEYRE JUST WASTED TRANSISTORS TAKING UP SPACE ON THE CHIP

19 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 19 2/18/17 10:18 AM


UPFRONT

Paolo Bonzini RECENTLY WANTED TO DISABLE SEVERAL #05 INSTRUCTIONS


FOR KVM, such as the SGDT and SIDT opcodes, which he said could
leak kernel addresses into userspace. Once leaked, those addresses
COULD BE USED TO DEFEAT CERTAIN SECURITY MEASURES SUCH AS ADDRESS
layout randomization.
3O THATS HAPPENING -EANWHILE FOLKS LIKE Liang Z Li HAVE OFFERED TO
help lock down those issues.
%VERYONE WANTS TO SUPPORT USB type C THE NEW 53" CONNECTOR THAT
works right side up and upside down. And, Heikki Krogerus OF Intel
RECENTLY POSTED SOME PATCHES TO SUPPORT THIS 4HERE WAS QUITE A BIT OF
DISCUSSION AND REVIEW OF THE PATCH AND ENOUGH PROBLEMS AROSE THAT Greg
Kroah-Hartman TOOK A DEEPER LOOK AT THE CODE 4HE PATCH TURNED OUT TO
have so many deep structural problems, Greg insisted that Heikki take the
CODE BACK TO )NTEL AND HAVE THE ENGINEERS THERE GIVE IT A THOROUGH GOING
OVER BEFORE 'REG EVEN WOULD AGREE TO LOOK AT IT AGAIN
So, that was harsh. Nobody likes to hear that their code is so bad
THAT THE UPSTREAM MAINTAINER WONT EVEN LOOK AT FUTURE VERSIONS UNLESS
THE DOWNSTREAM MAINTAINERS STAGE SOME SORT OF INTERVENTION "UT
that’s what happened.
5LTIMATELY 53" TYPE # SUPPORT WILL BE COMING SOONER RATHER THAN
LATER 4HERES A LOT OF MOTIVATION TO SUPPORT IT GIVEN ITS POPULARITY IN
the real world. I would imagine that the Intel engineers already are
much closer to a proper patch. —Zack Brown

20 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 20 2/18/17 10:18 AM


14th Annual Lo
w
-C Re
2017 HPC FOR WALL STREET – Re
os gi
t C st
gi Sa on er T
CLOUD & DATA CENTERS st v fe o
Show & Conference er f e $1 ren day
or 0 ce .
April 3, 2017 (Monday) Roosevelt Hotel, NYC Fr 0. at
ee $2
Madison Ave & 45th, next to Grand Central Station Sh 95
ow .
The Next Big Game Changer. Partial List of 2017 Speakers.

New FinTech Technology Featured on April 3.


All-Star Conference program for 2017.
Plan to attend the largest FinTech meeting of Cloud, Data Centers, HPC,
Big Data, Networks, AI and Machine Learning, Trading, Low Latency for the Terry Roche, Principal, Donal Byrne Lacee McGee
Capital Markets. Head of FinTech
Research,TABB Group
CEO,
Corvil
Sr Prod Mgr, FSI
Vertical Sols, HPE

• The Next-Generation FinTech Economy


• HPC Innovation for the FSI Market
• Addressing the Scalability Challenges of Deep Learning Model Training
• Optimizing New Data Technology and Developing HPC as a Service
Harvey Stein Moiz Kohari (Invited)
• The AI Revolution Comes to Wall Street Head of Credit Risk,
Dino Vitale
TD Securities SVP Ch Tech Arch,
Modeling, Bloomberg State Street London
• New Applications for High Performance Networking
• Enabling Financial Institutions to Build and Deploy Artificial Intelligence
Applications using an Open Source Container Platform
• A New Approach to Data Storage and new Data Storage Architecture
• Hybrid Cloud, Private Cloud, Public Cloud
Anthony Golia David Rukshin Asaf Wachtel
Sr Dir, Business Dev,
• Innovative Servers, Networks and Storage Applications FSI Chief Architect,
Red Hat
CTO,
WorldQuant (invited) Mellanox Technologies

Plan to attend low-cost conference at $295. Save $100.


Qualified End Users may register as our guests, but must confirm
their title and responsibility.
Register for the free show online at: www.flaggmgmt.com/linux
2017 Sponsors
Roman Chwyl Joseph George Andy Steinbach
Head of Fin Svcs VP Solutions Strategy, Sr Dir Global FSI
Google SUSE Bus Dev, NVIDIA

Show Hours: Mon, Apr 3 8:00 - 4:00


Phil Filleul Felix Candelario Asif Alam
Conference Hours: 8:30 - 4:50 Segment Dir, Fin Global Fin Svcs Sol Arch, Global Bus Dir
advanced hardware accelerators
Svcs, Cray Inc Amazon Web Svcs (Invited) Thomson Reuters

PENGUIN
COMPUTING

Visit: www.flaggmgmt.com/linux
Show & Conference: Flagg Management Inc, Natalia Vassilieva Ed Turkel Don Clegg Pat McGinn
353 Lexington Avenue, New York 10016 (212) 286 0333 fax: (212) 286 0086 Sr Research Manager, HPC Strategist, VP Mktg Bus Dev BBA/IB CITP VP Prod Mktg
Hewlett Packard Labs Dell EMC Supermicro (Invited) CoolIT Systems
[email protected]

LJ275-March2017.indd 21 2/18/17 10:18 AM


UPFRONT

Android Candy: My
World, in a Lock Screen

)T FEELS WEIRD TO MENTION A -ICROSOFT PRODUCT IN Linux Journal. But to be


HONEST THERE ARE SOME COOL THINGS COMING OUT OF THE -ICROSOFT 'ARAGE
(HTTPWWWMICROSOFTCOMGARAGE  /NE OF THOSE THINGS IS h.EXT ,OCK 3CREENv
which is an Android app that brings interactive tools to the lock screen.
4HIS CONCEPT ISNT REVOLUTIONARY BUT WITH .EXT ,OCK 3CREEN ITS DONE
very well. It’s possible to launch apps, interact with messages and get
CUSTOMIZED NOTIFICATIONS ALL WITHOUT UNLOCKING YOUR PHONE $O YOU
PREFER TO HAVE YOUR CALENDAR EVENTS ON YOUR LOCK SCREEN $ONE 7ANT
TO CONTROL YOUR MUSIC $ONE !GAIN NOTHING HERE IS REALLY NEW ITS JUST
INTEGRATED AND CUSTOMIZABLE IN A WAY THAT TAKES A BUNCH OF GOOD IDEAS
and repackages them into a slick lock screen. You also can get the Bing
wallpaper on your lock screen, which is pretty cool, because honestly,
THE "ING PHOTO OF THE DAY IS ALMOST ALWAYS INCREDIBLE
)F YOURE NOT AFRAID TO TRY AN APP DEVELOPED BY -ICROSOFT ) URGE YOU
TO CHECK OUT .EXT ,OCK 3CREEN )T MAKES A LOCKED PHONE FAR MORE USEFUL
(I should add that enabling interaction on your lock screen does make it
FAR LESS SECURE SO BE CAREFUL AS TO WHICH FEATURES YOU ENABLE #HECK IT
out at the Google Play Store. —Shawn Powers

22 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 22 2/18/17 10:18 AM


LJ275-March2017.indd 23 2/18/17 10:18 AM
UPFRONT

Non-Linux FOSS:
File Spelunking
with WinDirStat
7ITH ,INUX ITS FAIRY EASY TO FIND THE LARGE FILES ON YOUR SYSTEM BY DOING
something like this:

du  -­ahx  /  |  sort  -­rh  |  head  -­20

5NFORTUNATELY 7INDOWS USERS DONT USUALLY HAVE EQUIVALENT TOOLS

(Image via https://windirstat.net)

24 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 24 2/18/17 10:18 AM


UPFRONT

4HATS WHERE SOMETHING LIKE 7IN$IR3TAT COMES INTO PLAY )TS A FILE
BROWSER THAT USES INCREDIBLE '5) ELEMENTS TO SHOW YOU THE FILES ON
YOUR SYSTEM WITH FILE SIZE SHOWN AS RECTANGLES "IG FILES ARE SHOWN AS
BIG RECTANGLES AND THEIR FILE TYPES ARE SPECIFIED BY COLOR )TS A GREAT
VISUAL WAY TO SORT YOUR FILESYSTEM AND GET RID OF OR AT LEAST FIND
EXTREMELY LARGE FILES
)F YOU USE 7INDOWS ON A REGULAR BASIS BUT SEEM TO HAVE A SHRINKING
HARD DRIVE ) URGE YOU TO DOWNLOAD 7IN$IR3TAT TO GET REAL TIME
STATISTICS ON YOUR FILESYSTEM )TS OPEN SOURCE AND OF COURSE FREE TO
download at https://windirstat.net. —Shawn Powers

Archive
1994–2016
NOW
AVAILABLE!
SAVE $10.00
by using
discount code
2017ARCH
at checkout.
Coupon code
expires 3/28/2017

www.linuxjournal.com/archive

25 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 25 2/18/17 10:18 AM


UPFRONT

Is the Best Tea Ever


Really Worth It?
) RECENTLY WROTE ABOUT MY hPERFECT TEAv STEEPING DEVICE )TS NOTHING MORE
THAN A PLASTIC MUG SHAPED DEVICE THAT HAS A SIEVE BUILT IN FOR STRAINING TEA
LEAVES AFTER THE STEEPING PROCESS IS COMPLETE ) USE IT EVERY DAY 2EALLY
And compared to the tech pieces I normally write in Linux Journal, that
LITTLE 5P&RONT BLURB GARNERED ME QUITE A FEW EMAILS ASKING FOR DETAILS
It also got me a
FEW MESSAGES
explaining that
brewing tea in a
plastic cup was
AN UNFORGIVABLE
sin. One email,
however, did make
me think. I was
asked about my
"REVILLE /NE 4OUCH
4EA -AKER
For Christmas
OR A BIRTHDAY A
FEW YEARS BACK
my incredible
WIFE BOUGHT ME A
 TEA MAKER
4HE /NE 4OUCH
treats making tea
a bit like making
COFFEE 9OU PUT
the tea leaves
into a basket, and

26 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 26 2/18/17 10:18 AM


UPFRONT

THE BREWING PROCESS IS AUTOMATIC 4HE COOL PART IS


THEY
THAT IT MAKES THE WATER THE PERFECT TEMPERATURE SAID IT
AND STEEPS THE LEAVES FOR THE EXACT RIGHT TIME
DEPENDING ON THE TYPE OF TEA YOURE MAKING !ND There is only one
YOU KNOW WHAT )T MAKES THE BEST TEA )VE EVER success—to be
had. Seriously. It makes tea noticeably better than able to spend
your life in your
any other method I’ve used. And yet, I rarely use
own way.
IT 7HY
—Christopher Morley
)T TURNS OUT THAT ALTHOUGH THE /NE 4OUCH ISNT
EXACTLY DIFFICULT TO CLEAN IT DOES TAKE SOME EFFORT When a thought
4HATS FRUSTRATING 4HE REAL GOTCHA HOWEVER IS is too weak to
THAT ) DONT DRINK THE ENTIRE POT OF TEA QUICKLY be expressed
ENOUGH AND EVEN THOUGH THE /NE 4OUCH KEEPS simply, it should
TEA HOT FOR AN HOUR ) FIND MYSELF HEATING COLD TEA be rejected.
in the microwave. It is possible to make a smaller —Marquis de
Vauvenargues
BATCH OF TEA BUT IF YOURE GOING TO GO THROUGH THE
HASSLE OF BREWING A POT OF TEA WHY WOULDNT YOU
Do something. If
MAKE A FULL POT 3O MOST OF THE TIME ) USE MY /NE
it doesn’t work,
4OUCH POT TO HEAT WATER AND THEN BREW TEA IN THE do something
plastic steeper. It’s crazy. else. No idea is
-Y "REVILLE /NE 4OUCH HAS BEEN INSTRUMENTAL IN too crazy.
my thinking when it comes to tech purchases, and —Jim Hightower
I wish I’d learned its lesson sooner. My PlayStation
 0RO ) ACTUALLY FIND THE 03 IS JUST AS FUN AND A The most
FRACTION OF THE PRICE -Y RACKMOUNT 8EON %38I profound
SERVER 4HE FEW 2ASPBERRY 0I SERVERS ) HAVE ARE statements are
ACTUALLY MORE USEFUL AND FLEXIBLE (ECK EVEN MY often said in
FANCY NEW &  PICKUP ISNT HALF AS FUN AS MY silence.
—Lynn Johnston
 YEAR OLD 6OLKSWAGEN "EETLE
3O WHAT DID MY  TEA MAKER TEACH ME
The most potent
Marketing and popularity aren’t what make things
muse of all is our
great. It’s a lesson I should have learned years ago, own inner child.
BECAUSE ,INUX IS FREE AND YET ITS THE OPERATING —Stephen
SYSTEM THAT BRINGS ME THE MOST JOY —Shawn Powers Nachmanovitch

27 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 27 2/18/17 10:18 AM


UPFRONT

Jmol: Viewing
Molecules with Java
,ETS DIG BACK INTO SOME CHEMISTRY SOFTWARE TO SEE WHAT KIND OF WORK
YOU CAN DO ON YOUR ,INUX MACHINE 3PECIFICALLY LETS LOOK AT *MOL A
Java application that is available as both a desktop application and
A WEB BASED APPLET HTTPJMOLSOURCEFORGENET).
9OU CAN USE *MOL TO HELP ANALYZE THE RESULTS YOU GET FROM OTHER
SOFTWARE PACKAGES THAT ACTUALLY CALCULATE THE CHEMICAL EFFECTS YOU
ARE RESEARCHING )T CAN READ IN DOZENS OF DIFFERENT FILE FORMATS AND
YOU CAN USE IT TO VISUALIZE EVERYTHING FROM SMALL MOLECULES TO HUGE
macromolecules, like proteins. You also can visualize crystals and
orbitals. You even can visualize animated events, such as chemical
reactions and molecular vibrations.
Most Linux distributions should have Jmol available within their
package management repositories. For example, you can install it on

Figure 1. When you first start Jmol, you get a blank workspace ready for your work.

28 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 28 2/18/17 10:18 AM


UPFRONT

$EBIAN BASED DISTRIBUTIONS WITH THIS COMMAND

sudo  apt-­get  install  jmol

)F YOU WANT TO USE THE LATEST AND GREATEST VERSION DOWNLOAD IT FROM THE
MAIN PROJECT WEBSITE 4HE DOWNLOAD COMES AS A SIMPLE ZIP FILE CONTAINING
everything you need to run Jmol. You also will need to install a Java
virtual machine in order to run Jmol.
)F YOU INSTALLED *MOL FROM THE PACKAGE MANAGER YOU PROBABLY WILL HAVE
A SCRIPT AVAILABLE THAT WILL MAKE RUNNING *MOL EASIER )F YOU INSTALL IT FROM
THE BINARY ZIP FILE YOU WILL NEED TO RUN IT MANUALLY BY CALLING *AVA AND
USING THE *!2 FILE AS A COMMAND LINE OPTION
7HEN YOU FIRST START *MOL YOULL SEE A BLANK SCREEN READY FOR INPUT
!CROSS THE TOP IS A SERIES OF ICONS ALLOWING FOR EASY ACCESS TO THE KEY
FUNCTIONS AVAILABLE WITHIN *MOL )F YOU ALREADY HAVE DATA FILES TO ANALYZE
YOU CAN USE THEM /THERWISE YOU MAY NEED SOME SAMPLE FILES IN ORDER TO
PLAY WITH THE FUNCTIONALITY AVAILABLE
4HE BINARY DISTRIBUTION DOESNT INCLUDE ANY SAMPLE FILES IN ORDER TO SAVE

Figure 2. The basic display you get when you load a molecule is a ball and stick display.

29 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 29 2/18/17 10:18 AM


UPFRONT

ON DOWNLOAD BANDWIDTH HOWEVER SEVERAL SAMPLE DATA FILES ARE AVAILABLE


FROM THE MAIN WEBSITE 9OU CAN GET THE ENTIRE SET BY DOWNLOADING A
SNAPSHOT OF THE SOURCE FILES )N THE EXAMPLES FOR THE REST OF THIS ARTICLE
)M USING SEVERAL OF THE SAMPLE DATA FILES AVAILABLE FROM THE SOURCE
snapshot download.
4HE SIMPLEST EXAMPLE IS JUST TO LOAD A DATA FILE AND SEE WHAT IT
LOOKS LIKE &IGURE  SHOWS WHAT YOU GET WHEN YOU LOAD THE SAMPLE
FILE *MOL DATAFILESGAUSSIANPHENYLNITRINEGOUT
4HE DATA DISPLAY IS AN INTERACTIVE ONE 5SING YOUR MOUSE YOU CAN CLICK
AND DRAG THE MOLECULE TO ROTATE IT AROUND TO SEE ALL OF THE DETAILS
4HE $ISPLAY MENU ITEM PROVIDES A NUMBER OF OPTIONS TO PLAY WITH THE
MOLECULE 4HE !TOM MENU ITEM ALLOWS YOU TO CHANGE HOW MUCH OF THE
VAN DER 7AALS FORCE FIELD TO SHOW 4HE "OND MENU ITEM SHOWS HOW THICK
to make the bonds between atoms. With these two options, you can tailor
THE DISPLAY SO THAT THE APPROPRIATE AMOUNT OF DETAIL IS SHOWN 4HE ,ABEL
menu item allows you to add either symbols, names or atomic numbers to
the atoms within the molecule.
.EAR THE BOTTOM OF THE $ISPLAY MENU THERE IS A CHECK BOX FOR WHETHER

Figure 3. When you load an animation, it starts with a static image of your molecule.

30 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 30 2/18/17 10:18 AM


UPFRONT

HYDROGEN ATOMS ARE DISPLAYED WITHIN THE DISPLAY OF THE MOLECULE


7HILE )M TALKING ABOUT HOW TO AFFECT THE DISPLAY OF THE MOLECULE )
SHOULD MENTION THAT THE 6IEW MENU ITEM PROVIDES A NUMBER OF PRESETS
on how to line up the molecule. So, with a single click, you can view the
MOLECULE ALONG ANY OF ITS AXES
*MOL ALSO CAN DISPLAY ANIMATIONS OF EVENTS ALONG WITH STATIC IMAGES
4HE ANIMATIONS SUBDIRECTORY CONTAINS SEVERAL EXAMPLES THAT YOU CAN
PLAY WITH 7HEN YOU LOAD IT UP YOU START WITH A STATIC IMAGE OF THE
MOLECULE AS BEFORE
7ITHIN THE ICON BAR AT THE TOP OF THE WINDOW THERE ARE A SERIES OF
BUTTONS AT THE FAR RIGHT HAND SIDE THAT ALLOW YOU TO STEP THROUGH THE
FRAMES OF THE ANIMATION FRAME BY FRAME )F YOU WANT TO SEE THE FULL
ANIMATION THERES SET OF OPTIONS UNDER THE 4OOLSAAnimate menu item.
Here, you can go through the animation once, or you can put it on a loop.
9OU CAN EVEN USE A MODE CALLED 0ALINDROME THAT GOES FORWARD
THROUGH THE ANIMATION AND THEN BACKWARD 4HAT WAY YOU NEED TO
CALCULATE ONLY ONE HALF OF THE MOTION YET YOU STILL ARE ABLE TO VISUALIZE
THE ENTIRE RANGE OF THE MOTION
Several more analysis tools are available. Clicking the
4OOLSASpectraA*3PEC6IEW MENU ITEM POPS UP A NEW WINDOW 5NDER
THE &ILE MENU ITEM YOULL FIND OPTIONS TO ADD EXTRA FILES OR DO ( OR #

Figure 4. JSpecView is an extra tool available for looking at the spectra of molecules.

31 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 31 2/18/17 10:18 AM


UPFRONT

SIMULATIONS 9OU CAN SELECT 4OOLSAMeasurements to measure the distance


between atoms within your molecule, and you can set the units used
FOR THOSE MEASUREMENTS WITH THE 4OOLSA$ISTANCE 5NITS MENU ITEM 9OU
ACTUALLY CAN EDIT THE MOLECULE AFTER IT IS LOADED TOO
)F YOU CLICK THE ICON BUTTON WITH THE HOVER OVER DESCRIPTION h/PEN THE
MODEL KITv YOULL GET A SMALL SET OF DROP DOWN ITEMS ON THE TOP LEFT SIDE
OF THE DISPLAY WINDOW )T ALLOWS YOU TO DELETE ATOMS MOVE BONDS AROUND
OR EVEN CHANGE THE ATOM SPECIES AT SPECIFIC LOCATIONS
)F YOU HAVE SOME TYPE OF ANALYSIS THAT YOU NEED TO REPEAT SEVERAL
TIMES *MOL SUPPORTS THE ADDITION OF MACROS -ACROS ARE JUST SIMPLE
TEXT FILES THAT CONTAIN A SET OF *MOL INSTRUCTIONS )F YOU SAVE THEM IN
the ~/.jmol/macros directory, Jmol will pick them up and provide them
within the Macros menu item.
4HE LANGUAGE FOR THE MACROS IS THE SAME USED FOR *MOLS SCRIPTING
CAPABILITIES 4HIS SCRIPTING LANGUAGE IS BASED ON 2AS-OL WITH
SOME MINOR CHANGES 4HERE IS A FULL LANGUAGE REFERENCE AVAILABLE
at HTTPCHEMAPPSSTOLAFEDUJMOLDOCS.

Figure 5. Jmol also lets you edit molecules.

32 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 32 2/18/17 10:18 AM


UPFRONT

You also can use scripts interactively by clicking the FileAScript


%DITOR MENU ITEM 4HIS POPS UP A NEW WINDOW WHERE YOU CAN WRITE
YOUR SCRIPT CHECK ITS SYNTAX AND THEN RUN IT WITHIN *MOL 4HIS PROVIDES
A HUGE AMOUNT OF POWER ALLOWING YOU TO GET THE EXACT TYPE OF
analysis you need.
Once you’ve done your analysis, several output options are
AVAILABLE 4HE &ILEA%XPORT MENU ITEM GIVES YOU FOUR OPTIONS 9OU
CAN SELECT %XPORT )MAGE TO SAVE A STATIC IMAGE IN ONE OF SEVERAL IMAGE
FILE FORMATS
"ECAUSE *MOL ALSO OPERATES AS A *AVA APPLET YOU CAN SELECT %XPORT
to Web Page to generate a web page that you then can use within
your own website to share your research results.
)F YOU WANT A HIGHER RESOLUTION IMAGE OF A MOLECULE YOU CAN SELECT
2ENDER IN 0/6 2AY TO USE THE 0/6 2AY EXTERNAL PROGRAM TO RENDER A
HIGH QUALITY $ IMAGE
4HE LAST EXPORT OPTION IS 7RITE 3TATE WHICH SAVES THE CURRENT
workspace so that you can reload it later and continue your analysis.
4HERE ALSO IS AN EXTRA OUTPUT OPTION UNDER 4OOLSAGaussian that
POPS UP ANOTHER WINDOW (ERE YOU CAN SET SEVERAL OPTIONS FOR A

Figure 6. Jmol provides a full scripting language to help automate your analysis steps.

33 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 33 2/18/17 10:18 AM


UPFRONT

Figure 7. You can use Jmol to generate Gaussian input files based on your analysis.

'AUSSIAN INPUT FILE THAT YOU CAN THEN USE TO RUN FURTHER SIMULATIONS
OF YOUR MOLECULE
W ith these tools, you easily can share your research results with
others and build on the work you are doing. —Joey Bernard

RETURN TO CONTENTS

34 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 34 2/18/17 10:18 AM


LJ275-March2017.indd 35 2/21/17 10:20 AM
EDITORS’ CHOICE

NEXT
PREVIOUS
Reuven M. Lerner’s

V
V

UpFront
At the Forge

Read a Book

EDITORS’
in the Blink CHOICE

of an Eye!
I love reading. Sadly, the 24 hours I get per day seems to be inadequate
for the tasks I need to accomplish. That might change as my teenagers
turn into college kids and then begin to start families of their own. For
now, however, between drama class and basketball practice, it seems
like it takes about 30 hours to accomplish a 24-hour day. Needless to say,

36 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 36 2/20/17 12:14 PM


EDITORS' CHOICE

I don’t read as many books as I’d like.


.ORMALLY ) TAKE ADVANTAGE OF COMMUTE TIME TO LISTEN TO AUDIOBOOKS
4HAT ACTUALLY WORKS QUITE WELL AND )M ABLE TO READ n BOOKS A YEAR
-OST OF THOSE BOOKS ARE FICTION BUT STILL )M GRATEFUL FOR AUDIOBOOKS
.OT LONG AGO ) DISCOVERED A DIFFERENT SORT OF AUDIOBOOK ) HONESTLY
HAVE MIXED FEELINGS ABOUT THE CONCEPT BUT IMAGINE IF #LIFF .OTES AND
!UDIOBOOKS HAD A BABY 4HAT BABY MIGHT BE CALLED h"LINKISTv
Blinkist is a company that condenses books into very short
SUMMARIES 4HEY ARE AVAILABLE VIA TEXT FOR +INDLE AND THE LIKE BUT
FOR ME THE PROFESSIONALLY NARRATED AUDIO VERSIONS ARE REALLY WHAT
WORK BEST 2ATHER THAN READING AN AUDIOBOOK OVER THE COURSE OF
A WEEK ) CAN hREADv A BOOK ON THE WAY TO THE GROCERY STORE )M
SHOCKED TO ADMIT THE SUMMARIES OF ENTIRE BOOKS ARE SURPRISINGLY
USEFUL &OR MANY BOOKS THE SUMMARY FROM "LINKIST IS ENOUGH &OR
SOME THE hBLINKSv MAKE ME WANT TO READ THE ENTIRE BOOK 4HAT
MEANS ALTHOUGH ITS NOT A  REPLACEMENT FOR READING IT ADDS
VALUE AND KNOWLEDGE TO MY LIFE
4HERE IS A THREE DAY FREE TRIAL THAT ALLOWS YOU TO READ AS MANY BOOKS
AS YOU LIKE ) URGE YOU TO GIVE IT A TRY !FTER THE THREE DAYS YOU CAN
EITHER DEFAULT TO THE FREE ACCOUNT WHICH ALLOWS YOU TO LISTEN OR READ
ONE PRE CHOSEN FREE BOOK A DAY OR OPT FOR A PAID SUBSCRIPTION &OR
TEXT ONLY hBLINKSv ITS YEAR &OR UNLIMITED TEXT AND AUDIO hBLINKSv
ITS YEAR 4HANKFULLY THREE DAYS IS A ENOUGH TIME TO FIGURE OUT IF ITS
SOMETHING YOU FIND WORTH BUYING
4HANKS TO ITS COOL WAY OF FITTING MORE INFORMATION INTO OUR OVER BUSY
LIVES AND ITS HANDY MOBILE APP FOR hBLINKINGv ON THE GO "LINKIST GETS
THE %DITORS #HOICE AWARD THIS MONTH )F NOTHING ELSE CHECK OUT THE
FREE TRIAL AT http://www.blinkist.com. You can read a surprising number
OF BOOKS IN THREE FREE DAYS —Shawn Powers

RETURN TO CONTENTS

37 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 37 2/18/17 10:18 AM


AT THE FORGE

Unsupervised
Learning REUVEN M.
In this article, Reuven moves from supervised LERNER
learning to unsupervised learning, where Reuven M. Lerner, a

you ask the computer to tell you something longtime Web developer,
offers training and
interesting about the data. consulting services in
Python, Git, PostgreSQL
and data science. He has
written two programming
ebooks (Practice Makes
NEXT Python and Practice Makes
PREVIOUS
Dave Taylor’s V
V

Regexp) and publishes


Editors’ Choice a free weekly newsletter
Work the Shell
for programmers, at
http://lerner.co.il/
newsletter. Reuven tweets
at @reuvenmlerner and
lives in Modi’in, Israel, with
IN MY LAST FEW ARTICLES, I’ve looked into
his wife and three children.
machine learning and how you can build a model that
DESCRIBES THE WORLD IN SOME WAY !LL OF THE EXAMPLES )
LOOKED AT WERE OF hSUPERVISED LEARNINGv MEANING THAT
you loaded data that already had been categorized or
CLASSIFIED IN SOME WAY AND THEN CREATED A MODEL THAT
hLEARNEDv THE WAYS THE INPUTS MAPPED TO THE OUTPUTS
With a good model, you then were able to predict the
OUTPUT FOR A NEW SET OF INPUTS
3UPERVISED LEARNING IS A VERY USEFUL TECHNIQUE
AND IS QUITE WIDESPREAD "UT THERE IS ANOTHER SET OF
TECHNIQUES IN MACHINE LEARNING KNOWN AS unsupervised
learning 4HESE TECHNIQUES BROADLY SPEAKING ASK
THE COMPUTER TO FIND THE HIDDEN STRUCTURE IN THE

38 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 38 2/18/17 10:18 AM


AT THE FORGE

DATAˆIN OTHER WORDS TO hLEARNv WHAT THE MEANING OF THE DATA IS WHAT
RELATIONSHIPS IT CONTAINS WHICH FEATURES ARE OF IMPORTANCE AND WHICH
data records should be considered to be outliers or anomalies.
5NSUPERVISED LEARNING ALSO CAN BE USED FOR WHATS KNOWN AS
hDIMENSIONALITY REDUCTIONv IN WHICH THE MODEL FUNCTIONS AS A
PREPROCESSING STEP REDUCING THE NUMBER OF FEATURES IN ORDER TO SIMPLIFY
the inputs that you’ll hand to another model.
In other words, in supervised learning, you teach the computer about
your data and hope that it understands the relationships and categorization
WELL ENOUGH TO CATEGORIZE DATA IT HASNT SEEN BEFORE SUCCESSFULLY
In unsupervised learning, by contrast, you’re asking the computer to tell
you something interesting about the data.
4HIS MONTH ) TAKE AN INITIAL LOOK AT THE WORLD OF UNSUPERVISED LEARNING
#AN A COMPUTER CATEGORIZE DATA AS WELL AS A HUMAN (OW CAN YOU USE
0YTHONS SCIKIT LEARN TO CREATE SUCH MODELS

Unsupervised Learning
4HERES A CHILDRENS CARD GAME CALLED Set THAT IS A USEFUL WAY TO THINK
ABOUT MACHINE LEARNING %ACH CARD IN THE GAME CONTAINS A PICTURE 4HE
PICTURE CONTAINS ONE TWO OR THREE SHAPES 4HERE ARE SEVERAL DIFFERENT
SHAPES AND EACH SHAPE HAS A COLOR AND A FILL PATTERN )N THE GAME
PLAYERS ARE SUPPOSED TO IDENTIFY THREE CARD GROUPS OF CARDS USING ANY
ONE OF THOSE PROPERTIES 4HUS YOU COULD CREATE A GROUP BASED ON THE
COLOR GREEN IN WHICH ALL CARDS ARE GREEN IN COLOR BUT CONTAIN DIFFERENT
NUMBERS OF SHAPES SHAPES AND FILL PATTERNS  9OU COULD CREATE A GROUP
BASED ON THE NUMBER OF SHAPES IN WHICH EVERY CARD HAS TWO SHAPES BUT
THOSE SHAPES CAN BE OF ANY COLOR ANY SHAPE AND ANY FILL PATTERN
4HE IDEA BEHIND THE GAME IS THAT PLAYERS CAN CREATE A VARIETY OF DIFFERENT
GROUPS AND SHOULD TAKE ADVANTAGE OF THIS IN ORDER TO WIN THE GAME
) OFTEN THINK OF UNSUPERVISED LEARNING AS ASKING THE COMPUTER TO PLAY
A GAME OF Set. You give the computer a data set and ask it to divide that
LARGE BUNCH OF DATA INTO SEPARATE CATEGORIES 4HE MODEL MAY CHOOSE ANY
FEATURE OR SET OF FEATURES AND THAT MIGHT OR MIGHT NOT BE A FEATURE
THAT HUMANS WOULD CONSIDER TO BE IMPORTANT "UT IT WILL FIND THOSE
connections, or at least try to do so.
/NE OF THE MOST COMMON MACHINE LEARNING MODELS FOR BEGINNERS

39 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 39 2/18/17 10:18 AM


AT THE FORGE

IS THE hIRISv DATASET CONTAINING  FLOWERS  FROM EACH OF THREE


TYPES OF IRISES 3EVERAL MONTHS AGO ) SHOWED HOW YOU COULD CREATE A
SUPERVISED MODEL TO IDENTIFY IRISES )N OTHER WORDS YOU COULD CREATE
and train a model that would categorize irises accurately based on
their petal and sepal sizes.
#AN UNSUPERVISED LEARNING ACHIEVE THE SAME GOAL 4HAT IS CAN YOU
CREATE A MODEL THAT WILL DIVIDE THE FLOWERS INTO THREE DIFFERENT GROUPS
DOING THE SAME JOB OR CLOSE TO IT THAT HUMANS HAVE DONE
!NOTHER WAY OF ASKING THIS QUESTION IS WHETHER THE WAY IN WHICH
BIOLOGISTS DISTINGUISH BETWEEN VARIETIES OF FLOWERS IS SUPPORTED BY THE
underlying measurement data.
Let’s load the iris data and then start to create an unsupervised
model. Assuming that I’m working within the Jupyter notebook, I can
EXECUTE THE FOLLOWING

%pylab  inline  
import  pandas  as  pd  
from  pandas  import  DataFrame,  Series  
 
from  sklearn.datasets  import  load_iris  
iris  =  load_iris()  
 
df  =  DataFrame(iris.data,  columns=iris.feature_names)  
df['response']  =  iris.target

)N OTHER WORDS )VE CREATED A 0ANDAS DATA FRAME CONTAINING FIVE


COLUMNSˆTHE FOUR FEATURES AND ALSO THE RESPONSE THAT IS THE
CLASSIFICATION  9OU WONT BE PASSING THE CLASSIFICATION TO THE MODEL
ALTHOUGH THAT MIGHT IMPROVE THE MODELS ABILITY TO CLASSIFY THE FLOWERS
but it’s convenient to keep everything together in this way.

Creating a Model
Once you’ve loaded the data, it’s time to create a model. You’re looking
TO DO WHATS KNOWN AS hCLUSTERINGv WHICH MEANS THAT THE COMPUTER WILL
divide the data set into categories or clusters.
3O NOW WHAT )N SUPERVISED LEARNING YOU WOULD CREATE A NEW MODEL

40 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 40 2/18/17 10:18 AM


AT THE FORGE

FROM A CLASSIFIER AND THEN TRAIN IT USING SCIKIT LEARNS hFITv METHOD 9OU
then could give the trained model one or more data points and ask it to
categorize those based on the model.
)N UNSUPERVISED LEARNING ITS A BIT TRICKIERˆAFTER ALL YOURE ASKING THE
COMPUTER TO DO THE CATEGORIZATION )F YOU DONT HAVE ANY PRE LABELED
CATEGORIES ITS GOING TO BE HARD TO KNOW WHETHER YOUR MODEL IS USEFUL
accurate or both.
"UT BEFORE GETTING INTO THE EVALUATION LETS BUILD A MODEL 3KLEARN
COMES WITH A NUMBER OF CLASSIFIERS THAT HANDLE CLUSTERING /NE POPULAR
CLASSIFIER IS KNOWN AS h+ MEANSv )N + MEANS CLUSTERING THE IDEA IS THAT
the model puts each data point inside the cluster whose mean is the
CLOSEST 4HUS IF THERE ARE THREE CLUSTERS EACH CLUSTER WILL CONTAIN POINTS
THAT ARE CALCULATED TO BE CLOSEST 4HE hINERTIAv IS A MEASUREMENT OF HOW
COHERENT THE GROUPS AREˆTHAT IS HOW CLOSELY ASSOCIATED WITH ONE ANOTHER
THE ELEMENTS THAT HAVE BEEN GROUPED TOGETHER FIT
) SHOULD NOTE THAT BECAUSE + MEANS USES DISTANCES TO CALCULATE HOW
TO COMPOSE A GROUP YOU LIKELY WILL WANT ALL OF YOUR FEATURES TO BE
ON THE SAME SCALE )N THE CASE OF THE FLOWERS ALL ARE WITHIN THE SAME
ORDER OF MAGNITUDE "UT YOU CAN IMAGINE THAT IF THREE MEASUREMENTS
ARE ON A SCALE OF n AND A FOURTH IS ON A SCALE OF n MILLION THE
calculations might not work out as well. For this reason, it can be a
GOOD IDEA TO USE A SCALERˆSEVERAL OF WHICH COME WITH SKLEARNˆTO PUT
ALL OF YOUR DATA ONTO THE SAME SCALE 3UCH SCALING IS OFTEN IMPORTANT
WHEN CREATING MODELS IT HELPS THE CALCULATIONS TO IDENTIFY TWO OR MORE
items as being close by.
3O USING 0YTHONS SCIKIT LEARN YOU CAN SAY

from  sklearn.cluster  import  KMeans  


k  =  KMeans(n_clusters=3)

4HE ABOVE CODE INDICATES THAT YOURE GOING TO USE THE + MEANS
algorithm. You create a new model, indicating when you do so that you
want three groups.
.OW RIGHT AWAY YOU MIGHT BE ASKING YOURSELF HOW TO KNOW THAT THERE
WILL BE THREE CATEGORIESˆAND THE COP OUT ANSWER IS THAT YOU GUESS 9OU
CAN TRY DIFFERENT VALUES FOR n_clusters and evaluate the model to see

41 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 41 2/18/17 10:18 AM


AT THE FORGE

how well it does. But in many cases, you’ll have to experiment a bit.
,ETS NOW RUN + MEANS ON THE DATA 4HE 8 THAT IS INPUT MATRIX IS GOING
TO BE THE DATA FRAME MINUS THE hRESPONSEv COLUMN 9OU CAN CREATE THAT
AS FOLLOWS

X  =  df.drop('response',  axis=1)

7ITH SUPERVISED LEARNING THE hFITv METHOD IS THE PROCESS IN WHICH YOU
TEACH THE MODEL TO MAKE ASSOCIATIONS BETWEEN THE INPUT MATRIX 8 AND THE
OUTPUT VECTOR Y )N UNSUPERVISED LEARNING YOURE ASKING THE MODEL ITSELF TO
MAKE SUCH DIVISIONS AND TO CREATE AN OUTPUT VECTOR 9OU DO THIS WITH hFITv

k.fit(X)

Evaluating the Model


4HE FIRST QUESTION YOULL ASK THE MODEL IS h(OW DID IT DIVIDE UP THE
FLOWERSv 9OU KNOW THAT THE IRISES SHOULD BE DIVIDED INTO THREE DIFFERENT
GROUPS EACH WITH  FLOWERS (OW DID + MEANS DO
9OU CAN ASK THE MODEL ITSELF USING A VARIETY OF ATTRIBUTES 4HESE
ATTRIBUTES OFTEN END WITH AN UNDERSCORE ? INDICATING THAT THEY MAY
continue to change over time, as the model is trained more.
And indeed, this is an important point to make. When you invoke the
hFITv METHOD YOU ARE TEACHING THE MODEL FROM SCRATCH (OWEVER THERE
are times when you have so much data, you cannot reasonably teach the
model all at once. For such cases, you might want to try an algorithm that
SUPPORTS THE hPARTIAL?FITv METHOD WHICH ALLOWS YOU TO GRAB INPUTS A LITTLE
bit at a time, teaching the model iteratively. However, not all algorithms
SUPPORT PARTIAL?FIT A LARGE NUMBER OF DATA POINTS MIGHT FORCE YOUR HAND
AND REDUCE THE NUMBER OF ALGORITHMS FROM WHICH YOU CAN CHOOSE
&OR THIS EXAMPLE AND IN THE CASE OF + MEANS YOU CANNOT TEACH THE
MODEL INCREMENTALLY ,ETS ASK THE MODEL FOR ITS MEASURE OF INERTIA

k.inertia_

!GAIN NOTICE THE TRAILING UNDERSCORE 4HE VALUE THAT ) GET IS 
4HE INERTIA VALUE ISNT ON A SCALE THE GENERAL SENSE IS THAT THE LOWER THE

42 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 42 2/18/17 10:18 AM


AT THE FORGE

inertia score, the better, with zero being the best.


7HAT IF ) WERE TO DIVIDE THE FLOWERS INTO ONLY TWO GROUPS OR FOUR
GROUPS 5SING SCIKIT LEARN ) CAN DO THAT PRETTY QUICKLY AND DETERMINE
WHETHER THE COMPUTER THINKS THE MANUAL CLASSIFICATION INTO THREE GROUPS
was a good choice:

output  =  [  ]  
for  i  in  range(2,20):  
       model  =  KMeans(n_clusters=i)  
       model.fit(X)  
       output.append((i,  model.inertia_))  
kmeans  =  DataFrame(output,  columns=['i',  'inertia'])

.OW IT MIGHT SEEM RIDICULOUS TO GROUP  FLOWERS INTO UP TO  DIFFERENT


GROUPS !ND INDEED THE LOWEST INERTIA VALUE THAT ) GET IS WHEN ) SET
n_clusters=19 WITH THE INERTIA RISING AS THE NUMBER OF GROUPS GOES DOWN
0ERHAPS THIS MEANS THAT EVERY FLOWER IS UNIQUE AND CANNOT BE
CATEGORIZED 0ERHAPS "UT IT SEEMS MORE LIKELY THAT OUR DATA ISNT
APPROPRIATE FOR + MEANS -AYBE ITS THE WRONG SHAPE -AYBE ITS VALUES
aren’t varied enough. And indeed, when you look at the way in which
THE FLOWERS WERE CLUSTERED FOR n_clusters=3 , you see that the clustering
WAS QUITE DIFFERENT FROM WHAT PEOPLE CAME UP WITH ) CAN TURN THE
AUTOMATICALLY LABELED FLOWERS INTO A 0ANDAS 3ERIES AND THEN COUNT HOW
MANY OF EACH FLOWER WAS FOUND

Series(k.labels_).value_counts()

I get:

2        62  
1        50  
0        38

7ELL IT COULD BE WORSEˆBUT IT ALSO COULD BE MUCH BETTER 0ERHAPS YOU


CAN AND SHOULD TRY ANOTHER ALGORITHM AND SEE IF ITS BETTER ABLE TO GROUP
THE FLOWERS TOGETHER

43 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 43 2/18/17 10:18 AM


AT THE FORGE

) SHOULD NOTE THAT THIS NOW FALLS UNDER THE CATEGORY OF hSEMI SUPERVISED
LEARNINGvˆTHAT IS TRYING TO SEE WHETHER AN UNSUPERVISED TECHNIQUE CAN
achieve the same results, or at least similar results, to a previously used
SUPERVISED TECHNIQUE
In such a case, you can evaluate your model using not just statistical
TESTS BUT ALSO ONE OF THE TECHNIQUES ) DESCRIBED IN MY PREVIOUS ARTICLES ON
SUPERVISED LEARNING NAMELY TRAIN TEST SPLIT 9OU USE UNSUPERVISED LEARNING
ON A PORTION OF THE INPUT DATA AND THEN PREDICT ON THE REMAINING PART
#OMPARING THE MODELS OUTPUTS WITH THE EXPECTED OUTPUTS FOR THAT SUBSET
can help you evaluate and tune your model.

A Different Algorithm
"UT IN THIS CASE LETS TRY USING A DIFFERENT MODEL TO ACHIEVE A DIFFERENT
RESULT SIMPLY TO SEE HOW EASILY SKLEARN ALLOWS YOU TO TRY DIFFERENT MODELS
One common choice in unsupervised learning is Gaussian Mixture, known
IN PREVIOUS VERSIONS OF SCIKIT LEARN AS '-- ,ETS USE IT

from  sklearn.mixture  import  GaussianMixture  


model  =  GaussianMixture(n_components=3)  
model.fit(X)

Now, let’s have the model predict with the data used to train it, which
will return a NumPy array with the categories:

model.predict(X)

(OW DID THAT DO ,ETS POP THIS DATA INTO A 0ANDAS 3ERIES OBJECT AND
then count the values:

Series(model.predict(X)).value_counts()

And sure enough, the results:

2        55  
1        50  
0        45

44 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 44 2/18/17 10:18 AM


AT THE FORGE

4HIS IS STILL IMPERFECTˆASSUMING THAT THE HUMAN CLASSIFICATION COUNTS


AS hPERFECTv BUT ITS CLEARLY BETTER THAN THE ATTEMPTS WITH + MEANS !ND
BECAUSE THIS IS SEMI SUPERVISED LEARNING HERE IN WHICH YOU HAVE SOME
OF THE ORIGINAL SCORES YOU CAN USE SOME OF SKLEARNS METRICS TO FIND HOW
good (or bad) the model is:

from  sklearn  import  metrics  


labels_true  =  iris.target  
labels_pred  =  model.predict(X)

,ETS FIND OUT HOW WELL IT DID

metrics.homogeneity_score(labels_true,  labels_pred)  
0.89832636726027748  
 
metrics.completeness_score(labels_true,  labels_pred)  
0.90106489086402064

(EY PRETTY GOOD .OT PERFECT THAT IS  BUT NOT BAD AT ALL !ND IF
YOU COMPARE THIS AGAINST THE + MEANS MODEL

labels_pred  =  k.labels_  
metrics.homogeneity_score(labels_true,  labels_pred)  
0.75148540219883375  
 
metrics.completeness_score(labels_true,  labels_pred)  
0.76498615144898152

)N OTHER WORDS MY INTUITION WAS RIGHT 4HE 'AUSSIAN-IXTURE MODEL WAS


BETTER AT CLUSTERING THE FLOWERS THAN THE + MEANS MODEL

Conclusion
In many ways, unsupervised learning is the true magic and potential in
THE MACHINE LEARNING WORLD "Y USING COMPUTERS TO IDENTIFY PATTERNS
AND GROUPS IN YOUR DATA MORE QUICKLY AND ACCURATELY THAN YOU COULD DO
YOURSELF YOU CAN START TO IDENTIFY AND PREDICT ALL SORTS OF THINGS !S WITH

45 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 45 2/18/17 10:18 AM


AT THE FORGE

SUPERVISED LEARNING THOUGH UNSUPERVISED LEARNING REQUIRES THAT YOU TRY


A VARIETY OF MODELS COMPARE THEM AGAINST ONE ANOTHER AND UNDERSTAND
that each model has its own advantages, disadvantages and biases.
4HE WORLD OF DATA SCIENCE IN GENERAL AND MACHINE LEARNING IN PARTICULAR
CONTINUES TO GROW AT AN EXTREMELY RAPID RATE WITH NEW IDEAS TECHNIQUES
AND TUTORIALS AVAILABLE ALL OF THE TIME 4HE 2ESOURCES SECTION HERE
describes several places where you can learn more and start your journey
IN THIS SET OF CONCEPTS AND TECHNOLOGIES Q

RESOURCES
I used Python (http://python.org) and the many parts of the SciPy stack (NumPy, SciPy, Pandas,
Matplotlib and scikit-learn) in this article. All are available from PyPI (http://PyPI.python.org) or
from https://www.scipy.org.

I recommend a number of resources for people interested in data science and machine learning.

One long-standing weekly email list is “KDNuggets” at http://www.kdnuggets.com. You also


should consider the “Data Science Weekly” newsletter (https://www.datascienceweekly.org)
and “This Week in Data” (HTTPSDATAREPUBLICBLOGCOMCATEGORYTHIS WEEK IN DATA), describing
the latest data sets available to the public.

I am a big fan of podcasts, and I particularly love “Partially Derivative”. Other good ones
are “Data Stories” and “Linear Digressions”. I listen to all three on a regular basis and
learn from them all.

If you’re looking to get into data science and machine learning, I recommend Kevin
Markham’s Data School (http://dataschool.org) and Jason Brownlie’s “Machine Learning
Mastery” (http://machinelearningmastery.com), where he sells a number of short and dense,
but high-quality ebooks on these subjects.

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

46 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 46 2/18/17 10:18 AM


LJ275-March2017.indd 47 2/18/17 10:18 AM
WORK THE SHELL

Image
Manipulation DAVE TAYLOR

with Dave Taylor has been


hacking shell scripts

ImageMagick
on UNIX and Linux
systems for a really
long time. He’s the
author of Learning
Unix for Mac OS X
Dave switches gears this month and begins and Wicked Cool
delving into the more functional topic of Shell Scripts. You can
find him on Twitter
image manipulation. as @DaveTaylor,
or reach him through
his tech Q&A site: http://
www.AskDaveTaylor.com.
PREVIOUS NEXT
Reuven M. Lerner’s Kyle Rankin’s
V
V

At the Forge Hack and /

IN MY LAST ARTICLE, ) HAD SOME FUN LOOKING AT


THE CHILDRENS GAME OF ROCK PAPER SCISSORS WRITING A
SIMPLE SIMULATOR AND FINDING OUT THAT SOME STRATEGIES
ARE BETTER THAN OTHERS 9ES ) USED hSTRATEGYv AND
hROCK PAPER SCISSORSv IN THE SAME SENTENCE
3O FOR THIS ARTICLE ) THOUGHT IT WOULD BE INTERESTING
TO DELVE INTO SOMETHING MORE FUNCTIONAL AND
pragmatic: image manipulation. Ordinary shell scripts
don’t tend to do much with images because you can’t

48 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 48 2/18/17 10:18 AM


WORK THE SHELL

DISPLAY ANYTHING FROM THE COMMAND LINE


"UT LETS BE HONEST HERE 4HE CHANCE THAT YOURE RUNNING ,INUX
OR A SIMILAR COMMAND LINE INTERFACE RAW ON A COMPUTER TERMINAL IS
pretty darn low. More likely, you’ve got a terminal window open on
YOUR 8 SYSTEM OR LIKE ) OFTEN HAVE YOURE RUNNING A COMMAND LINE
INTERFACE APP WITHIN A MODERN /3 LIKE -AC /3 8 !ND THIS MEANS
yes, you do have the ability to display graphics, just not within the
TERMINAL APP ITSELF

Get Yourself a Copy of ImageMagick


4HE FIRST STEP IS TO DOWNLOAD AND INSTALL A COPY OF THE )MAGE-AGICK SUITE
OF GRAPHICS RELATED COMMANDS 9OU ALREADY MIGHT HAVE IT INSTALLED IF
you’re lucky: Just type convert  -­version AND IF YOU HAVE IT INSTALLED
you’ll see something similar to this:

$  convert  -­version  
Version:  ImageMagick  6.9.6-­6  Q16  x86_64  2016-­12-­31    
 ´http://www.imagemagick.org  
Copyright:  Copyright  (C)  1999-­2016  ImageMagick  Studio  LLC  
License:  http://www.imagemagick.org/script/license.php  
Features:  Cipher  DPC  Modules    
Delegates  (built-­in):  bzlib  djvu  fftw  fontconfig  freetype  gslib    
 ´jbig  jng  jp2  jpeg  lcms  ltdl  lzma  openexr  png  ps  tiff    
 ´webp  x  xml  zlib

)F YOU DONT HAVE IT INSTALLED IT CAN BE QUITE A TASK TO GET IT ALL UP AND
RUNNING %VERYTHING LIVES AT http://www.imagemagick.org, which is where
you want to get started.
/N A ,INUX SYSTEM YOU CAN USE THE PACKAGE MANAGER OF CHOICE FOR YOUR
DISTRO 9OU CAN GRAB A COMPRESSED TAR IMAGE FROM THE SITE OR YOU CAN USE
rpm , like this:

rpm  -­Uvh  ImageMagick-­7.0.4-­1.x86_64.rpm

/F COURSE THERES A BIT MORE TO IT BUT THATLL GET YOU STARTED


On a Mac, you’ll want to start by installing MacPorts (http://www.macports.org),

49 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 49 2/18/17 10:18 AM


WORK THE SHELL

WHICH YOU CANT DO UNTIL YOU INSTALL 8CODE FREE FROM !PPLE GET IT THROUGH
THE !PP 3TORE  /NCE YOUVE INSTALLED 8CODE AND -AC0ORTS YOU CAN INSTALL
ImageMagick, and you’re good to go.
You know you’re good to go when the test command convert  
-­version RETURNS SOMETHING MEANINGFUL !S ALWAYS WHEN YOU
INSTALL NEW SOFTWARE YOULL WANT TO LOG OUT AND LOG IN AGAIN FOR
THE 0!4( CHANGES AND SHELL COMMAND LINE HASH TO INCLUDE ALL THE
newest programs.

Converting Image Formats


/NE OF THE MOST USEFUL TASKS )MAGE-AGICK CAN HELP YOU WITH IS
CONVERTING IMAGE FILE FORMATS )TS A REMARKABLY WELL BUILT SUITE OF
PROGRAMS AND CAN READ OR WRITE MORE THAN  DIFFERENT FORMATS $ONT
BELIEVE ME 4RY THIS COMMAND

convert  -­list  format  |  more

!MONG THE MOST COMMON FORMATS THAT YOULL ACTUALLY ENCOUNTER IN YOUR
DAY TO DAY COMPUTER USAGE ARE THE FOLLOWING

Q BMP: MS Windows bitmapped image.

Q GIF: Graphics Interchange Format.

Q *0' *0%' IMAGE FORMAT

Q 0.' 0ROGRESSIVE .ETWORK 'RAPHIC FORMAT

Q 4)&& 4AGGED )MAGE &ILE &ORMAT

)MAGE-AGICK KNOWS OODLES OF OTHER FORMATS TOO INCLUDING ALL THE MAJOR
VIDEO FORMATS -+6 -0 !6) -/6  )T ALSO CAN CONVERT THINGS LIKE %03&
%NCAPSULATED 0OSTSCRIPT AND EVEN 0$& 0ORTABLE $OCUMENT &ORMAT
WHICH CAN BE USEFUL IN SPECIFIC INSTANCES
!RMED WITH THAT KNOWLEDGE CONVERSION BETWEEN IMAGE FILE FORMATS
IS REALLY RIDICULOUSLY SIMPLE ,ETS SAY YOU WANT TO CONVERT AN IMAGE FROM

50 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 50 2/18/17 10:18 AM


WORK THE SHELL

*0%' TO 0.' )TS AS SIMPLE AS

convert  image.jpeg  image.png

3INCE THE )MAGE-AGICK UTILITIES ARE GLOB AWARE THAT IS YOU CAN USE WILD CARDS
AND SPECIFY MULTIPLE FILENAMES YOU ALSO CAN CONVERT A GROUP OF ')& IMAGES
to JPG with the convert command or, more easily, with its cousin mogrify:

mogrify  -­format  jpg  *.gif

,ETS GIVE IT A WHIRL WITH A FOLDER THAT CONTAINS A HALF DOZEN ')& IMAGES
using ls TO SHOW THE FOLDER CONTENTS BEFORE AND AFTER THE MOGRIFICATION
IS THAT A WORD 

$  ls  -­s  
total  272  
   8  add-­to-­google-­reader.gif    24  blogger-­1.gif          
   8  dave.gif                                        8  add-­to-­newsgator.gif      
 24  blogger-­2.gif              176  manga.gif  
 16  aw-­logo.gif          8  blogger-­3.gif  
$  mogrify  -­format  jpg  *gif  
$  ls  -­s  
total  752  
   8  add-­to-­google-­reader.gif    24  blogger-­1.gif  
   8  dave.gif                                        8  add-­to-­google-­reader.jpg  
112  blogger-­1.jpg                  8  dave.jpg  
   8  add-­to-­newsgator.gif              24  blogger-­2.gif  
176  manga.gif                                      8  add-­to-­newsgator.jpg    
128  blogger-­2.jpg              168  manga.jpg  
 16  aw-­logo.gif          8  blogger-­3.gif  
 24  aw-­logo.jpg        24  blogger-­3.jpg

3IMPLE ENOUGH 5SE convert FOR INDIVIDUAL IMAGES AND mogrify FOR
BULK CONVERSIONS )TD BE AN EASY SCRIPT TO DIFFERENTIATE BETWEEN THESE
two cases and invoke the correct command with the correct arguments
TOO )LL LEAVE THAT UP TO YOU

51 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 51 2/18/17 10:18 AM


WORK THE SHELL

Checking Image Sizes


!NOTHER USEFUL FEATURE OF THE )MAGE-AGICK SUITE IS TO BE ABLE TO
IDENTIFY THE DIMENSIONS OF A GRAPHIC IMAGE 4HE LATEST VERSION OF
the file COMMAND CAN OFFER THIS INFORMATION ON SOME SYSTEMS

$  file  manga*  
manga.gif:  GIF  image  data,  version  89a,  358  x  313  
manga.jpg:  JPEG  image  data,  JFIF  standard  1.01,    
 ´aspect  ratio,  density  1x1,  segment  length  16,    
 ´baseline,  precision  8,  358x313,  frames  3  
manga.png:  PNG  image  data,  358  x  313,  8-­bit/color  RGB,    
 ´non-­interlaced

"UT ON MOST ,INUX SYSTEMS ONE OR MORE OF THESE WOULD EXCLUDE


the actual dimensions. Further, look closely at the above output, and
YOULL SEE ITS QUITE INCONSISTENT MAKING IT DIFFICULT TO PARSE OUT THE
DIMENSIONS IF YOU DONT ENCODE SPECIFIC RULES FOR EACH FORMATˆWHICH
is, uh, lame.
Instead, you can glean image size with the identify command, as
shown here:

manga.gif  GIF  358x313  358x313+0+0  8-­bit  sRGB  256c  88.5KB    


 ´0.000u  0:00.000  
manga.jpg  JPEG  358x313  358x313+0+0  8-­bit  sRGB  85.4KB  0.000u    
 ´0:00.000  
manga.png  PNG  358x313  358x313+0+0  8-­bit  sRGB  266KB  0.000u    
 ´0:00.000

4HATS BETTER )TS CONSISTENTLY THE THIRD PARAMETER WHICH MEANS THAT
a simple script can strip out everything but the image dimensions:

$  for  image  in  manga*;;  do      identify  $image  |  cut  -­f1,3  -­d\    ;;  done  
manga.gif  358x313  
manga.jpg  358x313  
manga.png  358x313

52 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 52 2/18/17 10:18 AM


WORK THE SHELL

%ASY ENOUGH AND NOTICE THAT THE cut command is invoked both with
A SPACE AS THE DEFAULT FIELD DELIMITER AND SPECIFYING THAT YOU WANT FIELD
 AND  BUT NONE OF THE OTHERS

And Next Month...


/KAY )MAGE-AGICK IS COMPLICATED )N FACT ) DIDNT REALLY GET MUCH INTO
scripting this month. But, come back next month; I’ll explain how to
TURN ALL THIS KNOWLEDGE OF convert , mogrify and identify into some
PRETTY SICK SCRIPTS 3EE YOU THEN Q

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

53 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 53 2/18/17 10:18 AM


HACK AND /

Sysadmin 101:
Automation KYLE RANKIN
Approach automation in the right way, and you
might find you’ve automated yourself out of a job. Kyle Rankin is a Sr.
Systems Administrator
in the San Francisco
NEXT Bay Area and the author
PREVIOUS
Shawn Powers’ of a number of books,
Dave Taylor’s

V
V

The Open-Source including The Official


Work the Shell
Classroom Ubuntu Server Book,
Knoppix Hacks and
Ubuntu Hacks. He is
currently the president
THIS IS THE SECOND IN A SERIES OF ARTICLES
of the North Bay Linux
ON SYSTEMS ADMINISTRATOR FUNDAMENTALS.
Users’ Group.
4HESE DAYS $EV/PS HAS MADE EVEN THE JOB TITLE
hSYSTEMS ADMINISTRATORv SEEM A BIT ARCHAIC MUCH LIKE
THE hSYSTEMS ANALYSTv TITLE IT REPLACED 4HESE $EV/PS
POSITIONS ARE RATHER DIFFERENT FROM SYSADMIN JOBS IN THE
PAST 4HEY HAVE A MUCH LARGER EMPHASIS ON SOFTWARE
DEVELOPMENT FAR BEYOND BASIC SHELL SCRIPTING AND AS
A RESULT THEY OFTEN ARE FILLED BY PEOPLE WITH SOFTWARE
development backgrounds without much prior sysadmin
experience. In the past, a sysadmin would enter the role
at a junior level and be mentored by a senior sysadmin
on the team, but in many cases currently, companies go
QUITE A WHILE WITH CLOUD OUTSOURCING BEFORE THEIR FIRST
DevOps hire. As a result, the DevOps engineer might
be thrust into the role at a junior level with no mentor

54 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 54 2/18/17 10:18 AM


HACK AND /

AROUND APART FROM SEARCH ENGINES AND 3TACK /VERFLOW POSTS


)N THIS SERIES )M GOING TO EXPOUND ON SOME OF THE LESSONS )VE LEARNED
through the years that might be obvious to longtime sysadmins but may
be news to someone just coming into this position.
)N THE FIRST ARTICLE IN THIS SERIES ) TALKED ABOUT HOW TO APPROACH ALERTING
AND ON CALL ROTATIONS AS A SYSADMIN )N THIS ARTICLE ) DISCUSS HOW TO
AUTOMATE YOURSELF OUT OF YOUR JOB 4HERE IS A QUOTE THAT YOU SEE FROM
TIME TO TIME IN SYSADMIN CIRCLES THAT GOES SOMETHING ALONG THE LINES OF
h"E CAREFUL OR ) WILL REPLACE YOU WITH A TINY SHELL SCRIPTv 'OOD SYSTEM
ADMINISTRATORS HATE PERFORMING MUNDANE TASKS AND CONSTANTLY SEEK TO
APPLY THAT SAYING TO THEMSELVES 4HAT SAID THERE ARE MANY DIFFERENT
APPROACHES TO AUTOMATION AND NOT ALL OF THEM RESULT IN A TIME SAVINGS
Here, I discuss my experience with automation and describe what, when,
why and how you should (and shouldn’t) automate.

Why You Should Automate


4HERE ARE A NUMBER OF DIFFERENT REASONS WHY YOU SHOULD TAKE STEPS TO
automate your work as a sysadmin:
1) It frees up time spent doing mundane tasks to focus on
more important work. 7ITH ALL OF THE AUTOMATION THATS ALREADY BUILT
IN TO SERVERS THESE DAYS ITS EASY TO TAKE FOR GRANTED JUST HOW MANY
MUNDANE TASKS SYSADMIN HAVE HAD TO PERFORM IN THE PAST ,OGS WERENT
ALWAYS ROTATED AUTOMATICALLY BACKUPS USUALLY WERE HOME GROWN AFFAIRS
THAT OFTEN WERE TRIGGERED MANUALLY %VEN NOW THERE STILL ARE SYSTEM
administrators who install every single server by hand, log in to a
MACHINE MANUALLY AND INSTALL OR UPDATE SOFTWARE AND CONFIGURE SERVER
CONFIGURATION FILES ON THE HOST BY HAND
,ETS TAKE SERVER /3 INSTALLATION AS AN EXAMPLEˆA MODERN INTERACTIVE
SERVER /3 INSTALLATION MAY TAKE ANYWHERE FROM  MINUTES TO AN HOUR OF
SYSADMIN TIME TO WALK THROUGH AND ANSWER QUESTIONS 4HESE ARE THE KINDS
OF ACTIONS THAT DONT REALLY REQUIRE A SYSADMINS EXPERTISE ONCE YOUVE
made the initial decisions about how you want a server to be set up. By
AUTOMATING THESE MUNDANE TASKS YOU CAN GET BACK TO THE MORE DIFFICULT
WORK THAT DOES REQUIRE YOUR EXPERTISE
2) Automation reduces mistakes in routine tasks. 4HE THING ABOUT
PERFORMING THE SAME TASK OVER AND OVER BY HAND IS THAT IT IS EASY TO MAKE

55 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 55 2/18/17 10:18 AM


HACK AND /

MISTAKES AND IF ITS SOMETHING YOU DO EVERY DAY EVENTUALLY YOU EVEN MAY
stop paying attention to whether your task succeeded. Also, the way that
YOU MAY PERFORM A CERTAIN TASK MIGHT BE A LITTLE BIT DIFFERENT FROM HOW
A DIFFERENT ADMINISTRATOR ON THE TEAM DOES IT "Y AUTOMATING A TASK THE
TEAM CAN AGREE ON THE IDEAL WAY TO PERFORM IT AND KNOW THAT WHEN YOU
RUN YOUR AUTOMATION SCRIPT IT IS PERFORMED THE SAME WAY EVERY SINGLE
time with no skipped steps or commands run in the wrong order.
3) Automation allows everyone on the team to be productive.
With automation, you can take even a complex process and reduce it down
TO A COMMAND 4HAT COMMAND THEN BECOMES SOMETHING THAT ANYONE ON
THE TEAM CAN RUN WHEREAS THE COMPLEX PROCESS MAY HAVE REQUIRED MORE
SENIOR MEMBERS OF THE TEAM &OR INSTANCE IF YOU TAKE PRODUCTION SOFTWARE
DEPLOYMENT AS AN EXAMPLE OFTEN THERE CAN BE A COMPLEX ARRANGEMENT OF
TRIGGERING LOAD BALANCER AND MONITORING MAINTENANCE MODES SOFTWARE
VERSIONS TO CHECK MIRRORS TO SYNC UP AND SERVICES TO RESTART AND TEST %VEN
though these individual steps may be mundane, combined, they become pretty
COMPLICATED AND COULD OVERWHELM A JUNIOR MEMBER OF THE TEAMˆESPECIALLY
when production uptime hangs in the balance. By automating that process,
SENIOR ADMINISTRATORS CAN PUT ALL OF THEIR EXPERTISE INTO CREATING THE RIGHT
PROCESS THAT PERFORMS THE RIGHT CHECKS AND THEY CAN GO ON VACATION KNOWING
THAT ANYONE ELSE ON THE TEAM NOW CAN PERFORM THE TASK THE RIGHT WAY
4) Automation reduces documentation workload. /FTEN INSTEAD
OF AUTOMATING A TASK A SYSADMIN TEAM WILL SPEND TIME DOCUMENTING A
PROCESS 4HERE IS STILL AN IMPORTANT PLACE FOR DOCUMENTATION AND IN THE
NEXT SECTION ) DISCUSS WHEN THAT MAKES SENSE AND WHEN IT DOESNT 4HE
FACT IS THOUGH IF YOU TAKE TAKE AN ENTIRE PROCESS AND PUT IT INTO A SINGLE
AUTOMATED TASK YOU NO LONGER NEED A FULL WIKI PAGE OF DOCUMENTATION
THAT INEVITABLY WILL BECOME OUT OF DATE BECAUSE YOUVE REDUCED IT DOWN
TO hRUN THIS COMMANDv "ECAUSE THE PROCESS IS NOW AUTOMATED YOU ALSO
know the process is kept up to date; otherwise, the script wouldn’t work.

What You Should Automate


.OT EVERYTHING IS APPROPRIATE FOR AUTOMATION AND EVEN THINGS THAT MAY
BE GOOD CANDIDATES FOR AUTOMATION MAY NOT BE GOOD CANDIDATES TODAY
THE NEXT SECTION COVERS WHEN YOU SHOULD AUTOMATE  &OLLOWING ARE A FEW
DIFFERENT TYPES OF TASKS THAT MAKE GOOD CANDIDATES FOR AUTOMATION

56 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 56 2/18/17 10:18 AM


HACK AND /

1) Routine tasks. )N GENERAL TASKS THAT YOU PERFORM FREQUENTLY AT LEAST
MONTHLY ARE GOOD CANDIDATES FOR AUTOMATION 4HE MORE FREQUENT THE
TASK IN THEORY THE MORE TIME SAVINGS YOU WOULD GET FROM AUTOMATING IT
4ASKS THAT YOU PERFORM ONLY ONCE A YEAR MAY NOT BE WORTH THE EFFORT TO
BUILD AUTOMATION AROUND AND INSTEAD THOSE ARE THE KINDS OF TASKS THAT
BENEFIT FROM GOOD DOCUMENTATION
2) Repeatable tasks. )F YOU COULD DOCUMENT A PROCESS AS A SERIES OF
commands, and then copy and paste them one by one in a terminal and
the task would be complete, that’s a repeatable task that may be a good
CANDIDATE FOR AUTOMATION /N THE OTHER HAND ONE OFF TASKS THAT HAVE
custom inputs or are something you may never have to do again aren’t
WORTH THE TIME AND EFFORT TO AUTOMATE
3) Complex tasks. 4HE MORE COMPLEX A TASK THE MORE OPPORTUNITIES
YOU HAVE FOR MISTAKES IF YOU DO IT MANUALLY )F A TASK HAS MULTIPLE STEPS IN
PARTICULAR STEPS THAT REQUIRE YOU TO TAKE THE OUTPUT FROM ONE STEP AND USE
IT AS INPUT FOR ANOTHER OR STEPS THAT USE COMMANDS WITH A COMPLEX STRING
OF ARGUMENTS ARE ALL GREAT CANDIDATES FOR AUTOMATION
4) Time-consuming tasks. 4HE LONGER THE TASKS TAKE TO COMPLETE
ESPECIALLY IF THERE ARE PERIODS OF RUNNING A COMMAND WAITING FOR IT TO
complete, and then doing something with that command’s output), the
BETTER A CANDIDATE IT IS FOR AUTOMATION /3 INSTALLATION AND CONFIGURATION
IS A GREAT EXAMPLE OF THIS AS WHEN YOU INSTALL AN /3 THERE ARE PERIODS
WHEN YOU ENTER INSTALLATION SETTINGS AND PERIODS WHEN YOU WAIT FOR THE
INSTALLATION TO COMPLETE !LL OF THAT WAITING IS WASTED TIME "Y AUTOMATING
LONG RUNNING TASKS YOU CAN GO DO SOME OTHER WORK AND COME BACK TO THE
AUTOMATION OR BETTER HAVE IT ALERT YOU TO SEE IF IT IS COMPLETE

When You Should Automate


-Y COWORKERS KNOW THAT ) ENJOY AUTOMATING MYSELF OUT OF MY JOB
and sometimes in the past they have been surprised to learn that I
HAVENT AUTOMATED A TASK THAT BY ALL MEASURES IS A PRIME CANDIDATE FOR
AUTOMATION -Y ANSWER IS USUALLY h/H ) PLAN TO )M JUST NOT READY YETv
4HE FACT IS THAT EVEN IF YOU HAVE A TASK THAT IS A GREAT CANDIDATE FOR
automation, it may not necessarily be the right time to automate it.
7HEN ) NEED TO PERFORM A NEW TASK THATS A SERIES OF MUNDANE MANUAL
STEPS ) LIKE TO FORCE MYSELF TO PERFORM IT STEP BY STEP AT LEAST A FEW TIMES

57 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 57 2/18/17 10:18 AM


HACK AND /

hIN THE WILDv BEFORE ) START AUTOMATING IT ) FIND ) USUALLY NEED TO PERFORM
A TASK A FEW TIMES TO UNDERSTAND WHERE AUTOMATION MAKES THE MOST
SENSE WHAT AREAS OF THE TASK MAY REQUIRE EXTRA ATTENTION AND WHAT SORTS
OF VARIABLES ) MIGHT ENCOUNTER FOR THE TASK /THERWISE IF ) JUST CHARGE
AHEAD AND WRITE A SCRIPT ) MAY FIND YOURSELF REWRITING IT FROM SCRATCH A
FEW WEEKS LATER BECAUSE ) DISCOVER THE PROCESS NEEDS TO BE ADAPTED TO A
NEW VARIATION OF THE TASK )F )M NOT QUITE SURE ABOUT PARTS OF A PROCESS )
MAY AUTOMATE ONLY THE PARTS ) AM SURE OF FIRST AND GET THOSE RIGHT ,ATER
ON WHEN THE REST OF THE PROCESS STARTS TO GEL IN MY MIND ) THEN GO BACK
and incorporate it into the automation I’ve already completed.
) ALSO AVOID AUTOMATING TASKS IF )M NOT SURE ) CAN DO SO SECURELY
&OR INSTANCE A NUMBER OF ORGANIZATIONS ARE BIG FANS OF USING #HAT/PS
AUTOMATING TASKS USING BOTS INSIDE A CHATROOM FOR AUTOMATION !LTHOUGH
) KNOW THAT MANY BOTS CAN AUTHENTICATE TASKS BEFORE THEY PERFORM
THEM ) STILL WORRY ABOUT THE POTENTIAL FOR ABUSE WITH A SERVICE THATS
USUALLY SHARED ACROSS THE WHOLE COMPANY NOT TO MENTION THE FACT THAT
production changes are being triggered by a host outside the production
environment. With my current threat model, I have to maintain strict
separation between development and production environments, so having
a bot accessible to anyone in the company, or having a Jenkins continuous
INTEGRATION SERVER IN THE DEVELOPMENT ENVIRONMENT PERFORMING MY
PRODUCTION TASKS JUST DOESNT WORK )N MANY CASES ) HAVE FULLY AUTOMATED
TASKS UP TO THE POINT THAT IT STILL REQUIRES AN ADMINISTRATOR WITH THE PROPER
access to go to the production environment (thereby proving that they are
AUTHORIZED TO BE THERE BEFORE THEY PUSH hTHE BUTTONv

How You Should Automate


3INCE THE WHOLE GOAL OF AUTOMATION IS TO SAVE TIME ) DONT LIKE TO WASTE
TIME REFACTORING MY AUTOMATION )F ) DONT FEEL LIKE ) UNDERSTAND A PROCESS
and its variables well enough to automate it, I wait until I do or automate
ONLY THE PARTS ) FEEL GOOD ABOUT )N GENERAL )M A BIG FAN OF BUILDING A
FOUNDATION OF FINISHED WORK THAT ) THEN BUILD UPON ) LIKE TO START WITH
AUTOMATING TASKS THAT WILL GIVE ME THE BIGGEST TIME SAVINGS OR ENCOURAGE
THE MOST CONSISTENCY AND THEN BUILD OFF THEM
) LIKE DOING THE HARD WORK UP FRONT SO THAT ITS EASIER DOWN THE ROAD AND
THAT IS WHY ) AM A BIG FAN OF CONFIGURATION MANAGEMENT TO AUTOMATE SERVER

58 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 58 2/18/17 10:18 AM


HACK AND /

CONFIGURATION /NCE SOMETHING LIKE THAT IS IN PLACE ROLLING OUT CHANGES TO


CONFIGURATION BECOMES TRIVIAL AND CREATING NEW SERVERS THAT MATCH EXISTING
ONES SHOULD BE EASY 4HESE BIG TASKS MAY TAKE TIME UP FRONT BUT THEY
PROVIDE HUGE COST SAVINGS FROM THEN ON SO ) TRY TO AUTOMATE THEM FIRST
) ALSO FAVOR AUTOMATION TASKS THAT CAN BE USED IN MULTIPLE WAYS DOWN THE
road. For instance, I think all administrators these days should have a simple,
AUTOMATED WAY TO QUERY THEIR ENVIRONMENT FOR WHETHER A PACKAGE IS INSTALLED
and on what hosts, and then be able to update that package easily on the
HOSTS THAT HAVE IT 3OME ADMINISTRATORS REFER TO THIS AS PART OF ORCHESTRATION
A SUBJECT ) COVERED A FEW MONTHS BACK IN A SERIES ON -#OLLECTIVE
Package updates are something that sysadmins do constantly both
FOR IN HOUSE SOFTWARE THAT CHANGES FREQUENTLY AND SYSTEM SOFTWARE THAT
NEEDS SECURITY UPDATES )F A SECURITY UPDATE IS A BURDEN MANY SYSADMIN
won’t bother. Having automation in place to make package updates easy
MEANS ADMINISTRATORS SAVE TIME ON A TASK THEY HAVE TO PERFORM FREQUENTLY
3YSADMINS THEN CAN USE THAT AUTOMATED PACKAGE UPDATE PROCESS BOTH FOR
SECURITY PATCHES IN HOUSE SOFTWARE DEPLOYMENTS AND OTHER TASKS WHERE
PACKAGE UPDATES ARE JUST ONE COMPONENT OF MANY
!S YOU WRITE YOUR AUTOMATION BE CAREFUL TO CHECK THAT YOUR TASKS
SUCCEEDED AND IF NOT ALERT THE SYSADMIN TO THE PROBLEM 4HAT MEANS
SHELL SCRIPTS SHOULD CHECK FOR EXIT CODES AND ERROR LOGS SHOULD BE
FORWARDED SOMEWHERE THAT GETS THE ADMINISTRATORS ATTENTION )TS ALL TOO
EASY TO AUTOMATE SOMETHING AND FORGET ABOUT IT BUT THEN CHECK BACK
WEEKS LATER AND DISCOVER IT STOPPED WORKING
)N GENERAL APPROACH AUTOMATION AS A WAY TO FREE UP YOUR BRAIN TIME AND
EXPERTISE TOWARD TASKS THAT ACTUALLY NEED THEM &OR ME ) FIND THAT MEANS
TIME SPENT IMPROVING AUTOMATION AND OTHERWISE DEALING WITH EXCEPTIONSˆ
THINGS THAT FALL OUTSIDE THE NORMAL DAY )F YOU KEEP IT UP YOU EVENTUALLY
WILL FIND THAT WHEN THERE ARE NO CRISES OR NEW PROJECTS THE DAY TO DAY WORK
should be automated to the point that your task is just to keep an eye on
YOUR WELL OILED MACHINE TO MAKE SURE
EVERYTHINGS RUNNING 4HAT IS WHEN Send comments or feedback via

YOU KNOW YOU HAVE REPLACED YOURSELF http://www.linuxjournal.com/contact


or to [email protected].
with a shell script. Q

RETURN TO CONTENTS

59 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 59 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

The Post-TV
Age? SHAWN
I have lots of streaming packages, but I just can’t POWERS
seem to cut the cord! Shawn Powers is the
Associate Editor for
Linux Journal. He’s
also the Gadget Guy
for LinuxJournal.com,
PREVIOUS and he has an
NEXT
Kyle Rankin’s

V
V

interesting collection
New Products of vintage Garfield
Hack and /
coffee mugs. Don’t
let his silly hairdo
fool you, he’s a
pretty ordinary guy
and can be reached
THE MOST BASIC CABLE PACKAGE FROM CHARTER via email at
( SPECTRUM? ) COSTS ME MORE THAN $70 PER [email protected].
MONTH, AND THATS WITHOUT ANY EQUIPMENT OTHER Or, swing by the
#linuxjournal IRC
than a single cable card. It’s very clear why people channel on
HAVE BEEN CUTTING THE CORD WITH CABLE 46 COMPANIES Freenode.net.
"UT WHAT OPTIONS EXIST $O THE ALTERNATIVES ACTUALLY
COST LESS !RE THE ALTERNATIVES AS GOOD )VE BEEN
TRYING TO FIGURE THAT OUT FOR A FEW MONTHS NOW AND
THE RESULTS )T DEPENDS
4HE IDEA OF CORD CUTTING ISNT NEW &OR YEARS PEOPLE
have been severing their ties with cable companies in
ORDER TO SAVE MONEY 4HE EVER PERSISTENT QUESTION IS
THIS HOW DO THE OPTIONS COMPARE

Real Time or On Demand?


7HEN REPLACING CABLE 46 THERE ARE TWO MAIN TYPES

60 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 60 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

OF MEDIA IN QUESTION 3ERVICES LIKE .ETFLIX !MAZON 0RIME AND (ULU ARE
GREAT BUT THEY DONT PROVIDE LIVE TELEVISION )N FACT DEPENDING ON THE
show and service, you might need to wait until the next day or even the
END OF A SEASON BEFORE YOUR DESIRED SHOWS ARE AVAILABLE 9OU USUALLY GET
THE ADVANTAGE OF NO COMMERCIALS BUT THE WAITING OFTEN IS UNBEARABLE IF
YOURE INTO TELEVISION SHOWS THAT END WITH CLIFFHANGERS
)T IS INTERESTING THOUGH NOW THAT .ETFLIX AND !MAZON HAVE BEEN SO
SUCCESSFUL WITH THEIR STREAMING SERVICES THEYRE BEGINNING TO GET THEIR
OWN EXCLUSIVE SHOWS 4HIS MEANS THAT NOT ONLY ARE THE SHOWS NOT DELAYED
BUT THEYRE ALSO ACTUALLY NOT AVAILABLE AT ALL VIA CABLE 46 !DMITTEDLY THAT
PHENOMENON IS FAIRLY NEW ONLY THE LAST FEW YEARS BUT IT MAKES THE CASE
FOR STREAMING FAR STRONGER 7HY PAY  PER MONTH AND STILL NOT GET TO
watch Jessica Jones
!LSO MANY INDIVIDUAL STATIONS ARE STARTING TO OFFER THEIR OWN STREAMING
OPTIONS SO THE DAYS OF PAYING FOR CABLE SO YOU CAN SEE A PARTICULAR ("/
SHOW ARE OVER "ROADCAST NETWORKS ARE STARTING TO OFFER STREAMING OPTIONS
TOO SO IF YOURE JUST LOOKING FOR THE ABILITY TO WATCH PARTICULAR TELEVISION
SHOWS EVEN PAYING FOR MULTIPLE ONLINE ACCOUNTS IS CHEAPER THAN PAYING
FOR CABLEˆUSUALLY

All Those Cable Channels...


3OME OF THE BIGGEST HURDLES FOR CORD CUTTERS ARE CABLE ONLY CHANNELS )
have a relative who watches only shows on the History Channel. And my
MOTHER IN LAW COULDNT LIVE WITHOUT WATCHING MOVIES ON THE (ALLMARK
#HANNEL !ND EVERYONE ) KNOW IN REAL LIFE IS ADDICTED TO ('46 AND ITS
TINY HOUSE PROGRAM 4HOSE CHANNELS ARENT BIG ENOUGH TO SUPPORT A FULL
STREAMING PLATFORM OR ARE OWNED BY ACTUAL CABLE COMPANIES SO THEY
WONT OFFER A NON CABLE ALTERNATIVE  3O WHATS A CORD CUTTER TO DO
5NTIL RECENTLY NOT MUCH .OW HOWEVER THERE ARE THREE REALLY GOOD
OPTIONS FOR STREAMING CABLE TELEVISION STATIONS AND ONE IS ALMOST
REALLY GOOD 4HOSE OPTIONS ARENT EXACTLY CHEAP AND THEYRE MOSTLY
53 ONLY BUT THEYRE FAR LESS EXPENSIVE THAN CABLE 46 4HE THREE
OPTIONS EACH HAVE THEIR QUIRKS BUT ANY OF THEM ARE WORTH LOOKING
INTO IF YOU HAVE RELIABLE INTERNET SPEEDS THAT ARENT DEPENDENT ON
CABLE 46 BUNDLING #URRENTLY THE THREE MAIN OPTIONS ARE 3LING 46
0LAY3TATION 6UE AND $IREC46 .OW

61 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 61 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

Streaming Cable: Sling TV


3LING 46 HAS BEEN AROUND THE LONGEST AND IS OWNED BY "LOCKBUSTER YES
that "LOCKBUSTER WHO IN TURN IS A SUBSIDIARY OF $ISH .ETWORK )T HAS
A LARGE LINEUP OF CABLE STATIONS AND SEVERAL TIERS OF OPTIONS THAT INCLUDE
packages like premium cable channels. Depending on promotions and
WHERE YOU LIVE THE PACKAGES RANGE FROM n PER MONTH )F YOU LIVE
in a big city, you also might get local broadcast stations (NBC, ABC, CBS,

Figure 1. Sling TV has been around a long time, but the lack of DVR and video glitches
make it less than stellar in my experience.

62 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 62 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

0"3 &/8 BUT FOR MOST OF THE COUNTRY YOU GET THOSE CHANNELS ONLY hON
DEMANDv WHICH MEANS RECORDINGS OF POPULAR SHOWS THE NEXT DAY
4HE TECHNOLOGY DETAILS OF 3LING 46 ARE A LITTLE CONFUSING )F YOU SUBSCRIBE
to the lowest tier, you can stream only one channel per account at a time.
4HAT MEANS IF YOU ARE WATCHING 46 IN YOUR LIVING ROOM YOU CANT WATCH
SOMETHING ELSE ON YOUR PHONE )F YOU SUBSCRIBE TO A HIGHER PRICED TIER
you can have up to three streams at once. Also, although the streams
USUALLY ARE GOOD QUALITY MY ANECDOTAL EXPERIENCE SHOWS THAT THERE ARE A
FEW MORE ARTIFACTS AND GLITCHES WITH 3LING 46 THAN WITH THE OTHER OPTIONS
but nothing that makes it a showstopper. (I get glitches with my cable
TELEVISION TOO SO NOTHING IS PERFECT
4HERES A FREE TRIAL WITH 3LING 46 SO ITS WORTH CHECKING OUT *UST BE SURE
TO CANCEL IT BEFORE YOUR CREDIT CARD AUTO RENEWS AT THE END OF THE TRIAL
UNLESS YOU DECIDE TO KEEP IT !LSO BECAUSE ITS BEEN AROUND FOR A LONG
TIME 3LING 46 HAS APPS ON MULTIPLE PLATFORMS 8BOX USERS CAN INSTALL 3LING
46 ALONG WITH !NDROID 46 AND 2OKU USERS ,IKE MOST STREAMING SERVICES
2OKU DOES A GREAT JOB OF STAYING VENDOR NEUTRAL WHICH MEANS IT USUALLY
CAN PROVIDE SERVICES REGARDLESS OF WHO IS PROVIDING THEM

PlayStation Vue
0LAY3TATION 6UE IS A BIT MORE OF A SURPRISE SINCE 3ONY 0LAY3TATION
IS SYNONYMOUS WITH GAMING RATHER THAN TELEVISION )TS OFFERINGS ARE
IMPRESSIVE HOWEVER 4HE LINEUPS ARE SIMILAR TO 3LING 46 BUT THE
BREAKDOWNS ARE A LITTLE DIFFERENT 4HE LOWEST PRICE SERVICE IS AROUND
 PER MONTH WITH OTHER TIERS AVAILABLE THAT ADD MORE CHANNELS
3ONY GIVES YOU A PRICE BREAK IF YOURE NOT IN ONE OF THE CITIES THAT HAS
LOCAL CHANNELS AVAILABLE SO FOR ME IN RURAL -ICHIGAN ITS CHEAPER THAN
IF ) LIVED IN #HICAGO 4HAT MEANS ) DONT GET LOCAL CHANNELS THOUGH
WHICH IS FRUSTRATING
!LTHOUGH THE SLIGHTLY HIGHER PRICE SEEMS FRUSTRATING THE TECHNOLOGY
INCLUDED MIGHT MAKE UP FOR IT .OT ONLY CAN YOU STREAM TO FIVE DEVICES
SIMULTANEOUSLY BUT IT ALSO PROVIDES h#LOUD $62v WHICH AUTOMATICALLY
STORES RECORDED CONTENT FOR YOU !LL YOU NEED TO DO IS MARK A PROGRAM
AS A FAVORITE AND ALL EPISODES ARE SAVED FOR  DAYS )TS NOT POSSIBLE TO
SCHEDULE A TIMED EVENT BUT THE $62 FEATURE IS EXTREMELY NICE AND IT
PROVIDES A FAR BETTER EXPERIENCE THAN THE LIVE ONLY 3LING 46

63 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 63 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

Figure 2. PlayStation Vue is remarkable, until it’s not. The video quality is amazing, and
the DVR is superb. The geolocation frustrations along with PS4 console problems make it
difficult to love.

4HE VIDEO QUALITY WITH 0LAY3TATION 6UE IS SHOCKINGLY GOOD 7HETHER


watching on a mobile device, a Roku or a PlayStation system, the video is
FAR MORE RELIABLE IN MY ANECDOTAL TRIALS 4HE FIVE STREAMS MEANS PEOPLE CAN
WATCH 46 IN MULTIPLE ROOMS AND SINCE 6UE ALLOWS FOR INDIVIDUAL PROFILES
DIFFERENT FAMILY MEMBERS CAN HAVE THEIR OWN $62D SHOWS 4HE ONLY REALLY

64 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 64 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

BIG ISSUE )VE HAD WITH 0LAY3TATION 6UE IS THAT ITS NOT POSSIBLE TO WATCH
STREAMS FROM THE SAME ACCOUNT ON TWO DIFFERENT 0LAY3TATION  CONSOLES )
HAVE A CONSOLE IN MY OFFICE AND A 0LAY3TATION 0RO IN THE LIVING ROOM AND ITS
NOT POSSIBLE TO WATCH 6UE ON BOTH DEVICES 4HAT IS PARTICULARLY FRUSTRATING
BECAUSE WATCHING ON MULTIPLE 2OKU UNITS WORKS FINE BUT NOT ON THE ACTUAL
3ONY HARDWARE 4HERES ALSO SOME FRUSTRATION WITH GEOLOCATION 3ONY OFTEN
THINKS )M NOT HOME SO IT LIMITS WHAT ) CAN WATCH ) WOULD UNDERSTAND IF MY
IP address changed, but I have a static IP address and I’m always connecting
FROM HOME 3EE THE NOTICE IN &IGURE 

DirecTV Now
$IREC46 .OW IS THE NEW KID ON THE BLOCK WHEN IT COMES TO CABLE 46
STREAMING 4HE PACKAGES ARE SIMILAR TO THE OTHER SERVICES ) MENTIONED

Figure 3.
DirecTV Now is
the new kid on
the block. The
$35/month is
a trial cost and
likely will increase
before this article
is published.

65 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 65 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

WITH SOME INITIAL LOW PRICED OPTIONS AVAILABLE TO ENTICE USERS AWAY .OTE
WITH ALL THESE SERVICES BEING CONTRACT FREE THE POTENTIAL FOR MOVING IN
ORDER TO SAVE A FEW BUCKS IS VERY LEGITIMATE $IREC46 .OW HAS SIMILAR
LIMITATIONS REGARDING LIVE BROADCAST STATIONS THAT IS AT THE TIME OF THIS
WRITING THERE ARENT ANY AVAILABLE BUT $IREC46 .OW HAS THE ADDITIONAL
LIMITATION THAT EVEN ON DEMAND CONTENT FROM #"3 ISNT AVAILABLE 4HE
KERFUFFLE THAT $IREC46 AND #"3 HAVE BEEN HAVING EXTENDS TO THE STREAMING
service as well.
) HAVENT PERSONALLY USED THE $IREC46 .OW SERVICE BECAUSE NONE OF MY
DEVICES CURRENTLY ARE SUPPORTED !PPLE 46 IS ITS MAIN DEVICE AND YOU CAN
GET ONE FREE IF YOU PRE PAY FOR THREE MONTHS OF SERVICE ) HAVE FRIENDS
WHOVE USED IT THOUGH AND THEY SAY THE QUALITY IS VERY GOOD ,IKE 3LING
46 HOWEVER IT DOESNT CURRENTLY HAVE ANY $62 CAPABILITY
3INCE $IREC46 .OW IS NEW ITS NOT FAIR TO CRITICIZE ITS LACK OF HARDWARE
SUPPORT YET 2OKU STREAMING IS SLATED FOR 1  AND ITS POSSIBLE
OTHER NON COMPETITORS WILL GET APPS AS WELL !S IS USUALLY THE CASE
2OKU LIKELY WILL BE ONE OF THE PREMIERE WAYS TO WATCH STREAMING CABLE
46 SERVICE BECAUSE ITS COMPATIBILITY WILL ALLOW FOR SERVICE HOPPING
without hardware reinvestment.

USTVnow, the Sort of Option


5346NOW IS A SERVICE DESIGNED FOR 53 CITIZENS LIVING OUTSIDE THE 53
AND THEREFORE UNABLE TO GET 53 TELEVISION )TS A STREAMING SERVICE
that provides live NETWORK CHANNELS !"# #"3 ."# 0"3 &/8 FOR
FREE AND FOR A MONTHLY FEE IT ADDS A FEW CABLE CHANNELS  TOTAL AND
($ STREAMING AS WELL 4HERE IS ALSO SOME $62 SERVICE INCLUDED WITH
THE PREMIUM PACKAGES !NY TIME )VE TRIED TO USE 5346NOW INSIDE THE
53 ITS WORKED PERFECTLY SO THERE ARENT ANY APPARENT GEOGRAPHICAL
restrictions. Honestly, on paper, it’s the best thing going.
5NFORTUNATELY THE TIMES )VE USED 5346NOW )VE HAD LOTS OF GLITCHES
5SUALLY ITS DURING BUSY TIMES 3UPER "OWL PARTY FOR INSTANCE THAT THE
service glitches, but since those are the times I want it to work the
MOST ITS BEEN A FRUSTRATING SERVICE 4HE PRICING IS COMPETITIVE HOWEVER
ESPECIALLY SINCE THE 3$ FREE TIER IS REALLY FREE AND PROVIDES LIVE BROADCAST
stations. As with most services, Roku seems to be the best way, apart
FROM A BROWSER TO CONSUME 5346NOW

66 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 66 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

Figure 4. I want to love USTVnow, and perhaps now that there is a paid service, the reliability
will improve. I just hope it’s able to keep providing live broadcast channels in the US.

) WANT TO LOVE 5346NOW ) HAVE NO IDEA HOW ITS ABLE TO PROVIDE


SERVICE IN THE 53 WHEN THE OTHER OPTIONS STRUGGLE TO PROVIDE BROADCAST
STATIONS (OPEFULLY ITS NOT A LOOPHOLE THAT WILL BE CLOSED BECAUSE FOR
SOME FOLKS ITS THE ONLY WAY TO GET BROADCAST CHANNELS AT ALL EVEN IF
they do LIVE IN THE 53

Rabbit Ears
Yes, obviously using an antenna is a great way to get local television. In

67 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 67 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

Figure 5. “Up to 0 channels” is a sad thing to see; I hope your location is better.

FACT YOU CAN HEAD OVER TO http://antennaweb.org and see what channels
ARE AVAILABLE IN YOUR AREA AND WHAT SORT OF ANTENNA YOULL NEED 4HE SITE
EVEN WILL TELL YOU WHAT DIRECTION TO POINT YOUR ANTENNA FOR THE BEST SIGNAL
)F YOURE JUST LOOKING FOR SOME OLD FASHIONED TELEVISION AN ANTENNA IS
OFTEN A GOOD OPTION 0LUS APART FROM THE HARDWARE ITS TOTALLY FREE
4HE PROBLEM IS EVEN THOUGH ) LIVE IN A SMALL CITY ) GET EXACTLY ZERO
CHANNELS FROM MY LOCATION 4HAT IS DUE TO GEOGRAPHY BECAUSE ) LIVE ON
THE SIDE OF A HILL BUT NONETHELESS ) CANT GET ANY CHANNELS USING EVEN A
ROOFTOP ANTENNA %VEN IF YOU CAN HOWEVER ITS WORTH CONSIDERING WHETHER
THAT SORT OF SYSTEM IS ACCEPTABLE FOR YOU ) DONT WANT TO SWITCH MY INPUT
SOURCE ON THE TELEVISION EVERY TIME ) WANT TO WATCH 46 !ND 4I6O HAS
SPOILED ME ) WANT TO PAUSE LIVE 46 )TS POSSIBLE TO GET SOMETHING LIKE AN
($ (OMERUN DEVICE FROM 3ILICON $UST AND CONVERT YOUR ANTENNA SIGNAL
into a digital stream, but integrating that into your entertainment system
IS OFTEN CHALLENGING 0LUS ) HAD SO MUCH FRUSTRATION WITH MY ($ (OMERUN
SETUP IN OUR LAST HOUSE THAT ) OPTED TO JUST BUY A CABLE 46 SUBSCRIPTION

68 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 68 2/18/17 10:18 AM


THE OPEN-SOURCE CLASSROOM

3O /4! OVER THE AIR CHANNELS ARE WORTH CHECKING OUT AND FOR SOME
PEOPLE THEY ARE MORE THAN ENOUGH &OR ME HOWEVER EVEN IF ) COULD GET A
good signal, I want more.

What about the Parade?!


)F YOU LIVE IN A BIG CITY AND CAN GET LOCAL CHANNELS VIA SERVICES LIKE 3LING
46 OR 0LAY3TATION 6UE THINGS LIKE WATCHING THE 4HANKSGIVING $AY PARADE
ARE POSSIBLE &OR ME THE ONLY WAY ) CAN WATCH LIVE EVENTS IS WITH THE NOT
ALWAYS RELIABLE 5346NOW !ND EVEN THOSE CHANNELS ARENT LOCAL SO ) CANT
ever watch the local news. My big issue used to be how to watch the
/LYMPICS BUT THANKFULLY MANY STREAMING OPTIONS ARE AVAILABLE NOW 3TILL
WATCHING THE 4HANKSGIVING $AY PARADE IS SOMETHING )VE DONE MY WHOLE
LIFE AND WITHOUT SOME WAY TO SEE LOCAL CHANNELS ALL THE CABLE CHANNELS IN
the world don’t help.
"ECAUSE ) HAVE THOUSANDS OF DOLLARS INVESTED IN MY 4I6O INFRASTRUCTURE
LIFETIME 2OAMIO SUBSCRIPTION AND FOUR 4I6O -INIS )M STILL PAYING FOR
THE LOWEST TIER OF CABLE 46 ) FIND THAT WE ALMOST NEVER SWITCH OVER TO
THE 4I6O HOWEVER SO IN THE NEXT FEW MONTHS ) MIGHT BITE THE BULLET
AND CANCEL CABLE 46 ALTOGETHER &OR MOST FOLKS SERVICES LIKE 3LING 46
0LAY3TATION 6UE AND $IREC46 .OW PROVIDE MORE THAN ENOUGH SERVICE AT
A FRACTION OF THE COST 3O IF YOU LIVE IN A BIG CITY OR CAN LIVE WITHOUT LIVE
BROADCAST CHANNELS ) URGE YOU TO GIVE THEM A TRY %ACH IS AVAILABLE WITH
A FREE TRIAL AND IF YOU SORT THROUGH THE VARIOUS PROS AND CONS COMING UP
WITH A SATISFACTORY SERVICE IS PRETTY EASY
)F YOUVE CUT THE CORD MANY OF YOU )VE SPOKEN WITH ALREADY HAVE DONE
SO )D LOVE TO HEAR ABOUT YOUR SPECIFIC SOLUTION $O YOU JUST SWITCH
SOURCES ON YOUR 46 AND USE RABBIT EARS $O YOU STRICTLY .ETFLIX AND CHILL
(AVE YOU SOLD YOUR TELEVISION AND REVERTED TO BOOKS )M OFTEN TEMPTED
0LEASE LET ME KNOW )D LOVE TO FOLLOW UP WITH SOME ALTERNATIVES FOR FOLKS
LIKE MYSELF WHO ARE STILL STRUGGLING TO CUT THE CORD Q

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

69 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 69 2/18/17 10:18 AM


NEW PRODUCTS
NEW PRODUCTS
PREVIOUS
NEXT
Shawn Powers’
Feature: Big Data

V
V

The Open-Source
Demonstrator
Classroom

HOSTING Monitoring Insights


!N IMPORTANT NEED FOR TODAYS #)/S IS GAINING GREATER GRANULAR VISIBILITY
INTO HYBRID CLOUD AND ON PREMISES ENVIRONMENTS TO MAXIMIZE THE
BUSINESS VALUE OF THEIR )4 ASSETS 4O MAKE PROGRESS IN THIS ARENA TWO
NATURAL ALLIESˆMANAGED CLOUD SERVICES PROVIDER (/34).' AND HYBRID
)4 MONITORING SPECIALIST 3CIENCE,OGICˆTEAMED UP TO DEVELOP (/34).'
Monitoring Insights, an innovative hybrid cloud monitoring solution
THAT DELIVERS hTHE INDUSTRYS FIRST HOLISTIC AND COMPREHENSIVE VIEW OF
HYBRID CLOUD ENVIRONMENTSv ! VIEW INTO TODAYS SITUATION FOR MANY
ORGANIZATIONS REVEALS THE USE OF MULTIPLE PLATFORMS TO MONITOR DEVICES
ACROSS DISPARATE CLOUD AND ON PREMISES ENVIRONMENTS 4HE CORE CUSTOMER
BENEFIT FROM -ONITORING )NSIGHTS IS CAPACITY TO MANAGE THE HEALTH OF
critical business processes proactively across the whole environment
AND SIGNIFICANTLY REDUCE THE QUANTITY OF MONITORING TOOLS REQUIRED
-ONITORING )NSIGHTS IS AVAILABLE BOTH DIRECTLY FROM (/34).' OR THROUGH A
(/34).' PARTNER AND AT VARIOUS SERVICE LEVELS BASED ON CUSTOMER NEED
http://hosting.com

70 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 70 2/18/17 10:18 AM


NEW PRODUCTS

NETGEAR, Inc.’s GSS108EPP and


GS408EPP Switches
4WO NEW .%4'%!2 )NC 0OWER /VER %THERNET 0O% SWITCHES FEATURE A
NOVEL h6IRTUALLY !NYWHEREv MOUNTING SYSTEM THAT DELIVERS MODERN HIGH
POWER 0O% TO DEVICES THAT OTHERS CANNOT 4HE MOUNTING SYSTEM ON THE
TWO MODELSˆTHE 0RO3!&%  PORT 'IGABIT %THERNET 7EB -ANAGED 0O%
#LICK 3WITCH '33%00 AND THE 0RO3!&% %ASY -OUNT  PORT 'IGABIT
%THERNET 0O% 7EB -ANAGED 3WITCH '3%00 ˆOFFERS ULTIMATE
FLEXIBILITY IN PLACEMENT SO THAT 0O% PORTS ARE AVAILABLE EXACTLY WHERE
NEEDED TO POWER 7!0S 6O)0 PHONES )0 SURVEILLANCE CAMERAS AND )O4
devices. In any orientation, they can be mounted on a wall, strapped to
A POLE OR TUCKED UNDER A DESK OR TABLETOP )N ADDITION THE '3%00S
UNIQUE DESIGN ALLOWS FOR TWO SWITCHES TO BE MOUNTED IN A SINGLE 5 RACK
SLOT SAVING VALUABLE RACK SPACE WHILE ALLOWING FOR FUTURE EXPANSION "OTH
SWITCHES ALSO PROVIDE CONFIGURABLE ADVANCED ,AYER  NETWORK FEATURES
http://netgear.com

71 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 71 2/18/17 10:18 AM


NEW PRODUCTS

Minifree Ltd.’s GNU+Linux Computers


-INIFREE ,TDˆDOING BUSINESS AS h-INISTRY OF &REEDOMvˆEXISTS MAINLY FOR
REASONS ,INUXERS WILL LIKE TO MAKE IT EASIER FOR PEOPLE TO GET COMPUTERS THAT
RESPECT THEIR FREEDOM AND PRIVACY AND TO PROVIDE FUNDING FOR A MEANINGFUL
PROJECT CALLED ,IBREBOOT -INIFREE DESCRIBES ,IBREBOOT AS A FREE LIBRE AND
OPEN SOURCE ")/35%&) REPLACEMENT THAT OFFERS FASTER BOOT SPEEDS BETTER
SECURITY AND MANY ADVANCED FEATURES COMPARED TO MOST PROPRIETARY BOOT
FIRMWARE -INIFREE RECENTLY ANNOUNCED AVAILABILITY OF THREE COMPUTERS
THE ,IBREBOOT # LAPTOP THE ,IBREBOOT $ $ESKTOP AND ,IBREBOOT
$ 3ERVER !LL COME WITH THE ,IBREBOOT FIRMWARE AND $EBIAN '.5 ,INUX
OPERATING SYSTEM PREINSTALLED AND ARE FREE OF UNWANTED BLOATWARE
$2- SPYWARE OR RESTRICTIONS ON COMPUTER USAGE RIGHTS 4HE ,IBREBOOT
# LAPTOP IS A CONFIGURABLE LIGHTWEIGHT AND PORTABLE LAPTOP IDEAL FOR
ANYONE NEEDING A SMALL LIGHTWEIGHT COMPUTER FOR TRAVEL WORK OR GENERAL
ENTERTAINMENT PURPOSES 4HE ,IBREBOOT $ $ESKTOP IS A CONFIGURABLE HIGH
END BUSINESS GRADE SECURE OWNER CONTROLLED WORKSTATION FREE OF BACKDOORS
IMPLANTED BY THE .3! AND OTHER AGENCIES &INALLY THE ,IBREBOOT $ 3ERVER
IS A CONFIGURABLE HIGH END BUSINESS GRADE SECURE OWNER CONTROLLED SERVER
ALSO FREE OF THE AFOREMENTIONED BACKDOORS -INIFREE SHIPS ITS MACHINES
WORLDWIDE FROM THE 5NITED +INGDOM
HTTPMINIFREEORG

72 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 72 2/18/17 10:18 AM


NEW PRODUCTS

William
Gurstelle’s
ReMaking
History,
Volume 3
(Maker Media,
Inc.)
In William Gurstelle’s ReMaking History SERIES FROM -AKER -EDIA
Inc., readers get exponentially closer to the inventors who shaped
OUR MODERN WORLD COMPARED TO OTHER HISTORIES OF TECHNOLOGY 4HIS
IS BECAUSE 'URSTELLE DOESNT MERELY TELL THE STORIES OF REMARKABLE
INVENTORS FROM THE PAST HE GETS INTO THEIR FASCINATING MINDS BY
ILLUSTRATING HOW TO MAKE ONES OWN VERSION OF THE INVENTORS HANDIWORK
4HE NEW 6OLUME  OF ReMaking History bearing the subtitle Makers
of the Modern World explores the early modern era and builds on
THE EARLIER TWO VOLUMES COVERING PRE MODERN HISTORY TO THE )NDUSTRIAL
!GE )N THIS VOLUME SEVEN INVENTORS AND THEIR TECHNOLOGIESˆDESTINED
TO FILL BASEMENTS AND GARAGES EVERYWHEREˆINCLUDE !LESSANDRO 6OLTA
AND ELECTROPLATING (UMPHREY $AVY AND THE FIRST ELECTRIC LIGHT 'EORGE
Cayley and the aeronautical glider; the Lumiere Brothers and the movie
PROJECTOR 2UDOLF $IESEL AND THE AUTOMOBILE ENGINE (ANS 'OLDSCHMIDT
and the thermite reaction; August Möbius and the Möbius Strip; and
,OUIS 0OINSOT AND LOADS MOMENTS AND TORQUES
http://oreilly.com

73 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 73 2/18/17 10:18 AM


NEW PRODUCTS

William Rothwell
and Nick Garner’s
Cert ified Ethical
Hacker (CEH)
Complete Video
Course (Pearson
IT Certification)
Watch William Rothwell and Nick Garner’s new Certified Ethical Hacker
#%( #OMPLETE 6IDEO #OURSE AND LEARN EVERYTHING YOU NEED TO KNOW
TO ACE THE #%( EXAM IN LESS THAN  HOURS $IVIDED INTO FIVE MODULES
AND CONTAINING A COMPLETE OVERVIEW OF THE TOPICS IN THE %# #OUNCIL
"LUEPRINT 2OTHWELL AND 'ARNERS INTERMEDIATE LEVEL VIDEO TRAINING
course helps viewers master the essentials needed to pass the exam.
4HE COURSE COMMENCES WITH A GENERAL OVERVIEW OF SECURITY ESSENTIALS
FOLLOWED BY AN EXPLORATION OF SYSTEM NETWORK AND WEB SERVICES
SECURITY AND A DIVE IN TO WIRELESS AND INTERNET SECURITY 4O TEST ONES
CHOPS THE COURSE OFFERS QUIZZES EXERCISES AND TWO FULL PRACTICE EXAMS
"Y PROVIDING THE BREADTH OF COVERAGE NECESSARY TO LEARN THE FULL SECURITY
CONCEPTS BEHIND THE #%( EXAM THIS VIDEO COURSE HELPS PREPARE VIEWERS
FOR A CAREER AS A SECURITY PROFESSIONAL
HTTPINFORMITCOM

74 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 74 2/18/17 10:18 AM


NEW PRODUCTS

TRENDnet’s WiFi Everywhere


Powerline 1200 AV2 Access Point,
Model TPL-430AP
42%.$NET RECENTLY RELEASED AN INNOVATIVE NEW SOLUTION THAT CREATES DEAD
SPOT FREE HOME 7I &I BY LEVERAGING A HOMES EXISTING ELECTRICAL SYSTEM
42%.$NETS 7I&I %VERYWHERE 0OWERLINE  !6 !CCESS 0OINT ADAPTER
MODEL 40, !0 USES EXISTING ELECTRICAL LINES TO CREATE A WIRED HOME
NETWORK AND DOUBLES AS AN ACCESS POINT TO DELIVER A WIRELESS SIGNAL FROM
NEARLY ANY POWER OUTLET 4HE ADAPTER SUPPORTS 0OWERLINE  NETWORKING
WITH A BUILT IN DUAL BAND WIRELESS !# ACCESS POINT AND FEATURES
-)-/ WITH "EAMFORMING TECHNOLOGY TO ENHANCE PERFORMANCE AND RANGE
WiFi Clone support duplicates one’s existing wireless network settings,
REDUCING 7I &I SETUP TIME FROM MINUTES TO JUST SECONDS 0OWERLINE RANGE
FUNCTIONS UP TO  FEET M OVER EXISTING ELECTRICAL LINES
http://trendnet.com

75 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 75 2/18/17 10:18 AM


NEW PRODUCTS

SUSE Linux Enterprise High


Availability Extension
Historically, data replication has been available only piecemeal
THROUGH PROPRIETARY VENDORS )N A QUEST TO REMEDIATE HISTORY 353%
AND PARTNER ,).")4 ANNOUNCED A SOLUTION THAT PROMISES TO CHANGE
THE ECONOMICS OF DATA REPLICATION 4HE TWO COMPANIES COLLABORATIVE
EFFORT IS THE HEADLINER IN THE UPDATED 353% ,INUX %NTERPRISE (IGH
!VAILABILITY %XTENSION WHICH NOW INCLUDES ,).")4S INTEGRATED
GEO CLUSTERING TECHNOLOGY 0ROVIDING A NEW CAPABILITY TO REPLICATE DATA
ACROSS UNLIMITED DISTANCES ,).")4 HAS ENHANCED 353%S HIGH AVAILABILITY
SOLUTION THAT IS BUILT ON OPEN SOURCE SOFTWARE AND RUNS ON COMMODITY
HARDWARE 4HE ,).")4 SOLUTION GUARDS AGAINST FAILURES OR DISASTERS BY
PROVIDING POLICY DRIVEN MECHANISMS FOR CUSTOMER APPLICATIONS AND
data, to continue operations in another geographically dispersed
DATA CENTER 353% ,INUX %NTERPRISE (IGH !VAILABILITY %XTENSION IS
AN INTEGRATED SUITE OF OPEN SOURCE CLUSTERING TECHNOLOGIESˆFROM
,).")4 AND OTHERSˆTHAT ENABLES CUSTOMERS TO ELIMINATE SINGLE POINTS
OF FAILURE THUS HELPING TO MAINTAIN BUSINESS CONTINUITY ENABLE
COMPLIANCE PROTECT DATA INTEGRITY MAINTAIN ISOLATION FOR MULTIPLE
TENANTS AND REDUCE UNPLANNED DOWNTIME FOR MISSION CRITICAL WORKLOADS
http://suse.com and http://linbit.com

76 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 76 2/18/17 10:18 AM


The Fifteenth Annual
Southern California Linux Expo

March 2-5, 2017


Pasadena Convention Center
Pasadena, CA

http://www.socallinuxexpo.org
Use Promo Code LJ15X for a 30%
discount on admission to SCALE

LJ275-March2017.indd 77 2/18/17 10:18 AM


NEW PRODUCTS

Briggs &
Stratton
8,000 Watt
Elite Series
Portable
Generator
with
StatStation
Wireless
Although Linux Journal READERS MIGHT NOT EQUATE -ILWAUKEE WITH TECH A NEW
"RIGGS  3TRATTON PRODUCT PORTENDS THE BRIGHT FUTURE OF SMARTENED hLEGACYv
DEVICES FROM THE INDUSTRIAL HEARTLAND 4HE 7ISCONSIN BASED MAKER OF ENGINES
AND INDUSTRIAL PRODUCTS RECENTLY ANNOUNCED A SMARTERˆAND hTECHIERvˆWAY
TO PRODUCE ON DEMAND POWER IN THE FORM OF THE NEW "RIGGS  3TRATTON  
7ATT %LITE 3ERIES 0ORTABLE 'ENERATOR WITH 3TAT3TATION 7IRELESS FEATURING
"LUETOOTH TECHNOLOGY 4HE ACCOMPANYING 3TAT3TATION APP FOR !NDROID AND
I/3 WITH SUPPORT FROM "LUETOOTH CONNECTIVITY PROVIDES VALUABLE REMOTE
VISIBILITY INTO KEY METRICS SUCH AS FUEL LEVEL AND REMAINING RUNTIME RUNTIME
METER PERCENT OF AVAILABLE 7ATT CONSUMPTION MAINTENANCE REMINDERS
DEALER LOCATOR REFERENCE GUIDES AND HOW TO VIDEOS
http://briggsandstratton.com
Please send information about
releases of Linux-related products
to [email protected]
or New Products c/o Linux Journal,
PO Box 980985, Houston, TX 77098.
Submissions are edited for length
and content.

RETURN TO CONTENTS

78 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 78 2/18/17 10:18 AM


LJ275-March2017.indd 79 2/18/17 10:19 AM
FEATURE

BIG DATA
DEMONSTRATOR
USING HADOOP TO BUILD
A LINUX CLUSTER FOR LOG
DATA ANALYSIS USING R
THIS ARTICLE WALKS THROUGH THE STEPS TO CREATE A HADOOP LINUX
CLUSTER IN THE CLOUD AND OUTLINES HOW TO ANALYZE DEVICE LOG
DATA VIA AN EXAMPLE IN THE R PROGRAMING LANGUAGE.

RUNE TORBENSEN and SØREN TOP

NEXT
PREVIOUS Feature: Integrating
V
V

New Products Web Applications


with Apache

80 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 80 2/20/17 12:15 PM


FEATURE: BigFEATURE
Data Demonstrator

T
HIS ARTICLE DESCRIBES WHY DEVICE LOG DATA ANALYSIS IS USEFUL AND BRIEFLY
INTRODUCES THE INVOLVED TECHNOLOGIES AND HOW THEY FIT TOGETHER
,INUX IS THE BASIS FOR THE h)NFRASTRUCTURE AS A 3ERVICEv STANDARD
that makes the proposed solution portable between cloud providers.
Furthermore, we describe the steps you need to go through to create a
(ADOOP CLUSTER BASED ON ,INUX IN AN !MAZON CLOUD 4HE STEPS INVOLVE
bash/install scripts placed in a GitHub repository that allows the automatic
INSTALLATION OF ALL THE NECESSARY COMPONENTS AND CONFIGURATION
"IG $ATA TECHNOLOGY AND THE )NTERNET OF 4HINGS )O4 ARE A STRONG
COMBINATION 4HE )O4 IS A GREAT SOURCE OF INFORMATION AND "IG $ATA
TECHNOLOGY ALLOWS FOR ANALYSIS OF VAST AMOUNTS OF DATA 0OSSIBLE
applications are prediction, anomaly detection and device improvement/
DEVELOPMENT 4HE LATTER IS THE CASE WE HAVE BEEN WORKING ON IN ORDER
to investigate why devices break. We need Big Data technology, because
CLASSICAL SINGLE SERVER APPROACHES WERE UNABLE TO PROCESS THE LARGE
AMOUNTS OF DATA FAST ENOUGH FOR AN
EFFICIENT ANALYSIS WORK CYCLE
In this article, we use an example
OF DATA ANALYSIS OF DEVICE LOG
data to illustrate how to use
A DEMONSTRATOR SETUP TO FIND
unknown correlations between
parameters in log data. However,
in this article, we will not go into
the device details, but use an
abstracted device model approach.

DEMONSTRATOR OVERVIEW
4HE DEMONSTRATOR SETUP CONSISTS
OF SELECTED TECHNOLOGIES &IGURE 
developed scripts and installation
INSTRUCTIONS 4HIS ALLOWS FOR THE
REPRODUCTION OF THE SETUP
4HE SO CALLED CLOUD IS A DYNAMIC
MARKET FOR COMPUTER RESOURCES
Prices are decreasing over time, Figure 1. Selected Technologies Layer Model

81 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 81 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

BUT IT IS FAR FROM FREE AND A CREDIT CARD IS REQUIRED TO GET A CLOUD ACCOUNT
4HE CLOUD IS IMPORTANT IN "IG $ATA ANALYSIS BECAUSE WE REQUIRE LARGE
COMPUTER RESOURCES ONLY FROM CASE TO CASE AND A STANDING IN HOUSE
COMPUTER CLUSTER IS IN MANY SITUATIONS TOO EXPENSIVE ESPECIALLY FOR
smaller organizations to begin learning Big Data analysis.
When we want to analyze data, we select a cloud provider and
CREATE A CLUSTER DO THE ANALYSIS AND THEN AFTERWARD WE DESTROY THE
CLUSTER 4HIS WAY WE PAY ONLY FOR THE COMPUTER RESOURCES USED DURING
the data analysis.
7E HAVE SELECTED )NFRASTRUCTURE AS A 3ERVICE )AA3 AS THE CLOUD
TECHNOLOGY BECAUSE IT ALLOWS FOR PORTABILITY OF THE DEMONSTRATOR
BETWEEN DIFFERENT CLOUD PROVIDERS
!LL THE CLOUD PROVIDERS THAT WE KNOW OF OFFER VIRTUAL 5BUNTU ,INUX
MACHINES 5BUNTU IS A WELL KNOWN ,INUX DISTRIBUTION WHICH IS WHY WE
DEVELOPED THE DEMONSTRATOR INSTALLATION SCRIPT FOR 5BUNTU
We chose to work on Amazon Web Services (AWS), since it’s a well
ESTABLISHED STABLE BUSINESS WITH WELL DEFINED AND DOCUMENTED INTERFACES
AND IT OFFERS A FREE TIER THAT IS VERY CONVENIENT FOR DEVELOPMENT WORK
(ADOOP IS A KIND OF OVERLAY OPERATING SYSTEM FOR A CLOUD CLUSTER OF
Linux computers. It handles all the resources in the cluster and allows
programs to be executed in a distributed manner. Hadoop is written
IN *AVA AND IS RATHER MEMORY CONSUMING WHEN COMPARED TO SMALLER
JOBS AND TESTING  (ADOOP HAS BEEN THE DE FACTO STANDARD FOR "IG $ATA
PROCESSING FOR THE PAST TEN YEARS OR SO (ADOOP CONSISTS OF A NUMBER OF
COMPONENTSˆSEE &IGURE 
At the bottom is the Hadoop Distributed File System (HDFS) that allows
FOR HIGH DATA THROUGHPUT )T HANDLES THE )/ BOTTLENECK PROBLEM WHEN
ANALYZING VAST AMOUNTS OF DATA 4HE DATA IS SPREAD OUT OVER THE CLUSTER
and the idea is that data is processed where it is stored.
9!2. 9ET !NOTHER 2ESOURCE .EGOTIATOR IS THE CENTRAL COMPONENT THAT
allocates resources to Hadoop jobs. Keep in mind that this is no trivial
TASK BECAUSE A (ADOOP CLUSTER MAY CONTAIN  NODES 9!2. HAS
BUILT IN LOGIC TO HANDLE NODE AND JOB FAILURE IN A GRACEFUL WAY .ODES MAY
disappear and reappear on the cluster network, but jobs must be taken
over by other nodes in the meantime.
-AP 2EDUCE IS THE COMPONENT THAT HANDLES THE PARALLELIZATION OF

82 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 82 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

Figure 2.
The Main
Components
of Hadoop

analysis tasks. Hard disk and network issues are abstracted away
FROM THE DEVELOPER IN ORDER TO ALLOW THE DEVELOPER TO CONCENTRATE ON
DEVELOPING THE ANALYSIS PROGRAM  4HE (ADOOP SYSTEM HANDLES THESE
ISSUES AUTOMATICALLY "E AWARE THAT THE -AP 2EDUCE FRAMEWORK ENFORCES
parallel programming by constraining the programing model, and that can
BE DIFFICULT TO GET USED TO 4HERE IS A -AP FUNCTION THAT IS RESPONSIBLE FOR
IMPORTING DATA AND CONVERTING TO THE INTERNAL DATA FORMAT KEY VALUE ˆ
FOR EXAMPLE THE KEY IS THE DEVICE )$ AND THE VALUE IS A LIST OF TEMPERATURE
DATA POINTS 4HERE IS ALSO A REDUCE FUNCTION ONE PER SLAVE NODE THAT
PROCESSES DATA WITH A CERTAIN KEY 4HIS MEANS IN OUR CASE THAT ALL DATA
FOR ONE DEVICE WILL BE PROCESSED BY THE SAME SLAVE NODE
4HE (ADOOP SYSTEM WILL FEED THE MAP FUNCTION WITH DATA RECORDS )T MAY
BE LINE BY LINE OR FILE BY FILE 4HE SYSTEM WILL DISTRIBUTE THE LOAD BY DIVIDING

83 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 83 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

THE FILES AMONG THE SLAVE NODES FOR hMAPv PROCESSING 4HERE IS NO DIRECT
FILESYSTEM ACCESS 4HE ONLY WAY TO OUTPUT RESULTS IS TO EMIT A KEY VALUE
PAIR OR A LIST OF KEY VALUE PAIRS 4HERE IS NO SHARED MEMORY BETWEEN MAP
INSTANCES EACH MAP NODE IS DOING THE WORK ON ITS OWNˆHENCE ALLOWING
FOR DECOUPLED PARALLEL PROCESSING OF DATA 4HE DEVELOPER HAS NO CONTROL
OVER WHICH NODE WILL PROCESS WHAT DATA 4HE KEY VALUES EMITTED BY MAP
instances are sorted by the system according to the key. Other than this,
YOU CANNOT ASSUME ANY ORDERING OF KEY VALUE PAIRS
)N ORDER TO SPEED UP EXECUTION THE MAP FUNCTION MAY BE USED TO FILTER
OUT RECORDS THAT ARE NOT RELEVANT FOR THE ANALYSIS 4HE 2EDUCER FUNCTION
WILL RECEIVE A LIST OF KEY VALUE PAIRS WITH A CERTAIN KEY FOR PROCESSING
7HEN DONE THE FUNCTION EMITS KEY VALUE PAIRS 4HE (ADOOP SYSTEM
COMBINES ALL THE KEY VALUE PAIRS FROM THE REDUCERS INTO THE OUTPUT
/N TOP OF THE (ADOOP ,INUX CLUSTER WE HAVE CHOSEN 2 AS THE DATA
ANALYSIS SOFTWARE 2 IS A GENERIC MATH TOOL THAT PROVIDES A FAST INTERACTIVE
PROCESS WHICH IS FUNDAMENTAL FOR DATA ANALYSIS 2 IS A HIGH LEVEL
PROGRAMING LANGUAGE WITH MANY EXTENSION PACKAGES 4HIS STEMS FROM THE
FACT THAT 2 IS OPEN SOURCE AND HAS A LARGE COMMUNITY !MONG ITS PACKAGES
is data mining. R has a command line that allows an interactive process
AND FITS WELL WITH THE 5.)8 ENVIRONMENT SCRIPTING 
(OWEVER 2 IS CLASSIC SINGLE COMPUTER SOFTWARE THEREFORE THE 2 PACKAGE

Figure 3.
Map-Reduce
Functions in R

84 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 84 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

2MR IS NEEDED TO ALLOW 2 PROGRAMS TO RUN ON A (ADOOP ,INUX CLUSTER


2MR IS MAINLY A WRAPPER OF (ADOOP INTO THE 2 ENVIRONMENT AND THE
MAP REDUCE PROGRAMMING MODEL APPLIES TO IT AS SHOWN IN &IGURE 
You now can use the entire R language and extensions to write
THE MAPREDUCE FUNCTIONS 4O USE 2MR YOU MUST FOLLOW THESE STEPS
 3TORE INPUT DATA IN ($&3  7RITE THE FUNCTIONS -AP AND 2EDUCE
 &ROM 2 CALL THE MAP REDUCE FRAMEWORK AND POINT TO THE MAP AND
REDUCE FUNCTIONS  'ET OUTPUT DATA FROM ($&3

USING THE DEMONSTRATOR


"EFORE DIVING INTO THE STEPS OF CREATING A (ADOOP ,INUX CLUSTER WE WANT
TO DESCRIBE THE WORK FLOW OF THE DATA ANALYSIS THAT WE PROPOSE &IGURE  
First, you have to collect and store data in order to have anything to
analyze. It has to be stored on a server somewhere on the internet. Next

Figure 4. Work Flow with the Demonstrator (*Requires Cloud Account)

85 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 85 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

you should determine what and how you want to analyze. In other
WORDS YOU DEFINE YOUR HYPOTHESIS AND WRITE AN ANALYSIS PROGRAM IN 2
4HEN CREATE YOUR OWN 2 (ADOOP CLUSTER 7ELL DESCRIBE THE DETAILS IN THE
next section. When your R Hadoop Linux cluster is ready, you load your
DATA FROM THE EXTERNAL SERVER INTO THE ($&3 AND RUN THE ANALYSIS PROGRAM
using the R command prompt.
When the results are ready, you should review them and copy the
RESULT DATA TO A STORAGE SERVER )F THE RESULTS ARE NOT SATISFACTORY YOU
should change the analysis program and run it again on the cluster.
Finally, when done, you should destroy the cluster, since keeping disk
AND #05 ALLOCATION WILL COST TOO MUCH IF YOU ARE NOT USING IT (OWEVER
IT MAY BE SENSIBLE TO KEEP THE MASTER TEMPLATE IMAGE IF YOU WANT TO
DO MORE ANALYSIS IN THE FUTURE

CREATE YOUR OWN HADOOP LINUX CLUSTER


)N THIS SECTION WE PROVIDE ONE LINE COMMANDS THAT CAN BE COPYPASTED
INTO A ,INUX 33( CONSOLE 7E ASSUME FAMILIARITY WITH !73 VIRTUAL
machines; there are many tutorials and videos online.
&IRST INSTALL AN 5BUNTU SERVER THAT WILL SERVE AS -ASTER AND TEMPLATE
FOR ALL THE SLAVES
 ,AUNCH AN 5BUNTU SERVER  ,43 VIA !MAZON 73 WEB INTERFACE
(Figure 5).
 ,OG IN USING 33( TO YOUR SERVER AND ENTER THE FOLLOWING COMMANDS
4HESE WILL DOWNLOAD THE INSTALLATION SCRIPT AND RUN IT

>  wget  https://raw.githubusercontent.com/Rustor/EE-­DIGI/  


´master/install-­big-­tools-­demo.sh  
>  bash  install-­big-­tools-­demo.sh

Figure 5. On the left is a screen dump of the Launch button, and on the right is the Ubuntu
Server used.

86 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 86 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

4HE SCRIPT WILL INSTALL 33( (ADOOP 2 AND 2MR ON THE 5BUNTU SERVER
(ADOOP IS INSTALLED IN THE hUBUNTUv USERS HOME DIRECTORY AND THE
(ADOOP DATA FILES ($&3 WILL BE PLACED IN TMP )N ADDITION IT WILL COPY
THE (ADOOP CONFIGURATION DATA BASIC CONFIGURATION FROM THE %%DIGI
'IT(UB REPO TO THE (ADOOP INSTALLATION ON THE SERVER 9OU CAN FIND FURTHER
CODE COMMENTS IN THE SCRIPT ! VARIATION POINT IS FOR INSTANCE THAT YOU
CAN COMMENT OUT THE 2 PART OF THE SCRIPT BEFORE RUNNING IT AND INSTALL
OTHER ANALYSIS SOFTWARE SUCH AS 0YTHON
Next, here are the steps to create the template.
 3HUT DOWN THE SERVER WITH

>  sudo  init  0

 -AKE A SNAPSHOT OF THE MASTER IN THE !73 WEB INTERFACE &IGURE  


 3TART THE MASTER AGAIN IN THE WEB INTERFACE AND LOG IN VIA 33(
4HEN START FORMING THE CLUSTER BASED ON THE MASTER TEMPLATE !LL THE
SLAVES WILL KNOW THE MASTERS INTERNET ADDRESS THE HOSTNAME IS DEFINED
AS hFEv AND PUBLIC KEY BECAUSE YOU HAVE GENERATED THE SLAVES FROM
the master image. However, the slaves are unknown to the master, and
THEREFORE YOU NOW RUN THE FOLLOWING SCRIPT ON THE MASTER 4HE SCRIPT
OPENS A SERVER THAT ACCEPTS SLAVES INTO A LIST OF SLAVES 4HE SLAVES LIST IS
THE SINGLE POINT THAT DEFINES THE (ADOOP ,INUX CLUSTER
 2UN AUTO CONFIGSH ON THE MASTER TO ACCEPT NEW SLAVES

>  bash  auto-­config.sh

Figure 6. Screen Dump of AWS Create Image Menu

87 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 87 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

Figure 7. Screen Dump of the AWS Field for User Data

5) Based on the master template, launch two (variation point) instances


IN THE !73 WEB INTERFACE ,AUNCH 3TEP -Y !-)S AND INSERT THESE TWO
LINES IN THE !73 INTERFACE USER DATA AS SHOWN IN &IGURE 

#!/bin/bash    
bash  /home/ubuntu/auto-­slave.sh

 7AIT SOME TEN MINUTES FOR ALL THE SLAVES TO COME ONLINE )F YOU
want, on the master, in another console window, check the online slaves
on the master during the process:

>  cat  cluster-­config.file

 "REAK AUTO CONFIGSH PRESS #TRL # TO BREAK TO STOP ACCEPTING MORE SLAVES
 !PPEND THE CLUSTERCONFIGFILE TO ETCHOSTS BECAUSE (ADOOP REQUIRES
$.3 NAMES FOR SLAVES

>  sudo  cp  -­n  /etc/hosts  /etc/hosts.org    


>  sudo  -­-­  sh  -­c  -­e    "cat  cluster-­config.file  >>  /etc/hosts"

 5PDATE THE (ADOOP SLAVES LIST WITH THE $.3 NAMES FROM THE
CLUSTER CONFIGFILE BUT REMOVE THE INTERNET ADDRESS INFORMATION

>  cat  cluster-­config.file  |  cut  -­f  4  -­d  "  "  >    


 ´/home/ubuntu/hadoop/etc/hadoop/slaves

88 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 88 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

 !DD THE SLAVES TO SSHKNOWNHOSTS TO AVOID 33( WARNINGS

>  ssh-­keyscan    -­H    -­f    /home/ubuntu/hadoop/etc/hadoop/slaves    >>      


 ´~/.ssh/known_hosts

 #OPY ETCHOSTS TO ALL SLAVES TO ALLOW THEM TO COMMUNICATE USING


DNS names:

>  wget  https://raw.githubusercontent.com/Rustor/EE-­DIGI/  


´master/update-­all-­slaves.sh  
>  bash  update-­all-­slaves.sh

In this section, you have now, on AWS, created a Linux Hadoop cluster
BASED ON A SINGLE MASTER INSTALLATION 4HIS INSTALLATION HAS ALL THE NECESSARY
SOFTWARE TOOLS AND CONFIGURATION SO THAT YOU CAN DIVE IN TO USING AND
TESTING THE CLUSTERˆSEE THE NEXT SECTION

DATA ANALYSIS METHOD


7E WROTE AN 2 PROGRAM TO FIND CORRELATIONS BETWEEN DATA LOGS 4O
DEVELOP THIS PROGRAM WE NEEDED TEST DATA 4HEREFORE WE HAVE
GENERATED LOG FILES FROM RANDOM DATA %ACH LOG FILE CONTAINS A NUMBER
OF VARIABLE DATA OR SIGNALS 4HE 2 PROGRAM DEMONSTRATOR REQUIRES A
FORMAT WHERE EACH DATA RECORD HAS A DEVICE )$ AND IS STORED ON ONE
LINE WITH TIME DATE

Time-­date,  device_id,  var1,  var2,  var3,  var4

(OWEVER WE HAVE MULTIPLIED ONE OF THE SIGNALS IN ONE OF THE LOG FILES
with a sinus curve. Without knowing which log, we have designed the
2 PROGRAM TO USE A CORRELATION FUNCTION TO FIND IT #ORRELATION COR IS
A MATHEMATICAL FUNCTION THAT GIVEN TWO SIGNALS WILL OUTPUT A NUMBER
BETWEEN  AND n :ERO  MEANS THAT THE TWO SIGNALS ARE UNRELATED OR
NOT DETECTABLE BY THE ALGORITHM ! VALUE ABOVE  OR BELOW n IS
CONSIDERED A SIGNIFICANT CORRELATION
!N 2 PROGRAM USING THE COR FUNCTION WAS ABLE TO FIND THE DATA LOG
THAT WAS MULTIPLIED WITH A SINUS CURVE )T IS DIFFICULT TO TELL WHICH ONE

89 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 89 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

YOU CAN SCALE UP THE EXAMPLE, IF YOU ARE


WILLING TO PAY FOR MACHINES WITH MORE RAM.

Figure 8. Illustrations of a couple device log files. On the left is the sinus signal. On the
right of this are two log file signals: V (middle) and VV (right).

OF THE SIGNALS MATCHES JUST BY INSPECTION #ORRELATION OF THE SIGNALS IN


&IGURE  RESULT IN THE FOLLOWING OUTPUT

cor(v,y)  
[1]  0.5779192  
 
cor(vv,y)  
[1]  0.7557141

!S YOU CAN SEE THE SIGNAL 66 HAS A CORRELATION OF  WITH Y


7E HAVE FOUND THE LOGSIGNAL THAT WAS MULTIPLIED WITH THE SINUS CURVE

TESTING THE CLUSTER BY ANALYZING DATA


Based upon what we have presented in the previous section, we made
A SMALL EXAMPLE -" WITH TEN hSPREADSHEET FILESv EACH WITH  
LINES AND  COLUMNS OF DATA 7E HAVE SCALED DOWN THE TEST THAT WE
PRESENT IN THIS ARTICLE IN ORDER TO BE ABLE TO RUN IN THE !MAZON FREE TIER
on three computers: one master and two slaves. You can scale up the
EXAMPLE IF YOU ARE WILLING TO PAY FOR MACHINES WITH MORE 2!- 4HE 2
ANALYSIS CODE DOES NOT REQUIRE CHANGES

90 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 90 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

%NTER THE (ADOOP DIRECTORY FORMAT THE ($&3 AND START THE WHOLE
SYSTEM (ADOOP DMONS ON THE DIFFERENT MACHINES  4HE FOLLOWING WILL
initialize all nodes:

>  cd  hadoop  


>  bin/hdfs  namenode    -­format  
>  sbin/start-­all.sh

4HEN YOU HAVE TO GENERATE THE TEST DATA 9OU SHOULD DOWNLOAD AND RUN
THE SCRIPT FOUND IN THE %%DIGI REPO ON 'IT(UB

>  wget  https://raw.githubusercontent.com/Rustor/EE-­DIGI/  


´master/genEEdigi-­test-­data.R  
>  Rscript  genEEdigi-­test-­data.R

.EXT PUT THE TEST DATA INTO ($&3 AND FOR THAT YOU MAKE A DIRECTORY
STRUCTURE AS REQUIRED BY THE ANALYSIS PROGRAM 

>  bin/hadoop    fs    -­mkdir      /rhadoop/eedigi/xdata  


>  bin/hadoop    fs    -­put    -­f    TESTdata*.csv    /rhadoop/eedigi/xdata

Now the cluster should be running with data loaded. You need to
DOWNLOAD THE %%DIGI ANALYSIS PROGRAM AND RUN IT AFTER YOU FIRST HAVE SET
UP THE 2MR ENVIRONMENT IN AN 2 SESSION 4HE IMPORTANT PARTS OF THE 2
CODE ARE SHOWN IN ,ISTING 
'ET THE %%DIGI DATA ANALYSIS PROGRAM

>  wget  https://raw.githubusercontent.com/Rustor/EE-­DIGI/  


´master/EEdigitest.R

Run the program in the R command prompt:

>  R    
>  source("etc/hadoop/hset.R")  
>  source("EEdigitest.R")  
>  q()

91 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 91 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

Listing 1. The “mapper” takes the input files and selects key/value before emitting. “Reducer”
analyses data with the “cor” function (correlation) and emits the result as value. Before printing,
the R analysis program selects the data with high (more than 0.75) significance.

Mapper  =  function(key,  val.df)  {  


               val.df  =  subset(val.df,  id  !=  'id'    )  
               output.key  =  data.frame(id=val.df$id,  stringsAsFactors=F)  
               output.val  =  val.df[,c('t1',  't2',  't3',  't4')]  
               return(  keyval(output.key,  output.val))}  
 
...  
 
reducer  =  function(key,  val.df)  {  
               output.key  =  key  
     output.val  =  c(cor(as.numeric(val.df$t2),as.numeric(val.df$t4)),  
                                   cor(as.numeric(val.df$t1),as.numeric(val.df$t2)))  
               return(  keyval(output.key,  output.val))}  
 
...  
 
print(  results.df[which(results.df$val>  0.75),])

CLOSE DOWN THE CLUSTER


5SE THE FOLLOWING INSTRUCTIONS IF YOU WANT TO CLOSE DOWN THE CLUSTER

>  sbin/stop-­all.sh

 3HUT DOWN ALL MACHINES IN THE !73 WEB INTERFACE


 4ERMINATE THE SLAVES KEEP THE MASTER INSTANCE AND IMAGE IN
THE INTERFACE

92 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 92 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

CONCLUSION
!FTER READING THIS ARTICLE YOU SHOULD BE ABLE TO SET UP A (ADOOP
,INUX CLUSTER IN A SHORT AMOUNT OF TIME 7E HAVE ALSO PROVIDED A
WAY FOR YOU TO TEST THE CLUSTER USING 2 4HE OPEN SOURCE TOOL CHAIN
FITS NICELY TOGETHER AND YOULL BE ABLE TO LEARN ABOUT "IG $ATA
analysis at no cost.
W ith the current cloud price structures, our recommendation is to
USE A CLOUD CLUSTER WHEN YOU HAVE A SMALL BUDGET FOR COMPUTING
AND YOUR NEED FOR IT IS TRANSIENT )F AT SOME POINT YOU ARE USING
MANY MACHINES CONSTANTLY FOR AN ENTIRE MONTH YOU SHOULD CONSIDER
building a local computer cluster. While doing the project, we learned
that buying 50 strong computers as hardware would have cost a
SIMILAR AMOUNT AS RENTING  STRONG COMPUTERS FOR ONE MONTH
7HEN USING A FREE TIER REMEMBER THAT THE CLOUD WILL COST YOU
MONEY OVER TIME )T IS THE CLOUD PROVIDERS BUSINESS TO CHARGE FOR
using their computing resources. Cloud providers have extensive
PRICE INFORMATION AND THEY WILL CHARGE YOU FOR EVERY USE OF A VIRTUAL
SERVERˆFOR EXAMPLE STORAGE OF HARD DISK IMAGES AND MACHINE
TEMPLATES DATA OUT OF THE DATA CENTER REGION  %VEN WHEN YOU
USE THE FREE TIER READ YOUR BILL CAREFULLY FOR UNEXPECTED COSTS DUE
TO A TINY MISTAKE ON YOUR SIDEˆFOR EXAMPLE USING MORE STORAGE
THAN INCLUDED IN THE FREE TIER 4HE BEST YOU CAN DO IS TO MAKE A
SMALL BUDGET FOR CLOUD USAGE IN ORDER TO PREPARE YOU AND YOUR
ORGANIZATIONS MINDSET 4HEN YOU SHOULD RUN TRIALS BASED ON YOUR
TYPICAL ANALYSIS TASKS TO DETERMINE WHAT TYPE OF MACHINES AND WHICH
provider is most suited.
$O NOT USE THE INTUITIVELY IMPRESSIVE SOUNDING 3OFTWARE AS A
Service (SaaS), but stick to generic virtual machines, as we used in
this article. Otherwise, you most likely will be investing in learning
A PROPRIETARY 3AA3 INTERFACE AND WILL LOSE THE ABILITY TO SWITCH CLOUD
providers without high switching costs. Managing Linux machines
SERVERS FOR SHORT PERIODS SHOULD NOT BE A PROBLEM IN TERMS OF
OPERATIONAL COSTS )N OUR EXPERIENCE IN DEPTH KNOWLEDGE ABOUT ,INUX
AND COMPUTER HARDWARE PERFORMANCE IS HIGHLY RELEVANT WHEN USING
CLOUD COMPUTER RESOURCES !FTER ALL YOU ARE GIVEN COMPLETE CONTROL OF
A REAL #05 AND 2!- IF ONLY FOR SHORT PERIOD

93 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 93 2/18/17 10:19 AM


FEATURE: BigFEATURE
Data Demonstrator

ACKNOWLEDGEMENTS
4HIS WORK WAS SUPPORTED BY 4HE 3OUTHERN $ENMARK 'ROWTH &ORUM
AND THE REGIONAL %5 PROJECT h)O4STYRINGv WHICH IS A PROJECT ABOUT
ENERGY EFFICIENCY IN EMBEDDED CONTROL SYSTEMS USING "IG $ATA AND )O4
technologies. Q

Rune Torbensen is Postdoc at University of Southern Denmark, SDU Mechatronics, Sønderborg, Denmark
and has an IoT PhD with a focus on wireless embedded communication from Aalborg University. He
has used embedded Linux in most of his experiments during the past ten years. Recently, he became
interested in Big Data technology due to its strong relation with IoT.

Søren Top, associate Professor PhD, is a lecturer at University of Southern Denmark, SDU Mechatronics,
Sønderborg, Denmark. He has taught operating systems and embedded systems for decades, and he uses
Linux for both topics on a daily basis.

RESOURCES
Hadoop: http://hadoop.apache.org

Hadoop: The Definitive Guide by Tom White.

Rmr2: https://github.com/RevolutionAnalytics/RHadoop/wiki

Guide to creating virtual machines with the AWS interface:


HTTPSWWWYOUTUBECOMWATCHV)X)$UYAMU9

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

94 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 94 2/18/17 10:19 AM


The Best
SharePoint 2016 and
Office 365 Training!

• Choose from more than 80 classes


and panel sessions

• Improve your skills and broaden your


knowledge of Microsoft's collaboration
and productivity software

• Learn about SharePoint 2016, the latest


April 2-5, 2017 • AUSTIN, TEXAS on-premises server release from Microsoft

SPTechCon offers classes and tutorials for IT professionals, • Tips and tricks for working with SharePoint
business decision makers, information workers, developers and 2013 and 2010, and Office 365
software and information architects. Each presenter at SPTechCon
• Practical information you can put to use
is a true SharePoint expert, with many drawn from Microsoft’s
on the job right away!
tech teams or holding Microsoft MVP status.
• The most knowledgeable instructors working
Whether you’re looking to upgrade to a more current version, in SharePoint today
making a move to the cloud, or simply need answers to those
daunting problems you’ve been unable to overcome, SPTechCon
is the place for you! Come join us! www.sptechcon.com
A BZ Media Event

LJ275-March2017.indd 95 2/18/17 10:19 AM


FEATURE

Integrating
Web
Applications
with Apache
Learn how to write your own custom Apache configurations
to make your applications work the way you want.

ANDY CARLSON

PREVIOUS
NEXT
Feature:
V
V

Doc Searls’ EOF


Big Data Demonstrator

96 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 96 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

W
hen you deploy a web application, how do end users access
IT /FTEN WEB APPLICATIONS ARE SET BEHIND A GATEWAY DEVICE
THROUGH WHICH END USERS CAN ACCESS IT /NE OF THE POPULAR
products to act as an application gateway on Linux is the Apache Web
3ERVER !LTHOUGH IT CAN FUNCTION AS A NORMAL WEB SERVER IT ALSO HAS THE
ability to connect through it to other web servers.
In this article, I discuss what it takes to integrate a web application
INTO !PACHE 4HIS INCLUDES INTEGRATING THE (440 PROTOCOL FUNCTIONALITY
CUSTOMIZING CONTENT TO RENDER PROPERLY AND REUSING PIECES OF CONFIGURATION
/NCE YOU UNDERSTAND THOSE BASIC BITS OF FUNCTIONALITY YOULL HAVE THE TOOLS
YOU NEED TO MAXIMIZE YOUR WEB APPLICATIONS USABILITY 3O LETS GET STARTED

Crash Course in RegEx


! MECHANISM THAT ) USE THROUGHOUT THIS ARTICLE THAT MIGHT NEED A BRIEF
INTRODUCTION IS 2EGULAR %XPRESSIONS OR REGEX  2EGEX IS USED TO DEFINE A
TEXT PATTERN TO SEARCH FOR WITHIN A 52, OR TO FIND AND REPLACE TEXT WITHIN
CONTENT SUCH AS (4-, OR *AVA3CRIPT 4HE TEXT PROCESSING COMMAND sed
uses regex to do searches and substitutions.
For each example below there will be three parts: input, regex pattern
AND OUTPUT 4HE PATTERN WILL BE APPLIED TO THE INPUT TEXT AND DETERMINE
THE VALUE OF THE OUTPUT TEXT
Example 1:

Input:  
   Name:  Frank  Sinatra  
   Genre:  Jazz  
   Name:  2Pac  
   Genre:  Rap  
   Name:  Reel  Big  Fish  
   Genre:  Ska  
 
Regex  pattern:  "^Name:  "  
 
Output:  
   Name:  Frank  Sinatra  
   Name:  2Pac  
   Name:  Reel  Big  Fish

97 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 97 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

4HIS EXAMPLE SEARCHES THE INPUT TEXT FOR TEXT THAT MATCHES THE
pattern "^Name:  "  4HIS PATTERN SAYS h,OOK FOR THE TEXT .AME  AT
THE BEGINNING OF EACH LINEv 3INCE THERE ARE TWO LINES THAT BEGIN WITH
THAT TEXT ONLY THOSE TWO LINES ARE RETURNED 7HILE h>v REPRESENTS THE
BEGINNING OF A LINE hv REPRESENTS THE END OF A LINE 3O IF YOU WERE TO
APPLY THE PATTERN hAv TWO LINES WOULD BE RETURNED &RANK 3INATRA AND
3KA  ,ETS EXPAND ON THAT EXAMPLE AND USE THE INPUT FROM %XAMPLE 
with a new pattern.
Example 2:

Regex  pattern:  "^Name:  [0-­9]"  


 
Output:  
   Name:  2Pac

As you can see, I’ve taken the original regex pattern and added [0-­9]
TO THE END 4HIS WILL SEARCH FOR A SINGLE CHARACTER THAT CAN BE ANY NUMBER
FROM  TO  WHICH IS WHY h0ACv WAS THE ONLY LINE RETURNED 9OU ALSO CAN
SPECIFY A RANGE WITH ALPHABETIC CHARACTERS  [a-­z] or [A-­Z] ).
Along with pattern selection, you also can do substitution with regex.
4HERE ARE TWO FORMATS FOR REGEX SUBSTITUTIONS S\PATTERN\REPLACE\MODIFIER
OR SPATTERNREPLACEMODIFIER )N !PACHE ) FIND IT EASIER TO USE THE PIPE
STYLE SUBSTITUTION %XAMPLE  USES THE SAME INPUT WITH A NEW PATTERN
Example 3:

Regex  pattern:  "s|^(.*)Frank(.*)$|\1Dwezil\2|g"  


 
Output:  
   Name:  Dwezil  Sinatra  
   Genre:  Jazz  
   Name:  2Pac  
   Genre:  Rap  
   Name:  Reel  Big  Fish  
   Genre:  Ska  
   Name:  Dwezil  Zappa  
   Genre:  Unknown

98 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 98 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

4HIS PATTERN HAS A LOT TO DISSECT /NE OF THE GREAT FEATURES OF REGEX
IS THE ABILITY TO MATCH ANY CHARACTER 4HE DOT OPERATOR WILL MATCH ANY
ONE CHARACTER 4HE ASTERISK OPERATOR WILL MATCH  OR MORE OF WHATEVER
character or operator preceded it. Putting these two operators together
MATCHES  OR MORE OF ANY CHARACTER %NCLOSING THIS IN PARENTHESES ALLOWS
THE MATCHED TEXT TO BE REPRESENTED IN THE REPLACE PORTION OF THE PATTERN
with a variable. In this case, \1 REPRESENTS THE FIRST BLOCK OF TEXT WITHIN
parentheses and \2 REPRESENTS THE SECOND 4HE ONLY CHARACTERS THAT
ARE EXPLICITLY BEING MATCHED ARE h&RANKv !S SUCH THE LINES CONTAINING
h&RANKv WILL BE REPLACED WITH EVERYTHING UP TO h&RANKv REPRESENTED
by \1 h$WEZILv AND EVERYTHING FOLLOWING h&RANKv REPRESENTED BY \2 ).
!S YOU CAN SEE THE ENTIRETY OF THE TEXT INPUT WAS SENT TO THE OUTPUT
ALTHOUGH MODIFIED BY THE PATTERN

Protocol Integration
7HEN IT IS DECIDED THAT AN APPLICATION WOULD BENEFIT FROM !PACHE
integration, there is a high likelihood that it will reside on a separate
SERVER FROM !PACHE 4O INTEGRATE APPLICATIONS BEING ACCESSED VIA (440
FULLY ANY OR ALL OF THESE MODULES MAY BE USED mod_rewrite , mod_proxy ,
mod_ssl and mod_headers  %ACH OF THESE MODULES ALLOWS YOU TO
customize the way communication between the end user and web servers
OCCURS FROM MODIFYING (440 HEADER DATA TO MANAGING PROXY CONNECTIONS
to other servers.
First, let’s look at mod_rewrite  4HERE ARE A NUMBER OF DIRECTIVES
within the mod_rewrite MODULE BUT ) COVER ONLY A HANDFUL HERE
RewriteEngine , RewriteCond and RewriteRule  4HE RewriteEngine
DIRECTIVE SIMPLY ENABLES 52, REWRITING AND IS INVOKED AS FOLLOWS

RewriteEngine  on

RewriteRule ALLOWS THE SERVER TO RESPOND TO AN (440 REQUEST TO A SPECIFIC


52, BY AMONG OTHER THINGS RETURNING AN (440 REDIRECT CODE  OR 
WHICH WILL REDIRECT THE END USER TO A SPECIFIED 52, OR SEND A PROXIED REQUEST
TO A BACK END SERVER (ERES AN EXAMPLE OF ISSUING AN (440 REDIRECT

RewriteRule  /google  http://www.google.com  [R=301]

99 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 99 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

)N THIS EXAMPLE WHEN THE 52, OF GOOGLE IS ACCESSED THE SERVER WILL RESPOND
WITH AN (440  THAT WILL REDIRECT THE USER TO HTTPWWWGOOGLECOM
4HIS EXAMPLE WILL WORK ONLY IF THE REQUEST 52, IS EXACTLY EQUAL TO hGOOGLEv
)F THE NEED IS TO REDIRECT ON ANY 52, STARTING WITH hGOOGLEv YOU WOULD
DEFINE A CONDITIONAL REDIRECT USING RewriteCond AS FOLLOWS

RewriteCond  "%{REQUEST_URI}"  "/google.*$"  


RewriteRule  "^.*$"  http://www.google.com  [R=301]

4HE RewriteCond directive has two parts: a string value to check and a
SUBSTRING TO SEARCH FOR )N THIS EXAMPLE YOU ARE LOOKING IN THE REQUEST_URI
(440 SESSION VARIABLE FOR ANYTHING BEGINNING WITH hGOOGLEv )F THAT CONDITION
is met, the RewriteRule ON THE FOLLOWING LINE IS EXECUTED "ECAUSE YOU ARE
DETERMINING THE VALUE OF THE TARGET 52, IN THE RewriteCond THE VALUE OF THE
TARGET 52, IN THE RewriteRule IS DEFINED AS "^.*$".
4HE EXAMPLES GIVEN HERE ARE ALL USER FACING EVENTS LIKE A  REDIRECT
4HE RewriteRule DIRECTIVE ALSO CAN BE USED TO PROXY REQUESTS TO A SERVER
4HIS IS DONE BEHIND THE SCENES UNLIKE AN (440 REDIRECT SO THE REQUEST
IS FORWARDED WITHOUT THE USERS KNOWLEDGE ! PROXIED REQUEST MAY BE
CONFIGURED LIKE THE EXAMPLE BELOW

RewriteRule  "/home/(.*)$"  http://back-­end01.test:8080/$1  [P]

4HE ABOVE ILLUSTRATES AN EXAMPLE OF A VIRTUAL ROOT DIRECTORY 7HEN THE


USER ACCESSES ANYTHING UNDERNEATH HOME NOTE THE h
v EXPRESSION THE
REQUEST IS SENT TO BACK ENDTEST ON PORT  WITH THE LOCATION SET TO
THE 52, PATH BENEATH HOME &OR EXAMPLE IF THE USER TRIES TO ACCESS
HOMETESTIMAGEJPG THE REQUEST IS SENT TO BACK ENDTEST WITH
A LOCATION OF TESTIMAGEJPG ! PROXIED RewriteRule also may be used
in conjunction with RewriteCond FOR FURTHER CUSTOMIZATION .OTE THAT
THIS STATEMENT PROXIES ONLY THE (440 REQUEST 0ROXYING OF (440 RESPONSES
WILL REQUIRE mod_proxy .
!NOTHER OPTION FOR PROXYING (440 CONNECTIONS THROUGH !PACHE IS
mod_proxy , which provides ProxyPass , ProxyPassReverse and
ProxyPassMatch among many other directives that provide more robust
PROXYING OPTIONS ) FOCUS PRIMARILY ON THESE THREE DIRECTIVES HERE !S

100 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 100 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

mentioned previously, RewriteRule PROVIDES PROXYING OF (440 REQUESTS


,ETS COMPARE THE EXAMPLE ALREADY GIVEN FOR PROXYING WITH RewriteRule
AND AN EXAMPLE FOR ProxyPass :

ProxyPass  /home  http://back-­end01.test:8080/

4HIS ProxyPass STATEMENT PROVIDES ROUGHLY THE SAME LEVEL OF


FUNCTIONALITY AS THE RewriteRule statement with a more simplistic
COMMAND 7HEN A REQUEST COMES IN FOR ANY 52, BEGINNING WITH hHOMEv
THE REQUEST HEADER WILL BE REWRITTEN SO THAT THE REQUEST WILL BE RECEIVED
PROPERLY BY HTTPBACK ENDTEST #ONSIDER THE FOLLOWING FIRST
LINES OF AN (440 REQUEST

From  user  to  server:        GET  /home/test/image.jpg  HTTP/1.1  


From  server  to  back-­end:        GET  /test/image.jpg  HTTP/1.1

4HE FIRST LINE OF THE HEADER CONTAINS THE METHOD  GET in this case)
AND THE 52, BEING REQUESTED 7HEN THE SERVER RECEIVES THE REQUEST
FROM THE CLIENT IT STRIPS OFF hHOMEv AS SPECIFIED IN THE ProxyPass
DIRECTIVE AND FORWARDS THE REQUEST TO THE BACK END SERVER )F YOU
WANT TO PROXY RESPONSE PACKETS AS WELL AS REQUEST PACKETS THE
FOLLOWING ProxyPassReverse statement can be paired with the
previous ProxyPass statement:

ProxyPassReverse  /home  http://back-­end01.test:8080/

4HE SYNTAX IS EXACTLY THE SAME AS ProxyPass ADDING TO THE SIMPLICITY OF


the mod_proxy CONFIGURATION 4HIS WILL TAKE ANY (440 RESPONSE MATCHING
AN (440 REQUEST FOR HOME AND FORWARD THE RESPONSE BACK TO THE ORIGINAL
CLIENT )F YOU NEED TO ADD SOME PROGRAMMATIC PROXYING SIMILAR TO
RewriteCond ), you can use the ProxyPassMatch . When implementing
A FORWARDREVERSE PROXY CONFIGURATION ProxyPassMatch can replace
ProxyPass . Here’s an example:

ProxyPassMatch  "^/home/([a-­z0-­9]*/docs)"  http://docserver01.test:8080/$1  


ProxyPassReverse  /home  http://docserver01.test:8080/

101 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 101 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

Along with rewriting URLs, it


may be necessary to rewrite HTTP
request or response header fields.

4HIS EXAMPLE SUGGESTS THAT WITHIN THE HOME FOLDER THERE ARE MANY
SUB FOLDERS LETS SAY USER NAMES AND WITHIN EACH OF THOSE EXISTS A FOLDER
NAMED hDOCSv 4HE 53%2.!-%DOCS 52, EXISTS ON DOCSERVERTEST
IN THE ROOT OF THE WEB SERVER AS DENOTED BY THE  IN THE SERVER 52,
4HE ProxyPassReverse WILL FUNCTION IN THE SAME MANNER AS IT DID IN THE
previous example.
Securing websites with SSL in Apache is accomplished with mod_ssl .
!LTHOUGH ) WONT DISCUSS CONFIGURING 33, FROM THE GROUND UP A FEW
directives relate to proxied SSL connections: SSLProxyCheckPeerExpire ,
SSLProxyCheckPeerName and SSLProxyCheckPeerCN . It is a common
PRACTICE TO USE SELF SIGNED CERTIFICATES ON BACK END SERVERS PROVIDED A
VALID CERT IS IN PLACE ON THE USER FACING SERVER AND THESE DIRECTIVES ADDRESS
COMMON ISSUES THAT CAN ARISE WHEN USING SELF SIGNED CERTS !NY OF THESE
DIRECTIVES CAN HAVE ONE OF TWO ARGUMENTS PROVIDED hONv OR hOFFv )F SET TO
hOFFv SSLProxyCheckPeerExpire will skip checking the expiration date
ON THE 33, CERT USED ON A BACK END SERVER 4O AVOID CHECKING A CERTIFICATES
common name or alternate names against the server name used to access
a back end, set SSLProxyCheckPeerName TO hOFFv )N OLDER VERSIONS OF
Apache, you might be able to use SSLProxyCheckPeerCN SET TO hOFFv
INSTEAD OF SSLProxyCheckPeerName .
!LONG WITH REWRITING 52,S IT MAY BE NECESSARY TO REWRITE (440
REQUEST OR RESPONSE HEADER FIELDS )N !PACHE THIS IS DONE WITH
mod_headers  4HERE ARE ONLY TWO DIRECTIVES WITHIN THIS MODULE
Header and RequestHeader  4HESE DIRECTIVES ARE USED TO MODIFY
RESPONSE AND REQUEST HEADER FIELDS RESPECTIVELY -ANY ACTIONS CAN
BE USED WITH EITHER OF THESE DIRECTIVES BUT HERE LETS LOOK AT THE set
and edit ACTIONSˆFOR EXAMPLE

Header  set  ReceiveTime  "%t"

102 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 102 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

4HIS EXAMPLE WILL ADD AND REPLACE ANY EXISTING HEADER IN AN (440
response named ReceiveTime AND GIVE IT THE VALUE OF THE 5.)8 TIMESTAMP
WHEN THE REQUEST WAS RECEIVED BY THE SERVER REPRESENTED BY "%t" ).
)F YOU NEED TO REPLACE THE VALUE OF A HEADER THAT COMES FROM A
BACK END SERVER YOU WOULD USE THE edit action. Consider the
FOLLOWING EXAMPLE

Header  edit  Location  "^http://back-­end01.test:8080/(.*)$"    


 ´"http://public.test/$1"

4HIS EXAMPLE WILL REPLACE THE Location ATTRIBUTE IN AN (440 RESPONSE


WHICH WILL EXIST IN A  REDIRECT )F IT FINDS HTTPBACK ENDTEST
AT THE BEGINNING OF THE Location header, it replaces that part with
hHTTPPUBLICTESTv THE USER FACING 52, 

Content Integration
/NCE A REMOTE APPLICATION IS INTEGRATED WITH AN !PACHE SERVER FROM A
PROTOCOL STANDPOINT IT MAY BE NECESSARY TO INTEGRATE CONTENT 4HIS WILL
GENERALLY MANIFEST ITSELF AS 52,S CODED INTO (4-, OR *AVA3CRIPT THAT
ARE SPECIFIC TO A BACK END SERVER AND NOT TO A USER FACING SERVER 4HE
BASIC NECESSITY IS TO BE ABLE TO SEARCH AND REPLACE BITS OF (4-, OR
*AVA3CRIPT CONTENT SO THAT IT CAN RENDER AND PERFORM CORRECTLY WHEN
ACCESSED THROUGH AN !PACHE PROXY 4HE MODULE THAT ACCOMPLISHES
this is mod_substitute AND SPECIFICALLY THE Substitute directive.
Substitute ALLOWS A SIMPLE REGEX SUBSTITUTE TO BE PERFORMED ON THE
PAYLOAD DATA OF AN (440 RESPONSE
3OMETHING TO CONSIDER BEFORE ATTEMPTING TO REPLACE TEXT IS TO
ACCOUNT FOR WHETHER THE BACK END WEB SERVER COMPRESSES DATA BEFORE
SENDING IT OVER THE NETWORK )F IT DOES YOUR Substitute statements
MIGHT NOT WORK AS IT WILL BE SEARCHING FOR !3#)) TEXT WITHIN BINARY
COMPRESSED DATA 4O ACCOUNT FOR THIS YOU CAN INSTRUCT !PACHE TO
DECOMPRESS THE DATA MANIPULATE THE RESPONSE AND THEN RE COMPRESS
IT 4HIS IS DONE USING THE SetOutputFilter DIRECTIVE WHICH IS PART OF
!PACHE CORE FUNCTIONALITY (ERES HOW IT WORKS

SetOutputFilter  INFLATE;;SUBSTITUTE;;DEFLATE

103 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 103 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

2EADING THE ARGUMENTS FROM LEFT TO RIGHT THIS TELLS !PACHE TO


INFLATE DECOMPRESS THE DATA FROM THE BACK END SERVER PERFORM
the substitute and DEFLATE COMPRESS THE DATA BEFORE RETURNING IT
to the end user.
4HE Substitute statement uses a regex substitute expression. As I
MENTIONED PREVIOUSLY ) FOUND IT EASIER TO USE THE PIPE STYLE SUBSTITUTE
EXPRESSION IN !PACHE 4O RECAP THE SYNTAX IS S\SEARCH\REPLACE\OPTIONS 4WO
COMMON OPTIONS THAT ) TEND TO USE hIv WHICH DENOTES A CASE INSENSITIVE
SEARCH AND hNv TO ALLOW THE SEARCH AND REPLACE VALUES TO BE PROCESSED
as regex. Here’s a common use example:

Substitute  "s|(href="http)(://)back-­end01.test:8080|$1s$2public.test|in"

&OR THIS EXAMPLE LETS ASSUME THAT THE USER FACING SITE PUBLICTEST
RUNS (4403 AND THE BACK END SERVER BACK ENDTEST RUNS (440
ON PORT  4HIS WOULD BE A SOLUTION IF THE BACK END WEB SERVER
RETURNED HYPERLINKS THAT WERE SPECIFIC TO ITSELF AS OPPOSED TO THE
USER FACING SITE )N THE SEARCH PORTION OF THE REGEX SUBSTITUTE THIS
SPLITS OUT TWO GROUPS OF TEXT IN PARENTHESES (href=\"http) and
(://)  4HESE ARE BLOCKS OF TEXT THAT YOU WANT PRESERVED IN
THE REPLACE SECTION OF THE REGEX )N THE REPLACE YOU ARE INSERTING
AN hSv AFTER HTTP AND REPLACING THE HOSTNAMEPORT WITH THE
USER FACING SITE NAME !FTER PROCESSING THE RESULTING STRING WILL BE
href="https://public.test  4HIS WILL UPDATE HYPERLINKS THAT USE
hHREFv ATTRIBUTES A AND LINK  &OR IMG AND SCRIPT TAGS YOU
COULD USE THIS SAME 3UBSTITUTE STATEMENT AND REPLACE hHREFv
WITH hSRCv !NOTHER CONSIDERATION WOULD BE TO ACCOUNT FOR DOUBLE
OR SINGLE QUOTES DELIMITING ATTRIBUTE VALUES  href=’ vs. href=" ).
!NOTHER APPLICATION OF Substitute IS TO EXTEND THE FUNCTIONALITY
OF A PAGE WITHOUT MANIPULATING THE ORIGINAL SOURCE CODE #ONSIDER
THE FOLLOWING EXAMPLE

Substitute  "s|(<body.*>)|\1<div  style=\"font-­size:14pt;;  


´font-­weight:bold;;background-­color:#ff0000;;color:  
´#ffffff;;display:block;;text-­align:center;;\">This  site    
 ´will  be  down  for  24  hours  beginning  at  8  pm  tonight</div>|in"

104 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 104 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

If a website needs to be taken


off-line for maintenance, this
is an easy way to alert the user
population of the outage without
modifying the application itself.

)F A WEBSITE NEEDS TO BE TAKEN OFF LINE FOR MAINTENANCE THIS IS


AN EASY WAY TO ALERT THE USER POPULATION OF THE OUTAGE WITHOUT
MODIFYING THE APPLICATION ITSELF 4HIS EXAMPLE SIMPLY INSERTS A RED
BAR ALONG THE TOP OF THE PAGE RIGHT AFTER THE BODY TAG WHICH
DISPLAYS INFORMATION ABOUT THE OUTAGE $EPENDING ON HOW YOUR PAGE
is rendered, you might need to choose another tag to act as your
STARTING POINT INSTEAD OF BODY

Streamlining Future Integrations


!LL OF THE TOPICS PRESENTED HERE CAN BE CONFIGURED AND MAINTAINED
RELATIVELY EASILY IF YOU HAVE ONLY A FEW STATEMENTS )N THE REAL WORLD
THERE TYPICALLY WILL BE MANY SITES THAT USE A SIMILAR CONFIGURATION AND
HAVING TO DEFINE THE FUNCTIONALITY FOR EACH SITE CAN BE TIME CONSUMING
and can lead to mistakes. Luckily, Apache provides a mechanism to
REPEAT FUNCTIONALITY THROUGHOUT YOUR CONFIGURATION THROUGH THE USE OF
mod_macro  4HE <Macro> DIRECTIVE WITHIN AN !PACHE CONFIG FUNCTIONS
VERY MUCH LIKE A FUNCTION OR SUBROUTINE /NCE A MACRO IS DEFINED IT
CAN BE REFERENCED AS MANY TIMES AS IS NECESSARY LEAVING YOU WITH ONE
PLACE WITHIN YOUR CONFIG TO MAINTAIN YOUR DETAILED FUNCTIONALITY (ERES
an example macro:

<Macro  RedirectSecure  $host  $path>  


               RewriteCond  "%{REQUEST_URI}"  "^$path"  
               RewriteRule  "^/(.*)$"  "https://$host/$1"  
</Macro>

105 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 105 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

7HEN CALLED THIS MACRO WILL DEFINE A RewriteCond and RewriteRule


THAT IF THEY ACCESS A 52, STARTING WITH THE VALUE OF THE PATH ARGUMENT
WILL REDIRECT THE USER TO HTTPHOST WHERE HOST IS THE HOSTNAME
SPECIFIED AS A MACRO ARGUMENT AND  IS THE ENTIRE 52, PATH 4HE
FOLLOWING SYNTAX WOULD BE USED TO CALL THIS MACRO

Use  RedirectSecure  public.test  /users

3OMETHING TO CONSIDER IS THE LOCATION WITHIN THE !PACHE CONFIG FROM


which a macro is called. A RewriteRule FOR EXAMPLE CANNOT BE CALLED
outside a <VirtualHost> BLOCK !S SUCH IF THE MACRO IS CALLED OUTSIDE
a <VirtualHost> block, Apache will throw an error and not start.
Here’s another example:

<Macro  ReplaceContentURL  $backendurl  $publicurl>  


               Substitute  "s|(href=\")$backendurl|$1$publicurl|in"  
               Substitute  "s|(src=\")$backendurl|$1$publicurl|in"  
</Macro>

4HIS MACRO EXPANDS ON THE REPLACING OF 52,S THAT ) COVERED PREVIOUSLY


4HIS WILL SEARCH FOR TAG ATTRIBUTES OF hHREFv AND hSRCv AND REPLACE THE
HYPERLINKS OF THE BACK END SERVER WITH THAT OF THE USER FACING SERVER
(ERES AN EXAMPLE OF HOW THIS MIGHT BE CALLED

Use  ReplaceContentURL  http://back-­end01.test:8080  https://public.test

4HIS WILL SEARCH FOR HTTPBACK ENDTEST BEGINNING WITH


either href=" or src=" AND REPLACE THE 52, WITH HTTPSPUBLICTEST
-ACROS CAN BE USED FOR ANY PIECE OF !PACHE CONFIGURATION 4HEY
can be used to do small tasks as shown here as well as whole site
CONFIGURATIONS !LTHOUGH MACROS ARE PRETTY SIMPLE THEY MAKE
THE DIFFERENCE BETWEEN A LARGE AMOUNT OF DIFFICULT TO MAINTAIN
CONFIGURATION FILES AND A SIMPLIFIED REUSABLE CONFIGURATION
!T THIS POINT YOU HAVE SOME BASIC KNOWLEDGE OF INTEGRATING (440
CUSTOMIZING CONTENT AND REPRODUCING CONFIGURATION WITHIN !PACHE
Although many directives and modules weren’t covered here, this will

106 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 106 2/18/17 10:19 AM


FEATURE: Integrating Web Applications with Apache

be a great starting point and can help you get started with accessing
your applications through Apache. Q

Andy Carlson has worked in IT for the past 13 years doing networking and server administration. He is
thankful to have chosen a career that he loves, grows in and learns from. He and his amazing wife have
three daughters and a son, and they currently reside in Cincinnati, Ohio. He enjoys playing the guitar and
spending time with family and friends.

RESOURCES
The following are some articles I’ve found useful along with some example Apache
configs I’ve written.

Apache Module Reference (2.2): HTTPDAPACHEORGDOCSMOD

Apache Module Reference (2.4): HTTPDAPACHEORGDOCSMOD

Git Instaweb Reverse Proxy: HTTPSGISTGITHUBCOMBNGCFFDBEF

Monit Reverse Proxy: HTTPSGISTGITHUBCOMBNGEABAEDABCBC

Adobe Experience Manager Apache Config: HTTPSGITHUBCOMBNGAEM DISPATCHER CONFIG

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

107 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 107 2/18/17 10:19 AM


FREE DOWNLOADS

A Field Guide to the World


of Modern Data Stores
4HERE ARE MANY TYPES OF DATABASES AND DATA ANALYSIS TOOLS TO CHOOSE FROM WHEN
BUILDING YOUR APPLICATION 3HOULD YOU USE A RELATIONAL DATABASE (OW ABOUT A
KEY VALUE STORE -AYBE A DOCUMENT DATABASE )S A GRAPH DATABASE THE RIGHT FIT
7HAT ABOUT POLYGLOT PERSISTENCE AND THE NEED FOR ADVANCED ANALYTICS

)F YOU FEEL A BIT OVERWHELMED DONT WORRY 4HIS GUIDE LAYS OUT THE VARIOUS
DATABASE OPTIONS AND ANALYTIC SOLUTIONS AVAILABLE TO MEET YOUR APPS UNIQUE NEEDS

9OULL SEE HOW DATA CAN MOVE ACROSS DATABASES AND DEVELOPMENT LANGUAGES SO YOU CAN WORK IN YOUR FAVORITE
ENVIRONMENT WITHOUT THE FRICTION AND PRODUCTIVITY LOSS OF THE PAST

Sponsor: IBM

> https://geekguide.linuxjournal.com/content/field-guide-world-modern-data-stores

Why NoSQL? Your database options in the new


non-relational world
4HE CONTINUAL INCREASE IN WEB MOBILE AND )O4 APPLICATIONS ALONGSIDE EMERGING TRENDS SHIFTING ONLINE
CONSUMER BEHAVIOR AND NEW CLASSES OF DATA IS CAUSING DEVELOPERS TO REEVALUATE HOW THEIR DATA IS
STORED AND MANAGED 4ODAYS APPLICATIONS REQUIRE A DATABASE THAT IS CAPABLE OF PROVIDING A SCALABLE
FLEXIBLE SOLUTION TO EFFICIENTLY AND SAFELY MANAGE THE MASSIVE FLOW OF DATA TO AND FROM A GLOBAL USER BASE

$EVELOPERS AND )4 ALIKE ARE FINDING IT DIFFICULT AND SOMETIMES EVEN IMPOSSIBLE TO QUICKLY INCORPORATE ALL OF THIS DATA INTO
THE RELATIONAL MODEL WHILE DYNAMICALLY SCALING TO MAINTAIN THE PERFORMANCE LEVELS USERS DEMAND 4HIS IS CAUSING MANY TO
LOOK AT .O31, DATABASES FOR THE FLEXIBILITY THEY OFFER AND IS A BIG REASON WHY THE GLOBAL .O31, MARKET IS FORECASTED TO
NEARLY DOUBLE AND REACH 53$ BILLION IN 

Sponsor: IBM

> https://geekguide.linuxjournal.com/content/why-nosql-your-database-options-new-non-relational-world

Estimating CPU Per Query With Weighted Linear Regression


9OUR DATABASE SERVER IS SUDDENLY USING A LOT OF #05 RESOURCES 1UICK WHAT CAUSED IT 4HIS IS A FAMILIAR
QUESTION FOR ENGINEERS OF ALL PERSUASIONS !ND ITgS OFTEN IMPOSSIBLE TO ANSWER

4HERE ARE GOOD REASONS WHY ITgS HARD TO FIGURE OUT WHAT CONSUMES RESOURCES LIKE #05 )/ AND MEMORY IN
A COMPLEX PIECE OF SOFTWARE SUCH AS A DATABASE 4HE FIRST PROBLEM IS THAT MOST DATABASE SERVER SOFTWARE
DOESNgT OFFER ANY WAY TO MEASURE OR INSPECT THAT TYPE OF PERFORMANCE DATA 4HE DATABASE SERVER ISNgT
OBSERVABLE 4HIS PROBLEM ARISES IN TURN FROM THE COMPLEXITY OF THE DATABASE SERVER SOFTWARE AND THE WAY
IT DOES ITS WORK WHICH ACTUALLY PRECLUDES MEASURING RESOURCE CONSUMPTION ACCURATELY

Author: Baron Schwartz

3PONSOR 6IVID#ORTEX

> https://geekguide.linuxjournal.com/content/estimating-cpu-query-weighted-linear-regression

108 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 108 2/18/17 10:19 AM


FREE DOWNLOADS

Database Performance Monitoring Buyer’s Guide


-ORE AND MORE COMPANIES HAVE BEGUN TO RECOGNIZE DATABASE PERFORMANCE MANAGEMENT AS
A VITAL NEED $ESPITE ITS WIDESPREAD IMPORTANCE GOOD DATABASE PERFORMANCE MANAGEMENT
REQUIRES SPECIALIZED EXPERTISE WITH CUSTOM APPROACHES YET ALL TOO OFTEN ORGANIZATIONS RELY ON
ONE SIZE FITS ALL SOLUTIONS THAT THEORETICALLY CHECK THE BOX BUT IN PRACTICE DO LITTLE OR NOTHING
TO HELP THEM FIND OR PREVENT DATABASE RELATED OUTAGES AND PERFORMANCE PROBLEMS

4HIS BUYERgS GUIDE IS DESIGNED TO HELP YOU UNDERSTAND WHAT DATABASE MANAGEMENT REALLY
REQUIRES SO YOUR INVESTMENTS IN A SOLUTION PROVIDE THE GREATEST POSSIBLE ULTIMATE VALUE

3PONSOR 6IVID#ORTEX

> https://geekguide.linuxjournal.com/content/database-performance-monitoring-buyer%E2%80%99s-guide

The Essential Guide To Queueing Theory


7HETHER YOURE AN ENTREPRENEUR ENGINEER OR MANAGER LEARNING ABOUT QUEUEING THEORY IS A
GREAT WAY TO BE MORE EFFECTIVE 1UEUEING THEORY IS FUNDAMENTAL TO GETTING GOOD RETURN ON
YOUR EFFORTS 4HATS BECAUSE THE RESULTS YOUR SYSTEMS AND TEAMS PRODUCE ARE HEAVILY INFLUENCED
by how much waiting takes place, and waiting is waste. Minimizing this waste is extremely
IMPORTANT )TS ONE OF THE BIGGEST LEVERS YOU WILL FIND FOR IMPROVING THE COST AND PERFORMANCE
OF YOUR TEAMS AND SYSTEMS

Author: Baron Schwartz

3PONSOR 6IVID#ORTEX

> https://geekguide.linuxjournal.com/content/essential-guide-queueing-theory

Sampling a Stream of Events


With a Probabilistic Sketch
Stream processing is a hot topic today. As modern Big Data processing systems have evolved,
STREAM PROCESSING HAS BECOME RECOGNIZED AS A FIRST CLASS CITIZEN IN THE TOOLBOX 4HATS BECAUSE
WHEN YOU TAKE AWAY THE HOW OF "IG $ATA AND LOOK AT THE UNDERLYING GOALS AND END RESULTS
DERIVING REAL TIME INSIGHTS FROM HUGE HIGH VELOCITY HIGH VARIETY STREAMS OF DATA IS A FUNDA
MENTAL CORE USE CASE 4HIS EXPLAINS THE EXPLOSIVE POPULARITY OF SYSTEMS SUCH AS !PACHE +AFKA
!PACHE 3PARK !PACHE 3AMZA !PACHE 3TORM AND !PACHE !PEXˆTO NAME JUST A FEW

Author: Baron Schwartz

3PONSOR 6IVID#ORTEX

> https://geekguide.linuxjournal.com/content/sampling-stream-events-probabilistic-sketch

109 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 109 2/18/17 10:19 AM


EOF

The Problem
with “Content” DOC SEARLS
Real journalism is getting programmatically
corrupted and harder to find. Fortunately, Doc Searls is Senior
there’s a fix. Editor of Linux Journal.
He is also a fellow with
the Berkman Center for
Internet and Society
at Harvard University
PREVIOUS and the Center for
V

Information Technology
Feature: Integrating Web Applications with Apache
and Society at
UC Santa Barbara.

B
ack in the early ’00s, John Perry Barlow
(https://en.wikipedia.org/wiki/John_Perry_Barlow)
SAID h) DIDNT START HEARING ABOUT CONTENT UNTIL
THE CONTAINER BUSINESS FELT THREATENEDv Linux Journal was
ONE OF THOSE CONTAINERSˆSO WAS EVERY OTHER MAGAZINE
NEWSPAPER AND BROADCAST STATION 4ODAY THOSE CONTAINERS
ARE BOBBING AROUND IN AN OCEAN OF hCONTENTv ON THE
INTERNET 7ORSE THE STUFF INSIDE THE CONTAINERS WHICH WE
USED TO CALL hEDITORIALv IS NOW A BREED OF hCONTENTv TOO
)N THE OLD DAYS EDITORIAL LIVED ON ONE SIDE OF A
h#HINESE WALLv BETWEEN ITSELF AND THE PUBLISHING
SIDE OF A NEWSPAPER OR MAGAZINE 4HE SAME WENT
FOR THE PROGRAMMING AND ADVERTISING SIDES OF A
COMMERCIAL BROADCAST STATION OR NETWORK 4HE WALL
WAS TRANSPARENT MEANING IT WAS POSSIBLE FOR A WRITER
A PHOTOGRAPHER A NEWSCASTER OR A PERFORMING ARTIST TO

110 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 110 2/18/17 10:19 AM


EOF

SEE WHAT FUNDED THE OPERATION BUT THE ETHICAL THING WAS TO IGNORE WHAT
HAPPENED ON THE OTHER SIDE OF THAT WALL 7HICH WAS EASY TO DO BECAUSE
EVERYTHING ON THE OTHER SIDE OF THAT WALL WAS SOMEBODY ELSES JOB
4ODAY THAT WALL HAS BEEN DESTROYED BY THE IMPERATIVES OF hCONTENT
PRODUCTIONv WHICH IS THE NEW JOB OF JOURNALISTS AND EVERYBODY ELSE
DEVOTED TO hGENERATING CONTENTv IN MAXIMUM VOLUMES ALL THE BETTER TO
ATTRACT hPROGRAMMATICv ADVERTISING
9OU CAN SEE THE WRECKAGE OF ONE SUCH WALL IN A *ANUARY  The New
York Times STORY TITLED h)N .EW *ERSEY /NLY A &EW -EDIA 7ATCHDOGS !RE
,EFTv HTTPSWWWNYTIMESCOMNYREGIONIN NEW JERSEY ONLY A
FEW MEDIA WATCHDOGS ARE LEFTHTML?R), by David Chen. In it he writes,
h4HE Star-Ledger, which almost halved its newsroom eight years ago, has
MUTATED INTO A DIGITAL MEDIA COMPANY REQUIRING MOST REPORTERS TO REACH AN
EVER INCREASING QUOTA OF PAGE VIEWS AS PART OF THEIR COMPENSATIONv
!S ) EXPLAINED IN MY *ANUARY  ARTICLE h7HAT 7E #AN $O WITH
Ad Blocking’s Leverage” (HTTPWWWLINUXJOURNALCOMCONTENTWHAT
WE CAN DO AD BLOCKINGS LEVERAGE), the advertising we’re talking about
here isn’t the old Madison Avenue kind that lived on the other side
OF JOURNALISMS #HINESE WALL )TS A NEW ALL DIGITAL KIND CALLED adtech.
While adtech is called advertising and looks like advertising, it is
ACTUALLY A BREED OF DIRECT MARKETING https://en.wikipedia.org/wiki/
Direct_marketing A COUSIN OF SPAM DESCENDED FROM JUNK MAIL
,IKE JUNK MAIL ADTECH IS DATA DRIVEN WANTS TO GET PERSONAL FINDS
SUCCESS IN TINY PERCENTAGE RESPONSES AND EXCUSES MASSIVE NEGATIVE
EXTERNALITIES 4HOSE INCLUDE WANTON AND UNWELCOME SURVEILLANCE
ANNOYING THE CRAP OUT OF PEOPLE AND FILLING THE WORLD WITH CRAPˆ
INCLUDING FAKE NEWS AND FRAUDULENT ADVERTISING
(ERES ONE WAY TO TELL THE DIFFERENCE BETWEEN REAL ADVERTISING AND
adtech, using the Star-Ledger as an example:

Q Real advertising wants to be in the Star-Ledger because it values the


paper’s journalism and readership.

Q !DTECH WANTS TO PUSH ADS AT READERS ANYWHERE IT CAN FIND THEM BASED
on gathered intelligence, algorithms and whatever else shows up in live
AUCTION MARKETS FOR EYEBALLS

111 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 111 2/18/17 10:19 AM


EOF

)N THE OLD ADVERTISING SUPPORTED PUBLISHING WORLD journalism was what


MATTERED MOST )N THE NEW ADTECH SUPPORTED PUBLISHING WORLD content is
what matters most.
2EAL ADVERTISERS IN THE OLD PUBLISHING WORLD WERE FLATTERED TO BE IN THE
Star-Ledger !DTECH ORIENTED ADVERTISERS IN THE NEW PUBLISHING WORLD
JUST WANT TO hGO DIGITALv WHATEVER IT TAKES !ND THERE ARE THOUSANDS OF
intermediaries (HTTPCDNCHIEFMARTECCOMWP CONTENTUPLOADS
MARKETING?TECHNOLOGY?LANDSCAPE?JPG) to help with that.
!S ) WROTE IN h3EPARATING ADVERTISINGS WHEAT AND CHAFFv
(HTTPSMEDIUMCOM DSEARLSSEPARATING ADVERTISINGS WHEAT AND
CHAFF ADFCBPGUJYIM IT IS BECAUSE OF THAT ORIENTATION AND
THOSE INTERMEDIARIES THAT h-ADISON !VENUE FELL ASLEEP DIRECT RESPONSE
MARKETING ATE ITS BRAIN AND IT WOKE UP AS AN ALIEN REPLICA OF ITSELFv
4HATS ALSO WHY TO OPERATE IN PUBLISHINGS NEW BODY SNATCHED ECONOMY
JOURNALISTS ARE INCENTIVIZED TO MEET THAT hEVER INCREASING QUOTA OF PAGE
VIEWSv 7HEN THE INCENTIVES ARE VOLUME BASED WHAT HAPPENS TO QUALITY
It is essential to note that adtech, by design, doesn’t care about
JOURNALISM AT ALL 4HATS BECAUSE ADTECH VALUES A MAXIMIZED SUM OF
CONTENT IN THE WORLD REGARDLESS OF HOW GOOD THAT CONTENT IS OR WHERE
IT COMES FROM 4HE MORE CONTENT THE MORE PLACES ADS CAN BE RUN
It is also ridiculously easy to make adtech money with content,
especially since there is nothing about content as a substance that
REQUIRES FACTS TO BACK IT UP 4HIS IS WHY ACCORDING TO Buzzfeed
(HTTPSWWWBUZZFEEDCOMCRAIGSILVERMANHOW MACEDONIA BECAME A
GLOBAL HUB FOR PRO TRUMP MISINFOUTM?TERMGC!WKW0!GSDXD2*%Z),
TEENAGERS IN ONE TOWN IN -ACEDONIA MADE AS MUCH AS   A DAY
BY GENERATING FAKE NEWS SUCH AS h0OPE ENDORSES 4RUMPv DURING THE
 53 PRESIDENTIAL ELECTION
)N MY *ANUARY  %/& h$EBUGGING $EMOCRACYv
(HTTPWWWLINUXJOURNALCOMCONTENTDEBUGGING DEMOCRACY), I made the
MISTAKE OF OPENING WITH NEGATIVE REMARKS ABOUT THE WINNER OF THAT ELECTION
WHICH DISTRACTED READERS FROM MY MAIN POINT WHICH WAS THAT JOURNALISM IS
CORRUPTED MARGINALIZED AND SUFFERING IN A WORLD WHERE A BUSINESS BASED ON
surveillance stokes people’s prejudices and drives them into mutually hostile
echo chambers (HTTPGRAPHICSWSJCOMBLUE FEED RED FEED), damaging
every democracy that depends on having at least some common ground on

112 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 112 2/18/17 10:19 AM


EOF

WHICH AGREEMENT OR AT LEAST COMPROMISE CAN BE FOUND !ND ) THOUGHT THE


TOPIC WAS A GOOD ONE FOR Linux Journal because readers such as ours are in
A GOOD POSITION TO HELP FIX IT
) ALSO THINK PUBLISHING NEEDS TO RE BRAIN -ADISON !VENUE AND ITS
employers who are drunk on digital (https://medium.com/@dsearls/
TV VIEWERS TO MADISON AVENUE PLEASE QUIT DRIVING DRUNK ON ADTECH
ABKLGYHAV) and demand real advertising by real
advertisers who want to sponsor real journalism.
4O SUPPORT THAT EFFORT ) WILL NOW CROSS OUR OWN #HINESE WALL HERE AT
Linux Journal and thank the advertisers listed each month below my
COLUMN 4HOSE ADVERTISERS ARE BRANDS IN THE BEST SENSE OF THE WORD ) HOPE
we attract more like them to Linux Journal WITH THIS SIMPLE FACT EVEN
though countless $billions (or perhaps $trillions by now) have been spent
on adtech, not one brand has been built by it.
"ONUS LINK $ON -ARTIS h4ARGETING FAILURE LEGIT SITES LOSE
intermediaries win”: (HTTPZGPORGTARGETED ADVERTISING CONSIDERED HARMFUL
TARGETING FAILURE LEGIT SITES LOSE INTERMEDIARIESWIN).Q

Send comments or feedback via


http://www.linuxjournal.com/contact
or to [email protected].

RETURN TO CONTENTS

ADVERTISER INDEX
Thank you as always for supporting our advertisers by buying their products!
ADVERTISER URL PAGE #
ATTENTION ADVERTISERS
$RUPAL#ON "ALTIMORE HTTPSEVENTSDRUPALORGBALTIMORE  The Linux Journal brand’s following has grown
$RUPALIZEME HTTPDRUPALIZEME  to a monthly readership nearly one million strong.
(0# 7ALLSTREET HTTPWWWFLAGGMGMTCOMLINUX  Encompassing the magazine, Web site, newsletters
,IBRE 0LANET  HTTPLIBREPLANETORGCONFERENCE  and much more, Linux Journal offers the ideal con-
,INUX&EST .ORTHWEST HTTPLINUXFESTNORTHWESTORG 
tent environment to help you reach your marketing
0EER  (OSTING HTTPGOPEERCOMLINUX 
objectives. For more information, please visit
3#!,% X HTTPWWWSOCALLINUXEXPOORG 
http://www.linuxjournal.com/advertising.
3ILICON -ECHANICS HTTPWWWSILICONMECHANICSCOM 
304ECH#ON HTTPWWWSPTECHCONCOM 
353% HTTPSUSECOMSTORAGE 

113 | March 2017 | http://www.linuxjournal.com

LJ275-March2017.indd 113 2/18/17 10:19 AM


Instant Access to Premium
Online Drupal Training
Instant access to hundreds of hours of Drupal
training with new videos added every week!

Learn from industry experts with real world


H[SHULHQFHEXLOGLQJKLJKSURȴOHVLWHV

Learn on the go wherever you are with apps


for iOS, Android & Roku

We also offer group accounts. Give your


whole team access at a discounted rate!

Learn about our latest video releases and


RIIHUVȴUVWE\IROORZLQJXVRQ)DFHERRNDQG
7ZLWWHU #GUXSDOL]HPH 

Go to http://drupalize.me and
get Drupalized today!

LJ275-March2017.indd 114 2/18/17 10:19 AM


Where every interaction matters.

break down
your innovation barriers
power your business to its full potential
When you’re presented with new opportunities, you want to focus on turning
them into successes, not whether your IT solution can support them.

Peer 1 Hosting powers your business with our wholly owned FastFiber NetworkTM,

solutions that are secure, scalable, and customized for your business.

Unsurpassed performance and reliability help build your business foundation to


be rock-solid, ready for high growth, and deliver the fast user experience your
customers expect.

Want more on cloud?


Call: 844.855.6655 | go.peer1.com/linux | Vew Cloud Webinar:

Public and Private Cloud | Managed Hosting | Dedicated Hosting | Colocation

LJ275-March2017.indd 115 2/18/17 10:19 AM

You might also like