Vickie Data Analytics
Vickie Data Analytics
Data analytics is the pursuit of extracting meaning from raw data using specialized
computer systems. These systems transform, organize, and model the data to draw conclusions
and identify patterns.
Data Analytics refers to the set of quantitative and qualitative approaches for deriving
valuable insights from data. It involves many processes that include extracting data and
categorizing it in order to derive various patterns, relations, connections, and other such valuable
insights from it. Today, almost every organization has morphed itself into a data-driven
organization, and this means that they are deploying an approach to collect more data that is
related to their customers, markets, and business processes. This data is then categorized, stored,
and analyzed to make sense out of it and derive valuable insights from it.
1
that, they employed data warehouses, but data warehouses generally cannot handle the scale of
big data cost-effectively.
While data warehouses are certainly a relevant form of data analytics, the term data
analytics is slowly acquiring a specific subtext related to the challenge of analyzing data of
massive volume, variety, and velocity.
Prescriptive Analytics: This is the type of analytics that talks about an analysis
based on the rules and recommendations in order to prescribe a certain analytical
path for the organization.
Predictive Analytics: Predictive analytics ensures that the path is predicted for the
future course of action.
Diagnostic Analytics: This is about looking into the past and determining why a
certain thing happened. This type of analytics usually revolves around working on
a dashboard.
Descriptive Analytics: In descriptive analytics, you work based on the incoming
data and for the mining of it you deploy analytics and come up with a description
based on the data.
2
What are the various tools used in Data Analytics
In this section, you will be familiarized with the tools used in the Big Data Analytics domain.
Here is the list of analytical courses that you can take up for a better career in Big Data
Analytics:
Apache Spark: Spark is a framework for real-time Data Analytics which is part of the Hadoop
ecosystem.
Python: This is one of the most versatile programming languages that is rapidly being deployed
for various applications including Machine Learning.
SAS: SAS is an advanced analytical tool that is being used for working with huge volumes of
data and deriving valuable insights from it.
Hadoop: It is the most popular big data framework that is being deployed by the widest range of
organizations from around the world for making sense of their big data.
SQL: The structured query language (SQL) is used for working with relational database
management systems.
Tableau: This is the most popular Business Intelligence tool that is deployed for the purpose of
data visualization and business analytics.
Splunk: Splunk is the tool of choice for parsing the machine-generated data and deriving
valuable business insights out of it.
R Programming: R is the Number 1 programming language that is being used by Data Scientists
for the purpose of statistical computing and graphical applications alike.
3
Who use Data Analytics
Data analytics technologies and techniques are widely used in commercial industries to enable
organizations to make more-informed business decisions and by scientists and researchers to
verify or disprove scientific models, theories and hypotheses.
Key Takeaways
Data analytics is the science of analyzing raw data in order to make conclusions about
that information.
The techniques and processes of data analytics have been automated into mechanical
processes and algorithms that work over raw data for human consumption.
Data analytics help a business optimize its performance.
Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from structured and unstructured data.
4
in its survey of the contemporary data processing methods that are used in a wide range of
applications.
In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe
for their biennial conference. Here, for the first time, the term data science is included in the title
of the conference ("Data Science, classification, and related methods"), after the term was
introduced in a roundtable discussion by Chikio Hayashi.
In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?"
for his appointment to the H. C. Carver Professorship at the University of Michigan. In this
lecture, he characterized statistical work as a trilogy of data collection, data modeling and
analysis, and decision making. In his conclusion, he initiated the modern, non-computer science,
usage of the term "data science" and advocated that statistics be renamed data science and
statisticians data scientists. Later, he presented his lecture entitled "Statistics = Data Science?" as
the first of his 1998 P.C. Mahalanobis Memorial Lectures.These lectures honor Prasanta Chandra
Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute.
In April 2002, the International Council for Science (ICSU): Committee on Data for Science and
Technology (CODATA) started the Data Science Journal, a publication focused on issues such
as the description of data systems, their publication on the internet, applications and legal issues.
Shortly thereafter, in January 2003, Columbia University began publishing The Journal of Data
Science, which provided a platform for all data workers to present their views and exchange
ideas. The journal was largely devoted to the application of statistical methods and quantitative
research. In 2005, The National Science Board published "Long-lived Digital Data Collections:
Enabling Research and Education in the 21st Century" defining data scientists as "the
information and computer scientists, database and software and programmers, disciplinary
experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the
successful management of a digital data collection" whose primary activity is to "conduct
creative inquiry and analysis."
Around 2007,[citation needed] Turing award winner Jim Gray envisioned "data-driven science"
as a "fourth paradigm" of science that uses the computational analysis of large data as primary
scientific method and "to have a world in which all of the science literature is online, all of the
science data is online, and they interoperate with each other."
5
In the 2012 Harvard Business Review article "Data Scientist: The Sexiest Job of the 21st
Century", DJ Patil claims to have coined this term in 2008 with Jeff Hammerbacher to define
their jobs at LinkedIn and Facebook, respectively. He asserts that a data scientist is "a new
breed", and that a "shortage of data scientists is becoming a serious constraint in some sectors",
but describes a much more business-oriented role.
In 2013, the IEEE Task Force on Data Science and Advanced Analyticswas launched. In 2013,
the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg,
establishing the European Association for Data Science (EuADS). The first international
conference: IEEE International Conference on Data Science and Advanced Analytics was
launched in 2014. In 2014, General Assembly launched student-paid bootcamp and The Data
Incubator launched a competitive free data science fellowship. In 2014, the American Statistical
Association section on Statistical Learning and Data Mining renamed its journal to "Statistical
Analysis and Data Mining: The ASA Data Science Journal" and in 2016 changed its section
name to "Statistical Learning and Data Science". In 2015, the International Journal on Data
Science and Analytics was launched by Springer to publish original work on data science and big
data analytics. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name
of the Society "Data Science Society" at the third ECDA conference at the University of Essex,
Colchester, UK.
Relationship to Statistics
"Data science" has recently become a popular term among business executives. However, many
critical academics and journalists see no distinction between data science and statistics, whereas
others consider it largely a popular term for "data mining" and "big data". Writing in Forbes, Gil
Press argues that data science is a buzzword without a clear definition and has simply replaced
“business analytics” in contexts such as graduate degree programs. In the question-and-answer
section of his keynote address at the Joint Statistical Meetings of American Statistical
Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term
for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some
way and people shouldn’t berate the term statistician.” Similarly, in business sector, multiple
researchers and analysts state that data scientists alone are far from being sufficient in granting
companies a real competitive advantage and consider data scientists as only one of the four
greater job families companies require to leverage big data effectively, namely: data analysts,
data scientists, big data developers and big data engineers.
On the other hand, responses to criticism are as numerous. In a 2014 Wall Street Journal article,
Irving Wladawsky-Berger compares the data science enthusiasm with the dawn of computer
science. He argues data science, like any other interdisciplinary field, employs methodologies
and practices from across the academia and industry, but then it will morph them into a new
discipline. He brings to attention the sharp criticisms of computer science, now a well respected
academic discipline, had to once face. Likewise, NYU Stern's Vasant Dhar, as do many other
6
academic proponents of data science, argues more specifically in December 2013 that data
science is different from the existing practice of data analysis across all disciplines, which
focuses only on explaining data sets. Data science seeks actionable and consistent pattern for
predictive uses. This practical engineering goal takes data science beyond traditional analytics.
Now the data in those disciplines and applied fields that lacked solid theories, like health science
and social science, could be sought and utilized to generate powerful predictive models.
In an effort similar to Dhar's, Stanford professor David Donoho, in September 2015, takes the
proposition further by rejecting three simplistic and misleading definitions of data science in lieu
of criticisms. First, for Donoho, data science does not equate to big data, in that the size of the
data set is not a criterion to distinguish data science and statistics. Second, data science is not
defined by the computing skills of sorting big data sets, in that these skills are already generally
used for analyses across all disciplines. Third, data science is a heavily applied field where
academic programs right now do not sufficiently prepare data scientists for the jobs, in that many
graduate programs misleadingly advertise their analytics and statistics training as the essence of
a data science program. As a statistician, Donoho, following many in his field, champions the
broadening of learning scope in the form of data science, like John Chambers who urges
statisticians to adopt an inclusive concept of learning from data, or like William Cleveland who
urges to prioritize extracting from data applicable predictive tools over explanatory theories.
Together, these statisticians envision an increasingly inclusive applied field that grows out of
traditional statistics and beyond.
For the future of data science, Donoho projects an ever-growing environment for open science
where data sets used for academic publications are accessible to all researchers. US National
Institute of Health has already announced plans to enhance reproducibility and transparency of
research data. Other big journals are likewise following suit. This way, the future of data science
not only exceeds the boundary of statistical theories in scale and methodology, but data science
will revolutionize current academia and research paradigms. As Donoho concludes, "the scope
and impact of data science will continue to expand enormously in coming decades as scientific
data and data about science itself become ubiquitously available."
The continually increasing access to data is possible due to advancements in technology and
collection techniques. Individuals buying patterns and behavior can be monitored and predictions
made based on the information gathered.
7
However, the ever-increasing data is unstructured and requires parsing for effective decision
making. This process is complex and time-consuming for companies—hence, the emergence of
data science.
Data mining applies algorithms to the complex data set to reveal patterns that are then used to
extract useful and relevant data from the set. Statistical measures or predictive analytics use this
extracted data to gauge events that are likely to happen in the future based on what the data
shows happened in the past.
Machine learning is an artificial intelligence tool that processes mass quantities of data that a
human would be unable to process in a lifetime. Machine learning perfects the decision model
presented under predictive analytics by matching the likelihood of an event happening to what
actually happened at a predicted time.
Using analytics, the data analyst collects and processes the structured data from the machine
learning stage using algorithms. The analyst interprets, converts, and summarizes the data into a
cohesive language that the decision-making team can understand. Data science is applied to
practically all contexts and, as the data scientist's role evolves, the field will expand to
encompass data architecture, data engineering, and data administration.
Companies such as Netflix mine big data to determine what products to deliver to its users.
Netflix also uses algorithms to create personalized recommendations for users based on their
viewing history. Data science is evolving at a rapid rate, and its applications will continue to
change lives into the future.
8
KEY TAKEAWAYS
Advances in technology, the Internet, social media, and the use of technology have all
increased access to big data.
Data science uses techniques such as machine learning and artificial intelligence to
extract meaningful information and to predict future patterns and behaviors.
The field of data science is growing as technology advances and big data collection and
analysis techniques become more sophisticated.