1.1.1.1-Transcription
1.1.1.1-Transcription
Hello and welcome to MSBA 300. This course sets the stage for the rest of the program. Topics that you will be learning
have to do with the principles, the technologies, the organisational structures, the data structures and the applications
of analytics in the business world. You're also going to be learning about some of the techniques and some of the
principles that are associated with the challenges of being a data analyst or a data scientist.
Whether you're new to this or whether you've been working for a very long time in this field, the principles that you're
going to learn in this class brings you all together at the same level so we can all start at the same place. So, remember
this course is an overview. It's not going to dive deeply into any of the areas that we touch on that's for the rest of the
curriculum to do. So, let's get started.
So, most evolving disciplines that we deal with have new ideas and new approaches and analytics isn't any different.
There is a myriad of individuals who are contributing to this discipline and they all have their own ideas, and as far as
it goes, the discipline becomes very broad, but at the same time, it's very complex. So, it's important that we start
defining some of these terms that you're going to be using over the next several weeks, so that you get a common
understanding of where analytics is going, and as I said as the discipline grows, some of these things are going to get
more clarity than they have right now.
The other people who have written a book that I think is very important for us understand is data science and its
relationship to big data and data-driven decision making. So, these two books and are going to set the context for
what you're going to be learning. So, please make sure you go back and take a look at these articles.
So, data is now incorporated into every profession aspect of the global market. You can't go online, you can't watch
TV without hearing something about data whether it's an election or whether it's COVID or whether it's how many
people are buying a product, it's always out there. So, companies are constantly seeking methods to gain a competitive
advantage over their competition because it's a highly competitive business environment, and it's actually driven by
the access to more and more data.
So, how do companies do this? How do they make everything available to their customers and take a very good shot
at making sure they're going to build their competitive advantage? Well, let's see how Netflix does it. As you probably
already know, Netflix has over 200 million paying subscribers, that's a lot of people, that's a lot of people to collect
data about, and they start making recommendations and such and streaming videos and such because they've done
their homework, and they did this by utilising what we call big data, which we'll talk about and data analytics to make
it more tailored for their customers.
Competing on analytics is an article that Thomas Davenport wrote for the Harvard Business Review in 2006, and he
kicked off the whole analytics business revolution by writing this article. What he was talking about were multiple
companies most of which you know as very high-tech digital companies were using to compete in their business or
their marketplace, and it came became very evident from those who were reading the article that were not part of
those companies that perhaps they were a little slow on the uptake and starting to use analytics.
So, Thomas Davenport tells us that every successful analytics company has some attributes and some strengths that
they work with, that set them apart from everyone else.
So, there are actually there are three attributes that he talks about. The first one is widespread use of modelling and
optimisation or adopting analytics. The second one is an enterprise approach, which means the whole company is
doing it and then the third one is that senior executives are advocates. This one is incredibly important because what
it's doing is it's telling the leadership if they're not behind what's going on this is never going to work, and from there,
he identifies four key strengths that are necessary one is the right focus what are we trying to accomplish.
Keep in mind that before all this started the thing that differentiated the companies from other companies to obtain
a competitive advantage was their use of computers. That wasn't as long ago as you might think, but today that one
doesn't work anymore because computers are what we call ubiquitous, they're everywhere, try running a business
without one.
So, according to the World Economic Forum 2022 is the year when the data analysts and data science job functions
become the highest-ranking jobs in the world, and there's nothing that we've seen today that say that that has not
happened. So, there are some differences though between the two and so we want to take a look at those. So, first
we're going to look at analytics, and if you go back to what Tom Davenport tells us you can see on the screen there is
a definition there of what data analytics is.
And if you take a look at that, you will see also that he comes up with three different kinds of analytics; there's
descriptive, there's predictive and there's prescriptive. And descriptive analytics is what happened and predictive
analytics is what we expect to happen and prescriptive analytics is what we want to do when we find out what's going
to happen. We also call that optimisation by the way because it gives you choices.
If you are familiar with some of the location applications like Google maps or Waze, you know that when you are
planning a trip and you have a start location and an end location, it will give you choices about what ways to go,
You get to make the choice and when you make the choice, that's the one you're going to follow. That's a really basic,
basic description of what prescriptive analytics is about, but that's really what it is. It's all about making choices. So,
we've talked about the different ideas of what people think about analytics and how the company's going to use it and
who cares and who doesn't care, but the fact of the matter is everybody comes into this process with their own
agendas, and so one of the challenges for you as a data analyst or a data scientist is to get the buy-in from the
management most likely who is going to be either impacted logistically by what you're talking about or financially?
So, if you take a look at this cartoon and you go back to look at what those strengths and those attributes were that
Thomas Davenport talked about, you can see that no matter how good your ideas are. There's somebody else that has
a different agenda and it may not be that easy for you to sell what you want. So, you need to be aware of that and we
are going to talk a little bit about how do you sell change because that's really what you're talking about.
So, now that we've talked about data analytics, let's talk about the other end of the spectrum, which is data science
and Provost and Fawcett in their book, lean towards the science part of data science and they talk about the data
scientists being someone who is using scientific principles to do their job, but in fact, data science has a broader
definition, and if you look at the slide that's accompanying this you will see that there are two different definitions
there. One of them is their definition and there's a set of fundamental principles, that's what data science is all about,
but there's another definition that I've added in there that talks more about what are the components of data science,
and so, it's an evolutionary process data science just doesn't pop up out of nowhere.
It is an evolutionary process that goes that analysts and data scientists go through and the challenge with that is how
much of that is math, how much of that is business, how much of that is technology, how much of that is statistics, all
those things mixed together make up data science, where Provost and Fawcett lean towards the scientific approach
to that, the true data scientist needs all of those.
So, we talk about this as being a craft. We talk about data science as being a craft. If that's a term you're not familiar
with a craft is something that you do very well, you're very skilled at. In the good old days if you will a craft was
something that was assigned to someone who worked with their hands and the longer, they worked with their hands
the better they got at doing their job and their skills increased. So, the data scientist really learns how to do data
science by working with the craft of data science.
So, we're going to learn more about how these things work when we get down to module 7, and we talk about the
organisation where both of these job titles appear, and so it's important to know that, you can't go wrong, you can't
go wrong when you call somebody a data scientist or a data analyst the people that it really is important to were the
people who are looking for the jobs. So, just to give you a little story about this years ago when we had decided we
would introduce an analytics degree, I attended a lot of conferences.
And coming from an IT, I recognised what a business analyst was? A business analyst was somebody that was the
bridge between the IT organisation and the business organisation, something of a translator. Well, I go to my first
conference and everybody presenting has the title of data scientists and when in fact I knew that what they were
actually practicing was business analysis, but doesn't data science sound a lot more important in business analysis, I
So, just when you thought, you knew, what a data scientist was, I have a picture here a cartoon to show you. This
cartoon pays homage to one of our faculty members who lives in the state of Florida. He builds algorithms for
companies for a living, but the first time that we had a faculty meeting, he came to the meeting and about halfway
through said sorry I need to leave, I'm giving a scuba diving lesson and that's what he does in his spare time. He teaches
scuba diving and surfboarding. So, being a data scientist isn't all work and no play.
So, let's go back now to Provost and Fawcett. In their book, they talk about the definition of data science and we've
already gone through that, but in fact, they're really about extracting knowledge. How do you get the knowledge from
the data? And they have a whole set of principles that they use that talks about what you should and shouldn't do
when it comes to data science or any aspect of analytics, and we're going to choose two of those to take a look at and
take a deeper dive on because it's important to understand that the fundamentals of what you're going to have to
deal with and these two kind of sum it up.
So, if you take a look at the slide, you'll see that there's two different fundamentals as I mentioned they're mentioned
they're mentioned here, one of them is that when you take a look at the results, you're required to take a look at your
results within the context of the problem you're trying to solve, that's the first one, and the second one is that when
you look at a set of data long enough you can pretty much fit it to anything that you want to fit it to.
So, what's the lessons to be learned here. The first one is that if you have a mismatch between what you're trying to
accomplish or trying to discover when you're examining the data and what the customer or your business client wants,
you're going to end up with some bad data, and so it's important that you understand how to get that right.
We talked back at the beginning in the first slides about what the ideas or what the customer definition is depending
on what kind of job that they have in the company and it's really important that you get that right because if you don't
get it right, then you're going to have problems. So, as an example for a customer, it may be that I'm in sales, and for
me, the customer or the definition of a customer is how much money they're bringing into me and how much revenue
they're generating because at some point, I would like to get a bonus for the work, I'm doing.
If I'm in customer support, I don't care about the revenue. What I care about is the customer who gets on the phone
and gives me problems because something isn't working right and it's my job to fix that. That's a customer support
issue. So, there are two different views of customer and it depends on and there are others by the way, and so it
depends on where you are in the company and what you're trying to accomplish.
Powered by upGrad Education Private Limited
So, the second one that we have here has to do with overfitting, and there's a term that we use called generalisation.
If you're developing a model to be used, the best thing that you can do is to generate a model that is reusable and that
can work across the environment like work across if not just the business unit across the company, and if you can't get
the model to do that, it's probably not worth the effort to use it, but if you look long enough, you'll make it fit. You'll
figure out a way to do that. That's called overfitting.
There's no value to be had and overfitting. So, if you're a customer or you're a data scientist and you put together the
greatest algorithm in the whole world and it's just beautiful and you want everybody to use it and then you send it out
to be used by a certain division of the company that should be able to use it and they don't have the technology that
can process it, it's totally useless to them. So, there's this there's a saying that if you have that having a model or an
algorithm that meets 80% of what you're trying to do is better than having an algorithm that is perfect. There just isn't
any value of trying to use something that is going to break when you need it most.
So, you think that you haven't personally encountered data science at some point in your life, think again. Any of you
who follow the sport of baseball will recognise billy bean as the analytics-oriented manager of the Oakland, A's baseball
team, and he had the thought that if he did enough to analyse all the statistics about a player. He could select a group
of players that would give him the lowest payroll, but as we tend to say get the bang for the buck. So, his players may
not be the highest paid but what they did is they went out and looked for players that had the right kind of skill sets.
That ended up being a book in a movie called Moneyball, which you may be familiar with.
The second one is Google's page ranking system, and you know, Google doesn't let everybody in and advertise. They
actually rank the pages that are coming in and they come back and they give you the most popular pages when you
ask for content about something and that's done behind the scenes, but that's done by some very, very robust
algorithms that are actually counting how many axes there are and the popularity of the topics that you're wondering
about.
The third thing is LinkedIn. LinkedIn used data science to come up with the world's first professional network. We think
of Facebook or meta as it's called now and that's a social network. LinkedIn is a professional network and that was
actually developed for recruiters that they could come in and find ready-made resumes if you will and of course they
paid a fee for.
The next one is Netflix. Netflix perfected streaming. Oh my goodness, there was another company or other companies
out there the most prominent one was blockbuster and blockbuster had stores and you would go into the store and
you would rent a DVD and you would take it home and you would watch it and the biggest irritant of dealing with
Powered by upGrad Education Private Limited
blockbuster is there was a date that you needed to return the DVD or you got fined, and I can remember many a trip
back to the blockbuster store to beat the time when that DVD was due.
So, then Netflix came along and Netflix did something else. They came up with a different way of accessing
entertainment, and by the way, Netflix when they started up, they did a similar thing, but they sent you your DVD in
an envelope, and you could keep it as long as you wanted, but then when you got it back you could get another one.
They perfected streaming. Blockbuster had no clue, and in the end, Netflix drove everybody else out of the
marketplace.
And then finally, the last one that's really a big deal in data science is Amazon. I'm sure all of you who have tried to
purchase or purchase something on Amazon, get the message that says the last time you purchased something like
this, you know, here there these are the other things that are you could buy just in case you wanted to substitute, and
they send this to you because they've been tracking you and everybody else that's purchasing or even looking at
something on Amazon. So, I go to Amazon and I want to buy, a new electric toothbrush.
I'm going to see dozens of recommendations for electric toothbrushes or if I pertain provide a certain product name
that I want they're going to come back and they're going to tell me who else bought this product, not the names, but
the number of people who bought this product. That drives you the purpose the purpose of buying something. So, I'm
sure that I don't think there's any of you out there who have not who have used Amazon who have not gone out there
and bought something because they suggested it to you not because you particularly wanted it.
So, there is one more important term that we should talk about and that's data-driven decision-making, and if you can
see in this diagram here that data science sits right in the middle of everything happens in processing information and
technology, so it's very difficult to give a data scientist, a job description if you will because when they're doing this
data decision making, they're touching on all these other aspects that are going on and it's impossible to separate
them out.
So, the most important thing to remember about it is the data-driven decision-making takes away the strong role that
intuition has always had in the way we make decisions, and we're going to talk about that a little bit later. We you can
call intuition gut feel. Data-driven decision-making is not about gut feel. It's about collecting data, building models and
coming up with some insights that allow you to make the best decisions. So, the important thing about this whole
data-driven situation is there's two different ways that you get decisions from data.
So, there's one more thing we need to talk about one more definition that we need to talk about, it we call it the
elephant in the room because you can't ignore it in this world that we're living in today and that's called big data, and
we're going to talk more about big data, but the thing to remember about big data is it is a game changer. It's probably
the most important change in the way we do business or impact that we have on the way we do business than anything
else in the last 20 years.
And if you go all the way back to Tom Davenport, talking about competing on analytics and having that data and the
magnitude of the data that did change everything, it really is a game changer. So, the thing about big data is that it's
not just big and maybe big data is a misnomer or the wrong name. Big data is two different things.
It's huge data sets, lots and lots of data that you cannot process in a normal manner. The other thing about big data
that makes it unique is big data is unstructured, and we're going to talk more about that when we talk about file
systems, but essentially what that means is that you can't process data that comes from say a satellite or data that
comes from some kind of an x-ray machine or whatever you can't process it in the same way as you process your
transactional data coming out of your company, and that's really what makes it the game changer and it's a game
changer because 80 of the data is unstructured.
COVID-19 could have been a lot worse and we all know it's bad a lot of people died, a lot of people lost their family,
their friends. The COVID-19 could have been a lot worse if we hadn't been able to process big data, and so, the
magnitude of big data in our world today and the just forget the world's economy just in the way we live has been just
huge.
So, you know, this goes back to what we were talking about before about using computers and we said, well, it used
to be if you used a computer, you got the competitive advantage over your competitors. And we already talked about
the fact that it doesn't matter anymore because everybody has computers. Just try to think about where we would
have been with COVID-19 if we hadn't had access to the technologies and the principles and the theories that we
associate with big data. Now, that's a game changer.
Disclaimer: All content and material on the upGrad website is copyrighted, either belonging to upGrad or its
bonafide contributors and is purely for the dissemination of education. You are permitted to access, print and
download extracts from this site purely for your own education only and on the following basis:
• You can download this document from the website for self-use only.
• Any copies of this document, in part or full, saved to disk or to any other storage medium, may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
• Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites, or use of the content for any other commercial/unauthorised purposes
in any way which could infringe the intellectual property rights of upGrad or its contributors, is strictly
prohibited.
• No graphics, images or photographs from any accompanying text in this document will be used separately
for unauthorised purposes.
Powered by upGrad Education Private Limited
• No material in this document will be modified, adapted or altered in any way.
• No part of this document or upGrad content may be reproduced or stored in any other website or included
in any public or private electronic retrieval system or service without upGrad’s prior written permission.
• Any right not expressly granted in these terms is reserved.