Data Mining Tools and Tech 06.07.2024
Data Mining Tools and Tech 06.07.2024
will not tell you what is the best fashion take. Betty will not tell them. OK. So if you compare, if you compare, if you compare the different types of analytics. OK on to
complexity and value. Look for the user OK, you will see that if you go. On the difficult diagnostic quality restricted, the value of increased and complexity increase.
OK, so. So far. We have very good control on like this. Most of the the thing the thing the simplest one is just. But very less. On the defective part. OK, so here you
require optimization tools from this. Specifically you require. OK. Are you sending? Yeah. So can we say that prescriptive and inferential are same? No, no comment
field is with What is the end the call inference diagnostic frequency? OK. Ohh. Thank you. Still not very. Not clear on what you're doing. What I want to ask you? OK,
this is not very mature, right? I don't know. It's there, but it is not a lot of skills. OK, you have two more optimization subject. OK, you have three. Yeah. You can build
a dictionary model linear regression with model simple OK, just scripting requires which will get into population to population to population 2. OK. Ohh. So little bit,
uh, awareness is less. Yeah, but technically it is also advanced. OK. OK, so let us see what are the. Application areas of data mining in real life. To. In bank for
astronomy by burning down discovery many business arising PRN investments again Canal park manufacturing, sports, entertainment, telecom, ecommerce. Targeting,
marketing, healthcare and so on. That something in both. Law enforcement. Providing type features and. OK, so you name any three? There will be a lot of applications
of these. OK. Do you see some examples? For customer modeling, yeah, yeah, say your customer task. Or if it's attrition prediction. Our company, they lose. You don't
believe? There were many people are leaving on paying back alley. People. You cannot. Yeah, kitchen attrition means. Customer is leaving the company and joining
some other competitor of that company. OK. In telecommunication, mobile service. Customer will not happy with the service and joins with the Geo OK that is an
attrition. So efficient rate of mobile phone customers is around 25 to 30% per year, very high. OK. Telecom industry attrition is maybe the highest compared to other
industries. To the task is studied who is likely to acquit next month OK, given customer information for the past ten months. So your company may ask you to list of
customers are given to you and. Our estimate customer value and what is the cost effective offer to be made to this customer? OK this this kind of the 2nd 2nd
objective like. OK. You found the list of customers, we're going to leave the company, likely to leave the company. The next question is? Estimate the customer value
of each one. OK. So if the customer is not bringing revenue, very little revenue if they leave the company should not matter for the company. OK, so that's why you
need to estimate the customer value. You cannot retain all customers if some are. Not getting revenue then OK. Yeah, you're fine with leaving their customer? OK.
Customer value is important and once you estimated what is the cost effective offer to be made to this customer? Do. Yeah, revenue, but he's likely to leave the
company. So you have to give some offers to retain that customer. Now how much? What is the cost effective offer? So what should be the cost of that offer? OK, you
cannot give. You cannot spend more than that 500 generator is in. Maybe net loss? All these things must be estimated. Then you have some examples like targeted
marketing. OK, Cross selling. Customer acquisition what? What is possible? Telling people this. Hurry. Tell him what? Products. OK. Like for example. Along with
the, you know, opening an account for the customers credit cards, other facilities. Yeah, yeah. So when you open an account with the bank if you after few days. And
they will call you and request to take a credit card or insurance. Other products. OK, that didn't make it. You cannot complain, cannot complain, cannot arbitrarily
target. All types of customers OK, they have to see. With customers are worthy of. Go back. OK. Clear the customer and then. Somebody requires insurance you are
giving credit card to actually that person will not decrease from. Hello, Sir. Yeah. The acquisition prediction task for the second option to predict who is likely to affect
next one. How can I predict that? Based on past data. Long list of customers Then how can we predict for each customer who is likely to acquit next month? Why not
so see? Suppose your database of fast customers who have left the company and who are continuing with you. You have to see the feature whether they were supposed
they have a prepaid connection. So whether they were recharging or not for past few months or what was the amount they were recharging, they were maybe
recharging for very small amount or not recharging for 5-6 months and then they left. OK. Yes, yes, yes. This is the. This is the main task. OK. Ohh, you have people
find the pattern. What pattern of pattern is? What was the pattern of the customers who left the company? OK. And as I'm telling you that they still receive. Is it clear?
OK, so. I feel good. God. Monday. And from detection from. What? You can just something telecom references. OK then case for example so. Example is like person
applied for a loan. OK, what? Yeah. Apply for a loan to a bank was filled with us with the monitors. To check this drop that position like everything is proper” or
something like that. Previous credit payment history. So this call profile the person. Based on. OK, hand, right. Any credit cards? OK, so the whole. But some person.
Do you? No, don't we just this factor also is also thinking. Ohh yeah, not only the. Think about I'm not with that you will check whether he is capability of returning the
loan. OK. So fast is should the bank approved the loan, that's the past. OK, so people who have the best create don't leave their homes and people with what period are
not like the customers are best customers and customers are in the middle. OK, so judging. Very good and very bad person you can easily identify but the difficulty is
the middle Bombay cases which has some features or both, some which are bad and whether they will return the long or not. The bank developed rate card models
using variety of machine learning methods. Mortgage and credit card qualification are the result of being able to fully predict the person is likely to default on loan
amount. How it is if you. If you have the filter, applications are very quick. You have to just fill the form in. In the bank, dictate and the one or two minutes it will take
and they will pay you whether they're good or not. What is the reason? Because the profiling is being done by them and not humans. In the. You can have all the credit
history. The customer and the customer look us tomorrow's data is maintained and 2nd. Yeah. So what? Yeah, the main thing is the ticket. Infection. Very quickly.
Good. Yeah, this table in India then which way, what is the important things they require from last summer? And they will check this table score. So they have
developed a model that OK score is. Together than 800 or something other than 500, OK. They would immediately. Give them one. Right. To Because of this, machine
learning algorithms are wrong. It has become very fast. Has become. OK, so ecommerce, ecommerce, how machine learning is applied? Person buying the book and
Amazon.com. Click on Amazon on Amazon. What would happen? Any product? What is from the bottom? The 11,000. What? What did you show review? Yeah,
reviews for the for it would also. Related products. Yeah, so it was a. So the task is recommend other books on products to this person is likely to buy. So for Amazon
the task is recommend other books or products this person is likely to buy. It will show that. So Amazon dot clustering based on books born. So if there's a group of
books with a similar book so it will show you. Suppose you clicked on. Advances in knowledge discovery and data mining. It will show that customers should not.
Joining also bought practical, practical machine learning tools and joy. He made them pass sales. OK, if you group the. Bookmiller topic books OK and on your books
that customers have bought together after sometime. OK. And then I will give you the recommendation. Ohh, next week. Have you heard about Netflix? Yeah. Yeah.
Hawaii in India. But it is a long, very old. Move the comment is done. What it does is. It recommend moving so once you start watching few movies it will understand
your taste and it will recommend you other movies. OK. So there was a lot of competition organized by Netflix to improve their recommendation? Your business,
please. This is it. Netflix movie, how you how it works is very interesting. Recommendation system and lot of research has been done for Netflix movie rebounder
system. So there's a combination programs of Amazon and education processes. So in medical field genomic microarrays case. So given micro medical data for the
number of patients, can we accurately diagnose the disease? OK. Predict outcome for a given treatment. OK, the command the best treatment. OK, so. Even if two
patients have the same disease, the treatment may not be same because there may be some other factors. OK, so getting the outcome of the entrapment and becoming
the best treatment for this patient is? Again, not modern will be done. Pretty acute lympho. Arctic leukemia. Ohh I'd be like you're my little human so. There are
different variants of leukemia, so. It's predicting. Yeah, yeah, correct version of the. So depending on the gene data. Machine learning algorithms can guide them,
whether it's e-mail or type of living. The other 30, some there, some studies are there. 30th training cases were there and 34 test cases and 7000 gene data. Available.
Go. You think you have to build diagnostic model and the result on the 33 or 34? Reply matrix, which is where there's only one error, so. Got it. E-mail. You said you
will if you search on Google. OK. Or classifying L and AML based on machine learning will get out of research papers. OK. So. You didn't want to take them. OK, so
it got 4 dictation. Machine learning has been applied, regression of Malina money laundering cases. OK, so U.S. Treasury securities fraud, they have that. At the phone
call, maybe and T. Bill Atlantic with silicon. But the reason Detection at Bold Epileptics in 2002. In the maybe the project and I feel like I have dictation they will find.
Management. What? I go alone. That. And then therefore management is to something else happened. And then after that there's Q and all those things. So optimizing
analytics is used to direct the correct supplies or discovery or food item to areas where they are needed most. OK. Ohh here of your healing is mainly the procedure and
so does the village need bottled water or the force rice or wheat should have done it. So whenever any disaster happens anywhere, the first part is of the same food and
water. But but but. Uh, all the same and the location may not be same, so requirement may be different somewhere you need. Both are wiser, with shelter
approximately important. OK, one example is uh. Here, Hurricane Frances was on way to hit Florida, the Atlantic Coast into 1/4. Go to Walmart wants to predict which
items will be sold most in the path of the hurricane. OK, so this hurricane is going to hit. I thought you. So before this I can. The Walmart. One Monday morning.
Before. Most in the stores near the Canadian. So can you guess? Paper. Drinking water, Toilet paper? OK Drinking water? Anything else? Please press the cities. Like.
At least the label that battery OK What else? Type foods Google for SO. By Premium. Back to food. Typical. Working, working, yeah. So, OK, nice. So what this
company did? Ohh you didn't know bottled water, flashlights or OK bad food battery and all those things but what what kind the shopper you mind the shopper he
mind the shopper is free when? He several weeks earlier had a different location. OK. So then they mind the shopping shopper history, the sales item sold? OK. And
what they found that in the past since the strawberry pop tarts and bear increased 7 times. OK. So you see this is determining you are finding something which is non
trivial, not obvious. If you if you are working in some company and you say that OK, you keep more water, bottled water, flashlights, food will be sold. Once
everybody is common sense, right? Everybody knows this. OK, but what has been found that is not very obvious? OK. Do you agree? Are you guys here? Yeah, yes,
yeah. OK, Yeah. OK, so. Of the determining tasks which are applied in the marketing for customers, for, for detection, for disaster management, so how we can group
them? OK, what is the depending on the nature of the task. We can group this task and there are many functionalities. Weather. Trying to pretend to be found
determining does OK, so the simplest one is descriptive and predictive. So a class of concept description. Description basically. So we want to characterize and
discriminate. OK. To get the data calculation is summarization of the general characteristics of features. To target class of data you want to summarize some features.
OK. His comparison of the general class data objects against objects from one or multiple contrasting classes. So this is. One class with with. OK, So what is the output
of despective? This thing output really by bar charts for multi dimensional data and multi dimensional table. OK, so example, let us consider example. So there is a
company all electronics. OK, it it fell the economic goods. They're very good only when they are successful international company with branches around the world.
OK, Easy Brand has its own cellular bases. The domain is following lesson table. What? What I need name, address, age of vision and store. Then you write them
table, maybe item ID, brand, category. Type price. Garage banged in name and address. You. Give me the transactions that happened. Another employee. OK then,
item sold. What is the quantity of items sold? OK, so data collection example is as follows. So some of us of us to all our customers who spend them. All these
numbers. So this is an example of integration task. OK, so company may. Ohh that what is the profile? On the customers who are spending more than 5000 a year at
this company. OK, this is an example of the. You like this one? Somebody like this and they're not profile of this customers that they're they're 40 years, 40 to 50 years
old, employed and have an excellent bold trading. The question of the question we spent more than five times over the years. OK. Little dictation. So what happens
here? Yeah, which way? What is your question? Customer age address. And category will be underrated. What do you think of now? OK. So yeah, Captain is the best
simple conversation description on the customer discrimination. Discrimination is comparison. OK, so. You are the definition. You compared two groups of customers.
OK, let's say those who shop for computer products regularly. Regularly means more than twice a month, and those who rarely shop for the products less than three
times a year. You want to compare these two groups of customers, so this is an example of discrimination. OK, give me one minute. And then the as follows. OK, so
the result is like this if he wasn't the customers who frequently purchased computer products. I'm going to 40 years old and never university elevation. I. Could only
buy it. Have you and have no need to do so. Example of discrimination right there is a comparison. Between two groups of customers, OK. So these comes under your
desk. OK, next functionality is mining frequent button associations and correlations. The patterns are frequent patterns, patterns that occur frequently in data. OK, so
here we there's a term called frequent itemset. The set of items that often appear together in a transaction with the. So. You got people. Although the you know this.
Mining. Was from this the retail store data? He also called the Market Basket analysis. Market Market. It means you want to analyze what are the items. The customer
is buying together in the in the Market Basket or the car. Each items are sold together. OK. Ohh. Yeah, it is OK. Like customer X computer. OK, OK. He or she is most
likely will most likely do what you will buy the software. This kind of association is there. So there are some. Whether the solution is strong or weak, there are some
numbers to below this because support means out of all transition, how many cases we suppose this these items are present together these two percent okay, confidence
means what is the quality that. Customer buy your software given the customer has bought computer that is 60% charges there. OK, so when you create this kind of
rule. OK, that algorithm will give you some kind of quality quality. It means this will not whether this rule will be valid in future or not. OK, if the support is very less
than the .001, confidence is also very less. 2% then it is likely that it is this kind of pattern is occurring only in this data set. In future it may not occur. OK. So higher
the support and higher the confidence, it is likely that this kind of patterns will occur in the future? OK, then we can frequent sequential patterns. Sequential means
something the customer is buying after making a purchase of making a purchase of making. OK, OK, not in the same Market Basket. Not together. It's eventually the
consequential. OK, OK, so sequential is like. Suppose a person buys a camera. OK, then after. A few days, a few months, the person will buy memory card. OK, so.
The differential pattern or put on the association rules. The rules are like this. It wasn't by his computer. He or she most likely will buy a software. OK, so this is the
second type of functionality. Well, the part one is the prediction. OK, we're finding models that are 5 and they string is classes are concepts for prediction. Prediction
means for future we want to know what will be the output in the future. So there are two players. Super models are supervised derived and so we are deriving models
from labeled data. So means the historical data we have, we know the outcome of the historical data that is from the labeled data. Well, we have the X variable and we
know the Y also of the data. OK, why is there still the class? Thanks for applying, applying, applying, applying for loan. Page to the age to the age in component form
and why we did not denote whether the customer is a good or bad customer. OK. Move on bed. And the person returned the loan on March. That is the label data. You
know the class from the. Move up The physical methods are different, like the night vision plus Commission neural network logical relation. You don't. Applications
are paid cards for detection. OK. Log direct marketing. Weather, Weather. Whether this person will? Come to buy the product from the shop or not, there is nothing.
Party went start to, you know, some. Objects or start or. Diseases web pages. OK, I'll put it back. Output will present an England bold OK. Like if income is less than
50,000 per month, the customer is not a good customer. On please you can represent as. I hope. In the classification chapter. OK. So like if all is you and income is high
the the customer will buy computer this kind of this part. This is the part. And if you love them for, what will be the outcome of? What is the class level of that? Do
that. Did that leave them though? So like this you have a mixture of items. So this is like a mixture. OK. And you have this model? OK, so you can understand the
customers or bad customer. You know that there are two types of. But you don't know any. Any customer comes in future, you don't know whether you will do the
band. So you apply the model, which means you give them a model. That is. The next one is. But then I did. So. This is. On the. Home. So you have. The description of
the. What do you think? Why do you want that? OK. Ohh. Generally the bathroom. Is that? Putting in your training book. I think I found move left unsupervised
learning. It means you don't know the why. OK, you don't know the correct answer. In case of the loan example. Remove from the party data within the customer that
customer. You look like there is no Why? Very, very. The objective here is different market segmentation. So you can do so from this one we want to. Move their
consumers together. OK. What is the meaning of market recommendation? I'm getting you. Go out with you. We have about the authority area we have. Is very, very
important for. All the benefit bullet manufacturing any product. So all all consumers are not the same. We don't have low cost. You know. Ohh, any company not
create a product which will satisfy different types and then OK let's say Maruti has too many cars. Why you tell so many cups? Different, very, very different variants
of cars are there. Starting from although. What is different markers, correct? Yeah, so some are like. The the low cost but they have they have some go back there,
some will bring him table bring him table pass from high end. Need family family family. So this is segmentation. We want to identify the groups of consumers. So
when you have. Appropriate product. OK, well for that you must apply this constraint and looking to find which are there any groups are there and which group you
want to. So let's say you want to. Amount spent. OK. So you design the product, you're set one right one. This red one. Cortana. And if you're designing the correct
product for the right group of customers, then you will realize profit, OK. And you design something for a long group of. Remove. OK, OK, this functionality. This
functionality is called. OK. So any questions so far? What do you called how many functions we have covered so far? Data characterization data. Discrimination data.
Data definition. Liquid patterns, associations and correlations and patterns. Classification and prediction. Bandwidth. Berkeley List. Cooking. So yeah, let us continue
so few more. So when you so when you're looping, looping region of limitation of grouping objects, what is the object? Of the two is to maximize last similarity and
minimize inter class. Good. National interoperability minimize interfacing. It means these items within the group are similar to each other, but items from one group
will not be similar to items. Maximize intraclass. Within the class we maximize similarity and minimize inter class energy. Maybe that? OK, the next functionality is
outlier. Water there are common between two groups. Yeah, yeah, yeah, yeah, there may be common. OK, so. There are there are some algorithms, some this classical
algorithm. They they tell that there is a very. That. OK then. But when we have some overlap, OK, then there are concepts called fuzzy clustering. OK, for the
clustering is there in. It means for every 10 it has some. Number belonging to. I think it is not a very yes or no situation, OK. OK. So if you have some technical, uh,
location where this overlap and then you have to apply this kind of algorithms. What I'm showing here there are basically that there is a clear yes or no whether this
item belongs to this or not this clear. Okay, overlap is not there. Ohh. Yeah, some of it will. Or some other functions. What is an outlier? Any data point which does not
comply with the state hypothesis or regulations. That. That exceptional values which are. It means the development will will not comply with the general training.
Organize the data that does not combine the general behavior of the data. That they get so much as to allow suspicion that it was generated by different. Definition. But
something is so different that you suspect whether they have confirmation or not or not. More like you have some friend of data like this. OK. And some data point is
here. Exception. It made it made noise or if it. OK OK so method how you find is the byproduct of clustering or regression analysis. So if you perform clustering. OK,
they are not following in any of the classroom then. Or when regression analysis you're doing. You line and one which are very far from the line. Can we call them an
outline? So useful information there, even emphasis. Ohh. What happens in when you're doing people? King that you think that you go on outlier later you throw it.
OK. But you have to be careful whether to depends on when depends on the. OK, OK, OK, suppose you are working on fall detection case. OK, OK, the fraud
transactions are outliers. OK, the 99% of the. OK, you'll you'll have a very plain or something, but the data points which are of interest for this application and the
outliers. OK, so don't blindly throw throw the during this. Task is to analyze the ocular zone. Like like what detection where you read the? OK. Then you have trained
and trained and lives. Cortana. Something is changing over time. OK. You want to end the to end time series, time series, time series and division. So division versus
the stock market, stock market you have? OK, fine. What time? What time Then you may do fine 125. You've done. In this picture you'll see the left hand side is not
working right. Okay, sequential. So you can also this is a continuous data based on time for some some sequential pattern, OK. Like what do you mean somebody
bought a opened and bank account savings account? Up in the. Again. By the digital camera and then. The time series in the time series. OK. So. This slide. Making of
the machine learning items so Other one. So you have what's the applications? That they're learning functionality but. The damage of the reaction of the communist
works. Really less developed. One of the new things. OK. Yeah. Any questions? Then we say learning, learning AI and data analytics are part of OK, OK, OK, OK.
Like. OK, OK. There there are different differences between AI and machine learning. OK, then you have other. Yeah. OK, OK. Buzzwords. We need any. There are
some. Will you? Send it. So. No, the question is machine this machine learning algorithms are capable yeah generating lot of patterns. All the things are all patterns.
There's patterns. Interesting. Has the potential to generate millions of small family, small family, small family, small person. Intervention. That intervention, that
intervention. So now you should be able to answer this question. Listening to my all these likes when you will say that something is interesting. A button which didn't
go proper inside. Like that Walmart is that is that hurricane? OK. You don't know. This this. Video. Knowing. So so. It should be easily understood. Then valid on new
or test data with some believers. OK, then potentially useful and Marvel and Marvel and Marvel and Marvel. Think that interesting but interesting knowledge,
Knowledge something found something interesting is something. Yeah. Testing this. As follows. Suppose confident accuracy coverage. Our next question.
Appointment. What are you doing? What are you doing? It might be OK. It wasn't by from the. That's what I see is. But really, how many times you are correct? Do I
have? Please. Any given. Show. But. I think city city. These are all relative values evaluation, but from perspective perspective. More, more red ones. Object things,
object things. These are some. These are #1 #1 #1. Quantifying. OK, like #2 support 2160% like this and very subjective. This cannot be measured in numbers. OK,
done. Done. In. Something new is there or not? What? And actually calling. OK, so. Is the is not a single discipline? You know. The weather. Technology, technology.
We have. The subject. So that is the beauty, that is the string that. It it has a. From different machine learning Computer science. So the domain is the cost. What do you
want? The next topic is AI. Yeah, yeah. What is you? I will end here. Will there be any questions? So you will share the presentation, right? Open the. Assignment is
you have to package uh share uh share, uh share, uh share data one data. Open. Type open. Your language you are using. And see what? And see what. And and. The
first three videos here. Frame and all that. This. Hope. Yeah, we'll meet next week. Yeah. Good question. Yeah, there's a question that, you know, I don't have. I don't
have kind of, you know, kind of, you know, kind of, you know, kind of, you know, do you take some of this? I've shared that. Basic videos Video OK. Opening
opening maybe like to? Learn OK a lot of available so when I'm teaching other. I will share the photo. Ohh. OK. Are you guys idea is not in the, not in the? OK. OK.
Last few questions. Yeah. Enjoy the boat and enjoy the boat. Do do do. Next. Yeah, OK, OK. Yeah, you have to just, you have to just copy, paste, copy paste on your
on your own. And the post the post results again. OK. OK. How much did you? All Python. Experience. How to install 2nd? One second, third, one second, third.
Simple. This. OK. You fine, OK. Hmm. Any other any other questions? We can go. If we don't have any, we stop. We stop here. London is like and to the profit to the