0% found this document useful (0 votes)
5 views

What is Multivariate Data Analysis (MVDA)_

Multivariate Data Analysis (MVDA) is a statistical technique used to analyze data generated from multiple variables, helping to estimate summary variables like stock indices or process performance metrics. By using data analysis tools, MVDA allows for the identification of relationships between variables and the establishment of normal behavior patterns, which can be useful for future data predictions. The article illustrates MVDA through examples, including the analysis of soccer players' height and weight, demonstrating how deviations in data can inform decision-making and model adjustments.

Uploaded by

cbqucbqu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

What is Multivariate Data Analysis (MVDA)_

Multivariate Data Analysis (MVDA) is a statistical technique used to analyze data generated from multiple variables, helping to estimate summary variables like stock indices or process performance metrics. By using data analysis tools, MVDA allows for the identification of relationships between variables and the establishment of normal behavior patterns, which can be useful for future data predictions. The article illustrates MVDA through examples, including the analysis of soccer players' height and weight, demonstrating how deviations in data can inform decision-making and model adjustments.

Uploaded by

cbqucbqu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

 Back to Blog Overview (https://www.sartorius.com/en/knowledge/science-snippets)


(https://twitter.com/intent/tweet? (https://www.facebook.com/sharer/sharer.php? (https://www.linkedin.com/sharing/share-offsite/? (mailto:?
&url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.sartorius.com%2Fen%2Fknowledge%2Fscience- u=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.sartorius.com%2Fen%2Fknowledge%2Fscience- url=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.sartorius.com%2Fen%2Fknowledge%2Fscience- body=https%3A%2F%2F
Data snippets%2Fdata-analytics-for-beginners-how-multivariate-data-analysis- snippets%2Fdata-analytics-for-beginners-how-multivariate-data-analysis-can- snippets%2Fdata-analytics-for-beginners-how-multivariate-data-analysis-can- snippets%2Fdata-analyti
Analytics can-separate-the-players-from-the-gorillas-507202) separate-the-players-from-the-gorillas-507202) separate-the-players-from-the-gorillas-507202) can-separate-the-player

Dec 02,
2020

| 8 min

What is Multivariate Data Analysis (MVDA)?


How Multivariate Data Analysis Can Separate the Players from the Gorillas (MVDA for beginners)

We have more data than ever before coming at us from many sources – both in our personal lives as well as business. Data is everywhere:
from the production flow of a manufacturing floor to the sales results in a grocery store to the number of shares a page gets on
Facebook. How do you sort it all out in a way that makes sense? Which data should you worry about and which should you ignore?
This article is posted on our Science Snippets Blog (https://www.sartorius.com/en/company/newsroom/blog). (https://www.sartorius.com/en/knowledge/science-snippets)

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 1/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

Using data analysis tools or software, such as the Umetrics Suite of Data Analytics Solutions, you can turn the jungle of numeric data into meaningful observations that
you can use to improve your product, process or situation. Top data analysis software such as SIMCA® combines reliable statistical methods, processes and tools that
help you perform multivariate data analysis, look for deviations and understand the relationship between different data points.

Whether you are using historical data that currently resides in your database, or generating new data for future use, understanding the basics of multivariate data
analysis will help you achieve better analytics.

What is Multivariate Data Analysis?


MVDA is a statistical technique used to analyze data that is generated from more than one variable. MVDA will help you estimate summary variables. What are those?
Well, one good example is a stock index.

For example, in the chart below, you can see how the Dow Jones stock index is changing as a function of time over a 48 hour period in August 2013. Dow Jones is
monitored by investors and depending on how this summary index is migrating up and down over time, investors may decide on different actions: they may sell, buy, or
hold.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 2/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

The Dow Jones index is simply a summary of the stocks of the individual companies weighted together using a specific algorithmic function (as you can see in the
formula written below).

Process modeling (analyzing the processes used for manufacturing, for example) is conceptually very similar. As with the stock index, we want to calculate the
summary indexes for how the process is performing. This summary can apply to a continuous process, a batch process, or hybrid of the two, or any other kind of data-
table.

What a process summary index has in common with the Dow Jones index is that it presents a trend over time. You monitor it to see how it goes up or down. If you are a
process engineer or manager, you may take various actions or refrain from actions based on how this summary in-dex looks over time. You can see how it looks in the
chart below.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 3/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

One difference in a chart like this when you are dealing with process data instead of a stock index, is that you will reference the trajectory of the summary index in
relation to lower and up-per limits. Usually this is a plus or minus two or three standard deviations which may set, for example, warning or action limits (as indicated by
the dotted lines in red and yellow).

So what we are striving to achieve with multivariate data analysis is to calculate summary indexes covering the most essential information in our process
measurements.

An Example Using Soccer Players


To get another example of how this works, we can use the example of soccer players and analyze the player’s height and weight to look for a pattern. In the chart
below, the green dots rep-resent the body height and weight of 200 elite soccer players who played in the 2014 World Cup championships in Brazil. What we can do
with multivariate data analysis is to create a summary index for how the weight and height changes among these elite soccer players. We are looking at the relationship
between the two variables (the height and the weight) across all the players.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 4/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

The summary index is shown by the red dashed arrow. The chart shows a clear relationship between body weight and body height among these top level soccer
players. Generally, as the height increases, there is an increase in body weight as well. Using a plot like this, we can see a typical variation limit, and from that, we can
define a variation limit or border of this model. The border of the model is marked in the chart below with an ellipse.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 5/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

Typically, when we are doing multivariate data analysis and calculating these summary variables, we will position the border at the 95 percent level. So if you have fairly
normally distributed data or uniformly distributed data, you would expect that 95 points of 100 would be situation inside the model border, and five points would be
outside the border. So if you have around 200 soccer players, roughly 10 players should be outside the model border.

Using MVDA, you can not only define the summary indexes that summarize the essential information in your data, you can also define an envelope around the data
points indicated by the typical data distribution, the normal or “good” behavior.

Changing the Scope of Data


The interesting part comes when you apply this data analytics model to future samples. Will the new data conform to our model, i.e. the expected variability among the
samples so far analyzed? For example, what happens if we add the data or measurements of an elite basketball player to our soccer player data?

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 6/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

Several things happen when we add this player to the data.

First of all, the orientation of the summary index changes a bit. The original index is yellow and the updated one, taking the basketball player into account, is red. You
can see there is a slight rotation in the direction of the summary variable following the introduction of the data points for the basketball player.

Second, although it’s obvious that the basketball player is very different from the soccer players, the weight-height relationship for the basketball player conforms to
that of the soccer players. Looking at this data, we can see that the weight-height relationship among the variables is not changing when we add the basketball player
to the data set. So while there is no doubt that the basketball player is very different from the soccer players, he still conforms to the model.

This is a typical behavior in data analytics. When you apply your model to future samples, you may have a deviation, but it doesn’t necessarily modify what we call the
correlation structure.

Adding a Third Set of Data

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 7/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

We can take a look at the effect of another deviation on the model. For example, if we add in the body height and weight of a sumo wrestler, we will see a completely
different behavior for the summary index. In the graph below, the yellow line represents the original summary index and the red line shows the influence of the new
data on the summary index, which is rotating up. Clearly the sumo wrestler has a higher influence in the data, or has a higher “leverage” as we say in statistics.

If we take a closer look at the deviation compared to the typical soccer player (for example, us-ing a diagnostic tool found in SIMCA® called the contribution plot), we
can see the reason, or the pattern, in how the sumo wrestler is deviating from the model of the elite soccer players. The graph below shows that the Sumo wrestler is
slightly taller than the typical soccer player (not by much) but on the other hand he is very much heavier than the typical soccer player.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 8/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

An Unexpected Deviation
Sometimes, your data will present a deviation that is completely different than you expected. So how can you interpret that? In the example below, we have a point
that is completely off the model. In this case, we have added in the data for height and weight of a gorilla.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical t… 9/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

So according to the contribution plot, the gorilla is much shorter than the typical soccer player, but much heavier. There is no way on earth we can get the model for
the soccer players to include the data for the gorilla. If we want to have a good model for gorillas, we must add in the measurements for more individuals, and calculate
a local model for the gorillas.

This is also something you may come across when you’re modeling process data. In this case, you will need to have more than one model. You will need to have a local
model for one type of production condition and another local model for a second type. One of the critical issues going forward then will be which model is applicable
when we add in our next set of data.

Aligning Data to Tell a Story


As you can see, the objective of multivariate data analysis is to organize our data so that it can tell a useful story. How do we do that? In summary, we must:

▪ Calculate the relationship between the variables


▪ Define a ‘normal’ region within which most of the data points lie

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical… 10/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

▪ Use that information to diagnose future samples (data points)

Applying this to production or manufacturing, we would use the same principles. If the future samples end up inside the black ellipse, then we know that the second
day’s production condition is in line with what were ‘good’ conditions previously.

But if we have a deviation, we may have one of several types:

▪ A deviation that is extreme, but still conforms to the relationship among all our measured variables (Basketball Player)
▪ A deviation that is influencing the direction of our summary indexes, which we may or may not be able to add into our model (Sumo Wrestler)
▪ A data point that is fundamentally different and completely off the model (Gorilla)

So using this analogy, the key is to get the gorillas out of your data, identify if the Sumo wrestlers are worth keeping, and adjust your process settings to cover up the
data gap if there is a desire to account for the Basketball players.

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical… 11/12


3/29/22, 10:46 PM What is Multivariate Data Analysis (MVDA)?

Want to Know More?

Download this presentation with an example of how SIMCA® solution can help conduct powerful data mining to improve your decision-making process.

Download Presentation (https://landing.umetrics.com/presentation-data-analytics-for-beginners-software-basics?hsCtaTracking=8d4d269e-d7ee-450c-b445-


d1fe3e1fd93e%7Cb58b80e4-953a-4257-90d5-42210fab813a)

 Back to Blog Overview (https://www.sartorius.com/en/knowledge/science-snippets)

https://www.sartorius.com/en/knowledge/science-snippets/data-analytics-for-beginners-how-multivariate-data-analysis-can-separate-the-players-from-the-gorillas-507202#:~:text=MVDA is a statistical… 12/12

You might also like