Hello everyone, welcome to our compatibility module. Com 338 and today we will continue on machine learning for Vision an. In the past few lectures so we have sent this supervised learning income provision so we have seen these classification problem using different classifiers. So we have seen this KNN classifier. Now we have seen.
A naive Bayes classifier and also we have seen softmax regression classifiers and also we have seen the evaluation metrics for classification.
We have said before for supervised learning, especially for classification, so we have a label for these objects in the image, so that's why we call it supervised learning. We give it a name, it will give it a label. But for unsupervised learning so we don't have any labels for for these inputs.
Um?
Instead, we have we cluster them together. You have seen these came means clustering in our previous lecture. So we group together different data points and we calculate the distance between these data points. So we cluster them and that is unsupervised learning. And today we will introduce another method for unsupervised learning called principle components analysis. In short, we call it PCA.
So early we use it for dimensionality reduction and we'll see how to compute the principle components of PC and we use principle components to represent the data and we will see a PCA example, the eigenfaces. So that's four phase.
Reputation so they human faces.
So first of all we will have a quick recap of uncivil learning, an clustering we have seen before. So for unsupervised learning there is no labels for the data. So for the data we have these. So for example, here we have eight data points.
And then we group them and together. So this is also a training phase. OK, so there's four data points will be one group and this will be another group. And in the test when we have a new image or data points.
So somewhere here, OK we will calculate the distance from this point to the cluster center of Edge cluster. So the cluster center of this one will be here, right? The Cluster center of this cluster will be here and we calculate the distance from this point to this one and the distance from the data point to another cluster center, and we find.
OK, this is closer, so we will assign this new data points to this cluster. OK, so basically we see they are similar OK.
Another example on the right. That's what we have seen before we have. We have data of images of different fruits here, but we don't know the names for these fruits. We only know how they look similar or different. OK, so we have this input data and we have our model. So we train the model to represent this input data so.
Well now OK. Is this data ponds? These images will look similar to each other, so an this data points they look similar to each other based images and those images look similar to each other and we group them together and next time when we have a new image and you data OK we have this model and then we find the distance of the new data to these clusters in the.
In our data set OK and we will find this new data is closest to this cluster OK and then we assign this new data to that cluster. OK because they look similar, so that's how we do for answer by learning so there's nobody tells tells us this is ample. This an mongo. This is a pair, but instead we just know this will look similar.
To these ones we have seen before. OK, but we don't have labels for that, so that's unsupervised learning.
And we have seen a very basic exam called K means clustering in our previous lectures. So expect for the section on image segmentation. So that is a general class of measures for unsupervised classification of data. So these classes are not pretty fun, so we don't have the label. We don't have the classes for these data points OK?
And these measures can be used to group image elements based on similarity. So I have said again again. So for answers learning, we just evaluate the similarity between these points, but we don't have the classes for them or labels for them, right?
So here we have different data points and we group them together and we have these different clusters. So for the data points within one cluster. So we have intra cluster distance, so we want to minimize the distance between the data points in one cluster. That intro class class we want to minimize the distance and we want to separate different clusters. So that is Intercloud Inter cluster distances we want to.
Maximize this distance between different clusters.
And on the right, that's what we did for K means clustering, right? So you may still remember, for K means clustering, we initialized the centers of clusters randomly first. So for example here we have K equal to two. We have two clusters and we initialize so too.
Cluster centers and then we calculate the distance of between the data points with these two class centers and then we re center these clusters and then we find a new cluster centers and after a few iterations we will find OK. There's locations of these cluster centers will converge to some places. OK, so we have funds the cluster clusters for these data points OK, Ann.
These are the cluster centers. And then when we have new data points for example, we have a new data here. OK, we calculate is different. Is distance to these two cluster centers OK and we will find the distance to this cluster center is smaller than and to the other one OK. And then we will know. OK this data points.
Will be assigned to these cluster. That's how we do for unsupervised learning.
So that's how we had 4K means for image segmentation and you have since in our lab lab session an image segmentation that is unsupervised learning.
So the advantage of uncivil learning compared to supervised learning first, the training data is cheap. So for silver learning we have to label these images. So for example, for image classification, so we have to label this. This image is a car. This image is.
It is a dog, so that's what we have seen in our assignment one. So we have to have label for every image.
But for us we're learning. We don't have to give a label to this data, so we just group them. We don't have a label for them.
Um so.
So the expense of incense of answered by learning it is to understand the structure of their world.
OK.
Um?
So this hallway learns about the world. So when we grow up so if nobody tells us this is an Apple. This is a banana, but still we can tell Banana is different from my Apple even even though we don't have a label for for them. We don't know. It's an Apple. Nobody told us, but still we can learn about the world, right? So we can understand the structure of the real world.
So similar for this example, so we have a label with. We have a model for the for for like separating these fruits so we can tell these rules.
A different from these ones. These fruits are different from these ones. We can tell the difference.
OK, and when we have a new data so we can tell which ways fruits it is most similar to these clusters. OK and we can understand the structure of the video work.
So now we have seen these unsupervised learning how it works and we have seen K means clustering another K exam for answered by learning is called principle components analysis is a quite common method for data mining for most learning and the concepts behind the principle component analysis is very important for a lot of fields. So even for like controls areas or.
Mechanical engineering and many others so principle components is analysis is very important PC.
So first what is PC?
So PCA is pattern recognition in high dimensional spaces.
So as we have seen before, so for safety descriptors. So we have 128 and for other methods we may have very high dimensional space for the features. So for example if we take one image, the original image as as one feature, one kind of feature.
The dimension of this feature would be very large, so it can be thousands. It can be meeting so the feature space will very very high. But if we have very large feature space there will be problems. So problems will arise when performing recognition in a higher dimensional space. So this is called curse of dimensionality. So if we our if our feature space is too large.
If we want to calculate the distance between 2 features, it's very hard as the competition it will be very high OK.
So one potential solution would be OK, we can map this had dimensional feature to feature of our lower space. So we select the principle components of these high dimensional feature as as there is a lower dimensional feature to represent this high dimensional feature.
So second and second significant improvements can be achieved by first mapping the data into a lower dimensional subspace.
So as you can see below, so originally we have a very high dimensional feature and we have a very large feature space so it can be 120 is for safety.
And for some other measure we may have once sold and or 10 thousands.
And if we calculate, the distance is very costly.
What do we do with PCA? Is we reduce the dimensionality of the feature to another space that is as prime X prime. So we have B1B2 until BK and the dimension the dimension of this knew feature would be much lower than the original one. OK is much lower than N, so that's what we do for PC. We reduce the dimensionality.
Of the feature. So the goal of PCA is to reduce the dimensionality of the data while retaining as much information as possible in the original data set. So that's what we do. What we want to do is OK, we want we don't want to use a very high dimensional feature as it's costly. But at the same time we want to get a relatively good performance. For example for recognition or for other tasks to maintain.
The information
so first reduce the dimensionality and seconds.
Retaining the information? OK, that's what they want to achieve for PC.
So PCA allows us to compute a linear transformation that Maps data from a high dimensional space to a lower dimensional subspace.
So that's what we do. We what we get is as product that is a lower dimensional subspace and we have X. So that is from the data from a high dimensional space that is X and we have a linear transformation that is T.
A linear transformation. That's what we do. So T is a transformation matrix and it has a size of K. Multiply an so that's the size for the transformation, and for each component this principle components. So we have B1 equal to T1A1T1282. So we transform this higher higher dimensional space.
To a lower dimensional space and then we get this component. This principle components using this transformation. OK, that's what we do with PC.
As we have we are doing this.
It's not a reduction, so for sure we may lose some information in this transformation using T.
So approximate vectors by finding a basis in a approximates lower dimensional space. So we want to get this dimensionality bears. So here dimensional space repetition. So we have the original X equal to this one. They won a 2V2, the V1V2V N is the basis of the N dimensional space. So for example.
For this condition space. So for the 3D World we have we have XY size, so XY XYZ are more basis of these 3 dimensional space and so you can imagine if we have 100 dimension dimensional space. So also we will have 100 access is similar to exercise.
Although we have lower dimensional space, reputation and then we have this B1U1V to you to an here you want you to UK is the basis of the K dimensional space. 8 balls basis have the same size. If N = 2 K so then so we don't have any transformation. So we have X equal to X prime.
I imagine so for the hair dimensional space, we can have an equal to three. That is the 3D World.
And if we have one run in this.
In this 3D World we can we have this data points in this.
Baseline.
As you can see, it's just a lot, so we can use a street two reward to represent this line, right? So we can have.
Projects.
There is a 3D words to a 2D world.
And give a line in this space.
So in this way, so instead of having three accesses, we only have, we only use to access to represent this tool, and that would be enough, right? So that's why we need to reduce the dimensions of the of the data of the feature.
But not too early. We can say, for example, here if we transform one line to these, two dword is fine, right? But usually we may have these data points scattered in the 3D World. If we project these data points to a 2D world, we may lose some information.
Right, so as a from 3D World to the two D1, we may lose some information.
So peace A as we said before it, it will aim to preserve as much information as possible. That is to minimize the error. So there's a difference between these X and the transform X prime.
And how should we determine the best lower dimensional subspace that is the question to do PC. So the best lower low dimensional space can be determined by the best eigenvectors of the covariance matrix of X. That is, the eigenvectors corresponding to the largest eigenvalues. So it's called principle components.
So again, back to this exam.
So we have these three words and if we want to map this really words for these points on this line to actually work. So we need to find OK with plan. This line is learning on and then we you we get this axis.
As the principle.
There is a.
Principle vectors for the transformation OK.
So for example, for the 3D World we have this line and probably we can draw a plan that is a 2D plan and then we get these vectors and we can transform this 3 words to the 2D word.
And then we get these eigenvalues. And so as these principal components.
So these eigenvectors are these access I have showed you in this coordinate system and for this eigen values they are these principle components corresponding to the values in each axis in each eigenvectors.
So now the question is how to compute principal components. OK, so next we will see the measures to compute the principle components.
Before we do that, we will see some math to compute eigenvalues and eigenvectors. OK, that's a linear agrava we have sent in our previous lecture. So in the linear Acura before for vision. OK, so here we will revisit this at values and eigenvectors. As I said before, so our.
Module will be self content but.
Sometimes you may need some additional mathematical knowledge, so I strongly suggest you to tech more details of eigenvalues and eigenvectors in your previous modules in linear acrobatics and see how it works and also there a lot of materials on eigenvectors, an eigen values online or in YouTube. Check them out sometimes if you don't know them.
Check some videos in the YouTube. There are some very nice illustrations and they can help you to understand these concepts.
OK, back to our lecture. So here we have a few terms. So first we have a that is an in the end matrix and Lambda. This is a scalar, could be 0 and we have X that is non zero vector. In the end dimensional space.
And here we can say for PCA. So what we have is a multiply. The zagan vector will be equal to this add value. Multiply this vector.
So which means?
We represent there is a matrix in the.
System using this eigenvectors.
To put you in a context of computer vision.
So you take it a as a data point, OK? Or like a few images. So that is a you have in your training data set.
And we transform these.
Did points.
Using these, adding vectors so we have the transformation from the original space of a.
Two, there is a space defined by this action vectors.
Anne.
We have this direction this coordinate system OK and then we have this eigen values. So like this scale on each axis.
You can imagine is similar to. I have showed you from the two 3D space to the 2D space. So in the 3D in the 3D space you have this a you have these data points in the 3D space and then we have a eigenvector to transform this 3D points to a 2D coordinate system to a plan.
And we have this plan defined by this eigenvectors days X&Y axis and then we have these values is eigen values to define the skills on each axis.
So we have lamp.
So here is, here is a geometric interpretation of this principle component analysis on the right. So you can say here.
Are we transform this X but a using the eigenvectors?
Up to here.
So from this point to this point, that is a.
And for X and it will be equal to Lambda X.
We just have a skill for these.
X&Y and we get this vector.
Basically is to transform these data points.
By using this active vectors in two days to this space, and then you use these eigenvectors.
Multiply these eigenvectors, eigenvalues multiply the eigenvectors to be equivalent to axe.
That's how we get these reputations. OK, thanks for the example I gave you the transformation from 3D to 2D. OK, that's how we do here.
So here is a real example. So we have these.
Matrix A so we have.
Two points.
20
0 -- 1 and we have two data points.
OK, this is one data point. This is another data point.
And then we have.
X1 and X2.
So they are two AG and measures 01001, and that's the way we want to find. There is again values.
OK.
So you may remember our definition.
Of our.
This principle component analysis analyzes it X equal to Lambda X.
So we use this definition to gather zagan values. OK Zero X one.
So we will have two zero 0 -- 1.
Multiply X1.
That is 1 zero.
OK, and then we have two O.
So then we get to 100, that is 2X1.
OK.
So now we get lumped in one.
Equal to two.
So that's a X 1 = 2 X 1X1.
I love the one X1 so we get the eigenvalue for these eigenvector X one. So we have two.
And then we compute these Acts 2.
200 minus one multiply and 01 and then we get 0 -- 1.
Can we take this matter one horse?
So you can see we have Lambda 2.
ECHO 2 -- -- 1.
Bye.
So in fact so far is value the two or minus one. It has infinitely many eigenvectors, so for Lambda equal to two so 3 zero or 5 zero are both corresponding add vectors.
So As for vectors, we don't consider the skills as they are just giving us there's directions OK?
Anne.
And also for example 430.
Oh, and +50 is still on eigenvector. As I mentioned, we don't care about the skill for this eigenvector, they just give us a direction on that axis.
So you can see here we can get this eigen values.
It means we have two data points.
And we transform them in this space defined by X one X2. Basically that we have this.
2 dimensional space, right?
And we knew or we represent this A in this.
In this corner system using two and master one, so they already gave you these key information for this matrix.
So here basically we have these 2 -- 1.
Which means we project.
This matrix to one point.
In this coordinate system, so we project these.
To buy two metrics to one point so that is 2 -- 1. OK, so we achieve this dimensionality reduction problem.
So that's how we transform from a higher dimensional space to a lower dimensional space.
But here in this example I gave you some, I gave you this eigenvectors and we get this. I bet I can values, but usually we don't have these eigenvectors. We need to compute them. So how to compute this eigenvectors? I mean I again values.
Steve, we go to the definition of Eigen value problem. So we have X equal to Lambda X. So we have we this it will be equal to Lambda IX and then we put this path to the left. So we have like I -- a multiply X equal to 0.
Although we have to make this equation is left to zero at 4X, so we have.
An axe is a eigenvectors, so it's not there. So to make this equation R0 so we have this characteristic equation of a. So we have the determinates of lamp I manners a equal to 0. So that's what we need to achieve than I am SA equal to 0.
So that's how we get this egg and values, or say later how to solve this characteristic equation.
And this is for every Lambda annually. We have several lambdas and so for here we can have egg in the egg and decomposition of a. So here is a square in by N matrix with N linearly independent independence. I can vectors Qi. So here for is laptop, so we have a ax equal to Lambda.
Tax by here if we consider all of these lampas, so we have a Q equal to Q and multiply also this diagonal matrix.
OK, so we have. So when we say diagonal matrix. So this is Lambda one Lambda 2 until Lambda N.
So that's the diagonal matrix.
By having all of this.
Principle.
All of these Egan values here.
So we have.
Other will have a.
Will be as away. Take this Q.
To the right. So we for both sides both sides.
So we multiply Q inverse and then we get a equal to Q. Multiply these diagonal matrix and multiply the inverse of Q. So that's how we get a.
So we get this eigen decomposition of a.
So on the rise, so four Q. So we have all of these. All of these adding vectors and four.
For this diagonal matrix, so we will have all of these lambdas.
So they are eigen values.
So this is the eigen decomposition. So to do this again decomposition.
We need to solve this. This cracker Cracker Ristic equation.
So next we'll see some example.
Here we have there is a.
We want to again decompose these metrics. So basically like away said before, we need to do is to solve this characteristic equation.
So we guess Lambda I -- a you could chew 0. So basically we what we get is.
OK.
So that is lamta manners too.
Minus 11.
It is tough and this is 1 so this is.
Lamta part 5.
And so we guessed that determinates equal to 0.
So these determinat is equal to Lambda I.
Determination of Lambda I -- 8 so Lambda manners 2.
Men's 12 and one.
Anh Lam TA.
Manners manners. Five that is minus 5 and then will work as we will get these determinates.
So for linear aerobar we know. So if we have ABCD, is determinates will be a D manners BC.
Right, so we need what we need to get is Lamb to manage to multiply lamta.
+5.
Manners one multiply.
Minus 12 equal to two.
By solving this equation.
Regards.
The eigenvalues.
So we can guess Lambda one is equal to minus one and minus two is equal to mass two. So now we have we have goals the eigenvalues. So next we need to find the eigenvectors corresponding to these eigenvalues. OK so we have two cases. The first case is Lambda, one is equal to minus one.
And then we still use the definition. So we have llama I -- a multiplier X equal to 0 and then we guess OK. So now we get Lambda 1 equal to minus one and then we have this part as here.
And the four X.
We have X one X2 equals 200.
And then we will get basically what we get is minus 3 X 1 + 12 X 2 = 2.
So here what we get is minus one is equal to four X2, right?
So we get full here.
So that is the eigenvector. So for one.
Yeah, so we guess the very first.
Eigenvector for eigenvalue minus one for adding value X2. Still we solve this equation.
Are we get we give X2 as matters too and then we get this matrix multiply X 1X to make it equal to 0. So here we have minus 4 X 1 + 12 X 2 = 2, so we have X one. You could do 3X2 so we have the eigenvector as 31.
So now we have gas. Everything we have guests Sagan values X -- 1 and Masters 2.
And we get all the dragon vectors.
4 one and three one, and we are done for this again decomposition. But there's one problem.
So you can see here.
For AG and decomposition we always need to solve this equation.
So here we have a two by two matrix for a.
And this is easy to solve to get these eigenvalues. But for example, if we have these import space very, very large one. So for example four saves, so it can be 128 and we may have a lot of data points, so the matrix would be very very large.
So a will be very large, so it means it will be very hard to solve these characteristic equation.
So for this case.
They said it was very costly and sometimes it's intractable using this method and also.
In this approach.
As here we have gods. All of these eigen values.
Lambda one Lambda, two Lambda N and then by listing all of these eigenvalues so we can guess the diagonal matrix.
Here.
And now we have got all the.
All the activators so by listing all of these add vectors so we can get Q.
Zero for this case.
What do we guess is a equal to Q?
They stagno matrix criminals. One what we guess is.
Ann, for one.
31
and this diagonal matrix, so it will be minus one and.
Masdood
an multiply the inverse of Q.
That's what we do.
For adding decomposition. OK, so now we have these.
Again.
So as all the other cases would be there, so they stagno matrix. OK, alright again, so this.
Mass one mass, two or the other places will be there as it is a diagonal matrix.
That's how we decompose this into this matrix A.
OK, and.
But for this case, as you can see here for Eigen decomposition, so we need to have guess the inverse of Q.
Which means Q should be a square or matrix.
You may remember for these all the matrix, so for the definition of inverse, so we will have key multiply its inverse.
Should be equal to is inverse.
Multiply Q should be equal to I an identity matrix. So for example, if we don't have a square matrix, so we have multiply N and so its inverse will be an multiply an multiply an.
And for this case it would be an arm an N, so the result will be OK for this one it will be modified M. For this case it will be N multiplied N the size.
If I'm is not equal to N, they can't be equal, right? So that's why.
Q needs to be a square.
But that's what we don't have you already.
So this comes to the another solution that is SVD segler singular value decomposition.
OK.
Let's have a look.
Still, you can push your feet into the shoes of computer vision, so here we have a data matrix X.
So this it has a set of N multiply PON is the number of samples. So for example we have 10 images and P is the dimension of the original feature vector. So for example for safety so it will be 128.
So if we have 100 samples, so X will have a size of 100, multiply 120.
So that's what we have for the input that is X.
And let us assume.
Access centers so, which means the column means have been subtracted and are not equal to 0.
OK, and then the singular value decomposition SVD of X. So we have this. X will be equal to you. There's a diagonal matrix of singular singular values and multiply.
Transpose of V.
Suppose U&V are orthogonal matrix is. So when we see orthogonal matrixes.
So their trust their transpose will be equal to their inverse. So Uth multiply. You will be equal to you multiply you T equal to 1 identity matrix and same for V.
And this is a diagonal matrix of singular values, and V is a matrix.
Of eigenvectors.
OK, so this is the the matrix of.
Eigen vectors.
So this column is active vector.
So that's what we want to guess. So we want to get eigenvectors. So this is a singular value decomposition. So for the proof you can take out more details so online. So for example that Wikipedia, so here, so we won't go into further details, but here this is a form of singular value decomposition. OK, and in this case.
So X doesn't have to be a square matrix, so it can, so N can be different from P. That's really what we have, so we have 100 data points, but we have 128.
Visions for the future.
ASMR so.
We can construct.
A conference matrix that is safe. He could to transpose of X multiplier X that characterizes the scatter of the data. So how it spread in the space?
So by doing this it will be a symmetric matrix, so it can be diagonalized. So C will be equal to XTX, so we can use this singular value decomposition, and we can guess.
So X transport will be equal to V. Multiply these diagonal matrix and multiply transpose of tea and not apply. Xu is the transpose of the diagonal matrix and transpose of key away.
Hi guys, this Yuan Yuan V they are orthogonal matrix is by definition here. So this will be identity matrix. So we remove this part. So we guess we multiply this diagonal matrix and its transpose and we multiply is vis vis transforms.
At this part will be the square of this diagonal matrix.
So we can see the fall.
And by comparing X ANSI so we can see there's some relationship between these sample matrix and is convenience matrix.
So you can see.
Both of them will have the same eigenvectors as they will be the same.
Anne.
You can see.
So in between the diagonal matrix matrix.
04 C The diagonal matrix will be the square of its elements of of the diagonal matrix of X.
So that's their relationship.
So it means that these principle directions for.
There is a matrix and sampling matrix. They are the same and also the singular values are related to eigenvalues of covariance matrix.
So here, so you may still remember, so we can have.
Hey equal to Q.
Minus one. That's what we have.
So here.
Is the eigen values for the conference analysis for for the convenience matrix.
So for this one we have the singular values for X. Here we have these as we have here we have the.
Eigenvalues for the convenience matrix, so we can link them together.
This is a square relationship.
So for the X. So the main reason why we do single singular value decomposition is so X may not be a square matrix.
So that's what we have seen before, but is.
But is covariance matrix is symmetric? Is a square matrix?
So that's how we can guess. So V is a square matrix and you is a square matrix. But the dimension of you may not be equal to the dimensions of way.
Right, so because you is not equal to V, we can't do this eigen decomposition but instead.
We go for its convenience metrics.
OK, so we can read it in a form of these eigen decomposition.
Write an by instead. We can once we get this convergence metrics, we get is principle directions and then we can get the principle directions for the sampling matrix as well as they are the same. And also we can guess these. We can guess the eigenvalues for covariance matrix.
And then we can get the singular values for X as they are the square relationship.
So now we can. We can also guess the principle components this by using X multiply V, so by so you can see is transforming the sampling data points by using this vector V.
To the new space.
So basically it will be you.
Multiply they said technol matrix.
So putting them together together is principal components and will transform the original feature space into the new.
Major space.
First, we make eggs centered. Then we subtract the means of X and then we form the covariance matrix. So say will be equal to X. Transform, transpose, multiply X, and then we compute the eigenvalues of C.
So we get Lambda one Lambda 2 until end of the N and also we can compute the eigenvectors of CU1U2U N.
So we get these.
Eigenvectors another good thing for singular for singular value decomposition is we can many place the dimensions of U&V. So for the one we previous see the dimensions of a Q&Q.
They are already previously defined by a the input A.
By here on an unusually we have very large input space A we have to compute very large space by here.
We can many place the dimensions of U&V if we make it small.
So it means we can reduce the computations of these.
Singular value decomposition and also the zagan.
Also there is I can decomposition so that we can save a lot of computation.
So for example, we can select the first few principle components, like 5, so we choose the dimension for V as 5 by 5 and the competition will be very much lower compared to before.
So now we have this eigenvalues and we have a convectors and then we can do the transform like we have seen at the beginning.
So using these linear transformation be 1, multiply U1B2, multiply you 2 so we get the repetition in the new feature space.
So here you can see we have this new axis and we have these values on these new accesses, so we have an you reputation.
So in the in the previous slides we have seen this convenience metrics, so it's called covariance matrix and is it consists of a variance and covariance. So you can see here they are displayed together in a variance covariance matrix. The variances appear along the diagonal.
These ones and covariances appear in the off diagonal elements OK.
So that's the matrix CI covariance.
It is a measure of the extent to which corresponding elements from two sides of ordered data data move in the same direction. So this is definition of convenience, so convenience X&Y will be equal to.
So this one.
X I -- A.
The means of eggs multiply y -- X. So how X&Y correlate?
They say.
Correlate with each other, so that's the measure of the extent to which corresponding elements from two sides of other data move in the same direction.
OK, that's the that's the definition of conference.
How they are?
Relate to each other in their two directions.
So put them together. We have these dimensionality reduction, so from the original space to the new space.
And we can have some geometric interpretation or principle components analysis. So peace a projects data along the directions where the data varies the map most, and this directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues.
As the magnitude of the eigenvalues corresponds to the variance of the data along the eigenvectors, I can vector directions so you can see here.
We have difference.
Cluster we have different data points, so for example for example for these ones.
We can guess to eigenvectors.
And you can see the size of these vectors will represent these eigen values.
So by using these two access we can represent this data. All of these data.
And for this one we have these clustered ADESA scattered data points in the space. We use a different reputation so we can see.
Steve, we have the same directions at this one, but you can see this.
Vector this vector has a much bigger value, so towards this one as you can see, these data points distributes more on this vector. That's why is more important compared to this vector, right? So that's how we have these two principle factors and how we assign these values more extremely so you can see these data points will.
More appear on one line in this Eggen vector.
Right?
So it is more like you.
Get this data points on today's principle vector.
And the scale of this of this ad vector shows that it will contribute more to the confusion of these data points.
So there's a problem of how to choose key. So usually we can.
Use this stronghold.
To determine which key we can have.
So you only we have. We can have this thread holders thereupon none. Or is there a .95 to show. OK, so for example, we have one five 911 like 4 principle components and this is the largest.
Eigenvalue
Annan.
Everyone to achieve these thresholds. So we need to guess OK. For example, for one principle one. So we divided by 11 + 9 + 5 + 1 and guess how much persons it can account for this and we can add two once we arrive at the thresholds. OK, we stop adding more eigenvalues.
So that's how we choose K.
So using this cratering.
And for sure, if we take all of them together.
So it means.
So it will preserve 100% of the information in our data.
So suppose each data point is in dimensional. An same procedure will be applied. So we get this appearance and we get these eigenvalues eigenvectors.
And so you can see, like I said before, the eigenvectors of a define a new coordinate system and eigenvector with largest eigenvalue captures the most variation among the training vectors X and eigenvector with smallest eigenvalue has least variation, and we can compute. We can compress the data using the topview. I convectors this correspond to choosing a linear.
Subspace.
And represent points in a line like plan or hyperbole. And these architectures are known as principle components. OK, that's what we have seen for PC.
So now we have seen some abstract examples how to compute principal components and how to get these origonal feature into the new feature space. And as we will say example, that's the eigenfaces.
So you can see here we have this space or faces these images, so I image is a point in high dimensional space, so we have N by M image. So we have points in this large data point in this high dimensional space.
So you can see in minus M if we have 1080P image. So aim by M will be very large.
Right, so if we want to store these images.
So it will take out a lot of storage and also if we want to transmit these images.
It's a very costly as they are very large. It possible to compress this images into a lower space? It is possible and that's why we use principal component analysis. So for dimensionality compression.
And we can define the vectors in a space as we did in the two decades.
So we have these data points for these images and we can find the distance between images. That's why we use PC.
So by using some basic some base.
Me days and we can guess this manipulation of images. So the key idea here. So I assume that most face images like a low dimensional subspace determined by the 1st K directions of maximum variance and we use piece A to determine the vectors or eigenfaces the spine that subspace. And then we represent all face images in the data site.
Linear combinations of eigenfaces.
So here we have some training images and then we find these base.
Of faces.
So we have the mean, so that's what we did right before we get the mean of all of these.
Images and we subtract them. We make some centered.
And then we get the eigenvectors.
You want until you came so you can see here are the eigenvectors for the faces and then we can use different eigenvalues to get new faces.
So we have these principle components vector eigenvector UK, so this ones.
And we can get some new ones.
So the Phase X in phase space colonies. So we can combine this eigen vectors together to generate.
There is image every phase images is the competition combination of all of these eigen vectors.
So you can stay here.
There's a face image.
Is the mean of the face image plus there is linear combination of all of these basis?
So you can say you want this eigenvector. You to this act vector you three. This like vector and we have different skills for on this ad vector. This action values by combining all of them. We get one face image.
04 Agonizes, Here's a pipeline of.
Five days training and testing phases.
So for the training images, so we find the mean and we get the convergence of convenience matrix. So that's what we did before, right? So we we get this conference matrix.
And we used to say before.
And then we find key principle components.
Oh, are eigenvectors of these converse.
Of these covariance matrix you want until you came and then we project is trading image XI onto subspace spanned by principle components.
OK, so we can guess the linear combination of all of these eigenvectors.
So that's what we did before, right? So we get this.
The mean of the import and we get the convergence and then we get the K principle components and here so we can reconstruct every training image onto the subspace spanned by principle. We can transform the training images into these lower dimensional space.
Defined by these principle components.
By this transformation.
And once we have an you image X Joy project onto the subspace.
So same as the training image.
And we may tag the Reconstruction error XX to determine whether image is really a face.
And then we can classify as closed training phase and can be in K dimensional subspace.
But so that is how we can classify this new novel image, you may get it confused with classification, the supervised learning by here. This is on survivor as we don't give any label to this face is right. So what we do is we reduce the dimensions from the original images to a subspace.
And then we transform all of these input data into another space, and then once we have one aim or one new image, we get these.
It's nearest.
What is most similar to this normal image?
So you may go back to this definition of unsupervised learning.
So here.
For example, we have a lot of inputs data. They are these face images and then we have this piece, a model to cluster them to project some in two days.
Space and once we have a new data we have on you face and we find which one so which one in the database will be most similar to this new data. OK, and then we know finds the nearest the most similar one in the database. That's what we do.
For eigenfaces.
And you can see here we don't have any label.
For the for the data.
It has some limitations for activities, so like global appearance methods, so it's not robust to misalignments. So like background variation.
So for example, we always use these images as the inputs.
But if we.
Get different.
Byrons, the result may be different.
And also is a assumes that the data has Goshen distribution, so we have I mean.
New an so that's why we get it's centered and also it has a covariance matrix.
Plus, for some cases they are not Gaussian diffusion. So for example this one. This is a circular distribution, so this won't. We can't separate them.
So that's where we can apply PCA to.
So for this case, the shape of the data set is not well described by its principle components, so we can use principle components too.
To project, they said it was.
And also the direction of maximum maximum violence is not always good for classification, so for example. So for this to both of them they have these principal components pointing to that direction.
So they are the same.
And we come separate that separatism. So in this case it's not good for classifying them.
For separating these two cases.
OK, to wrap up our today's lecture. So first we recapped answer by learning and we have seen K means clustering and we have seen another method for unsupervised learning which is principal component analysis. PCA is mainly for dimensionality reduction and we have seen how to compute principle components. So we have seen Eigendecomposition 1st and to solve the.
Limitations of eigen decomposition. We have introduced as VD singular value decomposition and we have seen example eigen faces to show how we use PCA for image classification. So for this case we don't have labels for them so we just separate them. So we have the training. We projects this input space into a lower dimensional feature space and when we have a new image so we find.
The most similar one in the database, but we don't give labels to them. OK, so that's why it's unsupervised learning.
And for the next lecture so we will see deep learning for vision and will see official new networks.
So we move from these handcrafted, handcrafted features to deep features. Learn by deep neural networks, OK, and see you next one. Thank you.