Hello everyone, welcome to our Community Module 338 and today we will have our very first lecture on deep new deep learning deep neural networks and today we will cover artificial. Your networks will say the basics of artificial neural networks. So first of all we will have a quick. A recap of we have seen before. As I said before, so our what we have covered up like traditional machine learning for computer vision. So we have seen supervised learning, so using these classifiers to predict the class labels on one way image. So we have these kinds fair like KN K nearest neighbor classifier to assign one label to one. It's basically we calculate the distance, return the test image and is neighbors either one neighbor or several neighbors. And also we have used. We have seen naive Bayes classifier, so we have this prior information and we have this. We predict the. The label of one class based on the Prairie information and also there is a feature information OK and also we have seen softmax regression models. So we have since the basic linear regression model and then we have seen so one class. I am softmax regression, which is the logistic regression. So either zero or one an. We extend it to softmax regression. We have multiple labels and we predict the labels of this of these samples. So basically that's the softmax regression and also we have seen unsupervised learning. We are seeing clustering. How to use like K means clustering too. Group different data points and then we have these cluster numbers. For one you test point so you can see for answer my learning. We do have a label for this data points. Means that we only have these cluster members. And also we have seen this principle component analyzes in the last lecture we have used two approaches to compute this principle components. So the first one is we use the character Ristic equation to guess the principle components. But usually for these measures we need to have these square matrix to get this principle components. So that's why we have. The second approach we have as VD to get the principle components. That's another approach, so the metric doesn't have to be square matrix. So we have seen these traditional methods, so we you can see we learn the features from the data and then we use these machine learning models to predict them. But you already we have to gather free features first using some handcrafted's method, for example, like a safety features or LBP. We have seen before. So this one. So we can have some linear functions or not nonlinear functions to learn these features, but they were they are hand designs. We have to specify what function water filters for these. Descriptors and they can't be learned automatically automatically, so this accounts to the deep learning. So we use a network to learn these features autonomically automatically. So today we will start the lectures deploring and we will see applications of the deep learning and we will see some very basic components of artificial neural networks. So you may see different communities at visual new networks. Deep learning and saying convolutional neural networks, so both of artificial neural networks and convolutional neural networks. They are deploying approaches as their deep, so we so we have networks to learn these features. But usually we see artificial neural networks we we see like we have these nodes and there is to connect with each other before convolutional neural networks we have these. 2D filters to learn the features, so that is what we see in color vision right before we go into convolutional neural networks for War war, say artificial neural networks or how it works, and then we will see convolutional neural networks in the next lecture. So today we will see thread hold neurons we will see activation functions, linear separability and also how to construct. Multi layered network. So first of all, what is deep learning? So here is a very quick overview of the difference between traditional machine learning and deep learning. So deep learning is a family of methods that use deep in that we use deep architectures or we cause a deep neural networks to learn high level feature reputations. So in the traditional machine learning like we have seen before, we and also you have seen it in the assignment one. So using the bag of words to do the classification of images. As you have seen before, we have one input. Why April image? And then we do feature extraction? In our assignment one, so we used safe descriptor. So we use this dog filters and we put them together to to to gather these image gradients and then we have these descriptors. All of these descriptors how they are formed, how they achieved a specified by ourselves and we have to tune these parameters. So this is the feature attraction and then we use a classifier. So for example the canoeist OK nearest neighbor. In our. Assignments, so we assign the label to each data point. So in this case we have these training phase and testing phase right. And in the training, so we for every for every data point. So we have one data like one image and then we have the label. So that's the ground truth for the labor. And then we train this classifier. And when we have the test image, we use the trend model to assign the label to the new data points. So that's the classification, and then when we have this test image, we give the output whether this car or this is not account. Before deep learning. Basically that the pipeline is quite the same, so we have the input and we have the output. You may still remember, so we for all of these machine learning or. Problems we have Y equal to FX. So X is the image. And why is the output so whether it's a label, whether it's a cut answer this but for departing so there is F. So we use these new networks. To learn this function to get this function. And we have we combine feature extraction and classification schedule in this new network. So for some layers of the new network. So it will do feature extraction so it can be linear combination of the inputs. It can also be non linear combination of the inputs. So that's a feature collection so you can see it's just to learn something from the from the input, for example for for this image we have the clutter. So we may have some useful information, so like these lines is geometry shapes, but we implicitly embed them into the new network, so that's the feature extraction using the neural networks, and then we have the classification phase. Also using base layers of neural networks to do the task like that were done by knife band like Kane, nearest neighbor or other class fares. And then we output the label for this image, whether it's a car or not. So that's the main difference between machine learning, traditional machine learning and deep learning. Really, we see deep learning is part of machine learning OK, but it is deep as we have new networks that may have many, many layers. So that's why we call it deep learning. And here are some examples. Have a glance how it looks like. So here we have a simple neural network, so if this is just a three layer network, we have the inputs and we have 1 hidden layer, an one output layer. So Steve, we have Y equal to FXOX is the input layer. These nodes and we have the output that is Y here and in between we have 1 hidden layer. To get this function so you can see we have several several connection between them. That is to get this transformation of the input. So we learn. Information from the inputs and then we have this connection to order pulls the label. An IF we a time expanded to deeper new networks, so we have the one on the right so you can see we still we have the input layer. We have the optical layer, but instead we have several hidden layers. So here we have 4 hidden layers and also you can see we have a lot of connections between them. So which means. And these are also combinations. It can be linear combination. Or non linear combination, but you can take it just a function of them. OK and this is the first class of this architecture of the neural networks. As I said before, your networks so we have so they are composed. They composed of nonlinear transformation of the data and our goal is to learn useful repetitions, also known as features directly from data. OK, we don't specify these functions, so like the ways for each filter. But we use the net. Network learn by itself. And there are many varieties. It can be supervised like a convolutional neural networks for class classification problem. It can also be unsupervised learning, so for example like autoencoders or sparse coding. So we have one input. We don't have the label for it, instead we just give it like. Like a reputation as the odd foods. So for example, we have won the stages as imports and we want to generate another form of this data is so that is the auto encoder. So that is answered by learning we don't give any labels for the input. So first of all we will. We will cover like convolution, convolutional neural networks. So this which is supervised learning and also artificial neural networks where they usually what we do is for. Survive learning, so for example, just to recap what you have seen before. So for this data samples, if we have labels for each point, so this is a simple learning OK if one have one image, this is an Apple. We have a label that is simpler. Learning if it is unsupervised learning, we have clusters we don't have any labels for this sum for this data points, so that answer violently. So there are some different types of deep architectures, so like convolutional neural networks, mainly for image classification and we have auto encoders. So for dimensionality reduction. So usually we have very we have images of very high resolution. It will it have a lot of space. So is it possible to reduce the dimension dimension nudity of the images so that we can transmit these images efficiently? So that's the autoencoders. And also we have deep belief networks, so it's for image recognition and generation is using like this vision. An inference to get this belief networks and also one important type of new networks is recurrent networks. So before we have seen this images and also we have seen these videos a sequence of images we have the time dimension. So for recurrent neural networks we have a sequence of images and we learn how these images in one sentence correlate with each other. So that is the. Recurrent neural networks is for learning patterns in sequential data. So first let's have a quick look at what we can use. Deep learning, OK? So early, if you have a lot of data. And you have a lot of challenges to address. So like we have mentioned in earlier lectures, like light conditions or very challenging challenging object to recognize. If we use these handcrafted descriptors like sift descriptors to recognize objects as we design these descriptors very carefully, and also we have to tune. These parameters very carefully so that it can achieve relatively good performance for some tasks, but the real world is very challenging and if we have a lot of data, so we have two. So our handcrafted features like safes, they cannot be used. All of these applications, so we need a very strong, very powerful function to approximate all the transformations in this data. So that's why we need deep learning, so we have a very deep networks to learn these very challenging features from the data. So that's what that's why we need deep learning. And there are a lot of applications of deep learning. So for example, we have seen object recognition or detection, so to rockhopper recognize OK, there's a car. It's a dog, and that's object recognition. And for detection not only to predict this is a dog, but also reward predicts the location of the dog in the image. That's object detection. And also we have seen since segmentation we segment this image into different. Regions and for each pixel in the image, we have a semantic label for the pixel. Either the pixel belong to a person or these pics are belong to the road, so that's the sense segmentation. Now we have 3D face reconstruction. We have this image and we reconstruct the faces of a person. And you can see all of this. We have Y equal to FX. We have the X as images or a sequence of images, and we predict the labels of these images or the pixels in the image is. That's how we do deep learning. And also once we have these models we can use this deep learning models for some other applications, for example autonomous driving self driving cars. So we have these cameras. The cameras can get these images and then we use these convolutional neural networks to predict OK. What kind of commands to make and you already we have this ground truth steering wheel angle to compare these predicted commands and then we will know. OK so you only these commands will come from a person. OK I trained person. Ann and then we compare the difference between the predicted label predicted steering wheel angle and the one provided by a person. OK, we get the difference and then we trace the model using these by propagation. Exam, so we'll see how backpropagation works in our future lectures, and that's how we use these deep neural networks to train our model autonomic strategy task. And also it can be applied for robotics. So here we have different robot arms. And they are picking up objects from these containers, so this can be an example scenario for like trash. Sorting, so we need to. We need to recycle some trash right? So we need to classify this is recycle recycle this is not risk recycle so we need to pick up this objects from the containers and sort them. So you can see here we have we have camera for a job alarm and this camera will classify each objects and will OK. This is recycable and we'll put it. Into another container and the robot arm will pick up and these objects, and that's how we use this deep learning to do robotics tasks. So still you can see we have the data, so the images from from the camera and then we have the predictions. OK, the labels for each object and weather is respectable and then we have these actions for the. Roberts the most basic. Deep neural networks is artificial neural networks OK? So we will start from here and then we will go to convolutional neural networks. So before we go into artificially neural networks, let's have a quick look at. There's real neural networks in our brain here. Yeah, so our new systems. So you can see here we have a new room and we have signals from left and we send signal to the right. OK and this and once we have the signal and the neural world gets activated and then sends the signal to the next neural and every neural is connected to its neighbor noodles and then we can represent this. Real neural into artificial neural below. So you can see here this neuron is connected to several of. Of several neurons behind before it. So for example, here we have a signal X zero. We have a signal X one and we have a signal X2 and then we have different ways for their signals WO W1W2 and then we combine them together in this new body. OK, so this is a linear combination of all of these input signals. WI multiply XI plus B, so B is the offset. Pentar, so basically this is a linear combination of all the input signals. So here we only have linear combinations, but usually we may have nonlinear. Transformation of the input signals so we have activation functions. That is the F so that we can have nonlinear features in this transformation. So we have F. So you can see we apply F to this linear combinations of this input signals and then we get the output signal and send it to the next neuron. So that is a function of how one very basic neural works OK. So you can see your networks are able to learn by adapting their connected connected. Connectivity patterns so that the algorithm Organism improve its behavior in terms of reaching certain goals. Basically, this is this is machine learning problem, right? So we want to get the model to predict as all foods actually like we did before, so the strength of a connection or whether it is a planetary or inhibitory depends on the state of the receiving neurons synapse. So when I say synapse here so you can see. So we have these synapse for that for the neural to get the signals. As the neural networks achieves learning by approximately adapting the states of its synapse. So. So here only see this adapting the states of its synapse. So we mean we adapt the weights W1W0W2 and also the F function is nonlinear function. So in the traditional machine learning approaches, so we have these handcrafted features. All of these ways are pretty funds or tuned by uh by human users, but here. The new role as a new networks were tuned. There's ways automatically automatically according to the input, and I will see how we do this using back propagation later, OK? So just to revisit what we have seen in the previous slides so you can see we have one neural here and we have several synapse, so that's the connection between the previous neurons and this current neural. OK, we have the input X1. That connects with the neural using these weights WI one and we have on this input signals X2 and 2X N. We have different ways for the input and then we guess the output signal Yi. OK so we have this. At simple signal this is a linear combination of all the input signals. And for the audible signal will apply this nonlinear function to the net signal and gets the final output while I. And for the F like I said before, this is a nonlinear function, usually requires activation function OK. So one possible choice is a stronghold function. OK. If there is no signal is greater than theater, so we get one, otherwise we can zero. So you can see this is not linear, so we have a threshold function to get this nonlinear features. So here is a graph or it looks like if the net signal is lower than the threshold sitter, so it will be 0. And if it is greater than Theta regards one. So when we have linear function so it looks like a line right? So we don't know how the gradients will be. There is a consistent by here we can see for nonlinear functions the gradient I, so we don't have any gradients at this point. Right, so as this one the building would be 0 but we jump to the next value here, so it's not consistent. So that's why we have this nonlinear function. So we have the activation function. OK, that's the most common activation function and the same place activation function, so we just use a threshold to get one or 0 four different scenarios for different. Night signals. So here we have seen there is distress holds neurons, which is the majority. We call it perception. So we use a sharehold noodles and also we have a stronghold logic units. So we generate this binary inputs and also we have other options for activation functions. So for example linear neural. The output is a linear combination of inputs. So here we just use a linear another linear combination of the. Not signal now we don't have any nonlinear functions or we can have rectified linear units or we call it Relu. So we will see this later. And also we have Sigmund oh Euros. So all of this so they were generated. This nonlinear features for the input. So for the SIG model, are neurons, so this is very common type of activation neural, especially in learning networks. So we have this logistic activation function. So you have seen Logistec. Classifier before right? So we have zero and one you may probably have remembered this graph OK and here. So we have the same function, so you can see we will squash the input to the interval zero and one. OK, and using this function. And this is not later, so it will be if it's below 2 zero point 1.5 will give them 0, otherwise we'll get one. That is the SIG model. Neurons is a small Sir an but is a 9 million. So a network of SIG model neurons for his end inputs at EM outflows computes the function. So from this end, dimensional Space 2 zero or one. So using this function. OK. And that's how we achieve this learning features. And also we can change the. Features they say nonlinearity of these Sigma model murals by tuning some hyper painter parameters. So basically we have two. Happen Panthers Hall answer. So here is another form of general SIG model neurons. So here you can see. So we can have the offset sitter for the net or input signal and we have a temperature. Palmater call to change the shape of the SIG model. Function so you can see if we have the same talk. I liked all equal to 1 for this for this one and for this one. But if we have different Theta for the offset parameter so you can see we just move this line too. It's right, so there's offset and offset. But if you compare if we have the same offset at 0, so this blue line and the red line. So you can see the shape of the curve is very different. OK, and that's how we get different Sigma model neurons. So the parameter talk sometime we call it the temperature parameter. It controls the slope of the segment where the parameter Theta controls the horizontal offsets in a way similar to the threshold euros. So that's another type of active activation functions. So here is a graph of comparing different activation functions, so we have seen Sigmund oh activation function, but also we have many others like we have. We have mentioned Relu rectified linear units, so you can see here if it's below 0, so it's all zeros. If it is greater than zero, thoughtful. Would be linear. OK, that's the value. And sometimes if we give it all zeros is not a good thing. So as and also is not differentiable sometimes, so we need to get the leaky Relu. So we give a very minimal value if the Internet signal is lower than 0. So to solve this problem, so we have this leaky Relu. And also we have some others like 10 gauge Max odds and PLU for different types of activation functions. You can check for more details. Online an you can practice in our lab sessions of using different activation functions and you can find out how they affect the output. Semester, let's have look at the thresholds logic units that I'll use. So here we have these synapse. So we have the input X one X2 and we have the ways for different input signals W1W2 as before. And here we have these offset the theater. And if we give it to with W1W2 as one both optimized one and we get Theta as 1.5 and we will get our. Outputs. So basically we get is X1 plus X2. Um? Compare this value and is this threshold right? So if we so for this case for threshold logic units we have X1 and X2, both of them are logic inputs as zero or one or one. So we have 4 cases. 004X12 and 10141011 as their logic inputs. If it is 0000 is lower than 4.5, right? So it will be 0. And here if it is 1 so it will be 0 as well for the output as zero is lower than 1.5. And only when we have two 114 X one X2. So what we get is 2 it is greater than 1.5 right? So our output will be one. So this is the Stronghold logic unit. So this one we get one function that is the end X1 and X2 so you can see here that we achieve this M function using the combinations of 11 equal to 122 equal to 1 and Theta equal to 1.5 right? And also we have we can have different ways and here we have that 1 equal to 1. And a picture. You could do one an here. We have these threshold as their .5. Again we use this table to get the output OK and you can see here we combine where we compare X 1 + X two against the thresholds 0.5 and you can find all of these three cases will get output one and in this way we get a function X1. Or X2 that is the function we guess by having these combination of ways and the thresholds. And by doing this way we can have different logical functions, OK? But here is a problem as a possible to achieve X or. So for X of what we want to get is OK only when they are different. So you can say. If X one is 0X2 is one, we get one. If X one is 1 X 2X2 at 0, they're different. X One is different from X2. That gets one as all good. If they are the same there zero or one one they offer is there. OK, so the question is, is it possible to fund a combination of W1W2 and the threshold Theta? To achieve exall function. So that. Answer is is impossible. OK, so we can't use this W1W2 ANSI to to get this function. OK, so that's the limitation of the perception, so that's the only one new room for in the network, so it cannot achieve such functions like XR XR. So it means to use can only realize linearly separable functions. Um? We need to find out what it means. Linear separable, so linear separability means so a function from this space there 1201. So we have several combinations of these inputs. At order ports after zero or one. If it is linearly separable, so the space of input vectors else one can be separated from those yielding zero by linear surface or we call it this hyperplane in dimensions. So here are some examples. So in two dimensions. So this is the case we have seen before, so we have X1 and X2. They are the input variables and they have the values of either zero or one as they are logic inputs. And we can we have this table to represent these different cases and we can separate these so for all. We have these hyperplanes to separate these cases. OK, if this is all all one, so they have the one and for this case they get one. But here they have. Um? They have they have only one case for outvoting O, and here we have a line to separate these folks this. I'm sorry. This case on to this case. OK. So you can see we have linear the linearly separable. But 4X all. So when we list all of these four cases, so we have different. Outfalls ones and zeros. So you can see. There's no linear that they're not linearly separable. If you draw a line here, there's zero on this side, and there's ones, and there's on this site. If you draw here. On this side there are ones and zeros. So we can't separate all of them, OK? So that's why the perception we have sent this is not we can't achieve a function that is not linearly separable. OK. And we can. Explain the linear probability that has considered the function. So this mapping OK. And was aware there is input represent real numbers and this is exactly the function that with our thresholds neurons used to compute their output from their inputs. So that's what we have seen before. And for linear separability, so input space in the two dimensional case an equal to two. So we always have a. Learn to separate this zero space and one space. So we have these different ways. And we have these thresholds so we can separate these. There are space and one space. So basically we have one X1 plus 2X2 and we compare all of these numbers against Satan and for this any points in this base it will be greater than say to this number will be greater than say to all of the points in this space. They will be lower than Theta and that's why they our son I 0. And also for this case. So that's how we get these linear separable linearly separable spaces. By varying the weights and thresholds we can realize any linear separation of the input space into a region that yells out for one another region that is output 0. So as we have seen, a 2 dimensional input space can be divided by any straight line. So a 3 dimensional input space can be divided by any 2 dimensional plane and also you can give it more so if. In general, an N dimensional input space can be divided by N -- 1 dimensional plane hyperplane. So of course, for N greater than three, this is very hard to realize, but you get the idea from the two dimensional case. So of course the same applied to our original function F of the TLU using binary input values, so the only difference is the restriction in the input values, so obviously we can't find a straight line to realize the X or function, so Excel function is not linear. Separable, so as we can't find any line to separate zeros and ones. So what's the solution? To achieve this X all function. So the solution is to have a network to have multiple perceptions, perceptions to have multiple neurons to achieve. There is actual function together. So we need to combine multiple plushies into a network and get this function. So here so you can see we have multi layered xalt network. So the logic is. Exall is equivalent to all hands, not N. So that's the logic, right? So this is the left is equivalent to the right? So which means we can have. There is, we can have off 1st and then we have no end for them and we combine or and not end together. So we have a two layer. Network OK, the first layer we can all and not end and in the second layer we get them together and then we get the output. So that is the X or function. Right, So what? We guess the ways for all nodes and not UN nodes, and then we get the unknowns and then we get the X or network. So that's how we achieve these axol function using a network. So we have two layer. For this case. So that's why I wait. Dad days, Multilayers Axol network to achieve these classification of of the cases that are not linearly separable. So the first layer we call his employer just the content, the input vector and doesn't perform any competitions. And the second layer, we call it heavily and we get the input from the input layer and sent its output to thoughtfully. So after applying their activation function, the numerals in the output layer contained the output layer. So we can have this activation function like we mentioned before to get this nonlinear features and the neurons in the output layer. They will contain based output vector. So for example to predict the labels of the input. So to wrap up today's lecture so this is the very first bit of deep learning. So we have seen what is deep learning this is to gets very complex repetitions of the input, so we don't have to design this feature extraction. Measure so we use networks can learn these very complex repeat repetitions using these networks. OK, that's deep learning. And we have seen some example applications of deep learning and also we have seen artificial neural networks. We have seen strong hold neurons. Basically that's the perception and we have seen this activation functions and we have seen these plushies and we have introduced a linear separability and we have seen we can't achieve XX all by using one single perception and then and that's why. We get the solution using multi layered network to solve these cases that are not linearly separable cases an for the next lecture. The world cover by propagation. And see you in the next lecture, thank you.