Hello again, I'm in this video we introduce the notion of homophily.
Or assortative mixing in networks.
And describe a way to measure it.
This ties up nicely with the main topic of Unit 2.
As assortative networks are those that are formed by in a way well clustered groups of nones.
To keep the presentation simple and focused.
I will restrain from going through examples.
The material presented should enable you to compute the omorfi.
On any given network.
And you know, let's keep the examples. The exercises for the tutorial sessions and for your own home practice.
So.
Several people, sociologists, and various other types of scientists have noticed that people have a strong tendency.
To associate with others whom they perceive as being similar to themselves in some way.
This tendency is called homophily.
Or as I said before, assortative mixing.
Homophily can be measured in networks too.
And one could argue that important groups of nodes in a network.
Are those with respect to which the network shows high homophily?
This picture.
Take him from Newman's book.
Shows Network formed by.
The children in an American school.
You see colors on the vertices.
The children in this instance are classified. Each vertex in this network is given a color.
By ethnicity.
And the other thing that you see in this in this picture.
Is friendship links directed links in fact.
So by just looking at the picture.
One can already argue that there is a tendency there for links to be connecting pairs of vertices of the same color.
There is fewer links in this region here connecting.
Green dots over here to yellow one sound there and you know similar way across this way.
The main goal of this video is to describe away.
To make.
This.
A bit more precise.
So let's see.
A bit of setup first.
We are we are given a network G. Let's call it G. And you know something like this.
For simplicity, assume it's a non directed network and it's got no loops. This is not a strong constraint, but we'll keep it for now.
Um?
And suppose that the nodes in there are labeled identified.
We define it set of positive integers so you know the numbers 123 up to N.
This is nice.
'cause then we can use these numbers. Both refer to the vertices in this network and to index the elements of the networks adjacency matrix. You remember one way in which we can describe a network is by means of a table of zeros and ones one in there.
Referring to the fact that there is a link between.
And.
Two nodes and a network.
Check unit 1. We talked about that there.
Finally, let's assume that there is some classification in there, or the vertex of the of the networks nodes.
So there is a set of classes, a set of colors. In this particular case.
And each node in the network belongs to one of these classes. So each node in in there for instance.
Has a color green, pink, yellow and so.
And you know this classification is a partition. Every vertex in there must belong to one of these classes.
Right, So what do we do with all this?
Um?
Let's instead of moving on. Let's slow down and.
Let's do it.
Paper and pencil. I mean the closest we could be with two paper and pencil. OK, so.
Let's remind ourselves about our goal. Yeah, we want to.
Find the measure.
Of the tendency of a network to connect vertices of the same type.
Belonging to the same class winner in A.
A man in a network of this sort.
So let me do this one step at a time, alright?
Um, so let's start from.
This quantity here. OK so if I.
Oh dear bye.
If I draw.
If I write this.
Then, as I mentioned on the slides.
A is Jason C matrix.
Of.
The given network, so this is.
A quantity that is equal to 1.
If vertex IN vertex J are connected by link and 0 otherwise, OK.
Yeah.
So if I if I do this now, this is a little bit of a mathematical notation, but this is.
Meant to represent the sum.
Of all possible A elements.
Angel adjacency matrix really yeah.
That's what what I mean by this notation here.
So what's this doing there?
Well, damn.
If this is 1 when there is a link between I and Jade and this some here.
Is our go very slow? Here is just counting the number of links in my network, yeah?
What do we do with this? Well, not really. Very much. We know the in fact it's counting twice the number of links in an undirected network.
Um well.
This is not very useful, so we won't remember. We want a measure of the tendency of the network.
To connect.
Pairs of vertices of the same type, so we want to count really how many links in there.
Or rather, how many links are there?
Between 2 yellow notes between 2 green notes and so on.
So this is still no good.
But let me add a trick to this. Yeah, so write it up here.
So let me let me say that if I right.
Delta.
Of.
X.
Why?
I'm referring to some kind of function.
That takes X&Y, and if X&Y are equal.
And he gives me one.
If X.
Equal
why?
And vice versa.
I'm writing it all in awfully, but never mind vice versa. Delta X&Y gives me 0.
If X.
Is.
Not equal to why OK?
I've been really slow now, but right so give me. I'll give you some examples, effects is effects and wiser strings.
And exes.
Great and wise milk, then Delta of bread and Milk EO OK.
If X&Y's are numbers, then.
Delta of 0.5.
And three.
Is again 0.
Yeah.
On the other hand.
Delta of.
2.
In two
is 1.
OK.
Cool, and now let's use this, how well let me multiply.
This term here.
This AIJ here.
By Delta.
Of.
See.
I.
See.
Gee.
What CI CI is?
De class.
In the given classification here.
Two which.
Birth ICSID belongs.
Yeah, so in this example here.
See I simply the color of vertex I.
Yeah.
And so this.
Function here will give me a one if CI and CJ are the same color in this example or belongs to the same class in general.
OK.
So now we have something a lot more interesting actually becausw.
If we look at this sum and yeah so.
This expression here now is the sum of all possible elements.
In the adjacency matrix of the given network, but not really all of them.
Just the ones.
Um?
Ooze.
Indices whose vertices.
Belong to the same class.
OK.
So this expression here.
Is counting edges? Yes, but it's counting, but it's only counting the edges between two elements.
Of the same class.
Yeah, which is exactly what we want.
So really, you know it will get scary on the next slide, but really, the gist of what I'm doing is exactly this. Yeah so.
I will present you a measure.
Homophily measure.
That is based on this simple.
Counting expressions so.
The core of it will be.
A term that counts all the edges between pairs of green vertices. Here all the edges between pairs of yellow vertices or.
In general.
Between pairs of vertices in the same class.
Right and so here is the scary bit.
But it shouldn't be too bad now. Yeah, so the let me say that the modularity of the network.
Is defined as this quantity Q here.
OK, which forget this multiplying factor here.
Is a sum.
Or containing of two terms.
It's the sum of the difference of two terms. The first of these terms is exactly.
This.
AIG Delta yeah.
And So what this expression is really?
Doing
easy comparing the number. Remember what this is? Yeah, the number of links.
Between pairs of vertices in the same class.
In the given network.
Against this thing which is in a way.
The expected number of links.
Between pairs of vertices in the same class.
But not in in the given network, rather in a in a in a.
In a typical network, in a in a network with the same number of vertices as.
Our starting G.
And we the same degrees.
At those vertices.
But weed links spread out at random.
In, you know, in some kind of.
Arbitrary random fashion.
Yeah.
And so.
That's what this queue is doing. It's comparing really this first time.
Which is a measure in the given network.
We did a similar measure but on in in a in a random in a typical network of the type.
Uh, and.
That's the main point of this definition. The rest is just scaling. So this Q marks here.
Is the same Q that we have here.
But in the case when all.
The edge is in the original network.
Actually connect vertices in the same class.
So if you if you if you.
Think if it is actually true that AIG.
Is equal to 1 exactly and only when this Delta is equal to 1.
Then
this here will actually equal.
Twice the number of edges in there.
We shouldn't simplify with these two. I'm at the denominator here and give the one that you see.
Um?
So.
Therefore, the ratio between Q&Q marks.
M is just the value that measures in a way that is more independent of the specific network properties.
The networks of more fully this is called the assortativity coefficient.
OK, so few comments.
We've already met Delta, so I don't need to say anything about this part of this slide.
Um?
The second part is also.
Relatively simple by now. We mentioned that Q Max is in fact the maximum possible value that you can take, so it shouldn't be surprising that this ratio here can never be larger than one.
Go back to the definition becausw Q is defined as a difference of positive things really.
Q can be negative. In fact can be as small as.
Minus Q Max.
In very special cases, so this term here can be as small as minus one.
Generally, positive values indicate homophily.
A positive noticeable tendency.
But he had using the Internet work to connect elements of the same types vertices of the same times.
Did the third important issue with this definition is a computational one.
Am the this this scary formula that we seen before?
Anne.
Provide useful insights.
To physicists, statisticians or even key sociologists.
But it's not very useful for computing the assortativity coefficient, and particularly for doing that by hand. As you know, we may need to do.
Fortunately.
We can do better.
And it turns out that the ratio Q2, Q marks actually simplify.
To the expression that you see on this slide here.
Which may look horrible, but it's actually very simple to work with.
Again, the problem is this thing does look horrible just because you see it right there. Bang on the slide in that way.
But if we, if we look at it a little bit more carefully and we think of the way which we will be using this.
This expression, then, you know, all of a suddenly becomes quite manageable.
Um?
I'll put it here and I'll draw some. You know I'll sketch some some notes.
On the part here to explain things so this looks ugly, but this is a.
A formula.
That is formed by 4 parts. This first one.
That one.
That one and that one.
Now these two are the same. Yeah, so this is a formula that is something like.
Not like me.
And.
Yes.
Something like.
I'm not like that. I want the black one.
X.
Minus Y.
Divided by sum zed.
Mine was.
Why again?
Yeah, so first of all is not.
Four terms in there, but really three of them OK.
These two terms actions Ed are actually really simple zed.
Is just essentially the square of the number of edges of the number of links in your network, yeah?
Which you can typically in exercises and sometimes also in some real life application. You can just count.
So it won't be difficult to estimate these values here so that.
Anne.
**** is.
An X.
I put a tick there as well. X is not that much more difficult cause it's the product of essentially the number of edges to him. Twice the number of edges.
Times twice the number of edges inside the classes.
Which is again something that one can typically evaluate by inspection.
Yep.
And what about why they did the final term in this expression?
Again, nothing really very complicated.
It boils down to a sum.
Over each of the classes.
Of this term here.
Which only depends on the vertex degrees.
So a game.
This is something that usually can be.
Calculated why seem quite simply by just inspecting.
And the given network.
Yeah.
So.
That's all, really.
We have discussed homophily and homophily networks.
And.
They have on this slide and a simple way.
To measure it.
The assortativity coefficient gives a way to measure the quality of a given classification of the network nodes.
Into groups.
Um?
And this is important as this can be used as a basis.
To find interesting ways to find hidden communities in networks?
So I'll stop there.
I'm sure we'll talk again about.
This conserved.
At some other times, thank you for listening.