Tag Archives: Clustering


Introducing Gephi video

The dataset analyzed contains the social networks that were extracted from the activities of the participants on blogs and Twitter in the MOOC: Connectivism and Connective Knowledge 2011 (CCK11).

The Twitter graph includes all authors as nodes of the network, and the edges between them were created if an author were tagged within the tweet. For example, if a course participant @A mentioned @B and @C in a tweet, then the course Twitter network would contain authors as @A, @B, and @C with the following edges: @A – @B, and @A – @C.

The blog graph includes authors of the blog posts (i.e. blog owners) and the authors of the comments to individual blog posts. If a learner A created a blog post, and then learners B and C added comments to that post, then the corresponding network would contain nodes A, B, and C with the following edges: A-B, and A-C. Edges can be undirected, or directed if the direction of the relation matters. In this case the direction is not relevant for the analysis.

Note: This dataset can only be used for course participants and it is restrict to the purposes of completing the activities and assignments in DALMOOC so I can’t share it

Social network graphs are essentially based on the construct of nodes and edges. Note from my “import report” (Fig. 1) that there are 194 nodes (students) in blog network, connected by 205 edges (relations). To keep it simple in this post, I’ll focus on blog network.

Import report for blog

Fig 1. Import report for blog


The software produces an overview of the graph – nodes positions are random and completely unreadable and unclear (Fig 2). Note that the basic layout of the visualization is not good yet for our interpretation.

Fig The basic network layout

Fig 2. The basic network layout










This is an important operation! A common first step is to apply a layout algorithm to re-position the nodes in order to improve its readability. It’s possible to experiment multiple layouts as you search for the best appropriate to display your graph. There are also many layout plugins options for Gephi (check it here).

So how to choose a layout? Here’s a link with a very helpful tutorial about layouts in Gephi. It will guide you to the basic and advanced layout settings in Gephi.

At this point I played extensively with the combination of layouts, filters and performed the main network analyses (density, centrality, and modularity). Some layouts have caught my attention and I’d like to mention two of them: Yifan Hu and Fruchterman Reingold. So here’s what I get when I run these algorithms over blog network:

Fig 3. Yifan Hu for blog

Fig 3. Applying the Yifan Hu algorithm

Fig 4. Applying the Fruchterman Reingold algorithm

Fig 4. Applying the Fruchterman Reingold algorithm










The Fruchterman Reingold (FR) algorithm avoids dispersion of disconnected nodes and it also produces less differentiation between sub-clusters. FR displays the nodes in a circle where they can be easily distinguished. It took more time to run and  I needed to stop manually. It takes a time complexity of O(N2).

The Yifan Hu (YH) algorithm runs much more rapidly than the other methods available and it stops automatically. It reduces the quadratic complexity to O(N*log(N)). YH seems to produce more differentiation between sub-clusters and the main cluster looks tighter. Note from the Figure 3 that it places disconnected nodes farther apart. In my case, I chose Yifan Hu algorithm (Fig. 3).

Being in the thick of things: Sizing the network nodes based on degree centrality.

Degree centrality represents the top most influential nodes. Individuals with high degree centrality have many social edges and are more active, recognized, important, and visible in their social networks (Brass, 1984; Gasevic et al., 2013 ).

The graph in Figure 5 plots the size of the nodes based in degree centrality. You can adjust it in the “Ranking” panel, select “Nodes” and the “red diamond”, then in the rolling menu select ”Degree” and enter the minimal and maximal value (I suggested 15-70). I also added the degree centrality score as a label – click on the “Configure…” link to set the data you want to get displayed in your graph.

Fig 5.

Fig 5. Applying nodes size = degree and nodes label = degree









Being in the bridge of things: Sizing the network nodes based on betweenness centrality.

Betweenness centrality indicates the ease of connection with anyone else in the network. The graph in Figure 6 plots the size of the nodes based in betweenness centrality. To set this measure just go back to the rolling menu and select ” Betweenness Centrality”.

Fig 6.

Fig 6. Applying nodes size = betweenness centrality and nodes label = degree











From Figures 5 and 6 we can verify significant similarities between the degree and betweenness centrality. Recently I found in the blog esinternet a good idea to compare and highlight both measures: I considered the top 10 nodes in terms of central nodes (measured by degree Fig. 7) and main brokers (measured by betweenness Fig. 8).

For central nodes: You can adjust it in the “Filters” panel, select “Topology” and the “Degree Range”, change it to the range of 8-55. For main brokers: You can adjust it in the “Filters” panel, select “Range” and the “Betweenness Centrality”, change it to the range of 500-4326. I also change the color: in the “Ranking” panel, choose the “color” to modify these sad shades of gray.

Fig 7. Top 10 central nodes measured by degree centrality

Fig 7. Top 10 central nodes measured by degree centrality










Fig 8. Top 10 main brokers measured by betweenness centrality

Fig 8. Top 10 main brokers measured by betweenness










It’s possible to clearly observe – from Figures 7 and 8 – that there are brokers and central nodes in common. Ok, now we start to see some patterns in the network.

Network brokers are those that build bridges between clusters (or sub-communities) in a network (Burt, 2013). Brokers typically represent a node connecting two or more different communities that may emerge in social network. Professor Dragan, in DALMOOC, states:

Brokers are able to see the different types of information that are available across different communities, and as such, they are exposed to many different ideas therefore being able to integrate these different ideas, generate potentially novel solutions.

Let’s dive a little bit more and look for more patterns…

Network Modularity and Community Identification

Once you applied “Modularity” statistic (it’s a cluster detection algorithm), you clearly see different clusters or communities that emerge from the network. Each cluster represents one color (Fig. 9). The algorithm detected 66 clusters; by the way, it represents a large number. Note various disconnected nodes from the Figure 9. It doesn’t make sense to have one student as a separate community.

Fig 7.

Fig 9. Applying modularity statistic










In terms of community formation the giant component filter was performed. The giant component represents the largest component of all the connected nodes in a network. That is to say that it removes all the disconnected nodes. Fortunato (2010) describes communities as:

Communities, also called clusters or modules, are groups of vertices which probably share common properties and/or play similar roles within the graph.

I watched a very useful tutorial by Jen Golbeck – suggested by Professor Dragan – on how to use gephi’s modularity feature to detect communities and color code them in graphs. Based in this tutorial, my goal now is to decrease the number of clusters in my graph.

In the “Statistics” panel, click on “Modularity” to display the modularity settings window, then choose resolution = 2. My clusters decreased from 66 to 4! I applied the same steps to Twitter network and as a result of that, the map produced by SNA were displayed:

Fig 8.

Fig 10. Blog network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.













Fig 9.

Fig 11. Twitter network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.













The density measure of the networks

In the report shown in Figure 12: Density = 0.01 for blog and Density = 0.003 for Twitter. It means that blogs network have more students connected to one another. De Laat et al. (De Laat et al., 2007) describes density as:

Density provides a measure of the overall ‘connections’ between the participants.


The following table represents my report summarized for blog and twitter network:

Fig x.

Fig 12. Blog vs Twitter report









Obviously, this is just a quick overview of the Gephi functionality and the centrality measures commonly seen in social network studies. It’s difficult to draw a meaningful conclusion without knowing more about the data. There is a limited understanding of the nodes – whom those nodes represents professors and whom represents the course participants?

I also found some limitations in my studies:

  •  the nodes labeled as “Id” for each student is coded within “Data Laboratory” of Gephi. Ok, I could set “Label” on nodes and compare centralities between week 6 and week 12. However, the column “Label” from both networks represents long strings of alphanumeric. So it’s not easy to visualize. I had some ideas but not efficiently. If you figure out a way to solve this issue in a useful manner, feel free to leave a comment. Feedback is always more than welcome!
  • Gephi does not support slide-by-slide display.


Brass, D. J. (1984). Being in the right place: A structural analysis of individual influence in an organization. Administrative Science Quarterly, 29(4), 518–539. doi:10.2307/2392937

Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: foundations and frontiers on advantage. Annual review of psychology, 64, 527-547. doi: 10.1146/annurev-psych-113011-143828 (full text)

Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), 224–238 (full text).

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

De Laat, M., Lally, V., Lipponen, L., & Simons, R. J. (2007). Investigating patterns of interaction in networked learning and computer-supported collaborative learning: A role for Social Network Analysis. International Journal of Computer-Supported Collaborative Learning, 2(1), 87-103. doi: 10.1007/s11412-007-9006-4 (full text).

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Gašević, D., Zouaq, A., Jenzen, R. (2013). Choose your Classmates, your GPA is at Stake!’ The Association of Cross-Class Social Ties and Academic Performance. American Behavioral Scientist, 57(10), 1459-1478. doi: 10.1177/0002764213479362 (full text).

Additional resources

Esinternet (2014, November). Assignment – Perform social network analysis and visualize analysis results in Gephi – Part of Basics of S.N.A (week 3)

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved October 18, 2014, from

Hirst, T. (2010, April 23). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters I, Retrieved October 18, 2014, from

Hirst, T. (2010, May 10). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats, Retrieved October 18, 2014, from

EduGeek Journal

Proud Sponsor of Your Future