Archive

Social Network Analysis

In a previous post, I explored different layouts and performed in details the SNA analysis methods on the blog and Twitter networks for week12 from the Connectivism and Connective Knowledge 2011 course (CCK11) into Gephi.

It’s now time to explore different layouts for the representation of a small network (e.g., Fruchterman Reingold and Yinfan Hu) and experiment with their configuration parameters.

A few weeks ago I posted some measurements commonly seen in social network studies and now let’s play with it!

Layout Algorithms

Figures 1 and 2 show a graph of what might be a small social network.

Fig 1. Applying the Fruchterman Reingold algorithm.

Fig 1. Applying the Fruchterman Reingold algorithm.

 

 

 

 

 

 

 

 

 

 

 

Fig 2. Applying the Yifan Hu algorithm

Fig 2. Applying the Yifan Hu algorithm

 

 

 

 

 

 

 

 

 

 

 

Calculating the network properties Social

SNA draws on concepts from graph theory and structural theory to evaluate network properties such as density, diameter and centralities calculations (Dawson, Tan, and McWilliam, 2011).

  • Diameter : the length of the longest path through the network between any pair of two nodes in the social network.

Diameter = 5.

  • Density: the number of existing connections and the possible connections in the graph.

Density = 0.108.

  • Degree Centrality: the total number of social ties a node has.

From the figures 3, 4 and 5 we can see that Emma, Jill and Shane are the students (nodes) that have the highest number of connections in the network. They have six individual connections. Based on this, they are quite central in most of the potential conversations.

Fig 3. Degree distribution

Fig 3. Degree distribution

 

 

 

 

 

 

 

 

Fig 4. Degree Centrality. Applying nodes size = degree and nodes label = label

Fig 4. Degree Centrality. Applying nodes size = degree and nodes label = label

 

 

 

 

 

 

 

 

 

 

 

 

Fig 5. Degree Centrality. Applying nodes size = degree and nodes label = degree

Fig 5. Degree Centrality. Applying nodes size = degree and nodes label = degree

 

 

 

 

 

 

 

 

 

 

 

 

  • In-degree Centrality: the number of edges coming in. In other words, it indicates popularity or prestige that an individual has in the community. It’s possible to note from Figure 6 the number of other students that are, for example, seeking Jill’s help.
Fig 6. In-degree Centrality. Applying nodes size = In-degree, nodes label = label and nodes color = in-degree

Fig 6. In-degree Centrality. Applying nodes size = In-degree, nodes label = label and nodes color = in-degree

 

 

 

 

 

 

 

 

 

 

 

 

  • Out-degree Centrality: the number of edges leading out. In other words, it indicates gregariousness about an individual. Clearly,  Emma and Bob  influence the greatest number of other students. They have direct influence over 4 students.
Fig 7. Out-degree Centrality. Applying nodes size = Out-degree, nodes label = label and nodes color = out-degree

Fig 7. Out-degree Centrality. Applying nodes size = Out-degree, nodes label = label and nodes color = out-degree

 

 

 

 

 

 

 

 

 

 

 

 

  • Betweenness Centrality: the ease of connection with any other node in the network. If Allen or Liz are removed from network, the entire connection would be completely collapsed with the rest of the community, and so you will notice separated subgroups. It plays an important role which is called “Network Broker”.
Fig 8. Betweenness Centrality. Applying nodes size = betweenness, nodes label = label and nodes color = betweenness

Fig 8. Betweenness Centrality. Applying nodes size = betweenness, nodes label = label and nodes color = betweenness

 

 

 

 

 

 

 

 

 

 

 

 

Network Modularity and Community Identification

It’s a cluster detection algorithm. Students of the same cluster are colored with the same color.

Fig 9. Modularity statistic. Applying nodes color = modularity class

Fig 9. Modularity statistic. Applying nodes color = modularity class

 

 

 

 

 

 

 

 

 

 

 

 

References

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

Additional resources

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/

Hirst, T. (2010, April 23). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/

Hirst, T. (2010, May 10). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/05/10/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-iii-ego-filters-and-simple-network-stats/

figure_gephi

Introducing Gephi video

The dataset analyzed contains the social networks that were extracted from the activities of the participants on blogs and Twitter in the MOOC: Connectivism and Connective Knowledge 2011 (CCK11).

The Twitter graph includes all authors as nodes of the network, and the edges between them were created if an author were tagged within the tweet. For example, if a course participant @A mentioned @B and @C in a tweet, then the course Twitter network would contain authors as @A, @B, and @C with the following edges: @A – @B, and @A – @C.

The blog graph includes authors of the blog posts (i.e. blog owners) and the authors of the comments to individual blog posts. If a learner A created a blog post, and then learners B and C added comments to that post, then the corresponding network would contain nodes A, B, and C with the following edges: A-B, and A-C. Edges can be undirected, or directed if the direction of the relation matters. In this case the direction is not relevant for the analysis.

Note: This dataset can only be used for course participants and it is restrict to the purposes of completing the activities and assignments in DALMOOC so I can’t share it

Social network graphs are essentially based on the construct of nodes and edges. Note from my “import report” (Fig. 1) that there are 194 nodes (students) in blog network, connected by 205 edges (relations). To keep it simple in this post, I’ll focus on blog network.

Import report for blog

Fig 1. Import report for blog

Visualization

The software produces an overview of the graph – nodes positions are random and completely unreadable and unclear (Fig 2). Note that the basic layout of the visualization is not good yet for our interpretation.

Fig The basic network layout

Fig 2. The basic network layout

 

 

 

 

 

 

 

 

 

This is an important operation! A common first step is to apply a layout algorithm to re-position the nodes in order to improve its readability. It’s possible to experiment multiple layouts as you search for the best appropriate to display your graph. There are also many layout plugins options for Gephi (check it here).

So how to choose a layout? Here’s a link with a very helpful tutorial about layouts in Gephi. It will guide you to the basic and advanced layout settings in Gephi.

At this point I played extensively with the combination of layouts, filters and performed the main network analyses (density, centrality, and modularity). Some layouts have caught my attention and I’d like to mention two of them: Yifan Hu and Fruchterman Reingold. So here’s what I get when I run these algorithms over blog network:

Fig 3. Yifan Hu for blog

Fig 3. Applying the Yifan Hu algorithm

Fig 4. Applying the Fruchterman Reingold algorithm

Fig 4. Applying the Fruchterman Reingold algorithm

 

 

 

 

 

 

 

 

 

The Fruchterman Reingold (FR) algorithm avoids dispersion of disconnected nodes and it also produces less differentiation between sub-clusters. FR displays the nodes in a circle where they can be easily distinguished. It took more time to run and  I needed to stop manually. It takes a time complexity of O(N2).

The Yifan Hu (YH) algorithm runs much more rapidly than the other methods available and it stops automatically. It reduces the quadratic complexity to O(N*log(N)). YH seems to produce more differentiation between sub-clusters and the main cluster looks tighter. Note from the Figure 3 that it places disconnected nodes farther apart. In my case, I chose Yifan Hu algorithm (Fig. 3).

Being in the thick of things: Sizing the network nodes based on degree centrality.

Degree centrality represents the top most influential nodes. Individuals with high degree centrality have many social edges and are more active, recognized, important, and visible in their social networks (Brass, 1984; Gasevic et al., 2013 ).

The graph in Figure 5 plots the size of the nodes based in degree centrality. You can adjust it in the “Ranking” panel, select “Nodes” and the “red diamond”, then in the rolling menu select ”Degree” and enter the minimal and maximal value (I suggested 15-70). I also added the degree centrality score as a label – click on the “Configure…” link to set the data you want to get displayed in your graph.

Fig 5.

Fig 5. Applying nodes size = degree and nodes label = degree

 

 

 

 

 

 

 

 


Being in the bridge of things: Sizing the network nodes based on betweenness centrality.

Betweenness centrality indicates the ease of connection with anyone else in the network. The graph in Figure 6 plots the size of the nodes based in betweenness centrality. To set this measure just go back to the rolling menu and select ” Betweenness Centrality”.

Fig 6.

Fig 6. Applying nodes size = betweenness centrality and nodes label = degree

 

 

 

 

 

 

 

 

 

 

From Figures 5 and 6 we can verify significant similarities between the degree and betweenness centrality. Recently I found in the blog esinternet a good idea to compare and highlight both measures: I considered the top 10 nodes in terms of central nodes (measured by degree Fig. 7) and main brokers (measured by betweenness Fig. 8).

For central nodes: You can adjust it in the “Filters” panel, select “Topology” and the “Degree Range”, change it to the range of 8-55. For main brokers: You can adjust it in the “Filters” panel, select “Range” and the “Betweenness Centrality”, change it to the range of 500-4326. I also change the color: in the “Ranking” panel, choose the “color” to modify these sad shades of gray.

Fig 7. Top 10 central nodes measured by degree centrality

Fig 7. Top 10 central nodes measured by degree centrality

 

 

 

 

 

 

 

 

 

Fig 8. Top 10 main brokers measured by betweenness centrality

Fig 8. Top 10 main brokers measured by betweenness

 

 

 

 

 

 

 

 

 

It’s possible to clearly observe – from Figures 7 and 8 – that there are brokers and central nodes in common. Ok, now we start to see some patterns in the network.

Network brokers are those that build bridges between clusters (or sub-communities) in a network (Burt, 2013). Brokers typically represent a node connecting two or more different communities that may emerge in social network. Professor Dragan, in DALMOOC, states:

Brokers are able to see the different types of information that are available across different communities, and as such, they are exposed to many different ideas therefore being able to integrate these different ideas, generate potentially novel solutions.

Let’s dive a little bit more and look for more patterns…

Network Modularity and Community Identification

Once you applied “Modularity” statistic (it’s a cluster detection algorithm), you clearly see different clusters or communities that emerge from the network. Each cluster represents one color (Fig. 9). The algorithm detected 66 clusters; by the way, it represents a large number. Note various disconnected nodes from the Figure 9. It doesn’t make sense to have one student as a separate community.

Fig 7.

Fig 9. Applying modularity statistic

 

 

 

 

 

 

 

 

 

In terms of community formation the giant component filter was performed. The giant component represents the largest component of all the connected nodes in a network. That is to say that it removes all the disconnected nodes. Fortunato (2010) describes communities as:

Communities, also called clusters or modules, are groups of vertices which probably share common properties and/or play similar roles within the graph.

I watched a very useful tutorial by Jen Golbeck – suggested by Professor Dragan – on how to use gephi’s modularity feature to detect communities and color code them in graphs. Based in this tutorial, my goal now is to decrease the number of clusters in my graph.

In the “Statistics” panel, click on “Modularity” to display the modularity settings window, then choose resolution = 2. My clusters decreased from 66 to 4! I applied the same steps to Twitter network and as a result of that, the map produced by SNA were displayed:

Fig 8.

Fig 10. Blog network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.

 

 

 

 

 

 

 

 

 

 

 

 

Fig 9.

Fig 11. Twitter network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.

 

 

 

 

 

 

 

 

 

 

 

 

The density measure of the networks

In the report shown in Figure 12: Density = 0.01 for blog and Density = 0.003 for Twitter. It means that blogs network have more students connected to one another. De Laat et al. (De Laat et al., 2007) describes density as:

Density provides a measure of the overall ‘connections’ between the participants.

 

The following table represents my report summarized for blog and twitter network:

Fig x.

Fig 12. Blog vs Twitter report

 

 

 

 

 

 

 

Conclusion

Obviously, this is just a quick overview of the Gephi functionality and the centrality measures commonly seen in social network studies. It’s difficult to draw a meaningful conclusion without knowing more about the data. There is a limited understanding of the nodes – whom those nodes represents professors and whom represents the course participants?

I also found some limitations in my studies:

  •  the nodes labeled as “Id” for each student is coded within “Data Laboratory” of Gephi. Ok, I could set “Label” on nodes and compare centralities between week 6 and week 12. However, the column “Label” from both networks represents long strings of alphanumeric. So it’s not easy to visualize. I had some ideas but not efficiently. If you figure out a way to solve this issue in a useful manner, feel free to leave a comment. Feedback is always more than welcome!
  • Gephi does not support slide-by-slide display.

References

Brass, D. J. (1984). Being in the right place: A structural analysis of individual influence in an organization. Administrative Science Quarterly, 29(4), 518–539. doi:10.2307/2392937

Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: foundations and frontiers on advantage. Annual review of psychology, 64, 527-547. doi: 10.1146/annurev-psych-113011-143828 (full text)

Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), 224–238 (full text).

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

De Laat, M., Lally, V., Lipponen, L., & Simons, R. J. (2007). Investigating patterns of interaction in networked learning and computer-supported collaborative learning: A role for Social Network Analysis. International Journal of Computer-Supported Collaborative Learning, 2(1), 87-103. doi: 10.1007/s11412-007-9006-4 (full text).

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Gašević, D., Zouaq, A., Jenzen, R. (2013). Choose your Classmates, your GPA is at Stake!’ The Association of Cross-Class Social Ties and Academic Performance. American Behavioral Scientist, 57(10), 1459-1478. doi: 10.1177/0002764213479362 (full text).

Additional resources

Esinternet (2014, November). Assignment – Perform social network analysis and visualize analysis results in Gephi – Part of Basics of S.N.A (week 3) http://esinternet.blogspot.com.ar/2014/11/assignation-perform-social-network.html

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/

Hirst, T. (2010, April 23). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/

Hirst, T. (2010, May 10). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/05/10/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-iii-ego-filters-and-simple-network-stats/

Social network analysis (SNA) represents how people connect to each other and shows the flow of information in a network. SNA can help identify influencers, key players, disconnected individuals and groups within a community.  SNA consists in models that define relationships and interactions between individuals, as well as the patterns that emerge from those relationships and interactions (Baker and Siemens, in press).

Fortunato (2010) describes social networks as:

“paradigmatic examples of graphs with communities. The word community itself refers to a social context. People naturally tend to form groups, within their work environment, family, friends.

According to Dragan Gasevic in Data, Analytics and Learning MOOC (DALMOOC) and some authors (Grunspan et al., 2014; Fortunato, 2010), we can identify the importance of SNA in different fields. These include, broadly, sociology, business, engineering, economics, politics, computer science, mathematics, and even more recently, the importance of SNA is recognized in fields such as, physics and biology. For example, in biology science we can verify some interesting works in social contagion (Christakis and Fowler, 2013) and biological networks (Jonsson et al., 2006).

Protein–protein interaction (PPI) networks are commonly studied in biology and bioinformatics fields. Fig. 1 illustrates a community structure in PPI networks (Jonsson et al., 2006). The graph shows the interactions between proteins in cancerous cells of a rat. Those communities represent proteins into clusters that have similar function within the cell. Note that the communities are represented by different colours.

Fig 1. An example of protein–protein interaction network.

Fig 1. An example of protein–protein interaction network (Jonsson et al., 2006).

In a learning context, the use of SNA has demonstrated several benefits. Dragan Gasevic, in DALMOOC, states:

different authors identify the importance of social networks, social activities and social interaction as critical predictors of academic performance.

I’d like to mention Dawson’s work that has caught my attention recently. Dawson and colleagues (Dawson, Tan, and McWilliam, 2011) demonstrated the value of using social network analysis to measure and visualize student creative capacity. The study involved 76 students of the first year of graduate school of medicine at the University of Wollongong in Australia. They extracted discussion forum data from the institutional LMS (Blackboard). Fig. 2 illustrates the social network between student and staff participation in the discussion forum. Note that student names have been removed. The graph demonstrates the position and interactions of each student within the network. The full article is here.

Fig. 2 Sociogram of the medical student network interactions (Dawson, Tan, & McWilliam, 2011)

Fig. 2 Sociogram of the medical student network interactions (Dawson, Tan, and McWilliam, 2011)

Network elements of any Social Network

Elements of networks consist of actors and relations. Actors can be students, organization, blogs and so on. Those actors are modeled as nodes in a graph. The connections between actors make up relations (modeled as ties, edges). Those relations can represent friendship, advice, hindrance or communication.

Network Measures

Professor Dragan Gasevic, in DALMOOC, introduced some measurements commonly seen in social network studies. I want to briefly describe how all this works:

1. The attributes of the entire network:

  • Diameter : the length of the longest path through the network between any pair of two nodes in the social network.
  • Density: the number of existing connections and the possible connections in the graph.

2. The metrics revolve around the concept of centrality:

  • Degree Centrality: the total number of social ties a node has.
  • In-degree Centrality: the number of edges coming in. In other words, it indicates popularity or prestige that an individual has in the community.
  • Out-degree Centrality: the number of edges leading out. In other words, it indicates gregariousness about an individual.
  • Betweenness Centrality: the ease of connection with any other node in the network. It is important to mention that if this node is removed, the entire connection would be completely collapsed with the rest of the community, and so you will notice separated subgroups. It plays an important role which is called “Network Broker”.
  • Closeness Centrality: the shortest distance of a node to all others nodes. It is important to keep in mind that it doesn’t work for disconnected networks (actors have zero social ties or groups have no connection to other groups).

Social network analysis tool

All the above social networking variables can be computed by using the Gephi software (Fig. 3). Gephi is a free open-source platform that supports visualization and exploration for all kinds of networks, dynamic and hierarchical graphs.

I watched a very interesting hang-out with Shane Dawson. Shane Dawson is a researcher on social network analysis and co-developer of The Social Networks Adapting Pedagogical Practice (SNAPP). The SNAPP is an open source tool that performs real-time social network analysis and visualization of discussion forum activity within popular commercial and open source Learning Management Systems (e.g. BlackBoard and Moodle).

Fig 3. Gephi Software

Fig 3. Gephi Software

 

Reflect on the methods that could be used for data collection, steps to be taken for the analysis, potential conclusions, and possible issues

Network analysis for the study of learning can be applied at larger scales, such as in schools and higher education. Data can be from social media such as blogs, twitter, e-mails, discussion forums and many others. SNA is increasing rapidly: The Horizon Report: 2014 Edition (Johnson et al., 2014) points out the social media adoption in educational sector as “fast trends” over the next one to two years. In my view the SNA represents a powerful idea: we can measure how knowledge flows across networks of people and organizations.

Steps to be taken for the analysis:

Step 1: Data collection: the actors involved in the network. Data comes from social media (e.g. Blogs and Twitter), survey questionnaires, learning management systems (LMS) (e.g.Desire2Learn, BlackBoard, Moodle, etc.) or many other resources.

Step 2: Software involved supporting the search for patterns such as Gephi software and The Social Networks Adapting Pedagogical Practice (SNAPP).

Step 3: Analysis of the network: clusters, statistics, the metrics revolve around the concept of centrality, and so on.

It’s important to note that it requires extreme attention to privacy and ethical issues. So my concern: how to deal with ethical issues once you have total access to personal information? Who gets to see the SNA data and in what format?

According to U.S. Departament of Education 2012:

Education institutions must consider privacy, policy and legal issues when collecting, storing, analyzing, and disclosing personally identifiable information from students’ education records to third parties for data mining and analytics.

All ideas and contributions are welcome!

References

Baker, R., Siemens, G. (in press) Educational data mining and learning analytics. To appear in Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition. [preprint draft pdf]

Christakis, N.A., Fowler, J.H. (2013). Social contagion theory: examining dynamic social networks and human behavior. Stat Med 32, 556– 577.

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Grunspan, D. Z., Wiggins , B. L., Goodreau, S.M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. Life Sciences Education, 13(2), 167-178.

Johnson, L., Adams Becker, S., Estrada, V., Freeman, A. (2014). NMC Horizon Report: 2014 Higher Education Edition. Austin, Texas: The New Media Consortium. (ful text)

Jonsson , P.F., Cavanna , T., Zicha , D., Bates , P.A. (2006). Cluster analysis of networks generated through homology: Automatic identification of important protein communities involved in cancer metastasis, BMC Bioinformatics.

U.S. Department of Education, Office of Educational Technology, Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief, Washington, D.C., 2012. (full text)

EduGeek Journal

Proud Sponsor of Your Future