Every action we make is recorded – Witten and Frank (2011) state:

“We are overwhelmed with data. The amount of data in the world and in our lives seems ever-increasing—and there’s no end in sight. Omnipresent computers make it too easy to save things that previously we would have trashed.

Educational Data Mining (EDM) has been defined by IEDMS (2009) as:

an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” (see educational data mining).

EDM develops and applies methods from statistics, machine learning and data mining to analyze data collected from online learning system. The book by Witten and Frank (2011) presents a good introduction to data mining landscape. Are you interested in digging deeper into the algorithms involved in machine learn? Witten and Frank (2011) explain a wide variety of them.

What is the difference between EDM and Learning analytics?

Siemens and Baker (2012) noted (here and here) that while the EMD and LA communities share similar goals and interests, they have distinct technological, ideological, and methodological orientations. One of the differences described by Siemens and Baker (2012) is that the EMD has greater focus on automated adaption. It means that the educational software identifies a need and automatically adapts to personalize the learner’s experience. By contrast, learning analytics are more often designed to inform and empower instructors and learners, such as informing instructors about ways that specific students are struggling, and then pedagogical strategies can be applied.

So How Can Data Mining and Analytics Enhance Education?

 This beautiful and very informative infographic summarizes how analytics and EDM can improve education.

Deep dives: What Educational Data Mining Can Bring to the Table?

“EDM applies data mining techniques such as prediction modeling (including classification), discovery of latent structure (such as clustering and q-matrix discovery), relationship mining (such as association rule mining and sequential pattern mining), and discovery with models to understand learning and learner individual differences and choices better” (Berland, Baker & Blikstein, 2014). Reviews of these methods have been covered by Professor Baker (here and here).

I belive that throughout the DALMOOC course this post will be updated. In a next post I’ll cover prediction modeling

Cited Sources

Baker, R., Siemens, G. (in press) Educational data mining and learning analytics. To appear in Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition.(full text)

Berland, M., Baker, R.S., Blikstein, P. (2014) Educational data mining and learning analytics: Applications to constructionist research. Technology, Knowledge, and Learning, 19, 205-220.(full text)

Collegestats (november, 2014) http://collegestats.org/

IEDMS. (2009). International Educational Data Mining Society. Retrieved April 22, 2013, from http://www.educationaldatamining.org/

Siemens, G., and R. S. J. d. Baker. 2012. “Learning Analytics and Educational Data Mining: Towards Communication and Collaboration.” In Proceedings of LAK12: 2nd International Conference on Learning Analytics & Knowledge, New York, NY: Association for Computing Machinery, 252–254. (full text)

Witten,I.H., Frank, E.(2011) Data Mining: Practical Machine Learning Tools and Techniques.

To date, the use of social network analysis (SNA) in education has demonstrated numerous benefits reported in the research literature, such as: learning design (Lockyer et al., 2013), sense of community (Dawson, 2008), creativity potential (Dawson, Tan, & McWilliam, 2011), academic performance (Gašević et al., 2013), social presence (Kovanović et al., 2014) and understanding of MOOCs (Kovanović et al., 2014).

I want to briefly describe about these papers previously introduced by Professor Dragan in DALMOOC.

1.SNA and Learning Design

Lockyer et al. (2013) investigated how learning design influences learner’s actions (The full article is here). Lockyer and collegues describe very well learning design as:

“Learning design describes the sequence of learning tasks, resources, and supports that a teacher constructs for students over part of, or the entire, academic semester. A learning design captures the pedagogical intent of a unit of study. As such, learning designs provide a model for intentions in a particular learning context that can be used as a framework for design of analytics to support faculty in their learning and teaching decisions.

They used a case-based learning design to investigate how analytics tools of two types—checkpoint and process analytics— can help instructors to see what is happening inside of their classes and choose pedagogical strategies.

This is a very interesting article which the authors provide some SNA scenarios, such as: an instructor-centered network (quite common in many courses), a discussion dominated by one student and social network example indicating strong student peer interaction.

According to Professor Dragan one of the purposes of learning design is to provide strong student peer interaction but at the same time is to have the role of the instructor to start building the comfortable atmosphere so that they can easily interact with each other.

Lockyer and colleagues (Lockyer et al , 2013) also conclude “Establishing a conceptual framework for typical learning analytics patterns expected from particular learning designs can be considered an essential step in improving evaluation effectiveness and to build the foundation for pedagogical recommender systems in the future.”

2.SNA and Sense of Community

Is there an association between sense of community and network position?

This question was investigate by Shane Dawson in this paper.  Dawson (2008) incorporates a mixed method approach utilising both quantitative and SNA centrality measures in order to evaluate an individual’s level of sense of community. Dawson (2008) states:

“The closeness and degree centrality measures also illustrate that students engaged with a greater number of learners report a higher level of sense of community than their less socially active peers.

3.SNA and Creative Potential

Is there any association between creativity capacity of students in general education contexts, and network brokers?

In a previous post, I mentioned Dawson’s article about creative potential. Dawson and colleagues (Dawson, Tan, and McWilliam, 2011) found significant association between the degree centrality and betweenness centrality of the students, while the closeness centrality was not significantly associated. Based on this study, Dawson and colleagues highlight that social network analysis (SNA) centrality calculations have a potential to provide insights related to the development of creativity of students. Of course, Dawson’s work offers many more details. The full article is here.

4.SNA and Academic Performance

Gašević and colleagues (Gašević et al., 2013) studied the association of Cross-Class Social Ties and Academic Performance (The full article can be found here). The principles of this work were based on ideas of Vygotsky (1978) who noted that higher levels of internalization are reached through social interaction. And also based in contemporary pedagogies, such as: learners active constructors of their knowledge (Adams, 2006) and collaborative learning (Johnson & Johnson , 2009).

The authors investigated two hypothesis:

“i) students’ social capital accumulated through their course progression is positively associated with their academic performance; and ii) students with more social capital have a significantly higher academic performance.

To assess the student social capital they calculated the following measures of centrality: Degree centrality, Closeness centrality, betweenness centrality and Eccentricity. As the outcomes this studies demonstrated a strong association between closeness, centrality, and eccentricity with the academic performance while betweenness centrality was not significantly associated.

5.SNA and Social Presence

Professor Dragan defines social presence as: “the way how learners project themselves socially and emotionally as real people in an online environment.”

Kovanović et al.(2014), in this work, explored the links between the Community of Inquiry (CoI) model and the SNA of student networks. The CoI model is rooted in the social constructivist philosophy, based on ideas of Dewey (1897). This model consists of three constructors, also known as presences: Cognitive Presence, Teaching Presence and Social Presence. The authors also highlight that this model is well-researched and widely accepted within the distance learning research community. The social presence in the CoI model is defined of three different dimension of communication: Affectivity, Interactivity and Cohesiveness.

The research questions investigated by Kovanović and colleagues (2014):

“What is the relationship between the students’ social capital, as captured by social network centrality measures, and students’ social presence, as defined by the three categories in the Community of Inquiry model?

The authors extracted the three network centrality measures: Betweenness centrality, Degree centrality (in and out-degree) and Closeness centrality. The results evealed that both in and out-degree centrality measures were significantly predicted by all the three categories of students’ social presence. On the other hand, betweenness was the best explained only by affective and interactivity categories in online discussion while cohesiveness messages did not show any association. They also propose for future work replication of their findings on a bigger sample and with more diverse courses from different subject matter domains.

6.SNA and Understanding of MOOCs

Kovanović et al. (2014) investigated patterns of interaction evolving from a socio- technical network in a connectivist Massive Open Online Course (cMOOC).The authors propose:

“facilitators have a significant role in shaping discussions and the course outline, learners may also have an important impact on how information flows and knowledge is constructed in such settings. At the same time, technological affordances such as hashtags, can have an important effect on how cMOOC participants find, share, aggregate, and connect information.

In order to investigate these proposes Kovanović and colleagues (2014) defined two research questions: “What is the influence of original course facilitators, course participants (ie., learners), and technological affordances on information flows in different stages of a cMOOC?” and “What are the major factors that influence the formation of communities of learners within the social network developed around a cMOOC?”.

The constructed network was analyzed using the following measures: Closeness centrality, Betweenness centrality, Authority weight, Hub weights, Weighted degree and Modularity components.

Once these networks are constructed, the authors performed network analysis by measuring centralities in order to determine who were the most influential nodes in the network as well as they performed a node analysis on the networks. As the results of this work, the topmost influential nodes measured by in degree centrality were actually hashtags or the students who emerged as course facilitators rather than original course facilitators. The full article is here.

Integration of Social Network Analysis in Gephi and Tableau Analysis (CCK11 dataset)

I exported the results of SNA(centrality and modularity) of the network blog in week 12 available in the CCK11 dataset from Gephi – via the Data Laboratory tab of Gephi – in the format (i.e., CSV) that can be imported into Tableau. And then I created a Dashboard 1 and 2. To keep it simple, relations with Twitter network are not presented.

My analysis showed, from Dashboard 1, that the nodes 3, 10,11 and 9 demonstrated a high level of influence (measured by degree) within the week 12. By measuring betweenness centrality, it’s revealed those that performed a critical role in brokering information among sub-communities. I also observed an interesting pattern: nodes (3,10,11 and 9) who were detected as key players by degree, also are the main brokers (measured by betweenness) and the strongest contributors (measured by out-degree).

These observations demonstrate a significant association between degree, betweenness and out-degree centrality (Centralities Dashboard 1), while the in-degree and closeness centralities (Centralities Dashboard 2) were not significantly associated.

On the other hand, from Dashboard 2, my analysis also reveals that the node that has the highest level of influence and the strongest contributor (node 3) has a low in-degree centrality. The nodes 19 and 11 are the most reported by others members (measured by In-degree) followed by nodes 10, 18 and 17.

A modularity algorithm was applied resulting in 5 communities (colored by Modularity Class) for week 12. Having in mind that communities – also called clusters – are groups of vertices which probably share common properties and/or play similar roles within a community (Fortunato, 2010). The SNA (Dashboard 1 – bottom left) shows the structures of the communities. From this, it’s possible to identify a pattern – the most populated communities (red, green and purple) had one or two central nodes (sized by degree and labeled by Id). They play an important rule – they can influence the information flow in the network (Kovanović et al., 2014).

Centralities Dashboard 1

Centralities Dashboard 1

 

 

 

 

 

 

 

 

Centralities Dashboard 2

Centralities Dashboard 2

I also found some limitations in my studies: the nodes labeled as “Id” for each student is coded within “Data Laboratory” of Gephi. Ok, I could set “Label” on nodes and compare centralities between week 6 and week 12. However, the column “Label” from both networks represents long strings of alphanumeric. So it’s not easy to visualize. I had some ideas but not efficiently. If you figure out a way to solve this issue in a useful manner, feel free to leave a comment. Feedback is always more than welcome!

References

Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), 224–238 (full text).

Dawson, S., Tan, J. P. L., & McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technology27(6), 924-942 (full text).

Dewey, J. (1897).My pedagogical creed. School Journal, 54(3):77– 80.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174

Gašević, D., Zouaq, A., Jenzen, R. (2013). Choose your Classmates, your GPA is at Stake!’ The Association of Cross-Class Social Ties and Academic Performance. American Behavioral Scientist, 57(10), 1459-1478. doi: 10.1177/0002764213479362 (full text).

Kovanović, V., Joksimović, S., Gašević, D., Hatala, M., “What is the source of social capital? The association between social network position and social presence in communities of inquiry,” In Proceedings of 7th International Conference on Educational Data Mining – Workshops, London, UK, 2014 (full text).

Lockyer, L., Heathcote, E., & Dawson, S. (2013). Informing pedagogical action: Aligning learning analytics with learning design. American Behavioral Scientist, 57(10), 1439-1459, doi:10.1177/0002764213479367 (full text)

Skrypnyk, O., Joksimović, S. Kovanović, V., Gasevic, D., Dawson, S. (2014). Roles of course facilitators, learners, and technology in the flow of information of a cMOOC. British Journal of Educational Technology (submitted) (full text).

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. (M. Cole, V. John-Steiner, S. Scribner, & E. E. Souberman, Eds.) Cambridge, Massachusetts: Harvard University Press.

In a previous post, I explored different layouts and performed in details the SNA analysis methods on the blog and Twitter networks for week12 from the Connectivism and Connective Knowledge 2011 course (CCK11) into Gephi.

It’s now time to explore different layouts for the representation of a small network (e.g., Fruchterman Reingold and Yinfan Hu) and experiment with their configuration parameters.

A few weeks ago I posted some measurements commonly seen in social network studies and now let’s play with it!

Layout Algorithms

Figures 1 and 2 show a graph of what might be a small social network.

Fig 1. Applying the Fruchterman Reingold algorithm.

Fig 1. Applying the Fruchterman Reingold algorithm.

 

 

 

 

 

 

 

 

 

 

 

Fig 2. Applying the Yifan Hu algorithm

Fig 2. Applying the Yifan Hu algorithm

 

 

 

 

 

 

 

 

 

 

 

Calculating the network properties Social

SNA draws on concepts from graph theory and structural theory to evaluate network properties such as density, diameter and centralities calculations (Dawson, Tan, and McWilliam, 2011).

  • Diameter : the length of the longest path through the network between any pair of two nodes in the social network.

Diameter = 5.

  • Density: the number of existing connections and the possible connections in the graph.

Density = 0.108.

  • Degree Centrality: the total number of social ties a node has.

From the figures 3, 4 and 5 we can see that Emma, Jill and Shane are the students (nodes) that have the highest number of connections in the network. They have six individual connections. Based on this, they are quite central in most of the potential conversations.

Fig 3. Degree distribution

Fig 3. Degree distribution

 

 

 

 

 

 

 

 

Fig 4. Degree Centrality. Applying nodes size = degree and nodes label = label

Fig 4. Degree Centrality. Applying nodes size = degree and nodes label = label

 

 

 

 

 

 

 

 

 

 

 

 

Fig 5. Degree Centrality. Applying nodes size = degree and nodes label = degree

Fig 5. Degree Centrality. Applying nodes size = degree and nodes label = degree

 

 

 

 

 

 

 

 

 

 

 

 

  • In-degree Centrality: the number of edges coming in. In other words, it indicates popularity or prestige that an individual has in the community. It’s possible to note from Figure 6 the number of other students that are, for example, seeking Jill’s help.
Fig 6. In-degree Centrality. Applying nodes size = In-degree, nodes label = label and nodes color = in-degree

Fig 6. In-degree Centrality. Applying nodes size = In-degree, nodes label = label and nodes color = in-degree

 

 

 

 

 

 

 

 

 

 

 

 

  • Out-degree Centrality: the number of edges leading out. In other words, it indicates gregariousness about an individual. Clearly,  Emma and Bob  influence the greatest number of other students. They have direct influence over 4 students.
Fig 7. Out-degree Centrality. Applying nodes size = Out-degree, nodes label = label and nodes color = out-degree

Fig 7. Out-degree Centrality. Applying nodes size = Out-degree, nodes label = label and nodes color = out-degree

 

 

 

 

 

 

 

 

 

 

 

 

  • Betweenness Centrality: the ease of connection with any other node in the network. If Allen or Liz are removed from network, the entire connection would be completely collapsed with the rest of the community, and so you will notice separated subgroups. It plays an important role which is called “Network Broker”.
Fig 8. Betweenness Centrality. Applying nodes size = betweenness, nodes label = label and nodes color = betweenness

Fig 8. Betweenness Centrality. Applying nodes size = betweenness, nodes label = label and nodes color = betweenness

 

 

 

 

 

 

 

 

 

 

 

 

Network Modularity and Community Identification

It’s a cluster detection algorithm. Students of the same cluster are colored with the same color.

Fig 9. Modularity statistic. Applying nodes color = modularity class

Fig 9. Modularity statistic. Applying nodes color = modularity class

 

 

 

 

 

 

 

 

 

 

 

 

References

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

Additional resources

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/

Hirst, T. (2010, April 23). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/

Hirst, T. (2010, May 10). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/05/10/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-iii-ego-filters-and-simple-network-stats/

figure_gephi

Introducing Gephi video

The dataset analyzed contains the social networks that were extracted from the activities of the participants on blogs and Twitter in the MOOC: Connectivism and Connective Knowledge 2011 (CCK11).

The Twitter graph includes all authors as nodes of the network, and the edges between them were created if an author were tagged within the tweet. For example, if a course participant @A mentioned @B and @C in a tweet, then the course Twitter network would contain authors as @A, @B, and @C with the following edges: @A – @B, and @A – @C.

The blog graph includes authors of the blog posts (i.e. blog owners) and the authors of the comments to individual blog posts. If a learner A created a blog post, and then learners B and C added comments to that post, then the corresponding network would contain nodes A, B, and C with the following edges: A-B, and A-C. Edges can be undirected, or directed if the direction of the relation matters. In this case the direction is not relevant for the analysis.

Note: This dataset can only be used for course participants and it is restrict to the purposes of completing the activities and assignments in DALMOOC so I can’t share it

Social network graphs are essentially based on the construct of nodes and edges. Note from my “import report” (Fig. 1) that there are 194 nodes (students) in blog network, connected by 205 edges (relations). To keep it simple in this post, I’ll focus on blog network.

Import report for blog

Fig 1. Import report for blog

Visualization

The software produces an overview of the graph – nodes positions are random and completely unreadable and unclear (Fig 2). Note that the basic layout of the visualization is not good yet for our interpretation.

Fig The basic network layout

Fig 2. The basic network layout

 

 

 

 

 

 

 

 

 

This is an important operation! A common first step is to apply a layout algorithm to re-position the nodes in order to improve its readability. It’s possible to experiment multiple layouts as you search for the best appropriate to display your graph. There are also many layout plugins options for Gephi (check it here).

So how to choose a layout? Here’s a link with a very helpful tutorial about layouts in Gephi. It will guide you to the basic and advanced layout settings in Gephi.

At this point I played extensively with the combination of layouts, filters and performed the main network analyses (density, centrality, and modularity). Some layouts have caught my attention and I’d like to mention two of them: Yifan Hu and Fruchterman Reingold. So here’s what I get when I run these algorithms over blog network:

Fig 3. Yifan Hu for blog

Fig 3. Applying the Yifan Hu algorithm

Fig 4. Applying the Fruchterman Reingold algorithm

Fig 4. Applying the Fruchterman Reingold algorithm

 

 

 

 

 

 

 

 

 

The Fruchterman Reingold (FR) algorithm avoids dispersion of disconnected nodes and it also produces less differentiation between sub-clusters. FR displays the nodes in a circle where they can be easily distinguished. It took more time to run and  I needed to stop manually. It takes a time complexity of O(N2).

The Yifan Hu (YH) algorithm runs much more rapidly than the other methods available and it stops automatically. It reduces the quadratic complexity to O(N*log(N)). YH seems to produce more differentiation between sub-clusters and the main cluster looks tighter. Note from the Figure 3 that it places disconnected nodes farther apart. In my case, I chose Yifan Hu algorithm (Fig. 3).

Being in the thick of things: Sizing the network nodes based on degree centrality.

Degree centrality represents the top most influential nodes. Individuals with high degree centrality have many social edges and are more active, recognized, important, and visible in their social networks (Brass, 1984; Gasevic et al., 2013 ).

The graph in Figure 5 plots the size of the nodes based in degree centrality. You can adjust it in the “Ranking” panel, select “Nodes” and the “red diamond”, then in the rolling menu select ”Degree” and enter the minimal and maximal value (I suggested 15-70). I also added the degree centrality score as a label – click on the “Configure…” link to set the data you want to get displayed in your graph.

Fig 5.

Fig 5. Applying nodes size = degree and nodes label = degree

 

 

 

 

 

 

 

 


Being in the bridge of things: Sizing the network nodes based on betweenness centrality.

Betweenness centrality indicates the ease of connection with anyone else in the network. The graph in Figure 6 plots the size of the nodes based in betweenness centrality. To set this measure just go back to the rolling menu and select ” Betweenness Centrality”.

Fig 6.

Fig 6. Applying nodes size = betweenness centrality and nodes label = degree

 

 

 

 

 

 

 

 

 

 

From Figures 5 and 6 we can verify significant similarities between the degree and betweenness centrality. Recently I found in the blog esinternet a good idea to compare and highlight both measures: I considered the top 10 nodes in terms of central nodes (measured by degree Fig. 7) and main brokers (measured by betweenness Fig. 8).

For central nodes: You can adjust it in the “Filters” panel, select “Topology” and the “Degree Range”, change it to the range of 8-55. For main brokers: You can adjust it in the “Filters” panel, select “Range” and the “Betweenness Centrality”, change it to the range of 500-4326. I also change the color: in the “Ranking” panel, choose the “color” to modify these sad shades of gray.

Fig 7. Top 10 central nodes measured by degree centrality

Fig 7. Top 10 central nodes measured by degree centrality

 

 

 

 

 

 

 

 

 

Fig 8. Top 10 main brokers measured by betweenness centrality

Fig 8. Top 10 main brokers measured by betweenness

 

 

 

 

 

 

 

 

 

It’s possible to clearly observe – from Figures 7 and 8 – that there are brokers and central nodes in common. Ok, now we start to see some patterns in the network.

Network brokers are those that build bridges between clusters (or sub-communities) in a network (Burt, 2013). Brokers typically represent a node connecting two or more different communities that may emerge in social network. Professor Dragan, in DALMOOC, states:

Brokers are able to see the different types of information that are available across different communities, and as such, they are exposed to many different ideas therefore being able to integrate these different ideas, generate potentially novel solutions.

Let’s dive a little bit more and look for more patterns…

Network Modularity and Community Identification

Once you applied “Modularity” statistic (it’s a cluster detection algorithm), you clearly see different clusters or communities that emerge from the network. Each cluster represents one color (Fig. 9). The algorithm detected 66 clusters; by the way, it represents a large number. Note various disconnected nodes from the Figure 9. It doesn’t make sense to have one student as a separate community.

Fig 7.

Fig 9. Applying modularity statistic

 

 

 

 

 

 

 

 

 

In terms of community formation the giant component filter was performed. The giant component represents the largest component of all the connected nodes in a network. That is to say that it removes all the disconnected nodes. Fortunato (2010) describes communities as:

Communities, also called clusters or modules, are groups of vertices which probably share common properties and/or play similar roles within the graph.

I watched a very useful tutorial by Jen Golbeck – suggested by Professor Dragan – on how to use gephi’s modularity feature to detect communities and color code them in graphs. Based in this tutorial, my goal now is to decrease the number of clusters in my graph.

In the “Statistics” panel, click on “Modularity” to display the modularity settings window, then choose resolution = 2. My clusters decreased from 66 to 4! I applied the same steps to Twitter network and as a result of that, the map produced by SNA were displayed:

Fig 8.

Fig 10. Blog network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.

 

 

 

 

 

 

 

 

 

 

 

 

Fig 9.

Fig 11. Twitter network in course (CCK11) applying node size=Betweeness Centrality and node label= Degree.

 

 

 

 

 

 

 

 

 

 

 

 

The density measure of the networks

In the report shown in Figure 12: Density = 0.01 for blog and Density = 0.003 for Twitter. It means that blogs network have more students connected to one another. De Laat et al. (De Laat et al., 2007) describes density as:

Density provides a measure of the overall ‘connections’ between the participants.

 

The following table represents my report summarized for blog and twitter network:

Fig x.

Fig 12. Blog vs Twitter report

 

 

 

 

 

 

 

Conclusion

Obviously, this is just a quick overview of the Gephi functionality and the centrality measures commonly seen in social network studies. It’s difficult to draw a meaningful conclusion without knowing more about the data. There is a limited understanding of the nodes – whom those nodes represents professors and whom represents the course participants?

I also found some limitations in my studies:

  •  the nodes labeled as “Id” for each student is coded within “Data Laboratory” of Gephi. Ok, I could set “Label” on nodes and compare centralities between week 6 and week 12. However, the column “Label” from both networks represents long strings of alphanumeric. So it’s not easy to visualize. I had some ideas but not efficiently. If you figure out a way to solve this issue in a useful manner, feel free to leave a comment. Feedback is always more than welcome!
  • Gephi does not support slide-by-slide display.

References

Brass, D. J. (1984). Being in the right place: A structural analysis of individual influence in an organization. Administrative Science Quarterly, 29(4), 518–539. doi:10.2307/2392937

Burt, R. S., Kilduff, M., & Tasselli, S. (2013). Social network analysis: foundations and frontiers on advantage. Annual review of psychology, 64, 527-547. doi: 10.1146/annurev-psych-113011-143828 (full text)

Dawson, S. (2008). A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), 224–238 (full text).

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

De Laat, M., Lally, V., Lipponen, L., & Simons, R. J. (2007). Investigating patterns of interaction in networked learning and computer-supported collaborative learning: A role for Social Network Analysis. International Journal of Computer-Supported Collaborative Learning, 2(1), 87-103. doi: 10.1007/s11412-007-9006-4 (full text).

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Gašević, D., Zouaq, A., Jenzen, R. (2013). Choose your Classmates, your GPA is at Stake!’ The Association of Cross-Class Social Ties and Academic Performance. American Behavioral Scientist, 57(10), 1459-1478. doi: 10.1177/0002764213479362 (full text).

Additional resources

Esinternet (2014, November). Assignment – Perform social network analysis and visualize analysis results in Gephi – Part of Basics of S.N.A (week 3) http://esinternet.blogspot.com.ar/2014/11/assignation-perform-social-network.html

Hirst, T. (2010, April 16). Getting Started With The Gephi Network Visualisation App – My Facebook Network, Part I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-visualisation-app-my-facebook-network-part-i/

Hirst, T. (2010, April 23). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part II: Basic Filters I, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/

Hirst, T. (2010, May 10). Getting Started With Gephi Network Visualisation App – My Facebook Network, Part III: Ego Filters and Simple Network Stats, Retrieved October 18, 2014, from http://blog.ouseful.info/2010/05/10/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-iii-ego-filters-and-simple-network-stats/

Analytic tools are visual data analysis that may be applied to better understand the interactions and relationships among the variables that influence student learning. The Horizon Report: 2010 Edition (Johnson et al. 2010) describes very well this technique:

Visual data analysis is a way of discovering and understanding patterns in large data sets via visual interpretation. It is a blend of statistics, data mining, and visualization that promises to make it possible for anyone to understand complex concepts and relationships

Is impressible the way that Matt Crosslin describes the dual layer MOOC through lenses in his blog post. Nicolas Cage – in National Treasure movie – figures out that a pair of spectacles with multiple colored lenses would change what he saw on the piece of paper as he changed lenses:

multiple colored lenses - National Treasure film

multiple colored lenses from National Treasure film.

Beyond that idea, why not say that analytic tools work as lenses as well? We change lenses or filters due to the data we are working with, each one helping you to see patterns and understanding data. In some cases you may be interested in interpret the social network analysis (e.g. centrality and modularity) of the networks (e.g. twitter and blog) . In other cases you may be interested in visualize the distribution of centrality measure for social networks, and many others different interests. There is a variety of analytic tools to interpret different sorts of data, such as:

Tableau

Tableau can connect to almost any database. You easily drag and drop your variables to create visualizations, and share with a click. You also can combine multiple graphics into interactive dashboards. So far, I have played with Tableau and watched some how-to videos. I found this really easy to learn and use. Tableau is not free, but you can download a free trial.

Gephi

Gephi is open-source and free. It is an interactive visualization and exploration platform for all kinds of networks and complex systems. You can interact with the visualization, manipulate the structures, shapes and colors to reveal patterns in large data collection. I have explored Gephi tool and I found it very easy to navigate and manipulate data. Follow this link to download Gephi. I had problems to install Gephi, but now it works (if you have issues to install Gephi go to the bottom of this post to see my solution).

RapidMiner

RapidMiner is a code-free analytics platform for machine learning, data mining and predictive analytics. There are several version of RapidMiner online – for DALMOOC, we will be using RapidMiner 5.3. You can find download in this page.

Lightside

LightSide is an open source text mining and machine learning tool. They offer quick-start tutorials on machine learning for beginners. You can download Lightside here.

One of the challenges with getting started with learning analytics is getting a sense of tools and resources. Here you can find my research in learning analytics tools #DALMOOC. Throughout the course this file will be updated.

Stay tuned to this blog on what I am learning about these tools #DALMOOC.

 

 

solution: I have Win 7 (64-bit). I uninstalled the latest Java version (it was java version – 1.8.0_05). I went here and downloaded Java 7 (version – 1.7.0_71) in my machine and the following Gephi version – 0.8.2 beta and voila!

Social network analysis (SNA) represents how people connect to each other and shows the flow of information in a network. SNA can help identify influencers, key players, disconnected individuals and groups within a community.  SNA consists in models that define relationships and interactions between individuals, as well as the patterns that emerge from those relationships and interactions (Baker and Siemens, in press).

Fortunato (2010) describes social networks as:

“paradigmatic examples of graphs with communities. The word community itself refers to a social context. People naturally tend to form groups, within their work environment, family, friends.

According to Dragan Gasevic in Data, Analytics and Learning MOOC (DALMOOC) and some authors (Grunspan et al., 2014; Fortunato, 2010), we can identify the importance of SNA in different fields. These include, broadly, sociology, business, engineering, economics, politics, computer science, mathematics, and even more recently, the importance of SNA is recognized in fields such as, physics and biology. For example, in biology science we can verify some interesting works in social contagion (Christakis and Fowler, 2013) and biological networks (Jonsson et al., 2006).

Protein–protein interaction (PPI) networks are commonly studied in biology and bioinformatics fields. Fig. 1 illustrates a community structure in PPI networks (Jonsson et al., 2006). The graph shows the interactions between proteins in cancerous cells of a rat. Those communities represent proteins into clusters that have similar function within the cell. Note that the communities are represented by different colours.

Fig 1. An example of protein–protein interaction network.

Fig 1. An example of protein–protein interaction network (Jonsson et al., 2006).

In a learning context, the use of SNA has demonstrated several benefits. Dragan Gasevic, in DALMOOC, states:

different authors identify the importance of social networks, social activities and social interaction as critical predictors of academic performance.

I’d like to mention Dawson’s work that has caught my attention recently. Dawson and colleagues (Dawson, Tan, and McWilliam, 2011) demonstrated the value of using social network analysis to measure and visualize student creative capacity. The study involved 76 students of the first year of graduate school of medicine at the University of Wollongong in Australia. They extracted discussion forum data from the institutional LMS (Blackboard). Fig. 2 illustrates the social network between student and staff participation in the discussion forum. Note that student names have been removed. The graph demonstrates the position and interactions of each student within the network. The full article is here.

Fig. 2 Sociogram of the medical student network interactions (Dawson, Tan, & McWilliam, 2011)

Fig. 2 Sociogram of the medical student network interactions (Dawson, Tan, and McWilliam, 2011)

Network elements of any Social Network

Elements of networks consist of actors and relations. Actors can be students, organization, blogs and so on. Those actors are modeled as nodes in a graph. The connections between actors make up relations (modeled as ties, edges). Those relations can represent friendship, advice, hindrance or communication.

Network Measures

Professor Dragan Gasevic, in DALMOOC, introduced some measurements commonly seen in social network studies. I want to briefly describe how all this works:

1. The attributes of the entire network:

  • Diameter : the length of the longest path through the network between any pair of two nodes in the social network.
  • Density: the number of existing connections and the possible connections in the graph.

2. The metrics revolve around the concept of centrality:

  • Degree Centrality: the total number of social ties a node has.
  • In-degree Centrality: the number of edges coming in. In other words, it indicates popularity or prestige that an individual has in the community.
  • Out-degree Centrality: the number of edges leading out. In other words, it indicates gregariousness about an individual.
  • Betweenness Centrality: the ease of connection with any other node in the network. It is important to mention that if this node is removed, the entire connection would be completely collapsed with the rest of the community, and so you will notice separated subgroups. It plays an important role which is called “Network Broker”.
  • Closeness Centrality: the shortest distance of a node to all others nodes. It is important to keep in mind that it doesn’t work for disconnected networks (actors have zero social ties or groups have no connection to other groups).

Social network analysis tool

All the above social networking variables can be computed by using the Gephi software (Fig. 3). Gephi is a free open-source platform that supports visualization and exploration for all kinds of networks, dynamic and hierarchical graphs.

I watched a very interesting hang-out with Shane Dawson. Shane Dawson is a researcher on social network analysis and co-developer of The Social Networks Adapting Pedagogical Practice (SNAPP). The SNAPP is an open source tool that performs real-time social network analysis and visualization of discussion forum activity within popular commercial and open source Learning Management Systems (e.g. BlackBoard and Moodle).

Fig 3. Gephi Software

Fig 3. Gephi Software

 

Reflect on the methods that could be used for data collection, steps to be taken for the analysis, potential conclusions, and possible issues

Network analysis for the study of learning can be applied at larger scales, such as in schools and higher education. Data can be from social media such as blogs, twitter, e-mails, discussion forums and many others. SNA is increasing rapidly: The Horizon Report: 2014 Edition (Johnson et al., 2014) points out the social media adoption in educational sector as “fast trends” over the next one to two years. In my view the SNA represents a powerful idea: we can measure how knowledge flows across networks of people and organizations.

Steps to be taken for the analysis:

Step 1: Data collection: the actors involved in the network. Data comes from social media (e.g. Blogs and Twitter), survey questionnaires, learning management systems (LMS) (e.g.Desire2Learn, BlackBoard, Moodle, etc.) or many other resources.

Step 2: Software involved supporting the search for patterns such as Gephi software and The Social Networks Adapting Pedagogical Practice (SNAPP).

Step 3: Analysis of the network: clusters, statistics, the metrics revolve around the concept of centrality, and so on.

It’s important to note that it requires extreme attention to privacy and ethical issues. So my concern: how to deal with ethical issues once you have total access to personal information? Who gets to see the SNA data and in what format?

According to U.S. Departament of Education 2012:

Education institutions must consider privacy, policy and legal issues when collecting, storing, analyzing, and disclosing personally identifiable information from students’ education records to third parties for data mining and analytics.

All ideas and contributions are welcome!

References

Baker, R., Siemens, G. (in press) Educational data mining and learning analytics. To appear in Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition. [preprint draft pdf]

Christakis, N.A., Fowler, J.H. (2013). Social contagion theory: examining dynamic social networks and human behavior. Stat Med 32, 556– 577.

Dawson, S., Tan, J. P., McWilliam, E. (2011). Measuring creative potential: Using social network analysis to monitor a learners’ creative capacity. Australasian Journal of Educational Technolog, 27(6), 924-942.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Grunspan, D. Z., Wiggins , B. L., Goodreau, S.M. (2014). Understanding Classrooms through Social Network Analysis: A Primer for Social Network Analysis in Education Research. Life Sciences Education, 13(2), 167-178.

Johnson, L., Adams Becker, S., Estrada, V., Freeman, A. (2014). NMC Horizon Report: 2014 Higher Education Edition. Austin, Texas: The New Media Consortium. (ful text)

Jonsson , P.F., Cavanna , T., Zicha , D., Bates , P.A. (2006). Cluster analysis of networks generated through homology: Automatic identification of important protein communities involved in cancer metastasis, BMC Bioinformatics.

U.S. Department of Education, Office of Educational Technology, Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief, Washington, D.C., 2012. (full text)

This is my 3rd course/MOOC, and this course is way ahead of anything else I’ve seen. Data, Analytics, and Learning provides “An introduction to the logic and methods of analysis of data to improve teaching and learning”. The most surprising to me is its non-linear structure and the several social tools involved.

You will experiment multiple learning pathways/dual layer MOOC. The instructors referred to this experiment as a metaphor. First of all, you basically have two choices, the “red pill” and the “blue pill”. The blue pill is what you are the most familiar with: it is the typical classroom environment, where you will be guided through a linear path of material, such as videos and assignments.

On the other hand, if you choose the red pill then you are about to choose a totally different way of learning (yeah, give it a shot!). You will be self-guided and responsible to connect with other learners to work on problems and assignments. Learners are encouraged to interact through various social tools (e.g. Blogs, Facebook, G+, Twitter, edX Forum) and share their artifacts. These artifacts can be videos, blog posts, graphics, images, or any other resources that you can share.

There are several new tools in DALMOOC, including Prosolo, Bazaar, Quick Helper, Visual Syllabus, Assignment Bank, and so on. Sometimes I feel overwhelmed but it has been a good experience anyway.

I created a ProSolo profile (it represents my online identity for DALMOOC). ProSolo is a social learning platform connected to edX. My profile includes my personal data, my social media and my geographic location. Moreover, it points out my geographic neighbors and I can buddy up with them. ProSolo provides learners with the ability to share knowledge, expertise, form groups, develop their competencies, work on assignments and peer assessment

Another interesting aspect is that it tracks my content published in the web and it integrates the social media. For example, this blog post will be tracked and thereafter available for peer assessment. If you are a twitter user then you can share the content using #DALMOOC tag.

There’s also another social learning tool named “Bazaar”. This is a collaborative activity that will allow you to connect with other learners – in real time – to complete exercises related to course topics. Once you log in, you will enter a lobby program and a virtual instructor will guide you and your partner throughout the discussion. In my case, I had the opportunity to learn more about the topic with my partner due to our level of expertise.

At first I found Prosolo difficult to manage, but now you can find some video tutorials to help users how to use ProSolo.

By the way, I’m taking the red pill of #DALMOOC. Wait for next posts. Meanwhile, let’s check out the red pill effect.

Ready for social learning ? Go for it!

EduGeek Journal

Proud Sponsor of Your Future

Follow

Get every new post delivered to your Inbox.