Graph-Based Data Mining for Social Network Analysis
DOI:
https://doi.org/10.15662/IJEETR.2022.0405001Keywords:
Graph Mining, Social Networks, Community Detection, Centrality, Link Prediction, Frequent Subgraph Mining, Graph Embedding, Network Analysis, Scalability, Social Network AnalysisAbstract
Graph-based data mining has become a fundamental paradigm for analyzing complex networks, particularly in the realm of social network analysis (SNA). Social networks naturally form graph structures—nodes representing individuals or entities, and edges representing interactions, relationships, or flows of information. Mining these graphs enables insights into community structures, influential individuals, information diffusion, and anomaly detection. This paper provides a structured overview of graph-based data mining techniques applied to SNA before 2021.
Key methods include community detection (e.g., modularity optimization, spectral clustering), centrality measures (e.g., degree, betweenness, eigenvector centrality), and link prediction models (e.g., common neighbors, preferential attachment, supervised learning). It also explores pattern mining techniques such as frequent subgraph mining and graph kernels, which support tasks like role detection and graph classification. Advances in scalable mining through MapReduce and distributed frameworks are highlighted.
A mixed-method research methodology is employed: literature synthesis, performance comparisons on benchmark social network datasets (e.g., Facebook, Twitter, collaboration networks), and illustrative case studies demonstrating insights obtained via these graph mining approaches.
Key findings indicate that graph mining methods significantly enhance detection of communities, influencers, and emergent trends in networks. Supervised link prediction models outperform heuristics, especially when enriched with node and structural features. Large-scale graph mining remains challenging, requiring efficient algorithms and parallelization.
The paper articulates an end-to-end workflow: data collection, graph construction, feature extraction or embedding, algorithm selection, evaluation, and interpretation. Advantages include rich relational insight and interpretability; disadvantages include computational demands and sensitivity to network noise.
This review concludes that graph-based data mining remains indispensable for SNA. Future work includes integrating graph neural networks, dynamic graph mining, and privacy-preserving graph analysis to address evolving social platforms and data concerns.
References
1. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.
2. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582.
3. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
4. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
5. Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1), 35–41.
6. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
7. Liben-Nowell, D., & Kleinberg, J. (2007). The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
8. Cook, D. J., & Holder, L. B. (1994). Substructure discovery using minimal description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.
9. Shervashidze, N., Vishwanathan, S. V. N., Petri, T., Mehlhorn, K., & Borgwardt, K. M. (2009). Efficient graphlet kernels for large graph comparison. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, 488–495.
10. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010). Pregel: A system for large-scale graph processing. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 135–146.
11. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.
12. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864.
13. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2010). Mining of Massive Datasets. Cambridge University Press.





