Login |

Co-Citation Proximity Analysis - Recommendation and Clustering Algorithms for Academic Literature

Co-Citation Proximity Analysis (CPA) [1, 2, 3] is a method to compute both local and global instances of semantic similarity in academic documents by examining citation proximity in the full texts of documents. 

CPA was developed with two applications in mind: recommender systems and clusteringRegarding the first application, an improved measure of document semantic similarity, which computes similarity at a more fine-grained resolution, has the potential to significantly improve the relevance of academic literature recommendations. Regarding the second application, a more granular measure of document similarity allows the development of more precise clustering algorithms for academic literature.

The CPA approach is an advancement of the well-known and widespread co-citation analysis. However, in addition to co-citation analysis, CPA was the first approach that proposed using modified weights based on the proximity of co-citations to each other within an article's full text [4]. The underlying idea is that the closer citations are to each other in the full-text of documents, the more likely they are related.

In comparison to existing approaches, like bibliographic coupling, co-citation analysis or keyword-based similarity computations, CPA achieves a higher precision and offers the possibility to pinpoint related chapters, sections or paragraphs within the texts of academic documents. Moreover, CPA allows a more precise automatic document classification.

Related Publications

[1] [PDF] B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) – A New Approach for Identifying Related Work Based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), Rio de Janeiro, Brazil, 2009. 

[2] B. Gipp and J. Beel, “Identifying Related Documents For Research Paper Recommender By CPA And COA,” in Proceedings of The World Congress on Engineering and Computer Science 2009, Berkeley, USA, 2009. 

[3] B. Gipp, “Measuring Document Relatedness by Citation Proximity Analysis and Citation Order Analysis,” in Research and Advanced Technology for Digital Libraries: Proceedings of the 14th European Conference on Digital Libraries (ECDL’10), 2010.

[4] Kevin W. Boyack, Henry Small, Richard Klavans, “Improving the Accuracy of Co-citation Clustering Using Full Text”, in Proceedings of 17th International Conference on Science and Technology Indicators, 2012