Measuring scholarly impact, beyond citation scores

How do you track scholarly impact, beyond citation-counting? Princeton computer scientists Sean Gerrish and David Blei developed a model based on the hypothesis that the most impactful publications will impact the mix of terminology used in subsequent work in the field, using corpora from Nature, PNAS, etc.:

“Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregation. Unfortunately, traditional bibliometrics such as citations are often not available. We propose using changes in the thematic content of documents over time to measure the importance of individual documents within the collection. We describe a dynamic topic model for both quantifying and qualifying the impact of these documents. We validate the model by analyzing three large corpora of scientific articles.” (via)

For example, they show how after the publication of “Molecular cloning of a cDNA encoding human antihaemophilic factor” in 1984, terms very frequently used in the highly-cited paper (e.g. “expression” and “blot”) became much more commonplace in the field. This content-based approach makes for an interesting supplement to bibliometric approaches that rely primary on author-generated citations.

  1. Hi Anirvan, this is very interesting. Speaking of citation-counting and influential publications…here is another paper that might interest you. “How journal rankings can suppress interdisciplinarity. The case of innovation studies in business and management” by Ismael Rafols, Loet Leydesdorff, Alice O’Hare, Paul Nightingale and Andy Stirling. See the working paper at

