How do you track scholarly impact, beyond citation-counting? Princeton computer scientists Sean Gerrish and David Blei developed a model based on the hypothesis that the most impactful publications will impact the mix of terminology used in subsequent work in the field, using corpora from Nature, PNAS, etc.:
“Identifying the most inﬂuential documents in a corpus is an important problem in many ﬁelds, from information science and historiography to text summarization and news aggregation. Unfortunately, traditional bibliometrics such as citations are often not available. We propose using changes in the thematic content of documents over time to measure the importance of individual documents within the collection. We describe a dynamic topic model for both quantifying and qualifying the impact of these documents. We validate the model by analyzing three large corpora of scientiﬁc articles.” (via)
For example, they show how after the publication of “Molecular cloning of a cDNA encoding human antihaemophilic factor” in 1984, terms very frequently used in the highly-cited paper (e.g. “expression” and “blot”) became much more commonplace in the field. This content-based approach makes for an interesting supplement to bibliometric approaches that rely primary on author-generated citations.
- “A Language-based Approach to Measuring Scholarly Impact” (on CiteSeer)
- Related story in The Economist, focusing on topic-grouping aspects