Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement citations() score as a function of cited paper frequencies #208

Open
aaccomazzi opened this issue Dec 4, 2023 · 0 comments
Open

Comments

@aaccomazzi
Copy link
Member

aaccomazzi commented Dec 4, 2023

This is a request that comes up on a regular basis. When we call the citations() and references() operators, the returned scores are not useful, in the sense that they neither reflect the original scores from the inner query nor they reflect the number of times the documents in the inner query were cited by the returned documents. We would like to enable the latter.

For example, if I search for author:"accomazzi, a" in the astronomy colleciton I will find about 200+ documents. If I ask for the their citations via citations(author:"accomazzi, a") the generated list has a ranking which is somewhat meaningless. Instead, we would like to see at the top the papers that cite the original inner query most frequently, which in this case would be:

bibcode              | citations
-------------------- | ---------
2010ARIST..44....3K  | 13.000
2002Ap&SS.282..299E  | 10.000
2011ASSP...24...23K  | 8.000
2007BASI...35..717E  | 8.000
2003lisa.conf..145E  | 8.000
2018ApJS..236....3H  | 7.000
...

Ideally we should take one step forward and consider implementing a hybrid score controlled by an optional parameter, as we have done for the reviews() operator:


The optional parameter (let's call it textWeightRatio) would control how much weight is given to the scores coming from the documents retrieved by the inner query, so that we can compute a final score for each citing paper j this way:

final_score(j) = SUM (1 + textWeightRatio * innerScore(i) / maxInnerScore)

where innerScore(i) is the relevance score computed for document i which matches the inner query, and SUM is computed over all citations to the inner set. maxInnerscore is the highest score from the inner query. When textWeightRatio is 0 (default), the final score is simply the number of citations document j has to the documents selected by the inner query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant