Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of rows returned from query limited to 2000? #39

Open
jmangum opened this issue Jun 11, 2018 · 4 comments
Open

Number of rows returned from query limited to 2000? #39

jmangum opened this issue Jun 11, 2018 · 4 comments

Comments

@jmangum
Copy link

jmangum commented Jun 11, 2018

Hello,

I am trying to extract citation statistics for various journals by running two queries looped over a range in years:

for yr in yearlist:
articles = list(ads.SearchQuery(q="(year:"+yr+" bibstem:"+journal+" AND citation_count:[0 TO 999990]) +property:refereed -title:erratum",fl=fllist,rows=3000))
zeroarticles = list(ads.SearchQuery(q="(year:"+yr+" bibstem:"+journal+" AND citation_count:[0 TO 0]) +property:refereed -title:erratum",fl=fllist,rows=3000))
...

I have found that
(1) If I do not set the rows parameter, I get a maximum of 50 results.
(2) If I set rows to 2000, I get at most 2000 results.
(3) If I set rows to a number larger than 2000, I get a maximum of 2000 results.
(4) It does not seem to matter if I set rows to a int or string in the SearchQuery call

I need to be able to return more than 2000 results, or hack around this limit by doing more smaller time range queries (which might cause me to approach my query limit). Is there a reason for the rows=2000 upper limit? If not, can it be increased? Thanks.

-- Jeff

@ghost
Copy link

ghost commented Jun 11, 2018 via email

@jmangum
Copy link
Author

jmangum commented Jun 12, 2018

Thanks for the response. Tried setting start+=2000, only to get a syntax error:

articles1 = list(ads.SearchQuery(q="(year:"+yr+" bibstem:"+journal+" AND citation_count:[0 TO 999990]) +property:refereed -title:erratum",fl=fllist,start+=2000,rows=2000))
                                                                                                                                                          ^

SyntaxError: invalid syntax

In fact, setting start to 2000 for the second pass through the search results in an index out of range error:

---> 37 articles1 = list(ads.SearchQuery(q="(year:"+yr+" bibstem:"+journal+" AND citation_count:[0 TO 999990]) +property:refereed -title:erratum",fl=fllist,start=2000,rows=2000))

/Users/jmangum/anaconda/lib/python2.7/site-packages/ads/search.pyc in next(self)
490
491 def next(self):
--> 492 return self.next()
493
494 def next(self):

/Users/jmangum/anaconda/lib/python2.7/site-packages/ads/search.pyc in next(self)
519 # extended .articles array.
520 self.execute()
--> 521 cur = self._articles[self.__iter_counter]
522
523 self.__iter_counter += 1

IndexError: list index out of range

-- Jeff

@romanchyla
Copy link
Contributor

Hi Jeff, it should be start=2000, but you got that right - that error comes from the ads package @andycasey where the iterator is probably not consulting numFound; i'm not familiar with the details of that code but basically it either needs to fetch new results behind the scene (start=current+rows&rows=2000) or exit (stop iteration).

I think you should update your ads package; the code at https://github.com/andycasey/ads/blob/master/ads/search.py#L498 seems right to me

if the problem persists, please create an issue with ads package; possibly the problem is here https://github.com/andycasey/ads/blob/master/ads/search.py#L547 (you did specify start parameter and the package may not be expecting it; but I did look only briefly)

@jmangum
Copy link
Author

jmangum commented Jun 12, 2018

Thanks Roman. I believe I have the latest update (as it is dated March 27,2017):

torgo:Stats jmangum$ python -c "import ads; print(ads.version)"
0.12.3

I will create an issue with the ads package. Thanks again!

-- Jeff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants