Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to change Gaia row limit #2184

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eerovaher
Copy link
Member

@eerovaher eerovaher commented Oct 20, 2021

The first commit of this pull request adds the row_limit argument to the query_object and cone_search families of functions in the Gaia module. It also ensures that the row limit can be changed through the configuration system at runtime.

The second commit makes the previously mentioned functions emit a UserWarning MaxResultsWarning if the number of rows in the query result matches the row limit.

Closes #1760

@codecov
Copy link

codecov bot commented Oct 20, 2021

Codecov Report

Merging #2184 (a2f0ee9) into main (7de9084) will increase coverage by 0.05%.
The diff coverage is 79.16%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2184      +/-   ##
==========================================
+ Coverage   66.10%   66.15%   +0.05%     
==========================================
  Files         418      418              
  Lines       28167    28201      +34     
==========================================
+ Hits        18619    18657      +38     
+ Misses       9548     9544       -4     
Impacted Files Coverage Δ
astroquery/gaia/tests/test_gaia_remote.py 26.31% <16.66%> (+2.78%) ⬆️
astroquery/gaia/core.py 62.80% <100.00%> (+1.35%) ⬆️
astroquery/gaia/tests/test_gaiatap.py 99.67% <100.00%> (+0.02%) ⬆️
astroquery/mast/collections.py 92.10% <0.00%> (ø)
astroquery/utils/tap/model/job.py 58.79% <0.00%> (+0.42%) ⬆️
astroquery/utils/tap/core.py 42.00% <0.00%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7de9084...a2f0ee9. Read the comment docs.

Copy link
Contributor

@keflavich keflavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Sidenote for maintainers: that the fact you had to make the same changes (add the same warnings) in multiple places indicates that the code should be refactored to use common functions for data retrieval within __cone_search and __query_object.

@bsipocz
Copy link
Member

bsipocz commented Oct 22, 2021

@keflavich - I run into these same duplications while reviewing #2140, too. Do you recall any reasons why the async_to_sync decorator wasn't used?

@bsipocz bsipocz added the gaia label Oct 22, 2021
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning needs to be removed, and I would also like to reach a conscious decisions for the preferred API before going ahead and merging this, while we have different approaches in other modules.

The rest of the comments nitpicks.

dump_to_file=dump_to_file,
background=background)
result = self.launch_job_async(query=query,
output_file=output_file,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indents (no need to have them in new lines, I would welcome a sensible reformat)

@@ -41,7 +43,7 @@ class GaiaClass(TapPlus):
MAIN_GAIA_TABLE = None
MAIN_GAIA_TABLE_RA = conf.MAIN_GAIA_TABLE_RA
MAIN_GAIA_TABLE_DEC = conf.MAIN_GAIA_TABLE_DEC
ROW_LIMIT = conf.ROW_LIMIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keflavich - so we decided to not go with the vizier approach where the default is set in the config (but can be overridden with a class init kwarg?) I don't use this module, so don't necessarily see which approach is more practical, but firmly think that we should converge toward one API over the different modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to comment on this, but if you keep reading - the API is the same, it's only the implementation that's different. This is used below as row_limit = row_limit or self.ROW_LIMIT or conf.ROW_LIMIT, which effectively does the same thing - the only difference is that class-level changes (as opposed to instance-level) will be different. i.e.:

Same behavior:

from astroquery.gaia import Gaia
Gaia.ROW_LIMIT = 1
g2 = Gaia()
g2.ROW_LIMIT # this is the default (None), not inherited from Gaia()

Different behavior:

from astroquery.gaia.core import GaiaClass
GaiaClass.ROW_LIMIT = 1
gg = GaiaClass()
gg.ROW_LIMIT = 2
gg2 = GaiaClass()
gg2.ROW_LIMIT # this is 1, inherited from the default set in GaiaClass

Perhaps for consistency only, we should enforce the old behavior? From a user perspective, this is very unlikely to come up, but it will certainly be confusing if it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer it if these options were set in __init__(), but the purpose of this pull request is to resolve #1760 with minimal changes to the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eerovaher, this is definitely an improvement either way. An API change is acceptable, though - if you want to modify this to put row_limit=conf.row_limit in __init__, like in the Vizier module, that would be fine. @bsipocz I'd like your input here, but I think the Vizier API is commonly used and has resulted in few problems, so if we change the API here to be more like Vizier, it will be best?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the configuration options through __init__() deserves a separate pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the configuration options through init() deserves a separate pull request.

Yes, indeed. But if it's handled though __init__, then there is no need for the kwarg. So converging to a preferred API is certainly in the realm of this PR, and thus I would like to think about this first than rush through a merge and then change the API later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the row_limit keyword arguments was discussed in #2153 (comment), #2153 (comment) and #2153 (comment).

return job.get_results()
table = job.get_results()
if len(table) == row_limit:
warn('The number of rows in the result matches the current row '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is right, definitely not when verbose is set to False. But even then, I would assume the user set the limit to a certain number rather than unlimited then they expect that the return may be trunckrated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this should be disabled with verbose=False. We don't warn users when using other modules that their results are truncated - though I think it would not be a bad idea to do so!

Copy link
Member Author

@eerovaher eerovaher Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the user has not specified a row limit, a default row limit from the configuration still gets applied, and this seems to be causing problems for confusion among the users. #1760 isn't the only issue about this, there's also #638, and there might be more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eerovaher. @bsipocz , I do think the several issues reported with surprise at getting truncated results - and my own experience forgetting that ROW_LIMIT is the reason there are fewer sources on a given patch of sky - suggests that we can default to having a more verbose response when the results are truncated by a user parameter. It constitutes a minor UX change, but I think a positive one. So, I'm 👍 for keeping this change - but we do want to make sure to clarify how to silence the warning, since excess warnings are one of the main annoyances users (including me) complain about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option worth considering is to emit a warning only if the row limit value used was taken from conf, so if the user has specified either the row_limit keyword argument or the class attribute then there will be no warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few pull requests seem to have addressed the complaints about the gaia module being too verbose, so I'm not sure that is a problem anymore.

Copy link
Member

@bsipocz bsipocz Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a warning, that is independent of verbose increases that verbosity that is now kind of fixed, and what people were complaining about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the gaia module used to be too verbose in general, these warnings in particular seem to be requested by the users. But if you think it is better to only emit them if verbose=True then I'll update the pull request accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these warnings in particular seem to be requested by the users

I don't see any issues that ask for the warning. The issue mentioned above asks for better documentation. OTOH there are a lot of issues that ask for better message handling that makes logging better for automated pipelines and that remove the littering of stdout, and historically, too, we've been asked a lot of times to get rid of the diagnostic prints etc even though they are a preferred way of coding of a lot of people in their own scripts.
So, yes, I refer back again in my first comment above that this warning should not be raised when verbose=False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the pull request so that a MaxResultsWarning is not emitted if verbose=False.

#638 does ask for such warnings, but at least in gaia module enabling them is now simple and also documented.

astroquery/gaia/core.py Show resolved Hide resolved
@eerovaher
Copy link
Member Author

Do you recall any reasons why the async_to_sync decorator wasn't used?

There already is an issue about this: #2100.

It is now possible to specify the row limit of the `query_object` and
`cone_search` families of functions with a keyword argument. It is now
also possible to change the row limit through the `conf.ROW_LIMIT`
configuration item at runtime.
The `query_object` and `cone_search` families of functions in the Gaia
module now emit a `MaxResultsWarning` if the number of rows in the query
result matches the row limit.
@eerovaher
Copy link
Member Author

I noticed that astroquery defines a MaxResultsWarning:

class MaxResultsWarning(AstropyWarning):
"""
Astroquery warning class to be issued when the maximum allowed
results are returned.
"""

This is more appropriate for letting the users know their row limit might be too small, so I've updated the pull request.

The `query_object` and `cone_search` families of functions in
`astroquery.gaia` no longer emit a `MaxResultsWarning` unless they were
called with `verbose=True`.
@keflavich
Copy link
Contributor

@bsipocz I suggest we merge this as an improvement but keep open issues about ROW_LIMIT API elsewhere (e.g., #2198, #638)

@bsipocz bsipocz added this to the v0.4.7 milestone Mar 19, 2022
@jespinosaar
Copy link
Contributor

Dear Astroquery team,

Could you please put in cc @esdc-esac-esa-int in the pull requests where ESA code is modified? Thanks!

@bsipocz
Copy link
Member

bsipocz commented Aug 7, 2023

@eerovaher - could you please rebase, so then this can be reviewed and eventually merged in?

@bsipocz bsipocz removed this from the v0.4.7 milestone Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GaiaClass.ROW_LIMIT prevents effective searching
4 participants