Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add NEOCC python interface to astroquery library #2254

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

D-arioSpace
Copy link

This is a python interface to data that the Near Earth Objects Coordination Centre (NEOCC) makes available through its API service (https://neo.ssa.esa.int/computer-access).

@pep8speaks
Copy link

pep8speaks commented Dec 22, 2021

Hello @D-arioSpace! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-02-07 15:54:47 UTC

Copy link
Member

@ceb8 ceb8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! This looks like a great new addition to astroquery!

Before I do a more thorough review, there are a couple of important changes that need to be implemented. You are using two packages: pandas and parse, that are not part of the astroquery dependency list (https://astroquery.readthedocs.io/en/latest/#requirements). We ask that new modules not depend on packages not already required, except in limited circumstances. Looking through your code I don't think you are doing anything that would be impossible without pandas or parse. I would be happy to help you rework your code to remove those dependencies, particularly figuring how to replace the pandas functionality with astropy.tables.

@D-arioSpace
Copy link
Author

Dear @ceb8, it's a pleasure to meet you.
Sorry for the delay in replying. Please take a look at this latest pull where we removed the dependency form the parse library.
Unfortunately we cannot remove the pandas dependency because we do not have the necessary resources to implement the changes, it requires too much effort. Hopefully this may be one of these "limited circumstances" you speak of. In this case it's not a question of functionality (apparently there is nothing that astropy.tables can't do and panda does, but I can't be 100% sure either). It is just a matter of our internal resources that we do not have at the moment.

I perfectly understand that this can introduce dependencies and extra-feature that you would like to avoid. Please take also into account that, apart from the lack of resources, there are also other considerations that push us to this choice:

  • there are users of some institutions who already use the library. We would like to encourage these users to use the library through astroquery (instead of the stand-alone version we provided) but with as little effort as possible for them. By using the library as it is now, they should just change their current import to from astroquery.esa.neocc import neocc in their code, and everything else will be transparent to them.
  • all users who now use the library already have pandas pre-installed. For those who don't have it or start with a very basic version of python, the installation can easily be done separately.

Taking all this into consideration, please let us know if we can focus on the integration into astroquery without considering the pandas replacement.

@ceb8
Copy link
Member

ceb8 commented Mar 15, 2022

Hi @D-arioSpace,

Unfortunately we really cannot allow a pandas dependency into our codebase, returning Astropy Tables to users is a very basic requirement in our API specs. However, we can provide the labor to make that change. If this course of action is acceptable, I will implement those changes within this PR.

There is one other issue that needs to be resolved before we proceed, which is the matter of the copyright notice you have on your code:

© Copyright [European Space Agency][2022]
All rights reserved

Astroquery is released under a 3-clause BSD style license (see https://astroquery.readthedocs.io/en/latest/license.html), and individual modules cannot be released under different licenses. Would it be OK to remove this notice? We have other modules contributed by ESA developers, so I believe the 3-clause BSD license should be acceptable from that standpoint.

Let me know what you think. If I get your OK I will, implement the necessary changes and then ping you so you can review my work.

Thanks for your patience.

@D-arioSpace
Copy link
Author

Hello @ceb8!

First of all, sorry for the delay in replying, unfortunately we had to resolve some legal issues due to the change of copyright requested.

From now on, you can change the license and insert a 3-clause BSD style license anywhere in the code.

Also, I would like to point out that we carried out (within this fork) another push containing the implementation of the new API provided by NEOCC web portal (https://neo.ssa.esa.int/PSDB-portlet/download?file=past_impactors_list), this time using directly astropy.table (see parse_impacted function).

Thanks again for your help. We are looking forward to see this library integrated into astroquery!

@ceb8
Copy link
Member

ceb8 commented Sep 8, 2022

Hi @D-arioSpace!

Thanks for your response, and no worries about the delay. That's very good news about the copyright issue.

I am a bit confused about the new implementation you mention however, since I still see pandas imported all over. Do you mean this to be an example implementation for me? If so, thank you very much and I will get started on the work shortly. Just want to make sure not to step on your toes.

@D-arioSpace
Copy link
Author

Hello @ceb8

you're right, it is just an example, which we hope to have implemented well (we did this to become familiar with the astropy.table data structure, taking advantage of the fact that this specific function had to be implemented almost from scratch). Feel free to improve it.
And yes, pandas is still in all other functions, which need to be converted.
Thanks again for your help. :)

Dario

@ceb8
Copy link
Member

ceb8 commented Sep 9, 2022

@D-arioSpace Excellent, I am in this middle of one coding project, so I will start this one as soon as that one wraps up.

@ceb8
Copy link
Member

ceb8 commented Nov 8, 2022

@D-arioSpace I'm pushing the de-pandafied lists.py functions so that you can go ahead and take a look at them while I am working on the tables.

@D-arioSpace
Copy link
Author

D-arioSpace commented Nov 28, 2022

Hi @ceb8! I had a look to the code and it looks fine. I also tested the main functionality within the list.py module and all of them are working!
I would just remove the extra white spaces between the asteroid's number and its name (1) for all lists. I also saw that in the two 'priority' lists the time is in isot format, while in all the others it is iso (2). Both comments are not so relevant, but in case you think it's useful to implement them, I'll leave them to you.

I leave below the steps to reproduce the behavior I mentioned:
(1)

ast = neocc.query_list(list_name='nea_list')
ast[1]
<Row index=1>
      NEA       
     str27      
----------------
719       Albert

(2)

out1 = neocc.query_list(list_name='priority_list')
out1[0][7]
<Time object: scale='utc' format='isot' value=2022-11-30T00:00:00.000>
out2 = neocc.query_list(list_name='risk_list')
out2[0][3]
<Time object: scale='utc' format='iso' value=2056-12-12 21:39:00.000>

@D-arioSpace
Copy link
Author

Hi @ceb8 just to inform you I added a very small fix to tabs.py and to the 433.clolin file as the latter changed and the former has been adapted to that change

@ceb8
Copy link
Member

ceb8 commented Jan 27, 2023

@D-arioSpace I’ve fully removed pandas from this module. In doing so I also tried to regularise the output formats, so now instead of query-specific objects the outputs are all either a single astropy table or a list of them.

I haven’t updated the documentation or the tests, so this is still very much a work in progress. But I wanted to make sure we are all on the same page in terms of the functionality before working on that.

I have a couple of big picture questions that @bsipocz and @keflavich may want to also weigh in on:

  • query_object sometimes returns a single Astropy table, and sometimes a list of them, would it be better to separate out the list-returning queries into a separate function? (or alternately always return a list, but sometimes the list is only of length 1)
  • Two table types for query_object (orbit properties and ephemeradies) require additional arguments, might it make sense to split these queries out into seperate functions?
  • This is mainly for @bsipocz and @keflavich, this module does not use the async_to_sync paradigm. It would not be too hard to update it do that, but it seems like we are moving away from that structure, what do you think about changing it vs leaving as is?
  • Generally how do you feel about the functionality, and output formats? I put all the non-tabular information in the meta property of the table(s):
impacts_table = neocc.query_object(name='1979XB', tab='impacts')
pd_tble.meta
  • A lot of the datetimes produce dubious year warnings. I looked up why, and it seems basically unavoidable with dates too far in the past/future using UTC, what are your opinions on how to handle this?

@D-arioSpace
Copy link
Author

D-arioSpace commented Apr 14, 2023

Hi @ceb8 sorry for the big delay in answering and many thanks for the update of this interface!

Regarding your big picture questions, I personally don't have any issues with the current behavior of query_object returning either a single table or a list of tables, but it would be good to get feedback from @bsipocz and @keflavich as well, since they are the experts in this area.

Regarding the two table types that require additional arguments, I'm okay with leaving them as is for now, but if anyone else has strong opinions on this, we can revisit it later.

In terms of functionality and output formats, I don't have any major issues with them. However, I did notice an issue with error handling when calling query_object with an invalid tab name, for example:
temp = neocc.query_object(name='1979XB', tab='s')
it returns me a non-clean message like:
KeyError: 'impactsPlease introduce a valid tab name. valid tabs names are: , close_approachesPlease introduce a valid tab name. valid tabs names are: , observationsPlease introduce a valid tab name. valid tabs names are: , physical_propertiesPlease introduce a valid tab name. valid tabs names are: , orbit_propertiesPlease introduce a valid tab name. valid tabs names are: , ephemeridesPlease introduce a valid tab name. valid tabs names are: , summary'.

The error message that is returned is not very clean and includes multiple repeated messages.

Additionally, I encountered an error when trying to retrieve observations for object "433" (i.e. neocc.query_object(name='433', tab='observations') ). It looks like there may be a bug in the code related to radar observations and the "AsteroidObservation" class, in the “_get_rad_into” method. It would be helpful to investigate this further and see if we can resolve the issue (please let me know if there's anything I can do to help. I anticipate having more time available in the near future to dedicate to this project, so please don't hesitate to reach out to me if there's anything I can assist with.).

Overall, thanks for the updates! Let me know if there's anything I can do to help.

@D-arioSpace
Copy link
Author

Hi @ceb8 I was wondering if the work is progressing smoothly or if I could assist you in clarifying any doubts or concerns you might have.

@ceb8
Copy link
Member

ceb8 commented Jun 30, 2023

@D-arioSpace Thanks for checking in, I'm just getting back to this after a busy period. Seems pretty straightforward. I'll leave things as they are for now, fix the bugs, and move forward updating the testing to match the new functionality.

@D-arioSpace
Copy link
Author

Thank you very much @ceb8, I confirm my availability to clarify any doubts you may have about files and functions. I hope to hear from you soon then! Thanks again.

@D-arioSpace
Copy link
Author

Hi @ceb8 sorry for bothering you again. I'm working with NEOCC data for some projects and I was wondering if you believe it would be possible to have the code integrated into astroquery by September. This integration would greatly benefit these other projects, as they would be able to utilise the astroquery package directly instead of relying on the "standalone" version for NEOCC data.

Thank you again for your time and effort!

@ceb8
Copy link
Member

ceb8 commented Jul 7, 2023

@D-arioSpace That seems like a reasonable timeframe.

@ceb8
Copy link
Member

ceb8 commented Jul 21, 2023

@bsipocz @keflavich Could I get your opinions on the big picture questions I mentioned above?

@ceb8
Copy link
Member

ceb8 commented Jul 21, 2023

@D-arioSpace Looking into the neocc.query_object(name='433', tab='observations') error, and I can't reproduce it. I get a table that looks superficially fine to me as output, can you describe what you're seeing?

@keflavich
Copy link
Contributor

query_object sometimes returns a single Astropy table, and sometimes a list of them, would it be better to separate out the list-returning queries into a separate function? (or alternately always return a list, but sometimes the list is only of length 1)

Yes, that's a good idea if there are two distinct routes that both make sense. For some services, like Vizier, queries are supposed to return lists of catalogs, so it just makes sense that that's the return.

Two table types for query_object (orbit properties and ephemeradies) require additional arguments, might it make sense to split these queries out into seperate functions?

No strong opinion here.

This is mainly for @bsipocz and @keflavich, this module does not use the async_to_sync paradigm. It would not be too hard to update it do that, but it seems like we are moving away from that structure, what do you think about changing it vs leaving as is?

I'm fine with the approach that is less work now. async_to_sync has proved useful in some debug cases, but I think the asynchronous queries have not seen widespread adoption and do not have particularly broad use cases. Since we don't really document how to use those, no one is using them.

Generally how do you feel about the functionality, and output formats? I put all the non-tabular information in the meta property of the table(s):

At a glance, that seems fine to me, but I'm not a NEOCC user - it would be better to find people using the data and get their opinions. From a structural perspective, this seems fine.

A lot of the datetimes produce dubious year warnings. I looked up why, and it seems basically unavoidable with dates too far in the past/future using UTC, what are your opinions on how to handle this?

Maybe we should silence those warnings in this context, since this is a context where far past/future dates are expected.

@ceb8
Copy link
Member

ceb8 commented Jul 21, 2023

Thanks @keflavich! To your first point, I can't tell which paradigm you are advocating. Always returning a list?

@keflavich
Copy link
Contributor

I'm ambivalent, actually - always returning a list is right if we're sticking with one function, but I'm also OK with splitting into two functions if there is an understandable difference between two approaches. I would avoid having one function return lists sometimes and tables other times.

@D-arioSpace
Copy link
Author

Hello @ceb8 ,
So, I have performed other tests, and most of them are okay.
However, I noticed some errors in some calls (but not in the one I mentioned in the previous comment) specifically the call:

zzz = neocc.query_object(name='2023DW', tab='close_approaches')

returns me an error. I investigated it a bit, and it is due to an extra line NEOCC added at the beginning of the close approach file of each asteroid. To solve it, you can tell the program to skip the first line in the pd.read_csv, in class CloseApproaches in the static method clo_appr_parser(data_obj):

            df_close_appr = pd.read_csv(df_impacts_d,
                                        delim_whitespace=True, **skiprows=1**)

Additionally, we have a similar issue with the call:

zzz = neocc.query_object(name='433 Eros', tab='orbit_properties', orbital_elements="keplerian", orbit_epoch="middle");

which returns me the following error:

zzz = neocc.query_object(name='433 Eros', tab='orbit_properties', orbital_elements="keplerian", orbit_epoch="middle");
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dario/dev/astroqyery_intergation/astroquery/astroquery/esa/neocc/core.py", line 477, in query_object
    neocc_obj = tabs.parse_orbital_properties(resp_str)
  File "/home/dario/dev/astroqyery_intergation/astroquery/astroquery/esa/neocc/tabs.py", line 655, in parse_orbital_properties
    section = re.search("(\w+ parameters)", body_lst[3]).groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

This time I'm not sure where the error come from, I'm still investigating it.

Furthermore, we have recently incorporated an additional column into the close approach lists (both the recent and upcoming lists). This new column, a float value, indicates the frequency of a specific event based on the miss distance from Earth, the size of the asteroid, and its relative velocity. It is important to note that both calls still function properly, so if you believe it is worthwhile to include this information, feel free to do so.

@bsipocz
Copy link
Member

bsipocz commented Aug 30, 2023

@ceb8 - There is a lot of commits here, I would even guess duplicates, certainly duplicated commit messages, so I wonder whether you could cleanup the PR, rebase and squash things out a bit?

Also, I wonder whether the total diff of ~150k is necessary, I assume it's due to bundled data files, but should they be this many/big?

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Attention: 215 lines in your changes are missing coverage. Please review.

Comparison is base (ab7fdaa) 66.77% compared to head (30e854f) 67.77%.

Files Patch % Lines
astroquery/esa/neocc/test/test_neocc_remote.py 18.89% 176 Missing ⚠️
astroquery/esa/neocc/tabs.py 90.40% 31 Missing ⚠️
astroquery/esa/neocc/core.py 92.00% 4 Missing ⚠️
astroquery/esa/neocc/test/setup_package.py 50.00% 2 Missing ⚠️
astroquery/esa/neocc/test/test_neocc.py 99.62% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2254      +/-   ##
==========================================
+ Coverage   66.77%   67.77%   +0.99%     
==========================================
  Files         237      246       +9     
  Lines       18303    19537    +1234     
==========================================
+ Hits        12222    13241    +1019     
- Misses       6081     6296     +215     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bsipocz
Copy link
Member

bsipocz commented Oct 18, 2023

I don't see why you say it's an alfalfa issue, to me it looks more to be a generic rtd problem and therefore I restarted the build. If it keeps failing I would suggest trying to sort out the problem with local docs builds, it's just easier to debug than waiting on RTD in CI.

The windows failure is due to windows having int32 as default rather than int64. So either the code needs to be explicit to use int64 for int columns (if you have long int IDs it may be necessary, as I recall e.g. gaia or SDSS IDs doesn't fit into the 32bit), or the test should be more generic and check for int rather than int64.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall things look to be in a very good shape, thank you both of you for the hard work.

I've left some generic comments, and will do one more careful check when the docs builds and windows issues are sorted out.

setup.cfg Outdated Show resolved Hide resolved
docs/esa/neocc.rst Outdated Show resolved Hide resolved
docs/esa/neocc.rst Outdated Show resolved Hide resolved
docs/esa/neocc.rst Outdated Show resolved Hide resolved
astroquery/solarsystem/neocc/__init__.py Outdated Show resolved Hide resolved
astroquery/esa/neocc/core.py Outdated Show resolved Hide resolved
astroquery/esa/neocc/core.py Outdated Show resolved Hide resolved
astroquery/esa/neocc/core.py Outdated Show resolved Hide resolved
astroquery/esa/neocc/core.py Outdated Show resolved Hide resolved
@D-arioSpace
Copy link
Author

Thank you @bsipocz @ceb8 for your effort! I briefly tried all the features of the library, and they all seem well implemented. In these days, I will invite some of my colleagues to play with this fork and see if we can identify any errors before integration. Regarding the warnings astropy Time, I tend for keeping them because in my opinion it's better to make the user aware of the inherited warnings from the library rather than silencing them.
What are the next steps now?

@ceb8 ceb8 force-pushed the main branch 2 times, most recently from 0de16ec to 2faac54 Compare October 25, 2023 22:35
@ceb8
Copy link
Member

ceb8 commented Oct 25, 2023

@D-arioSpace I've got all the tests passing and addressed the ones of @bsipocz's comments that I can. All that needs to happen now is that you address the outstanding comments, and anything else that comes up when @bsipocz does a more careful check.

@Galaksio
Copy link

Hi everyone, my name is Alvaro and I was involved in the development of this functionality. I will try to answer the remaining questions.

@D-arioSpace
Copy link
Author

Hello, thank you @Galaksio for clarifying the points. From a functional point of view, the library seems okay. I wonder, @ceb8 and @bsipocz if any input is needed from our side and what is the plan (basically to warn people who now use the standalone version to switch to the one integrated into astroquery).

@ceb8
Copy link
Member

ceb8 commented Feb 6, 2024

Rebased and addressed @bsipocz's comments.

@ceb8 ceb8 force-pushed the main branch 2 times, most recently from 5fc2b0b to d3a087a Compare February 7, 2024 15:48
@D-arioSpace
Copy link
Author

Hi @ceb8 I just had a look at the changes you introduced. Thank you! I also see that in the last push, the tests all passed. At this point I just want to ask if there is anything else to do or if you know approximately when the package will be integrated, just to inform some users and collogues who are using the stand_alone version to switch to the astroquery integrated version. Thank you again for your effort :)

@THoffmann281
Copy link

THoffmann281 commented Mar 8, 2024

@D-arioSpace thank you for the work!
I just would like to express my interest for a release version of the NEOCC python interface in the astroquery library as I am currently using it for some research work on NEOs. I hope that you and the reviewers @bsipocz @Galaksio can find a final solution anytime soon. It would be really nice to have the changes approved and merge it into astroquery, so there is a release that I could cite as reference! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants