GSoC 2024 Harsh Shah

OPEN ASTRONOMY

Sub-org: Sunpy

Project: QUERYING SOAR METADATA

Mentors : Nabobalis , Hayesla , Ebuchlin
By Harsh shah

About Me

Name: Harsh shah
Github username: Ryuusama09
College: Dwarkadas J Sanghvi College of Engineering
Degree and Year: B-tech in Computer Science and Engineering, 4th year
Email: shahh8138@gmail.com
Country: India
Timezone: Indian Standard Time (UTC +05:30)
Primary Language: English
LinkedIn Profile: Profile

GSoC experience So Far

GSOC provides me a platform to contribute back to the community that has been doing so much for student developers like me, by providing such marvelous open source projects which are making the process of the software development handy and easy.

Are you also applying to other projects?

I'm not applying to any other projects this year

Are you eligible to receive payments from Google?

Yes, I am fully eligible to receive payments from google and am above 18 to do so.

How much time do you plan to invest in the project before, during, and after the Summer of Code?

I plan to spend ~18-20h per week working on this GSoC program, but I am prepared to put in more hours if the project turns out to be more difficult than anticipated. The only time I might be a little busy could be during the last weeks of May (due to my semester exams), but I don't think it's going to interfere much with the Community Bonding Period. Except for my exams, I don't have any commitment in particular which will be a hindrance for dedication to this project.

Programming Experience

I have been through 2+ years of coding experience till now. I am proficient in 3 coding languages C++, python & javascript. I have used c++ comprehensively in competitive programming in which I have won over 3 coding challenges and been to ICPC asia west regionals 4 times (also known as the olympics of coding) over 8000 teams. I have used Javascript and python in server side programming in many hackathons , like winning UST’s D3code (India’s largest hackathon) with over 12000+ participants and TIAA globals t3hacks. I have used javascript to make Node.Js apps like remote-code-execution-engine , which is a microservice based code execution engine which provides file isolated , containerized file execution environment for c++ & python files which can be used in online judges. I have also used it in imgress, which is a one stop platform to create and manage your one-click reverse image search engine instances. It provides testing utility as well role based access to the users.
I have used python to build several machine learning projects like instrument audio separation and cyclone prediction using real-time satellite feed. Apart from this I have made python scripts which connects the client to azure-gpt-3.5 in https://github.com/ryuusama09/soulsupport.ai

Contributions made to Sunpy

My journey with Sunpy has been really exciting. I have resolved and worked on issues that are around attributes, maps, world coordinate system etc. I have merged a total of 7 pull requests which has helped me know a lot about Sunpy. The following list shows the significant ones. Merged Pull requests 1.https://github.com/sunpy/sunpy/pull/6708
2.https://github.com/sunpy/sunpy/pull/6720
3.https://github.com/sunpy/sunpy/pull/6726
4.https://github.com/sunpy/sunpy/pull/6744
5.https://github.com/sunpy/sunpy/pull/6732

WHAT MAKES ME SUITABLE FOR THE PROJECT (ALSO THE REASON TO CHOOSE IT)

In addition to the issues mentioned above, I have delved into the world of SunPy Fido and its diverse clientele. I actively participated in discussions and testing of complex feature requests, including issues 7474, 5661, and 7279. This experience provided me with a comprehensive understanding of SunPy.net, its design principles, and overall structure. I gained deep insights into how nested functions are called and the type of response propagated through these clients. Furthermore, I attempted to address an issue with a pull request, but its complexity necessitated a collaborative decision with mentors. I also identified potential design flaws in the SunPy scraper, such as missing edge cases and ignoring certain error responses. I raised an issue for this and am currently working on a solution. By tackling these challenges, I have acquired a strong familiarity with SunPy.net. My current focus is on PR-7541(merged as of now), which aims to enhance the scraper's robustness against various failures, improve error debugging, and implement unit test mocks. This project will further my understanding of real-world Python package development and deployment. Given my experience, I am confident that I possess the skills and knowledge to excel in this project.

THE PROJECT

Abstract

sunpy-soar is a plugin to sunpy Fido, which is the standard interface to query different types of available metadata. The Solar Orbiter Archive (SOAR) provides a rich source of solar data(provides access to data from ESA's Solar Orbiter mission which observations from 10 instruments, 6 remote sensing and 4 in-situ.), but its full potential remains untapped due to incomplete metadata support in sunpy-soar to query for observational data based on important attributes of the available data (such as location on the Sun, wavelength, observation mode etc . This project aims to enrich sunpy-soar by adding support for a subset of SOAR's metadata, facilitating more comprehensive data search,retrieval, and analysis. Additionally, exploring the use of astroquery TAP for query construction is a stretch goal to enhance efficiency and usability.

Approach

Currently, SOAR provides clients with a variety of tables to work with. There are four main types available: Product table FITS table In Situ Instruments Table Auxiliary Tables Sunpy-soar currently only supports tables with basic information about existing data files, following the format

V_<sc/ll>_data_item
The main task of this project is to add support for FITS tables. SOAR provides four primary types of FITS tables, with the general form represented by the string
V_<instrument>_<ll/sc>_<fits/extension_fits>
The first major step will involve discussing the scope of attributes that need to be added or translated, beyond those mentioned by Eric in his post. Additionally, I propose creating a separate attribute named "Table" in the soar/attrs.py file as a child class of SimpleAttr. If the user doesn't select this attribute, we can unify the responses of all tables (This is different from join operation. Here I mean to merge the results of individual queries from each table). After adding new attributes, we'll need to modify SOAR client methods like _can_handle_query to support them. We'll determine the exact flow of these modifications as we progress.

Now there are three possible ways through which users can query .

Default Table: If a user doesn't specify a particular table, the system will automatically use v_sc_data_item as the default option. This is the current behavior inherited from sunpy-soar.
Here the user specifies a single table. This is a very simple case which won't have data inconsistency due to different tuples being returned from tables. The query can be broken down into atomic queries like shown in the example.

instrument = a.Instrument('EUI')
time = a.Time('2021-02-01', '2021-02-02')
level = a.Level(2)
product_1 = a.soar.Product('EUI-FSI174-IMAGE')
product_2 = a.soar.Product('xyz')
table = a.soar.Table('V_sc_data_item')

Here the query could look like (time & level & (product_1 | product_2) & table). This query will be simply broken down into 2 subqueries like : <time & level & product_1 & table> , <time & level & product_2 & table>. Finally the resultant tables would be stacked into an astropy table.

User queries for more than 2 tables. The plan here would be to use join operations on the specific subset of attributes that user queries. We can introduce a join(*.... Note that this is just for representational purpose for the proposal) operator for the attributes. For a normal attribute for example Time operation (time_1 & time_2) doesn’t make sense hence it's not allowed. However, specifically for the attribute Table & operations make sense so that they aren’t split into subqueries by the attrwalker and we can use table join easily. A strict rule to keep in mind would be that if there are more than two tables then a join operator should be necessary in the query. Therefore a join query in the fido would look like.

instrument = a.Instrument('EUI')
time = a.Time('2021-02-01', '2021-02-02')
level = a.Level(2)
product_1 = a.soar.Product('EUI-FSI174-IMAGE')
product_2 = a.soar.Product('xyz')
table_1 = a.soar.Table('V_sc_data_item')
table_2 = a.soar.Table('V_ll_data_item')
Fido.search(time & level & *(product_1 | product_2) & table_1 & table_2)

One major change to note would be a shift from hardcoded elements in the resultant table to a more dynamic approach. This will involve adding support for attributes that are specifically queried by the user. The TAP API is robust and comprehensive as it provides protection against BAD requests and (err 400) against invalid queries, so we won't have to implement query safety checks in the client. The Query async rework could be done in 3 possible ways using TAP async endpoint

The ideal would be to make all the functions in the nested call asynchronous which results in fully async FIDO client along with TAP async endpoint.This is ideal but requires more development effort.
Another way to do it would be to use the asyncio library and use asyncio.run(func) from a sync function. This is easier to execute and can be set as base to test client’s performance in async mode.
Another way to do this would be to use the astroquery library. Using the astroquery’s tapPlus , we can make async queries to SOAR . An added benefit of using this is that it provides server-side caching which can improve the performance during repetitive queries.

DEVELOPMENT TIMELINE

Community Bonding Period [ May 1 - May 26 ]

Meet with mentors and bond with the community.
Investigate and research the part of the project which requires the most work.
Familiarize myself with ADQL queries , in case anything needs to be researched.

Week 1 - 2 [ May 27- June 10 ]

Start adding the new Attributes to sunpy soar that lie in our scope
Collect data on different types of possible queries which can help us plan well about different types of queries and join operator.
Write the tests and gallery examples as we proceed .

Week 3-4 [ June 11 - June 25 ]

Check whether the code is compatible and working fine with the newly added attributes.
Discuss about the behavior of newly added attributes with mentors , take feedback and make changes if needed.

Week 5 - 6 [ June 26 - July 10 ]

This time will also be utilized to see that we are able to implement a valid join operator and see to it that attrwalker is working normally after implementing it .

Midterm Evaluation (11th July)

Week 7 [ July 12 - Aug 19 ]

Keep additional buffer week to properly test and validate the function of the changes in the walker , join operations and query formation.

Week 8-9 [ July 20 - Aug 3 ]

Keep additional buffer week to properly test and validate the function of the changes in the walker , join operations and query formation.
Add some tests and gallery examples

Week 10-11-12 [ Aug 4 - Aug 20 ]

Make changes to the search function , replace the hardcoded resultant table with modular table.
Check if the table is printing properly and check for possible anomalies.
Implement the query rework to add async feature using astroquery, and benchmark the results
Write the final set of tests and gallery examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly