GSoC 2024 Harsh Shah
Mentors : Nabobalis , Hayesla , Ebuchlin
By Harsh shah
Name: Harsh shah
Github username: Ryuusama09
College: Dwarkadas J Sanghvi College of Engineering
Degree and Year: B-tech in Computer Science and Engineering, 4th year
Email: shahh8138@gmail.com
Country: India
Timezone: Indian Standard Time (UTC +05:30)
Primary Language: English
LinkedIn Profile: Profile
GSOC provides me a platform to contribute back to the community that has been doing so much for student developers like me, by providing such marvelous open source projects which are making the process of the software development handy and easy.
I'm not applying to any other projects this year
Yes, I am fully eligible to receive payments from google and am above 18 to do so.
I plan to spend ~18-20h per week working on this GSoC program, but I am prepared to put in more hours if the project turns out to be more difficult than anticipated. The only time I might be a little busy could be during the last weeks of May (due to my semester exams), but I don't think it's going to interfere much with the Community Bonding Period. Except for my exams, I don't have any commitment in particular which will be a hindrance for dedication to this project.
I have been through 2+ years of coding experience till now. I am proficient in 3 coding languages C++, python & javascript. I have used c++ comprehensively in competitive programming in which I have won over 3 coding challenges and been to ICPC asia west regionals 4 times (also known as the olympics of coding) over 8000 teams. I have used Javascript and python in server side programming in many hackathons , like winning UST’s D3code (India’s largest hackathon) with over 12000+ participants and TIAA globals t3hacks.
I have used javascript to make Node.Js apps like remote-code-execution-engine , which is a microservice based code execution engine which provides file isolated , containerized file execution environment for c++ & python files which can be used in online judges. I have also used it in imgress, which is a one stop platform to create and manage your one-click reverse image search engine instances. It provides testing utility as well role based access to the users.
I have used python to build several machine learning projects like instrument audio separation and cyclone prediction using real-time satellite feed. Apart from this I have made python scripts which connects the client to azure-gpt-3.5 in https://github.com/ryuusama09/soulsupport.ai
My journey with Sunpy has been really exciting. I have resolved and worked on issues that are around attributes, maps, world coordinate system etc. I have merged a total of 7 pull requests which has helped me know a lot about Sunpy. The following list shows the significant ones.
Merged Pull requests 1.https://github.com/sunpy/sunpy/pull/6708
2.https://github.com/sunpy/sunpy/pull/6720
3.https://github.com/sunpy/sunpy/pull/6726
4.https://github.com/sunpy/sunpy/pull/6744
5.https://github.com/sunpy/sunpy/pull/6732
In addition to the issues mentioned above, I have delved into the world of SunPy Fido and its diverse clientele. I actively participated in discussions and testing of complex feature requests, including issues 7474, 5661, and 7279. This experience provided me with a comprehensive understanding of SunPy.net, its design principles, and overall structure. I gained deep insights into how nested functions are called and the type of response propagated through these clients. Furthermore, I attempted to address an issue with a pull request, but its complexity necessitated a collaborative decision with mentors. I also identified potential design flaws in the SunPy scraper, such as missing edge cases and ignoring certain error responses. I raised an issue for this and am currently working on a solution. By tackling these challenges, I have acquired a strong familiarity with SunPy.net. My current focus is on PR-7541(merged as of now), which aims to enhance the scraper's robustness against various failures, improve error debugging, and implement unit test mocks. This project will further my understanding of real-world Python package development and deployment. Given my experience, I am confident that I possess the skills and knowledge to excel in this project.
sunpy-soar is a plugin to sunpy Fido, which is the standard interface to query different types of available metadata. The Solar Orbiter Archive (SOAR) provides a rich source of solar data(provides access to data from ESA's Solar Orbiter mission which observations from 10 instruments, 6 remote sensing and 4 in-situ.), but its full potential remains untapped due to incomplete metadata support in sunpy-soar to query for observational data based on important attributes of the available data (such as location on the Sun, wavelength, observation mode etc . This project aims to enrich sunpy-soar by adding support for a subset of SOAR's metadata, facilitating more comprehensive data search,retrieval, and analysis. Additionally, exploring the use of astroquery TAP for query construction is a stretch goal to enhance efficiency and usability.
Currently, SOAR provides clients with a variety of tables to work with. There are four main types available: Product table FITS table In Situ Instruments Table Auxiliary Tables Sunpy-soar currently only supports tables with basic information about existing data files, following the format
-
V_<sc/ll>_data_item
The main task of this project is to add support for FITS tables. SOAR provides four primary types of FITS tables, with the general form represented by the string -
V_<instrument>_<ll/sc>_<fits/extension_fits>
The first major step will involve discussing the scope of attributes that need to be added or translated, beyond those mentioned by Eric in his post. Additionally, I propose creating a separate attribute named "Table" in the soar/attrs.py file as a child class of SimpleAttr. If the user doesn't select this attribute, we can unify the responses of all tables (This is different from join operation. Here I mean to merge the results of individual queries from each table). After adding new attributes, we'll need to modify SOAR client methods like_can_handle_query
to support them. We'll determine the exact flow of these modifications as we progress.
- Default Table: If a user doesn't specify a particular table, the system will automatically use
v_sc_data_item
as the default option. This is the current behavior inherited from sunpy-soar. - Here the user specifies a single table. This is a very simple case which won't have data inconsistency due to different tuples being returned from tables. The query can be broken down into atomic queries like shown in the example.
instrument = a.Instrument('EUI')
time = a.Time('2021-02-01', '2021-02-02')
level = a.Level(2)
product_1 = a.soar.Product('EUI-FSI174-IMAGE')
product_2 = a.soar.Product('xyz')
table = a.soar.Table('V_sc_data_item')
Here the query could look like (time & level & (product_1 | product_2) & table).
This query will be simply broken down into 2 subqueries like : <time & level & product_1 & table>
, <time & level & product_2 & table>
. Finally the resultant tables would be stacked into an astropy table.
- User queries for more than 2 tables. The plan here would be to use join operations on the specific subset of attributes that user queries. We can introduce a join(
*
.... Note that this is just for representational purpose for the proposal) operator for the attributes. For a normal attribute for exampleTime
operation(time_1 & time_2)
doesn’t make sense hence it's not allowed. However, specifically for the attributeTable
&
operations make sense so that they aren’t split into subqueries by theattrwalker
and we can use table join easily. A strict rule to keep in mind would be that if there are more than two tables then a join operator should be necessary in the query. Therefore a join query in the fido would look like.
instrument = a.Instrument('EUI')
time = a.Time('2021-02-01', '2021-02-02')
level = a.Level(2)
product_1 = a.soar.Product('EUI-FSI174-IMAGE')
product_2 = a.soar.Product('xyz')
table_1 = a.soar.Table('V_sc_data_item')
table_2 = a.soar.Table('V_ll_data_item')
Fido.search(time & level & *(product_1 | product_2) & table_1 & table_2)
One major change to note would be a shift from hardcoded elements in the resultant table to a more dynamic approach. This will involve adding support for attributes that are specifically queried by the user. The TAP API is robust and comprehensive as it provides protection against BAD requests and (err 400) against invalid queries, so we won't have to implement query safety checks in the client. The Query async rework could be done in 3 possible ways using TAP async endpoint
- The ideal would be to make all the functions in the nested call asynchronous which results in fully async FIDO client along with TAP async endpoint.This is ideal but requires more development effort.
- Another way to do it would be to use the asyncio library and use
asyncio.run(func)
from a sync function. This is easier to execute and can be set as base to test client’s performance in async mode. - Another way to do this would be to use the astroquery library. Using the astroquery’s tapPlus , we can make async queries to SOAR . An added benefit of using this is that it provides server-side caching which can improve the performance during repetitive queries.
- Meet with mentors and bond with the community.
- Investigate and research the part of the project which requires the most work.
- Familiarize myself with ADQL queries , in case anything needs to be researched.
- Start adding the new Attributes to sunpy soar that lie in our scope
- Collect data on different types of possible queries which can help us plan well about different types of queries and join operator.
- Write the tests and gallery examples as we proceed .
- Check whether the code is compatible and working fine with the newly added attributes.
- Discuss about the behavior of newly added attributes with mentors , take feedback and make changes if needed.
- This time will also be utilized to see that we are able to implement a valid join operator and see to it that attrwalker is working normally after implementing it .
- Keep additional buffer week to properly test and validate the function of the changes in the walker , join operations and query formation.
- Keep additional buffer week to properly test and validate the function of the changes in the walker , join operations and query formation.
- Add some tests and gallery examples
- Make changes to the search function , replace the hardcoded resultant table with modular table.
- Check if the table is printing properly and check for possible anomalies.
- Implement the query rework to add async feature using astroquery, and benchmark the results
- Write the final set of tests and gallery examples