GSoC 2024 Manit Singh

Personal Information

Name: Manit Singh
University: Netaji Subhas University of Technology(NSUT)
Email: manitsingh018@gmail.com
Element username: @nucleonx:matrix.org
GitHub username: NucleonGodX
Time Zone: IST (GMT + 5:30)

My Background

I am a computer science undergrad at Netaji Subhas University of Technology, located in Dwarka, Delhi, India. Talking about programming, I've been programming for the last 5 years, and I have good experience in Python. I have also been doing web development for about a year now. Mostly, I have worked on basic projects in Python, such as making Discord bots or experimenting with Python libraries like Tkinter. In university, I have had formal education in both NumPy and Matplotlib, and I learned more as I started contributing to SunPy in December.

Have you participated previously in GSoC? When? With which project?

No, I haven't.

Are you also applying to other projects?

No, I'm not.

Are you eligible to receive payments from Google?

Yes, I am.

How much time do you plan to invest in the project before, during, and after the Summer of Code?

I have invested a good amount of time to understand the project by understanding the TAP protocol and sunpy attribute system and also in understanding how to write ADQL queries. I plan to invest a full 175 hours in the project, which is the length of the program. As I have no other commitments elsewhere, I am fully dedicated to investing all the time required to complete the tasks assigned for each week, even when my college would reopen in August, I would still be able to invest at least 30 hrs a week in the project. After this Summer of Code is done, I intend to continue my involvement with the community by helping sunpy-soar grow further. I will be prepared to contribute to any changes or improvements needed in sunpy-soar and remain an active member of the community.

Open Source And Programming Journey

I started programming in high school with Python, and I have always had a great interest in doing minor personal projects that were fun to work on. In terms of relevance to this project, ADQL is used, which is similar to SQL, and I have both formal education in school and personal project-based knowledge in this area. Additionally, I have worked with APIs as I've been doing web development for the past year. Talking about my open-source journey, it pretty much started with SunPy. Before this, I had no experience with open source. Basically, it was the SunPy community that helped me navigate the initial hurdles of open source, as I opened my first pull request in sunpy/sunpy itself. With every pull request I opened, I got to learn a lot of things, such as using pytest for unit testing, doc tests, sphinx gallery build and tox environments used for testing, among other things.

My Contributions to SunPy

Pull Requests

-I’ve also involved myself in some issue discussions in the community like discussion.

-I’ve also been able to attend 3-4 sunpy weekly meetings during the start of the year 2024 and also introduced myself to the community.

Project Overview

Solar Orbiter Archive (SOAR) contains extensive solar data in form of tables and columns, the data provider allows the data to be retrieved by TAP protocol, which provides both data and metadata in form of tables. Queries are written in ADQL to fetch the data. Each table provides relevant columns providing information such as begin_time, sensor, instrument etc. This Project aims at increasing the extent of metadata support in sunpy-soar and also enable searching and filtering of it.

Problem Statement

sunpy-soar, a plugin to sunpy’s Fido, provides access to the Solar Orbiter Archive (SOAR). Currently, only basic metadata such as SOOP name, time, and level are being returned. Data from the v_sc_data_item (science) and v_ll_data_item (low latency) are being filtered and fetched. However, there is a wide range of data in SOAR that is not currently supported by sunpy-soar and the SOAR web interface to query metadata. For example, fields like detector and wavelength are neither being returned nor can they be used to filter responses in SOAR web query form here. It is not practical or useful to add the entire metadata, but implementing some useful metadata as stated in the issue would be beneficial. This issue provided a lot of information about the entire problem and what exactly needs to be implemented.

Proposed solution

There are various tables that have useful data for implementation.

Description of some important tables and their extent of use:

v_<ll/sc>_data_item - Latest Versions
This table consists of level (processing level of the data), filename, filesize, sensor etc. Currently the soar web interface and sunpy-soar only support fetching/filtering on the metadata stored in the v<ll/sc>_dataitem table. This data consists of the _filename, filesize, data observation end time, SOOP name.

v_<ll/sc>_repository_file
Lists all versions of the files received into the SOAR. As mentioned in the GSoC issue, might not need it in the start but does have something that can be covered. Consists of their active filename, filesize, repository id etc.

v_<instrument>_<ll/sc>_fits
Consists of data from a specific instrument. Each instrument has 2 fields: Low Latency (LL) and Science (SC). A lot of scope for implementation in this table. I will start adding metadata from this table initially, as even the issue suggested there is a lot of generic metadata in this table that needs to be implemented.

Additional functionality to work with more metadata such as wavelength, detector etc can be added by joining the v<ll/sc>_data_item and the v_instrument_<ll/sc>_fits tables. This allows return of more metadata with a single query and allows for filtering using the new metadata fields. Also the repository files table can be used to query on archived data (mentioned in the main GSoC issue)

Attributes like 'Physobs', 'Resolution', 'Detector', 'Sample', 'Wavelength', 'Source' already exist in sunpy.net attribute system, these can be implemented similar to how ‘Time’,’Instrument’, ‘Level’ and ‘Provider’ have been implemented. This comment on this issue, provides important insights on every metadata we plan on implementing and their respective tables and columns from which the data can be fetched from. We will firstly need to make changes in sunpy-soar/client. In the SOARClient’s method _can_handle_query() these new attributes will need to be added so that the query supports them. We will also need to make changes to the _construct_payload() function to allow support querying multiple tables, that is we would need to update it to add support for constructing the ‘where’ part of the ADQL query to support metadata from multiple tables. A new function will also be created to get the final ADQL query after implementing join operation by taking data from multiple tables as different variables and then returning a final query.

Like the query
adql_query = "SELECT h1.data_item_oid, h1.filename, h1.filesize, h2.filename, h2.level, h2.dimension_index, h2.wavelength, h2.detector FROM soar.v_sc_data_item AS h1 JOIN soar.v_eui_sc_fits AS h2 USING (data_item_oid) WHERE h1.instrument = 'EUI' AND h1.level = 'L1' AND h1.begin_time > '2021-02-01' AND h1.begin_time < '2021-02-02' ORDER BY begin_time"

After this we also need to make changes to sunpy-soar/attrs and create lambda functions with this @walker.add_applier() decorator and pass the attribute as an argument to this decorator, and the function will take parameters(wlk, attr, params), inside the function we will validate the attribute value and append it to the query parameters(similar to how validation is done for levels, so that only levels recognized by SOAR are passed in the query).

The new metadata classes can be added to sunpy-soar/attrs similar to how SOOP name and product have previously been implemented, The SOOP name implementation in this pull request, is very helpful to understand to some extent what changes need to be made for addition of any new metadata.The proposed solution initially feels quite direct, but some metadata implementation will be more complex than others. There are some overlaps in use cases to consider. For instance, the detector field partially overlaps with the use case of soar.product.

It's crucial to update the documentation whenever new metadata is added. This ensures that users interested in querying SOAR metadata can easily access and utilize the new sunpy-soar metadata field querying capabilities.

Timeline

Community Bonding Period(May 1- May 26)

I’ll understand more about ADQL queries and TAP protocol and get an in-depth idea of the sunpy attribute system. Till now I’ve worked only on sunpy/sunpy repo so I'll be understanding the testing and documentation practices on the sunpy/sunpy-soar.
I’ll be joining sunpy weekly meetings to interact with the community and learn more about the community and also follow up with the hot topics in the community.
I’ll also try to work on any open issues in sunpy/soar or issues related to sunpy’s attribute system (if any of these issues provides me a better understanding of the project).

Coding Period

Week 1-2 (May 27 - June 9)

Begin working on the extending metadata implementation and cover half of suggested metadata in the issue.
Take feedback from mentors about implementation.

Week 3-4 (June 10 - June 23)

Take feedback from mentors about the functions and operations used for metadata implementation and refine the code.
Add the rest of the metadata suggested in the issue and take guidance from mentors about the next set of metadata that can be implemented.

Week 5-6 (June 24 - July 6)

Implement any suggested metadata by mentors.(if any)
Fix any/bugs or errors encountered during the implementation of any metadata.

Midterm Evaluation(July 12)

Week 7-8 (July 7 - July 20)

Work on enabling filtering and searching and try to cover most of the attribute filtering.
Take feedback from mentors about the methods implemented for filtering.

Week 9-10 (July 21 - August 3)

Cover all the attributes in filtering/searching, along with any feedback mentors provided.
Update the documentation for all the new metadata fields.

Week 11-12 (August 4 - August 17)

Address any bugs or errors encountered throughout the implementation process. Enhance the overall code quality and refine its structure. Additionally, tackle any unforeseen issues that may arise and require attention.
If everything is done and there is time, I would definitely like to work on the stretch goal, i.e constructing TAP queries using astroquery TAP.

Final Evaluation(August 26)

Availability

I'm excited to share that my end-term exams have been moved up and will now conclude in April. After these exams I’ll be having my summer break and I do not have any prior commitments in these vacations. This means I'll be fully available to work on the project throughout the entire Google Summer of Code (GSoC) timeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly