Skip to content

GSoC 2024 Sanvi Sharma

Nabil Freij edited this page Apr 4, 2024 · 1 revision

Project Proposal: Serialisation of NDCube classes to ASDF

Google Summer Of Code ’24 | Sunpy

Personal Details:

Name: Sanvi Sharma

Email: sharmasanvi125@gmail.com

Time-Zone: UTC+05:30

Matrix handle: @ciaokitty:matrix.org

GitHub username: ciaokitty

Personal Background:

I am a computer science undergraduate, currently enrolled in the batch of 2026 of Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India. I have been acquainted with python since the last three years and have been exploring and learning about open source, Linux, git for over a year. I am currently in my second year and since I joined college I have worked on four python projects through participating in hackathons. I also contributed to a local college project during a winter event called Kharagpur Winter of Code. A year later me and a friend mentored a small personal project in the same event. Here’s the link to my mentor certificate. My learning was mostly limited to maintaining and developing personal and small group projects and I only admired open source from a distance. This month I braced myself and decided to become an active contributor to the open source community. A college senior, also a past contributor, introduced me to sunpy and I have always been amused with physics and this project attracted my attention.

Experience with programming and open-source:

I find computer science very empowering and I find it even more amusing when it gets paired up with science. It is my first time contributing to open-source in a more real-world application manner. I am an active participant of the open source developers’ community of my college and I have also attended various local conferences on open-source over the last couple of months. The more I familiarize myself with open source the more I feel pulled towards the community. Looking into GSoC projects and orgs definitely acted as a catalyst to my learning journey as earlier I used to be overwhelmed by large codebases but this time I felt this huge mental barrier breaking although slowly as I managed to grasp the codebase and tried understanding the essence of this project. Here are the links to some of my projects which I had fun working on: Omilia- retro platformer game written in python

FinWiser- a financial literacy web application

Me and GSoC:

Have you participated previously in GSoC? When? With which project?

No

Are you also applying to other projects?

No, I am not.

Are you eligible to receive payments from Google?

Yes, I am eligible.

How much time do you plan to invest in the project before, during, and after the Summer

of Code?

Before the official time period of GSoC starts I intend to familiarize myself with sunpy’s codebase and work on smaller issues concerning this project. I will also try to be more active in the sunpy community. During the Summer of Code, I plan to work on the project as a 175 hour GSoC project. I am ready to devote more time if the project requires it. Finally after the program I wish to keep contributing to the project and helping newcomers as much as I can.

Details about the project:

What excites you about this project and what I want to achieve? Why did I choose it?

I picked this project because I liked playing around with the ndcube package, trying out various operations on sample astronomical data. As a beginner to open source and developer-level python I also felt I could learn so much by only tinkering around with various functionalities of this package. Currently, ndcube lacks built-in support for reading or writing its various objects to files. The goal of this project is to add ASDF (Advanced Scientific Data Format) support to ndcube, allowing almost any object to be saved and loaded back through ASDF.

How do I plan to implement the project?

This issue was linked to the project’s description. Following the comments thread I recognised how having support for the ASDF file format could enhance the functionality and usage of ndcube. I haven’t looked into the project in great detail and much of my understanding comes from the project’s description and googling. When we talk about serialization, we mean the process of converting an object (such as an ndcube object) into a format that can be saved to a file. So serialization of ndcube classes to ASDF allows you to save your astronomical data (including both the raw values and their associated world coordinates) in a standardized, metadata-rich format that can be easily shared, archived, and analyzed. In the context of the problem statement of this project serialization to ASDF can be broken down into the following three basic steps:

  1. We have an NDCube object containing both data and WCS information.
  2. We have to convert this object to an ASDF file format.
  3. Saving this to a file with the .asdf extension. Since the asdf Python package implements (de)serialization of custom types by defining a Converter class and a schema for the new type, the first part of the project requires to have Converters written for the three classes of ndcube: NDCube GlobalCoords ExtraCoords Here we are serializing an ndcube.NDCube object when the .data is a numpy array and .wcs is a gwcs.WCS object. Besides this Converters will also have to be written for certain classes, namely, astropy.wcs.WCS, SlicedLowLevelWCS, ResampledLowLevelWCS, CompoundLowLevelWCS, ReorderedLowLevelWCS, of the asdf_astropy package so that NDCube object which have been manipulated can also be saved to ASDF. Third part of the project involves extending the existing Converters to handle optional properties (e.g., mask, uncertainty, PSF). I plan on to:
  4. Define how these properties are serialized and deserialized.
  5. Ensure that the round-trip through ASDF preserves all relevant information. The final part of the project involves wrapping up the newly added feature through testing and adding documentation:
  • Creating test cases to verify serialization and deserialization behavior.
  • Adding documentation for the new ASF support
  • Adding more examples to the example gallery explaining basic operations of saving and loading ndcube objects using ASDF.

Timeline of the project:

Community Bonding Period: May 1 - May 26

Since I started looking into ndcube a bit later than I should have I plan to spend this time learning more about asdf serialization through going over official documentation and taking help of this project’s mentors. I will also spend this period working on setting up a proper smoothly working development environment and will learn about the working of asdf extensions, converters and schemas.

Week 1: May 27 - June 2

  • Adding the asdf extension infrastructure to ndcube: More vigorous and detailed exploration of the codebase by understanding the structure of NDCube objects and their associated properties.
  • Outline a high-level plan for implementing serialization for NDCube, GlobalCoords, and ExtraCoords.

Week 2: June 3 - June 9

  • Write Converters and schemas for NDCube, GlobalCoords, and ExtraCoords.
  • Define schemas for these classes in asdf format.
  • Open a PR dealing with the same.

Week 3: June 10 - June 16

  • Write unit tests to verify the correctness of serialization.
  • Write test cases that cover different combinations of NDCube objects backed by different gwcs.WCS configurations. For each test case:
  • Create an NDCube object with appropriate data and WCS configuration.
  • Serialize the NDCube object to ASDF format.
  • Load the ASDF file back into Python and reconstruct the NDCube object.
  • Assert that the reconstructed object matches the original object in terms of data and WCS configuration.
  • Integrate serialization into the existing codebase of ndcube.
  • Test serialization with sample NDCube objects containing various data and WCS configurations.

Week 4: June 17 - June 23

  • Debugging and Refinement
  • Study edge cases and fix bugs.
  • Take feedback from mentors and work on the changes recommended

Week 5: June 24 - June 30

  • Write documentation for finished up tasks.
  • Start working on planning out converters for the astropy.wcs.WCS class in the asdf_astropy package.

Week 6: July 1 - July 7

Finish up on remaining tasks. Work on feedback from mentors.

MidTerm Evaluation

By this time I plan to be finished with extending the asdf support to the three basic classes of ndcube i.e. NDCube, GlobalCoords and ExtraCoords. Tests should be written and the various tests in the test suite should be organised in different categories and they should be made reusable for future purposes. PRs related to the same should be opened and ideally merged successfully.

Week 7-8: July 9 - July 21

  • Start working on writing converters astropy.wcs.WCS, SlicedLowLevelWCS, ResampledLowLevelWCS, CompoundLowLevelWCS, ReorderedLowLevelWCS, of the asdf_astropy package.
  • Open PR for the same.
  • Integrate support for WCS wrapper classes into the serialization process for NDCube.
  • Test serialization with sample data containing various WCS wrapper classes.

Week 9:July 22 - July 28

  • Write tests to ensure smooth working of serialization/deserialization with different WCS wrapper classes.
  • Write documentation explaining the included support for WCS wrapper classes.
  • Update the example gallery by writing out various usage examples and guidelines for serialization with these classes.

Week 10: July 29 - August 4

  • Write Converters and schemas for optional properties such as mask, uncertainty, and PSF

Week 11 - 12: August 5 - August 18

  • Writing documentation and tests covering the support for optional properties.
  • Write converters and schemas for NDCubeSequence and NDCollection.

Week 13: August 19 - August 26

  • A buffer period - working on unfinished tasks and leaving room for new unforeseen tasks that come up along the way. Take note of last minute issues and work on them.
  • Finish up with documentation. Add more examples to the example gallery demonstrating basic operations of saving and loading ndcube objects using ASDF.

Final Evaluation

Sunpy and me:

A tiny history of my very recent involvement with the project:

However speaking beyond GSoC, when I think about my long term goals with sunpy I wish to contribute more and also mentor newcomers like me in the distant future. At the very least I hope to introduce sunpy and astropy to more budding developers and astrophysics and python enthusiasts so that the community and the software can grow and flourish in a healthy and supportive environment.

Clone this wiki locally