Skip to content

GSoC 2018 prateekiiest sunkit

Nabil Freij edited this page Feb 22, 2024 · 5 revisions

Organization: SunPy in OpenAstronomy

Project: Develop SunKit Image

Student Information

University Information

  • University: Indian Institute of Engineering Science and Technology, Shibpur

  • Major: Computer Science and Engineering

  • Current Academic Year : Third Year

  • Graduate Year: 2019

Personal Background

I am currently an undergraduate student from India, pursuing computer science as my majors. I work both on Windows and Ubuntu LTS Version. My favourite editor is Atom and I do most of my Python coding on Jupyter. I am pretty comfortable with Git and GitHub, having used it for past 2 years.

Experience In Programming

  • I did one project as part of my work with SunPy - Solar Data Analysis

  • I did two major projects on Machine Learning in Python which majorly uses statistical analysis (pandas, numpy, scipy) and the use of Jupyter Notebook. These were project works for Udacity Machine Learning Nanodegree: Titanic Survival Exploration, Boston Housing.

  • I worked recently on one of my personal projects based in python which saw many active participation from different communities. The project being called Code Sleep Python. It is a project based in python for building desktop applications and games in python

  • I also did one research internship during previous summer working on a data mining project using Python - Group Activity Recognition.

  • All my independent project works (mostly in Python) can be seen in my Repositories List.

Open Source experience

  • Contributing to Organizations like SunPy, FOSSASIA and other small organisations. I have currently total 163 Pull Requests and 72 issues being worked so far, with a total 1,165 contributions in the last year.

  • Mentored other students under different Open Source events like Hacktoberfest ,Kharagpur Winter of Codeand 24 Pull Requests on some of my own personal projects primarily written in Python.

  • Got the opportunity to speak at differentOpen Source Summits and Python Conferences. Got invited to different developer conferences like RISE Hong Kong 201*7,FOSSASIA Open Tech,Google Developers Solve for Indi*a among others.

  • Currently I am the GitHub Campus Expert of my university and along with direct support from GitHub I am helping to grow my community in campus, involving more people in Open Source.


Interest In OpenAstronomy

OpenAstronomy consisting of 14 sub-organisations is a collaboration between open source astronomy and astrophysics projects that are being used by astrophysicists all around the world. The analysis of solar data obtained from observatories like SDO helps multiple types of research from being able to detect sunspot locations or study solar flares over a time frame. I watched this TED Talk by Miho Janvier where she discusses effects of solar storms, how solar storms get created and how it affects Earth, along with space weather. Hence I got more interested in this. I had always been in love with the field astronomy but never really got the opportunity to work on a project until I started working on the SunPy project.

I did not qualify for my first GSOC with Open Astronomy back in 2017. But I did not give up. I had always loved the project and wanted to contribute as much as I can. Having worked with the SunPy project for the past 1 year and being in close touch with the community, I learnt many new things in open source, team management, adding new features, and lastly the joy of your contribution being accepted. It has been really a great experience so far contributing to Sunpy. I feel it as a privilege to be able to contribute to such an open-source software in the field of astronomy. OpenAstronomy along with Google Summer of Code will thus give me this unique opportunity to be a part of this amazing project.

Contribution to Sunpy

I have been involved with the SunPy project for the past 1 year contributing since December 2016. And I am delighted to be a part of the latest version releases of SunPy and am grateful to being acknowledged for my work.

Pull Requests Corresponding Issue Status
More Mapcubes Examples - Gallery Examples #2455 More examples in the gallery on simple map and mapcube manipulation #2413 Open
Remove Gamma usage in Map #2424 Gamma in map doesn't do anything #2333 Merged
Finding Local Peaks in Solar Data - Gallery Update #2339 Suggested this example Merged
Masking Hot Pixels Found Bug in Example Merged
Brightest pixel location may occur at multiple position Found similar bug in Example Merged
Brightest pixel location redundancy removed Removed Redundancy in Examples Merged
Added documentation for suds-py3 incompatibility VSOClient.query returns no result in Python 3.5 Merged
Update README.md Matrix Org linked Enhancement Merged
Added documentation for database/tests database tests depend on data/tests dir Merged
Update vso.py Enhancement Documentation Merged
Update time.py Removed extract_time Remove extract_time function Merged
Update rescale.py reshape_image_to_4d_superpixel array seems broken Merged

Contribution to SunPy Website

Pull Requests Corresponding Issue Status
help section Docs link updated Sunpy Documentation Link Showing Privacy Error Merged
Sunpy Presentations and Talks Upload on the Site Enhancement Merged
Registry of Sunpy Affiliated Packages Enhancement -Introducing Affiliated Packages Registry Merged
Update about.html Community Link updated to matrix Enhancement Merged

All my contributions to SunPy are listed here

Abstract

Analysis of solar data collected from different solar observatories is one of the fundamental research area in the heliophysics community. Hence a good processing of such images needs to be done in order to get accurate features and hidden data which can be used for further data analysis.

This project aims at building the foundations of sunkit-image, a SunPy affiliated package which will contain some well-known image analysis algorithms required for solar data analysis purposes. Through this package SunPy users will be able to analyze solar images and get a clear picture of the image processing techniques in the field of solar physics. By implementing all such algorithms in a single package will give the user convenience and freedom to analyze solar plots without having to bother about the underlying complexity of the algorithm implementations.

Project Goals

As part of the project there are major 4 milestones that need to be completed. If time permits as per my proposed timeline, I plan to work on the optional extras which I have listed below as part of post-gsoc goals

  • Implement the normalizing-radial-graded filter (NRGF) algorithm.

  • Porting the Multi-Scale Gaussian Normalisation code in the sunkit-image repository.

  • Implementation of the OCCULT-2 Algorithm for coronal loop tracing.

  • Implement image resampling algorithm and update it to the Astropy project.

Detailed Description

A detailed analysis of each part of the project is described as follows

Implementation of the Normal-Radial-Graded Filter **(**NRGF)

Motivation

As we shift outwards from the solar center, we observe a sharp fall in the intensity with respect to the radial distance from the center. Due to this sharp decrease in density, it becomes quite difficult to observe the coronal density structures in the solar data. Hence we need an image processing algorithm that can reduce this radial gradient in brightness and produce an image with intricate details of such coronal density structures, easily observable to the end-user.

How NRGF solves the problem?

Normalizing radial graded filter (NRGF) uses a simple approach for removing this radial gradient in the image as proposed in this paper "The Depiction of Coronal Structure in White-Light Images". Considering at a particular height (radius, r) we get a circular strip of the coronal region. So for r = 0 (center) to r = R (some radius) we get a number of such circular strips. For each such strip, we can divide it into small slots each corresponding to the angle measured with respect to the center. For each such slot, we first calculate the mean of the corresponding circular strip intensity, the slot lies in and subtract this mean from that particular slot intensity. Next we calculate the standard deviation of the intensity of the circular strip and divide the previous difference calculated by it. This way we can reduce much of the steepness in the radial density gradient. The main idea is presented in this issue image.

What my work will include

The IDL version of the algorithm has been already implemented. It includes additional features like detecting pixels which are on or off limb in a solar map. It is required to implement the python version of the same code, only the part demonstrating the calculation of the processed image intensity as discussed above. If time provides, we can work on the extra additional features as proposed in the IDL code. AnotherIDL code is also presented which can be used to compare with.

Much of this has already been implemented by @wafels here. The current code implements both calculating the radial intensity (a summary statistic of the intensity in a solar map as a function of radius) and the intensity distribution in each of the radial bins. Lastly it calculates the corresponding processed intensity using the method described above. The implementation is quite modular, but it still needs to be checked first whether the output is correct by validating against the corresponding IDL output (preferably a LASCO image). The python implementation needs to be optimized as much as possible in terms of memory and CPU usage. Finally before merging into the sunkit repository, the above code needs to be rigorously tested and well documented in terms of what each function does and about the parameters used.

Porting the Multi-Gaussian Algorithm

Motivation

The coronal image contains relevant information over a wide range of spatial scales including information regarding structures like quiet Sun and active regions. Processing of such image (EUV images or AIA/SDO data) to retrieve hidden patterns and information from the data is crucial and the accuracy of such information depends upon how the noise gets filtered leaving behind only the finest contextual details of the coronal structures. By appropriate processing of the image we can get much better features from the data which may be useful in predicting certain patterns in later stage of data analysis.

How MGN solves the problem

The proposed algorithm provides an efficient process which normalizes the data locally at different spatial scales. As opposed to common methods like gamma transformation and wavelet transformation, these method flattens the noisy regions and reveals the hidden information while preserving the contextual details. The algorithm works by replacing any spurious negative pixel with zero or local mean/median. Considering a Gaussian kernel of width w , it obtains the local mean and local standard deviation by performing convolution operations of the kernel with the pixel values of the original image .The image is then normalised by using the local mean and local standard deviation. Finally we take the arctan transformation of the normalised image. After working on the same process for different spatial scales we combine all such normalised components with corresponding weights to form the final image.

What my work will include

The algorithm has been specified in this paper "*Multi-Scale Gaussian Normalization for Solar Image Processing"** **.*TheIDL code has already been implemented for the MGN code but some areas of the code have contradiction to the actual algorithm stated in the paper. Firstly its needed to check in which areas the IDL code is inconsistent with the paper. For example here are a few inconsistencies that I found while comparing the IDL code with the corresponding paper algorithm. They are listed in this thread.

Two python implementations of the MGN algorithm has been mentioned in this issue. One has been implemented by @Cadair here and the other one by @ebuchlin here. The output image based on an inputAIA image data with necessary input parameters needs to be compared among the two python implementations along with the IDL output. After this, we need to check which is more memory efficient and time efficient. I have already worked on comparing the two implementations by seeing the memory utilization of the two codes and also the time taken by each of them. The memory usage comparisons can be seen here.

The output of the two codes proposed by @Cadair and @ebuchlin is shown in this issue. As it can be seen, the outputs are quite different. So careful investigation needs to be done what are the implementation differences between the two codes and also compare that with the paper. Some of the inconsistencies which I found have been mentioned in issue 1. Some of the flaws associated with the current implementation in ebuchlin's code are listed here.

Also in addition the main MGN function in the code may be broken down to sub-functions to make it more modular. Depending on how the two python codes get modified at the end, we can have a merge of them or keep both in the repository. Finally I plan to work on a few examples on AIA images demonstrating the working of the implemented algorithm.

Occult-2 Algorithm Implementation

Motivation

Detection of coronal loops in solar images (TRACE, SDO/AIA) is an important aspect while analysing solar images. Coronal loops can be found in regions closer to sunspots and active regions in the Sun. We can derive specific information about the coronal loop structure by detecting such loops and also gain knowledge about the coronal heating problem. Hence we need a pattern recognition algorithm that can segment the solar image into different regions thereby, extracting finer details of one-dimensional curvilinear features from the image.

How the algorithm solves the problem

The proposed algorithm in the paper implements a pattern recognition method to extract such loops from images. A modified version of this algorithm, the OCCULT-2 Algorithm provides a lot more automation due to only a few control parameters. The implementation of the OCCULT Algorithm should be such that it takes an input image (TRACE or AIA files) and returns a high-pass filtered image along with a list of coordinates of the detected coronal loops. Much of the noise in the original image gets filtered using the high-pass and low-pass filters leaving behind the faint structures below some threshold.

The OCCULT Algorithm works in 4 steps

  • Background suppression - This step is actually needed to deal with noisy data where we want to suppress any structure detection in the background. The paper presents a novel approach of setting the lower intensity values of the image to the median intensity multiplied by a constant factor qmed. The parts of the image having intensity less than this base value are rendered a constant value. This can be easily implemented in terms of code, where by in the map data we check for such low intensity values (zmin) which will be set to a constant k, where k is the product of the median intensity of the map data multiplied by qmed. The IDL code accordingly does that in its code under MINIMUM FLUX CORRECTION

  • High Pass Filtering and Low Pass Filtering The low pass filter can be used to smooth out the data and remove redundant noise from the data, while the high pass filter enhances the fine structure. The IDL version of the code implements the high pass filter mechanism using the smooth function in tracing_auto.

  • Loop Structure Tracing Here we wish to find the loop trajectory or the path. So we start with the position having the maximum flux intensity and from that point we start tracking it in a bi-directional manner, based on the direction of the ridge with maximum flux. It has been implemented in the IDL code in this portion.

  • Loop Subtraction in Residual Image - Once a full loop structure has been traced, we don't want to rescan the former area of the detected loop for the next iteration of the algorithm. From an implementation point of view, we can set the former area intensity values to 0 and hence when we iterate for checking the maximum flux intensity, these regions will not come into picture. I made a list of which of the IDL codes performs what function in this issue.

What my work will include

I have prepared a rough pseudo-code implementation of the proposed algorithm and have uploaded them in a gist. I plan to implement the algorithm in a modular fashion by writing separate functions for background suppression, filtering , loop tracing and theloop subtraction part to provide more clarity to the user. Finally once done, the output needs to be cross-checked with that of the IDL implementation and the output of the algorithm needs to be verified (using TRACE Data). After this, I will work on documenting the relevant portions of the algorithm along with working on some examples demonstrating the working of the algorithm. In the end the code needs to be thoroughly tested before finally merging into the main package.

Implementation of Image Resampling

Motivation

Image Resampling forms one of the most crucial steps in solar image data analysis. It involves mapping the image from one coordinate system to another. Applications include image co-alignment and perspective re-projection of the solar surface. As opposed to ordinary interpolation of data the proposed algorithm performs better under any arbitrary coordinate transformations.

How the algorithm solves the problem

The following paper "On Resampling of Solar Images” introduces a novel method to perform resampling of a single pixel under an arbitrary coordinate transformation. Considering a nxn 2D image,the algorithm first calculates the inverse jacobian matrix of the subroutine Ci ( that returns inverse-mapped vector of the input pixel vector). It then works on calculating the Jacobian's singular values by working on the singular value decomposition and finally filters out noise through hanning filter and gaussian filter. There has been several optimization techniques for this that have been proposed in this paper which I would try to implement once the main algorithm gets implemented.

What my work will include

Some initial functionality including the coordinate transformation and performing interpolation (bilinear interpolation) on the data has been done in this Pull Request. The current code has done with a nice cython implementation (with an aim for faster execution) and has implemented both the hanning filter and gaussian filter. These functions need to be thoroughly tested first and also checked first against the proposed methodology in the paper for any discrepancy.

I propose to implement the main image resampling algorithm by using (svd for singular value decomposition, map_coordinates functions for the calculation of the Jacobian and interpolation). Once the algorithm is set up, it needs to be thoroughly tested for checking the output using known image plots (AIA_171 may be an example). Finally the entire code will need to be documented describing what each function specifically does, along with updating docstrings.

Timeline

Time Period My Work Plan
Apr 24- May 14 Community Bonding Period First Week Dedicate this time to knowing more about the project and the algorithms that needs to be implemented as part of the project. Discuss with mentors the modules that need to be imported from the main sunpy repository. Second Week and Third Week Re-read the papers for all the four algorithms to be implemented and discuss with mentors about implementation details. Inform mentors for possible inconsistencies of IDL codes with corresponding mentors (like in case of MGN) and get it resolved if possible within this week.
May 14 - Jun 11 First Week After having going through the paper, I would start working on the MGN algorithm in the sunkit repository. Having discussed with mentors the potential inconsistencies in the current IDL implementation with the paper, I will work on merging the two python codes, available to me and finally remove any inconsistency in the final code. Will dedicate this entire week to work on the two python implementations of MGN, checking for any remaining inconsistencies and validating the output of the two codes against the IDL output. Second Week After having updated the two python implementations, I plan to make a quick comparison of the memory and CPU utilization of the two codes. Based on the results will port the optimized code (or may be both) into the sunkit repository. Add tests for the updated MGN code. This can be done within the first half of the week. Remaining half can be used as a buffer, for fixing any bug in the code. Third Week Dedicate the first 2-3 days of the week to documenting the updated MGN code in terms of docstrings, all the parameters being used like the data, constants like a0 and a1 and details on signal depths. Remaining part of the week will be dedicated to working on the NRGF algorithm. Since @wafels has already implemented a part of the NRGF, it would not take much time to implement the rest of it. Once the function is set up, the output needs to be validated with the corresponding IDL output. Last few days of the week will be dedicated to update the documentation of the algorithm. Fourth Week Add tests for NRGF and will try to complete this within the first half. This period will also act as a buffer. In case of any bugs and errors in the implementation, I will try to fix that within this time frame.
Jun 11 -Jun15 Phase 1 Evaluation Any documentation that remains for NRGF will be completed. During this time I will try to work on examples describing the working of MGN and NRGF.
Jun 15 - Jul 9 First Week Work on removing any potential bugs or issues if any in the implementations of MGN and NRGF. Solve PEP8 issues if any and ensure good code readability. Once the two implementations are all set up and having get them reviewed by mentors, I will get them merged within the first half of the week. Second Week This week will be dedicated to studying the OCCULT-2 algorithm from the paper and studying the IDL code side by side. After having discussed further details with mentors regarding its implementation, I plan to dedicate the rest of the week to write code for OCCULT-2. This would involve working on all the 4 stages of the algorithm as discussed above Third Week Work on the implementation of OCCULT-2 and try to finish within this week. If approved by mentor I will work on a cython implementation of the code. By the end of the week , I would cross-check with the corresponding IDL output for any inconsistencies, if any. Fourth Week -- This of the week will be dedicated to writing tests and checking for any bugs or breaks in the code. Any documentation that remains for NRGF will be completed
Jul 9 - Jul 13 Phase 2 Evaluation -- This time period will be dedicated to writing examples for the OCCULT-2 algorithm,
Jul 13 - August 6 First Week -- Start working on the image resampling algorithm. This will start by working on the portion being left behind in the pull request. Second Week -- Write functions which will call appropriate function under the current cython implementation (like svd, bilinear interpolation )and will perform the image resampling method. Test the called functions like svd, hanning filter and others and check if any error. Third Week Once the function is set up, I plan to write tests for the implemented algorithm along with detailed documentation. Keep 3-4 days for writing the tests and the remaining days will be devoted to writing the documentation and updating with an example.
August 6 - 14 Final Week -- Any remaining amount of work like updating the documentation or writing examples for any of the algorithms that could not be completed before will be done during this week.

Post GSOC Goals

I would still contribute to this project after GSOC ends. If I can finish the project 1-2 weeks prior to deadline, I would like to implement two optional extras as part of the project. - Implementation of NAFEand**Soft Morphological Filter.**If by any chance I don't get time for working on the additional features in NRGF during the time period proposed, I will do that during this time.

Software Packages to be used

  • Sunpy - modules to be used mainly include sunpy.coordinates and sunpy.map

  • Astropy - modules like astropy.units , astropy.wcs

  • Scipy - Some simple image processing modules like gamma transformation

  • Skimage -Modules under skimage like for normalization purposes

  • Numpy - Numpy will be used majorly throughout the whole sunkit project

  • Matplotlib - will be used for plotting the maps, the original ones and the processed ones.

How I will successfully complete the project

During the last few months I have been contributing to gallery examples based on SunPy maps and I worked on a Pull Request to find local peaks in the solar map. There I had used image processing tools of scipy and learnt a lot on how we can use such processing tools on solar data to obtain hidden information. Thus I got more interested in the project work. I have also done projects in my university based on image processing and also as a part of course-work had computer graphics course during my previous semester.

I will abide by the time-table as proposed above and will update patch-works regularly. I will be in touch with mentors every week and will update them with my current proceedings and also seek out to them in case of any queries. I will write blog posts based on the three stages of the project (Phase-1, Phase-2 and Final) and will also try to update the posts with relevant works from specific weeks. I will provide clear documentation to each of the algorithms to be implemented so that it can be understood easily by anyone.

I would still be contributing to this project even after GSOC ends and would be available if anyone has any specific queries to my proposed work.

Benefits to the Community

By the end of the project, the sunkit image repository will contain the necessary image processing tools, that can be used for processing of solar images collected from different observatories. Since all the current image processing algorithms (MGN. NAFE, Occult-2 and Image Resampling ) are implemented in IDL, conversion to corresponding python code will provide any end-user more convenience. Hence as a separate affiliated package it will give the end-user the freedom to use these modules without having to bother about the underlying algorithm complexities or having to know about IDL. As a future work, we can have examples under the current repository which will depict the usage of the current image processing modules on various types of solar data. In the end we can release this package by registering with pypy.

GSoC

Have you participated previously in GSoC? When? With which project?

I have not participated in GSoC before. This is the first time that I would be participating in GSoC.

Are you also applying to other projects?

SunPy is the only organisation I am applying to and I am applying to another project Transition to Astropy Time under SunPy. But this project is my first priority among the two .

Commitment

  • I don't have any other internships or work for the summer. I don't have any other plans to go on vacation.

  • My classes for the new semester will begin around August 2nd, but I would still be able to give sufficient time for the project as academic load is very less during the initial few weeks of the semester. Hence it will not be much of a problem during the final week. I will be able to spare 30-35 hours for the project per week easily.

  • Also, because my summer vacation will be starting on May 3, I will start working on the project early so that I can try to complete the project well before the deadline ( around 1-2 weeks before the deadline ).

  • I have my semester exams from around 22nd of April to 1 May. So I will not be able to contribute much time to the project work during this time. Still I will try to devote 2-3 hours during the weekdays to do my work. After May 1 till May 14 (Community Bonding Period) I will be able to discuss with mentors more about the details of the project goals.

Eligibility

Yes, I am eligible to receive payments from Google. For any queries, clarifications or further explanations, feel free to contact me at prateekkol21@gmail.com .

Clone this wiki locally