Skip to content

GSoC 2023 Nischal Singh

Nabil Freij edited this page Feb 22, 2024 · 3 revisions

Rewriting the existing scraper code to improve its efficiency, robustness, and maintainability.

Student Information

Personal Details

Education

Personal Background

As a computer science junior, I have a strong interest in Python, image processing, astronomy, and machine learning. My journey with Python began during my freshman year, which coincided with the COVID-19 pandemic. With the sudden shift to online learning, I saw a need for a lecture management system and decided to build one using Python and Tkinter. This system used a JSON file as a storage for lectures, allowing professors to easily organise their course material and keep track of their progress.

This encouraged me more and I continued to develop my skills in software development and participated in hackathons. In one such event, I built a university management system named Scheduler, which aimed to help universities handle various tasks such as lecture management, student attendance, schedules, and examination schedules. This project demonstrated my ability to develop comprehensive solutions to complex problems.

Then I was introduced to the world of image processing, which further expanded my horizons. I became fascinated with the possibilities and built a licence plate detection system using OpenCV, which could mask out the region of the licence plate. Now, I am looking to expand my skill set and contribute to the open-source community further. I believe that the Google Summer of Code (GSoC) program is the perfect opportunity for me to do so. With my background in Python, image processing, astronomy, and machine learning, I am excited to see where this journey takes me and what contributions I can make to the open-source community.

Programming Experience

What is your experience with programming?

I have been programming since high school. The first programming language I was introduced to was Java with which I worked in a group project to build a library management system which connected to a MySQL database. Then in my freshmen year I got introduced to C, C++ and Python, and made some projects for my university such as the Bank Management system using C++ and linker which was built using Python and tkinter. I have also made some small changes towards automation to automate small tasks. I have also worked with different libraries of python such as numpy, pandas, scikit, OpenCV, re, tkinter etc.

What is your experience with open source software?

This was my first contribution towards open source and I learned a lot in this journey. If I get the opportunity to work for this project I will try to give my all towards this project and inturn learn a lot from this project. I have just started with open source but I am sure this lack of experience won't be a hindrance. Because of my lack of experience I will work hard towards this project to build myself as a programmer.

Project: Scraper Rewrite

Mentors: Nabil Freji, Shane Maloney, Laura Hayes

Abstract

The aim of this project is to improve the efficiency, robustness, and maintainability of the scraper code. A new scraper will be created with predefined API and class structure, as the API and the scope of scraper has been mostly predetermined. So the primary focus of this project will be to write the class and its methods to follow the specification, Some of its goals are to remove the full regex support and to only use parse, extract common code to higher level (directory pattern creation, etc).

How do you plan to implement the project?

First, I will start with understanding the current API design, the current scraper class and all its methods, then I will start experimenting with the current scrapper class and try to find the bugs and issues we face with the current class, Some of the issues already known to us #4493 which states that Scraper has issues handling specific url where date_part is none,one of the way we can solve this issue is by adding a group to the date_part and then adding a quantifier, so that if the date_part becomes it will be ignored and no error will be thrown but since our core motive in this project is replace the full regex with the parse, we will implement this using parse. Then, one of the issue that's known to us already is #4336 which states the limitation we are currently facing with scraper class. Some of the limitations such as, we cannot use kwargs and regex at the same time, Scraper generates all possible directories based on the timerange and pattern. But it can't do so if the directory contains a variable. These limitations can mostly be covered when I will rewrite the current scraper with parse method instead of regex. Then I will experiment with parse and understand its full functionalities, then I will start incorporating parse method in scraper class replacing regex by implementing the functionality of parse instead of regex, then after successful experimentation with the parse, I will create unit tests for each method in scraper class to verify the updated scraper and with all the methods. And look for any bugs that might have been created during the rewriting of the class, fix those and then create more tests. This will help us to create a rewritten scrapper class with keeping the current API intact and updating the current scraper class.

What are your contributions to the SunPy Project so far?

This Pull Request Made the example Masking HMI based on the intensity values of AIA #6825 was my first contribution in Sunpy and in open source while completing this PR I learned a lot about how to use git, github and sphinx documentation and it was a great feeling when my first PR got merged. Then I after this I created three more PRs: Updated docstring in TimeSeries #6829 Merged Adds suggestion in the example autoalign/reproject to use assume_spherical_screen() #6855 Merged Fits reading emits a warning when the data array is not actually memory mapped #6837 Under Work

What do you want to achieve? What excites you about this project? Why did you choose it?

This project is important for improving the efficiency, reliability, and maintainability of the scraper class and all its methods and reducing its complexity by using parse instead of regex. This is critical for users who rely on URL scraping to return metadata and data files to a user. Working on this open source project will be a privilege for me because during this period I will get the opportunity to work with experienced mentors and collaborate with other talented developers, leading to a valuable learning experience and professional growth. I have chosen this project because I have previously worked with re libraries and have gone through the parse and have understood what we are trying to implement in this project and how we can achieve this. I will give my all to complete this project.

Timeline

Time Period My Work Plan
Community Bonding Period (May 4 - May 28) During the Bonding Period, I will get myself more familiarise with the project and with the functionalities by reading documentation again.Will get to know mentors and other members of OpenAstronomy.Discuss any future discrepancy which can be caused during the working for the projects.Clearing all my doubts.
Week 1 - 2 (May 29 - June 12) Understanding and experimenting with the current scrapper classes and all the methods used by the class.Develop pseudocode for the primary tasks.
Week 3 - 4(June 13 - June 26) Understanding the Scope of the current Scraper class.Rewriting the current scraper class with keeping the current API intact using parse instead of regex.Rewriting all the methods in Scraper class with the functionalities of parse instead of regex.
Week 5 - 6 (June 27 - June 10) Writing unit test for the updated Scraper class and for all its methods.
First Evaluation (July 11) Submitting the rewritten Scraper class for the midterm evaluation.
Week 7 - 8 (July 14 - July 24) Further working on the project and going through all the classes which use Scraper class and make the changes accordingly to these classes.Looking for any bugs and debuggingCleaning the code.
Week 9 - 10 (August 21 - August 28) Working on any optional goal which may arise during the completion of the project.Finalising the changes and writing the documentation for the changes done during the current project.
Final Week (August 21 - August 28) Wrap up everything, make sure everything is working.Submitting the final code for the evaluation.

I and GSoC

Have you participated previously in GSoC? When? With which project?

No, I have not previously participated in GSoC before.

Are you also applying to other projects?

No, I am not applying for other projects.

Are you eligible to receive payments from Google?

Yes, I am eligible to receive payments from google.

How much time do you plan to invest in the project before, during, and after the Summer of Code?

The only time I will be distracted away from the project is during the community bonding period as I have exams during that period. I will make up for this in upcoming weeks and the project will have my full attention after the community bonding period. I will work 6 hours per day for this project, and dedicate my full potential towards this project. And after Summer of Code I still plan to contribute towards open source as it's a great opportunity to meet with technology enthusiasts and great developers with whom I can collaborate and learn a lot.

Clone this wiki locally