Skip to content

GSoC 2023 Suleiman Farah

Nabil Freij edited this page Feb 22, 2024 · 3 revisions

[SunPy] Refactoring the Current OpenAstronomy Web Scraper to Reduce Bugs and Optimize Data Collection

About Myself:

My name is Suleiman Farah, I am a second year undergraduate student at the University of Toronto in the Computer Science Specialist Program. I have prior internship experience as a Mobile Applications Developer/Designer with an outstanding recommendation towards my coding performance, * ethics, and professionalism in the workplace on Linkedin.

In the Mobile Applications Developer / Designer internship, I lead a notification project on an IOS campus app for a team of a dozen colleagues using the Agile Development Model. Across two sprints within 1.5 months, I contributed several hundred lines of code and the app UI design to successfully improve the livelihood of a community of 8000+ people by increasing accessibility to community services.

I believe I have the skills to work in a fast paced high pressure setting for both large department-wide teams as well as smaller cohesive team environments. I was already preparing to create a web scraper project this summer and finding this opportunity to contribute to the field of astronomy as an open source project through the Google Summer of Code program allows me to contribute to my passion while building up my portfolio. Although I am new to open source, I can quickly adjust to the new conditions as I am already familiar with the usage of git pull request and professional interactions on Git by being involved in multiple team coding projects.

I also believe that I am perfect for this role because I have more than half a decade of experience in python, proficient in Java, C, Unix/Linux environments, and confident in the usage of advanced software design principles and data structures. Moreover, through my pull request I've familiarized myself with the SunPy API and the documentation across several segments of the core docstrings

Here are a few of my details as well as contact methods:

Google Summer of Code Details:

This is the first time I am applying for the Google Summer of Code program and I only intend to apply for this OpenAstronomy project as this is the only industry I want to make an impact in through an open source project. In addition, I feel the other ideas on the OpenAstronomy page require supplemental mathematical knowledge that I don't have at the moment.

For this medium size (~175 hr) project, I intend to dedicate 20 hours/week to ensure the project is finished on time with exemplary quality and documentation. I am also eligible to receive payments from Google.

Current SunPy Project Contributions

I have made one pull request where I do a comprehensive search through 1000+ lines of documentation to check for word choice errors and update broken or url references with valid DOI address, if applicable.

Project Issue and Solution Plan

The current Scraper utilized by Sunpy is error-prone and lacks maintainability. Specifically, the Scraper class is functional only for specific parameters while failing to work with different parameters, such as when providing a different timeRange parameter. Additionally, utilizing regex has proven to cause errors for some edge cases (as listed in the Scraper Crash URL issue). The codebase for the current Scraper API contains numerous bugs which warrants the need for refactoring and subsequent optimization.

To enhance the Sunpy Scraper, I will conduct a thorough analysis of the current codebase related to the Scraper class, identify the areas of improvement, develop a new scraper by refactoring useful components of the previous code, and add applicable python libraries. The new scraper should be able to extract the necessary information from the given website using parse instead of regex to avoid high-level complications with pythons regex. The new scraper will be well-documented and can be easily accessed by the Sunpy community members to run additional tests and provide feedback. Lastly, I will also develop a series of pytests (python unit testing library) to ensure the new scraper works accurately, provides proper feedback on errors, and fulfills all the abilities of its predecessor.

Project Timeline

Community Bonding Period (May 4 - May 28)

  • Become familiar with the mentors for this OpenAstronomy project and read over the SunPy program code and documentation.
  • Learn about the SunPy predefined API instructions.
  • Learn about the Sunpy predefined class structure.
  • Study the current implementation of the web scraper system, see what aspects will be refactored, and read the corresponding documentation to understand functionality.

Week 1 - Week 2 (May 29 - June 11)

  • Start Coding!
  • Construct the predefined portion of the scraper class as instructed by the mentors
  • Structure the API to integrate into the scraper class in a cohesive manner, check for bugs incrementally

Week 3 - Week 6 (June 13 - July 9)

  • A partial web scraper should be written and provides some functionality
  • Write brief documentation to describe each method and process
  • Ask the mentors and the open source group via GitHub for feedback on the project.

First Evaluation (July 11 - July 14)


Week 7 - Week 8 (July 14 - July 25)

  • Work on the feedback provided by the mentors to improve the function or documentation of the program
  • Attempt to implement the parse functionality on the scraper in replacement of Python Regex

Week 9 - 11 (July 26 - August 21)

  • Draft final unit tests to test the functionality of the web scripting API
  • Test edge cases that were failing for the previous API
  • Improve performance where possible via selection of key data structures
  • Finalize the coding documentation docstrings and GitHub information panels related to the web scraper
  • Continue to work on the parse functionality if there is extra time

Final Evaluation (August 21 - August 28)


  • The SunPy web scraper should function with no bugs and is well-documented in its docstrings, examples, and presentation to the team.
  • All specifications for the project should be met according to the mentor instruction. A stable well-documented iOS application with the above feature implemented according to the specifications from the mentors.

Other Commitments:

I will be taking part in a tech startup internship this Summer which tentatively ends on August 31st. I believe taking part in an internship will not deter or take away any time that would be placed separately on the OpenAstronomy project. I have demonstrated time management skills and the passion to guarantee 20 hours a week dedicated to this open source project for the Summer semester. This web scraper program for an astronomy based project really emboldens my next steps as a developer and I can't see myself engaging in an opportunity like this in the foreseeable future. I sincerely believe that given my passion in astronomy and open-source opportunities, and my strong fundamentals in python, then strong results by engaging in this project will follow.

Clone this wiki locally