Skip to content

GSoC 2013 Simon Liedtke

Nabil Freij edited this page Mar 17, 2023 · 3 revisions

Application for GSOC 2013

Student Information

Name

Simon Liedtke

IRC handle

derdon@irc.freenode.net

GitHub account

derdon

Jabber account

derdon@jabber.ccc.de

Blog

http://derdon.github.io/blog/

Project Proposal Information

Title: SunPy: Database of local data

Abstract

The SunPy package offers the module vso (Virtual Solar Observatory) which represents an interface to query and download astronomical data from multiple data providers simultaneously. Currently, each time some data from a data provider is fetched, it must be downloaded from the remote server. This causes unnecessary high bandwidth if the same data is requested multiple times. It is also not possible to save multiple query results with their metadata in the same place. With a SQLite database on the local hard drive or a server database accessible over a network these problems will be solved in a convenient way.

With sqlite as a database, it will be possible to save data in-memory which would be especially convenient in interactive sessions.

The users can choose whether they want to set a data limit or not. If there is one, the database is used as a cache whose method can be chosen by the user. The default one is a LRU (Least Recently Used) cache. The API will also support to use a custom cache method by subclassing a general caching class and overriding methods to remove entries.

Another benefit of having a personal local database is the possibility to allow grouping of results (e.g. "group all results that are from the instrument EIT and are currently opened"). As the (informal!) example demonstrates, results can be grouped by chaining logical operators.

The database interface itself is separated from the class VSOClient (which is responsible for querying and downloading data) so that one does not need to be connected to a data provider to be able do add entries.

If an operation on the database (adding, removing or editing entries) was done by accident, the user can undo this operation, and redo it again if the undo action should be reverted. If there is more time for implementing features, it will also be possible to undo and redo multiple actions in one run. The undo-redo feature is especially useful in an interactive session if the user has removed or edited a database entry by mistake.

Detailed Description

Milestones:

May 27 -- June 2 (1 week)

Read and understand the relevant parts of the code and documentation. This includes sunpy/net/download.py, sunpy/net/vso/vso.py and the documents http://sunpy.readthedocs.org/en/staging/guide/vso.html, http://sunpy.readthedocs.org/en/staging/guide/tutorial.html#querying-the-vso, and http://sunpy.readthedocs.org/en/staging/reference/vso.html.

June 3 -- June 9 (1 week)

Design the database: Define which columns will be included in the database and which types they will have. Use SQLAlchemy for a dialect-independent solution. If the database is stored on the local hard drive, SQLite is used (the default). But it is also possible to use a server database such as PostgreSQL which may be located on a remote server connected over some network. Because of SQLAlchemy's create_engine function, it is very easy for users to define the SQL driver, the user and password, the host, the database name and possibly more options (like the encoding). This information will be saved in the config file.

June 10 -- June 16 (1 week)

Design an API for downloading data and saving the relevant information of it in a database and discuss the proposal with the mentors. Also discuss the API of caching classes and how to write custom caching classes.

June 17 -- July 7 (3 weeks)

Implement the database interface. This is the main part of the project. The path of the database is either the one specified in the configuration file or if the according option is not set, an operating system-dependent value will be used. For unix-like systems this is $XDG_DATA_HOME or $HOME/.local/share, if this environment variable is not set. For windows it is the directory AppData in the home directory of the user. Implement the APIs planned in the week before, i.e. the API for downloading data and saving information of it in a database and the API for caching. The caching method can be set in the configuration file.

July 8 -- July 24 (2.5 weeks)

Document all implemented code and check for missing tests. The Sphinx documentation generator will be used for all parts of the documentation. To document the API, I will use the rst directives and roles Sphinx provides and will follow the rules which can be found in the recommended document A Guide to NumPy/SciPy Documentation. A tutorial will take the user by the hand and show how to connect to the database, how to add, modify and delete entries and how to close the connection. It should only be shallow, because topic-specific user guides will go deeper into the topic and cover every aspect und usage possibility of the classes and methods.

To test the implemented methods, the Python testing framework py.test will be used. To check how many parts of the implementation have been tested, the tool coverage.py is used.

July 25 -- August 4 (1.5 weeks)

Buffer zone: Fix bugs, refactor code, have code review done.

August 1

Mid-term evaluation: Database interface must be stable and tested thoroughly.

August 5 -- August 11 (1 week)

Implement grouping of results. Because SQLAlchemy is used, this can relatively easily be achieved by writing a handy wrapper for the method orm.query.Query.filter from the sqlalchemy package.

August 12 -- August 18 (1 week)

Document and test the grouping feature. Again, there will be three kinds of documentation: an API reference, a step-by-step tutorial, and a more detailed topic guide.

August 19 -- August 25 (1 week)

Implement the undo and redo functionality for the database interface.

August 26 -- September 1 (1 week)

Document and test the undo and redo features. Because this is part of the database interface which is separated from the query and download functionality, these features can be tested without any connections to a data provider. Undoing and redoing is especially useful in interactive sessions, therefore the documentation consists of many examples from a REPL session so that the reader can copy & paste the code and comprehend the behaviour.

September 2 -- September 8 (1 week)

Extend grouping to support boolean operations. That means that group filters can be chained together via and, or, and exclusive or (xor) operations.

September 9 -- September 15 (1 week)

Test and document boolean operations on groups. The documentation of this will be part of the documentation of the grouping feature.

September 16

Suggested 'pencils down' date. Add more tests, improve the documentation, refactor and fix bugs. This is also a buffer zone if features require more time than planned.

Code Sample(s)

A pull request by me can be found at https://github.com/sunpy/sunpy/pull/445. This patch makes it possible to parse time strings with more than six zeros in the microsecond part. An example for input that can be parsed with this patch but could not be before is 2007-05-04T21:08:12.00000000 (note the eight zeros instead of six). The corresponding issue can be found at https://github.com/sunpy/sunpy/issues/289.

I am interested in esoteric programming languages and therefore wrote a an interpreter for the language chef. To simplify developing and debugging programs written in befunge, I wrote an interactive shell for it, see befungeshell.

To exercise my newly developed skills in the programming language Go, I wrote a (still experimental) library for parsing and making images in the Netpbm format, see netpbm to have a look at the code.

Writing code is not the only way to contribute to open source projects: To help the developers of a project with improving it, it is important to report bugs the users have encountered. An example of a bug I have reported is from the urwid project: there is an issue with putting an application into the background and fetching it back into the foreground. The complete bug report can be found here: https://github.com/wardi/urwid/issues/25.

Biography

I am a 21 year old student studying computer science in Bremen, Germany. I dived into the world of coding when I was 12 and started to interest myself for the Internet and the www. I wanted to find out how web pages are made, so I learned HTML, CSS, and JavaScript. When I read documentation about JavaScript, it was often mentioned that it works on the client side, the browser, and thus cannot be used for interfacing with a database from the webserver (back in my days, there was no fancy thing like Node.js). So I became curious again and learned PHP (it seemed to me that there was no alternative option). Five years ago, I read about a programming language called Python and found out that PHP has many flaws which I could not notice when I learned it because I was a beginner back then. The time I started learning Python was also the time I started supporting the German Python community: I am an active member both in the IRC channel #python.de at freenode and in the German Python forum. I have thorough experience with git, sphinx, the testing framework py.test and working on open source projects in general. When I used Python from web programming, I used Werkzeug and Flask as web frameworks and Genshi as the template engine. I also have basic experience with (web-based) database applications using SQLAlchemy.

Other Schedule Information

Important remark: the times mentioned in this section are all GMT+1!

Summer vacancies start for me on July 6, so I cannot work 40 hours per week before that date. But I promise to spend as much time as possible during this period to achieve my goals. Until July 6, I plan to work from 8:00 to 13:00 on Tuesdays, 8:00 to 14:00 on Wednesdays and from 8:00 to 11:00 on Thursdays. Depending on how much homework has to be done, I will also be able to work on the weekends during that time.

After the July 6, I can work much more flexible, i.e. 40 hours per week (or more if possible, to compensate for the previous weeks).

Clone this wiki locally