GSoC 2015 Goran Cetusic
Goran Cetušić | Short bio |
---|---|
My academic and programming career started with my enrollment as a Bachelor student at the Faculty of Electrical Engineering and Computing in Zagreb. During that time I've been involved in various activities like lecturing student courses, organizing events and projects, leading several of them as the section leader of the Electrical Engineering student club. I got my first job as a network administrator at a small firm and where I started to learn some real programming in Python. Before finishing my Bachelor studies, I got an internship position as an Openlab summer student at CERN where I worked on process accounting software for their batch cluster. After a year working as a Linux administrator and contributing to several open source projects, I'm now a dedicated Python and Django developer, juggling my responsibilities as a Masters student in Telecommunications and Informatics, currently working on my thesis on Linux network simulators and various other interests like machine learning and bioinformatics, often "wasting" my time on Coursera. | |
Education |
Faculty of Electrical Engineering and Computing, University of Zagreb Telecommunication and Informatics, Master studies |
Contacts | |
Curiculum vitae | https://github.com/gcetusic/gcetusic.github.io/blob/master/docs/gcetusic_en.pdf |
A Linux enthusiast every since in 2006 someone installed Ubuntu on my laptop at a yearly Linux InstallFest. Since then I've taken over the organization of the same installfest for a couple of years and even organized and lectured two Linux courses at my college. After writing (mostly) bad C code in Gedit, I've switched to Python and Komodo Edit and it's been kind to me ever since. Except for that time when Komodo Edit deleted 300 lines of code...I'm using Sublime now, with all the benefits of its lint plugins.
After spending some time writing more bad code and making rookie mistakes, suddenly I realized I actually have something interesting to show to other developers. Python has served me well in this but I still have a lot to learn. For example, after doing web development for a couple of years, while trying to work around badly written Django code, I've implemented my own Python metaclass, probably the most abstract thing I've done in a programming language. Here are examples of some of the projects I've worked on (for a more complete list, please view my GitHub page, some are under NDA and closed sourced)
- https://github.com/stormpath/stormpath-sdk-python/ - a flexible user managagement API
- https://github.com/gcetusic/psacct-collector - process accounting software developed during CERN internship
- https://github.com/gcetusic/IMUNES - a powerful network simulator
The Lightcurve
class is one of the three core datatypes in SunPy, along with Map and Spectra.
Lightcurve
is designed to read in, process and store meta data related to solar physics time series data.
Currently, Lightcurve
uses the pandas library as its underlying data structure, however, this is subject to change in the future.
Much like the map
submodule, lightcurve
needs to be able to read in various supported data formats (such as FITS, ascii and others in the future), store their meta data and give users Beginner and unified access to this metadata independently of the original source of the data.
As currently implemented (as of 0.5) the lightcurve
module performs three core tasks:
- Download the raw data
- Read this data into a pandas dataframe
- store the meta data obtained with the data.
As of the SunPy 0.6 release the first stage will be moved out of lightcurve
and into the net
subpackage as part of the UnifiedDownloader
(name subject to change) Pull Request.
This leaves lightcurve
in a similar position to map
where the data acquisition is not part of the core data type and is managed separately.
Therefore, enabling the implementation of a factory class like Map
for the lightcurve module.
Expected Outcomes
- Become familiar with the
UnifiedDownloader
code, if it has not been accepted into the SunPy codebase, complete the remaining tasks for this to be achieved. - Re-write any new lightcurve sources that were not included in the
UnifiedDownloader
code as sources forUnifiedDownloader
. - Write a factory class for
lightcurve
similar to thesunpy.map.Map
class. This class will be a generic constructor forlightcurve
allowing the user to instantiate any one of the many subclasses ofGenericLightcurve
present insunpy.lightcurve.sources
. The API design for the factory class is here: https://github.com/sunpy/sunpy-SEP/pull/6 - Design and develop a robust method of dealing with lightcurve meta data, which can handle joining different parts of timeseries from different files, each with their own meta data. (See #1122)
- Getting more familiar with the codebase
- Defining specification for project
- Researching the physics behind sunpy and lightcurve
- Basic LightCurve class refactor and code cleanup
- Designing factory class for LightCurve
- Refactor of all sources related files
- Syncing sources refactor with UnifiedDownloader branch
- Resolving LightCurve related issues on Github
- Evaluating, fixing and closing
- Midterm evaluation
- Resolving any design problems
- Maintenance and codebase improvements
- Dealing with merging of LightCurve metadata
- Fixing any showstoppers not caused by Lightcurve refactor
- Removing pandas library from SunPy
- Implementing custom data storage structures
- Buffer Period for completing any incomplete work
- Fixing any showstoppers not caused by Lightcurve refactor
- Making sure there's good test coverage and documentation
- Final syncing of GSOC project (mostly merges)
During the first 4-5 weeks I would have to work on my masters thesis parallel to GSOC. It's my last year as a student and the deadline for turning it in is the end of June. Even though I would be able to devote at least 4 hrs a day during the period between the start of GSOC and the end of June I would start coding by the beginning of May (during the community bonding period) to get a head start.