Proposal for common data format #12

dfm · 2013-06-12T15:45:13Z

We need a way of comparing the output of all of the detrending algorithms. This will (probably) involve something including but not limited to:

running a standardized search algorithm on all the different outputs
visualizing the results of the different methods in the same way/simultaneously
other things?

Anything that we do will benefit a common output format for the codes (for obvious reasons).

I see 2 main options:

ASCII tables (gasp!) with specified columns (kbjd, detrended_flux, detrended_flux_uncert, ...)
FITS tables with the same format as the original Kepler data products (including the relevant metadata) with added columns with the same information as above

The first option is far easier to implement in any programming language (lowering the barrier to entry) so I'm probably inclined to go with that but the second one seems more useful (and self-contained) for the search phase depending on what we decide to do.

Thoughts?

rwolpert · 2013-06-12T16:07:29Z

Is there a way to take ASCII option, plus an auxiliary function or perl script to convert?

Dan Foreman-Mackey notifications@github.com wrote:

We need a way of comparing the output of all of the detrending
algorithms. This will (probably) involve something including but not
limited to:

running a standardized search algorithm on all the different outputs

visualizing the results of the different methods in the same
way/simultaneously

other things?

Anything that we do will benefit a common output format for the codes
(for obvious reasons).

I see 2 main options:

ASCII tables (gasp!) with specified columns (kbjd, detrended_flux,
detrended_flux_uncert, ...)

FITS tables with the same format as the original Kepler data
products (including the relevant metadata) with added columns with the
same information as above

The first option is far easier to implement in any programming language
(lowering the barrier to entry) so I'm probably inclined to go with
that but the second one seems more useful (and self-contained) for the
search phase depending on what we decide to do.

Thoughts?

Reply to this email directly or view it on GitHub:
#12

On the go from my phone...

pdbaines · 2013-06-12T16:09:49Z

ASCII! 👍

dfm · 2013-06-12T16:25:38Z

Yeah. That's a good idea! Sounds like ASCII + auxiliary script is a good idea.

What columns do we need? I mentioned kbjd, detrended_flux, and detrended_flux_uncert above. Any others?

eford · 2013-06-13T05:18:32Z

What about an integer flag, where 0 = this point was used for calculating
detrending, and non-zero values provide information about why a point was
excluded?

On Wed, Jun 12, 2013 at 12:25 PM, Dan Foreman-Mackey <
notifications@github.com> wrote:

Yeah. That's a good idea! Sounds like ASCII + auxiliary script is a good
idea.

What columns do we need? I mentioned kbjd, detrended_flux, and
detrended_flux_uncert above. Any others?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-19337457
.

aprsa · 2013-06-13T22:29:53Z

We need a way of comparing the output of all of the detrending
algorithms. This will (probably) involve something including but not
limited to:
[snip]

Do we have a list of sandbox KICs somewhere on git? If so, I can run
our detrender on them tonight/tomorrow.

Cheers,
Andrej

jessielchristiansen · 2013-06-13T22:32:21Z

Yes, they are in the detrending/documents directory - a bunch of text files containing all the KICs in the skygroups, all the TCEs identified by the pipeline in Q1-Q12 in the skygroups, three quiet 12th magnitude G stars, three bright variable stars, and a couple of other poster children (Kepler-37 eg).

benmontet · 2013-06-13T22:43:40Z

Have we decided what will be our test suite will be to compare algorithms
on? If not, I propose we use at least a subset of the variable stars using
a selection from each sky group.

On Thursday, June 13, 2013, jessielchristiansen wrote:

Yes, they are in the detrending/documents directory - a bunch of text
files containing all the KICs in the skygroups, all the TCEs identified by
the pipeline in Q1-Q12 in the skygroups, three quiet 12th magnitude G
stars, three bright variable stars, and a couple of other poster children
(Kepler-37 eg).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-19429412
.

jessielchristiansen · 2013-06-13T22:46:05Z

Depends how your detrending algorithms work. If they need the ensemble of stars to identify common modes, then I would use the whole set of KICs in each skygroup. Otherwise hit up the variable stars!

dfm · 2013-06-13T22:56:22Z

The sandbox data are here and an example output is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for common data format #12

Proposal for common data format #12

dfm commented Jun 12, 2013

rwolpert commented Jun 12, 2013

pdbaines commented Jun 12, 2013

dfm commented Jun 12, 2013

eford commented Jun 13, 2013

aprsa commented Jun 13, 2013

jessielchristiansen commented Jun 13, 2013

benmontet commented Jun 13, 2013

jessielchristiansen commented Jun 13, 2013

dfm commented Jun 13, 2013

Proposal for common data format #12

Proposal for common data format #12

Comments

dfm commented Jun 12, 2013

rwolpert commented Jun 12, 2013

pdbaines commented Jun 12, 2013

dfm commented Jun 12, 2013

eford commented Jun 13, 2013

aprsa commented Jun 13, 2013

jessielchristiansen commented Jun 13, 2013

benmontet commented Jun 13, 2013

jessielchristiansen commented Jun 13, 2013

dfm commented Jun 13, 2013