Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign column types (instead of considering everything is a string) #76

Open
niconoe opened this issue Dec 13, 2018 · 2 comments
Open

Comments

@niconoe
Copy link
Member

niconoe commented Dec 13, 2018

No description provided.

@niconoe
Copy link
Member Author

niconoe commented Dec 13, 2018

(idea coming from ropensci-archive/finch#25)

That seemed the simplest sensible approach at the time, since the standard doesn't allow to specify data types in the Metafile (which would be great). That means we have to:

  • either rely on a on standardized fixed list of "expected type per column" like @damianooldoni did
  • either rely on some sort of guess algorithm

(or a combination of both)

It seemed to me that the former might be a bit rigid (different people may use the standard in a different way) while the latter can be difficult to implement properly.

Anyway, implementing it as an optional interpretation layer on top of the current behavior seems helpful to end user, so I'll consider it!

(would the Pandas integration help/can be used for the search algorithm?)

@stijnvanhoey
Copy link
Contributor

@niconoe I do understand the benefits of this, but I would keep this indeed as an optional setting as for some use cases (e.g. validation of inputs pywhip) knowing the data will enter as string maks things easier to build on. Certainly the latter algorithm based approach will be tricky to further build a on (and python-dwca-reader is a typical building block in other implementations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants