Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format(s) of the ckan_package S3 object #113

Open
VLucet opened this issue May 15, 2019 · 5 comments
Open

Format(s) of the ckan_package S3 object #113

VLucet opened this issue May 15, 2019 · 5 comments

Comments

@VLucet
Copy link

VLucet commented May 15, 2019

Not an urgent problem, just a suggestion on OOP in ckanr

As I am involved in the development of the package rgovcan which is wrapping around ckanr I have been thinking about the most user-friendly way to return query results. I believe the current implementation S3 object ckan_package could be improved to facilitate delivery to users.

Currently, the output of package_search is a list of ckan_packages, all with the resources returned as a list. Adding as = table returns the results in a data.frame (coming out of fromJSON I assume) with resources being a list of data.frames within the main results data.frame.

A few suggestions:

  • package_search could return a list of ckan_packages, all with the resources returned as ckan_ressource within the ckan_package
  • To push it further, each ckan_ressource could be organised in a new class ckan_ressource_stack, and each ckan_package returned by package_search could be organised in a ckan_package_stack ***

So maybe what we are looking at here is switching to a S4 definition of ckan_package so that it contains multiple ckan_ressource. Although maybe S3 is enough for this? I am not so familaier (yet) with S4.

*** For rgovcan, I am experimenting with attempts to write new classes ckan_package_stack and ckan_ressource_stack that stacks the output from package_search with associated methods to access resources of the object, to make the querry results more user-friendly. I am implementing that in the first version of rgovcan because it is my first time doing OOP, but it might make more sense (and better practice) to migrate these to ckanr later on, if it is preferred!

@VLucet
Copy link
Author

VLucet commented May 17, 2019

EDIT: I have edited my original comment as, after a little more thinking, I found a better way to formulate my suggestions. My apologies for the messiness of the original comment!

@sckott
Copy link
Contributor

sckott commented May 21, 2019

thanks for this @VLucet

  • first, IF we were to use something more than S3, as the maintainer, i would insist on R6 instead of S4
  • returning data as a list, and only doing data.frame conversion with jsonlite::fromJSON is because every CKAN instance is different, and I wanted to make sure we weren't failing on parsing results because we hadn't considered an edge case in how results look like in some CKAN instances.
  • i'm on the fence about making an object on which you can call methods. that's probably not familiar to the average R user, as they are more used passing things through functions, rather than calling methods on objects. But, if we are aiming for a more technical set of users I think calling methods on objects makes more sense
  • i'm open to your stack idea in general. when you're ready could do a PR and we can further discuss there

@VLucet
Copy link
Author

VLucet commented May 21, 2019

Thank you for your reply @sckott. Replying to each of your points:

  • I would agree that R6 makes more sense given we are processing from an API. I do not know R6 well however. But maybe we should stick to S3 (see below).
  • I do think it makes the most sense to keep it as a list for parsing reasons, but then the outputs are not so user-friendly in my opinion and that is why I was thinking of working with objects and methods to facilitate the access to all those lists elements.
  • I did not think about this, this is a good point. However I think it would not make the package too technical, as long as we do not overwhelm users. The way I think about this is as long as we stay in S3, users can still access the elements in the objects with the usuals $. Providing methods would offer a cleaner way to do so for users that understand it, and an easier way to handle multiple resources and packages. Besides, the users would still be passing the results of their searches to a function, and I do not think it is too overwhelming for les technical users (but i might know the community less well than you do)
  • I have sketched the stack objects for both packages and resources in the current development stage of rgovcan. If that is not too much to ask, could you please give a quick look at the READme, it shows what it would look like if it were to be added to ckanr.

@sckott
Copy link
Contributor

sckott commented Jul 3, 2019

sorry for delay in response. I'll have a look at your README and get back to you

@sckott
Copy link
Contributor

sckott commented Jul 18, 2019

@VLucet sorry again for delay, back from vacation now with time.

I took a run through your README and everything looks nice. A few thoughts:

  • Are you developing your pkg with the CRAN version of ckanr, or the development version here? because they are quite different now since it's been so long. e.g., fetch fxn is now ckan_fetch
  • At the moment it seems your fxns, e.g., govcan_search, govcan_get_resources, etc. are light wrappers around the ckanr fxns. not changing the output much. I can see how if we used classes (ideally R6) then we could make a nice print method to summarise results for any number of results, and then make methods for accessing parts of each result, e.g. pseudo code
x <- govcan_search(keywords = c("dfo"), records = 2)

# methods to get metadata across every result
x$ids()
#> [1] "5cfd93bd-b3ee-4b0b-8816-33d388f6811d" "4dc95665-3d44-428c-bb26-12f981c57060"
x$
#> [1] 

# methods to perform actions
x$resources() # to get resources across each package

thoguhts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants