Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making convert = FALSE the default in fetch_survey() #281

Open
juliasilge opened this issue Aug 31, 2022 · 8 comments
Open

Consider making convert = FALSE the default in fetch_survey() #281

juliasilge opened this issue Aug 31, 2022 · 8 comments

Comments

@juliasilge
Copy link
Collaborator

@juliasilge what would you think about making convert = FALSE the default, at least until we can get things off the v2 endpoint? I know conversion has been the default behavior for a long time, but I'm concerned we might be steering users, maybe esp. new ones, towards potentially problematic behavior.

As @context-dependent noted, that change would just mean that users by default get the same data they get from a web download with default settings (still with some bonuses like cleaning up the metadata row & a column map)

Originally posted by @jmobrien in #278 (comment)

@juliasilge
Copy link
Collaborator Author

We can consider whether this change is a good idea, or whether we should solve these types of problems in a better way such as outlined in #267. I am concerned about changing a very old default (predates my involvement with this package), especially when we don't have nice ways to communicate this kind of change to users and we think the situation will change moving away from the old v2 endpoint.

@jmobrien
Copy link
Collaborator

Thanks. I feel the same way, in that it's something I'm reluctant to touch given its history. I also personally don't care for the feature and always use it turned off, but for that reason I may also lack perspective for those that do.

As noted elsewhere, theres this new includeLabelColumns param that offers something similar:

image

So, we could theoretically do something that would link up recodes and labels, without calls to any external endpoints.

That said, the more I think about it it's not quite the same, and I think there are some important considerations for how to get this right:

  1. "recode" ≠ "choice" - they're not necessarily the same (though they often are). So factoring with those would work differently from the existing convert approach. OTOH, recodes that differ from choices might actually be closer to what users usually want w/r/t the ordinal relationships among responses.
  2. "choice" ≠ "order" - it's actually possible to have, say, "choices" 1-5 that consistently display as, say, {3, 4, 1, 2, 5} on respondents' screens. Also, a survey designer can create this by accident from the web interface, and it can be really hard to even know it happened (I encountered this issue myself). I think this concern applies to the current convert as well, and if so is something we should make a point of addressing.
  3. "choice" ≠ "answer" - in, e.g., matrix questions, the rows (different subquestions) are "choices", while the responses are "answers". Answers also have the same ID/order separation. Not sure where this stands re:current convert.
  4. unendorsed responses - the current approach at least has the benefit of trying to get all response options as levels, even those that might have had no respondent select them. includeLabelColumns probably would lose that.

@jmobrien
Copy link
Collaborator

@juliasilge re: your wise point in #278 about not annoying users with warnings, one thing I'm curious about. In the tidyverse packages I've seen a "warn on first instance in a session" behavior for certain actions, e.g.:

require(tidyverse)

vbls <- 
  c("mpg", "drat")

mtcars |> 
  select(test)

If we did end up changing our minds and decided to change the default, how hard would it be to do something like that?

@juliasilge
Copy link
Collaborator Author

@jmobrien It's not hard; it is arguments to rlang::inform(), like so.

@jmobrien
Copy link
Collaborator

@juliasilge thanks, that's useful to have in the toolbox.

FYI, all the above is essentially why the PR for fetch_description() happened, and why I also wrote a separate R package to process what came out of it. So, I've already "done" all the above, sort of--it's just that it was really "bespoke" (hacky) and not something I want to port over here.

It was hacky in part because the response schema weren't yet published at the time. Now they're out so they could be a great resource, but I'm still a little unclear where/how to get the specific endpoint response schema that describes everything that might come back from fetch_description(). Do you (or anyone) know where that might be?

@juliasilge
Copy link
Collaborator Author

I think the documentation here is pretty good for that endpoint. Does it have what you were thinking of?

@jmobrien
Copy link
Collaborator

jmobrien commented Sep 7, 2022

Sort of--yes, the schema they're now publishing here are what I'm interested in. But using the web display for that isn't great.

Main issue is similar the one before when I was mining the survey description downloads themselves--the trees describing the survey/schema are of arbitrary depth, so it's not easy to map out where everything important is/would be.

A smaller challenge here is that the schema references other sub-schema, e.g. the Questions object that embeds what comes from here. (and is something we'd need if we want to properly update survey_questions().)

Here's what I know so far. The schema for each endpoint docs page aren't a part of the webpage source, but are imported via JS-based calls to some more programmatic API reference files served from company Stoplight, e.g. these things I dug out:

(These are JSON, Firefox has a nice tool for formatting these but apologies if it's a mess on your screen)
https://stoplight.io/api/v1/projects/cHJqOjk3NDQ/nodes/9d0928392673d-get-survey
https://stoplight.io/api/v1/projects/cHJqOjk3NDQ/nodes/YXBpOjYwOTM2-qualtrics-survey-api
https://stoplight.io/api/v1/projects/qualtricsv2/publicapidocs/nodes/reference/surveyDefinitions.json

That first link looks like the raw data we would want, just without the organizing/labeling the web page offers. I (or someone) could put the time in to manually build that framework from that raw data, sure--but I'm wondering if someone who knew things like web scraping and/or conventions of API publishing would see how this could be done faster + in a more maintainable way. Both are way outside my expertise, though.

@jmobrien
Copy link
Collaborator

jmobrien commented Sep 7, 2022

Side note--these same Stoplight files are probably where we'd get any standardized list of metadata vars such as @chrisumphlett suggested in #272.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants