Consider making `convert = FALSE` the default in `fetch_survey()` #281

juliasilge · 2022-08-31T18:33:08Z

@juliasilge what would you think about making convert = FALSE the default, at least until we can get things off the v2 endpoint? I know conversion has been the default behavior for a long time, but I'm concerned we might be steering users, maybe esp. new ones, towards potentially problematic behavior.

As @context-dependent noted, that change would just mean that users by default get the same data they get from a web download with default settings (still with some bonuses like cleaning up the metadata row & a column map)

Originally posted by @jmobrien in #278 (comment)

The text was updated successfully, but these errors were encountered:

juliasilge · 2022-08-31T18:35:57Z

We can consider whether this change is a good idea, or whether we should solve these types of problems in a better way such as outlined in #267. I am concerned about changing a very old default (predates my involvement with this package), especially when we don't have nice ways to communicate this kind of change to users and we think the situation will change moving away from the old v2 endpoint.

jmobrien · 2022-08-31T20:01:32Z

Thanks. I feel the same way, in that it's something I'm reluctant to touch given its history. I also personally don't care for the feature and always use it turned off, but for that reason I may also lack perspective for those that do.

As noted elsewhere, theres this new includeLabelColumns param that offers something similar:

So, we could theoretically do something that would link up recodes and labels, without calls to any external endpoints.

That said, the more I think about it it's not quite the same, and I think there are some important considerations for how to get this right:

"recode" ≠ "choice" - they're not necessarily the same (though they often are). So factoring with those would work differently from the existing convert approach. OTOH, recodes that differ from choices might actually be closer to what users usually want w/r/t the ordinal relationships among responses.
"choice" ≠ "order" - it's actually possible to have, say, "choices" 1-5 that consistently display as, say, {3, 4, 1, 2, 5} on respondents' screens. Also, a survey designer can create this by accident from the web interface, and it can be really hard to even know it happened (I encountered this issue myself). I think this concern applies to the current convert as well, and if so is something we should make a point of addressing.
"choice" ≠ "answer" - in, e.g., matrix questions, the rows (different subquestions) are "choices", while the responses are "answers". Answers also have the same ID/order separation. Not sure where this stands re:current convert.
unendorsed responses - the current approach at least has the benefit of trying to get all response options as levels, even those that might have had no respondent select them. includeLabelColumns probably would lose that.

jmobrien · 2022-08-31T20:08:22Z

@juliasilge re: your wise point in #278 about not annoying users with warnings, one thing I'm curious about. In the tidyverse packages I've seen a "warn on first instance in a session" behavior for certain actions, e.g.:

require(tidyverse)

vbls <- 
  c("mpg", "drat")

mtcars |> 
  select(test)

If we did end up changing our minds and decided to change the default, how hard would it be to do something like that?

juliasilge · 2022-08-31T20:16:10Z

@jmobrien It's not hard; it is arguments to rlang::inform(), like so.

jmobrien · 2022-08-31T20:28:02Z

@juliasilge thanks, that's useful to have in the toolbox.

FYI, all the above is essentially why the PR for fetch_description() happened, and why I also wrote a separate R package to process what came out of it. So, I've already "done" all the above, sort of--it's just that it was really "bespoke" (hacky) and not something I want to port over here.

It was hacky in part because the response schema weren't yet published at the time. Now they're out so they could be a great resource, but I'm still a little unclear where/how to get the specific endpoint response schema that describes everything that might come back from fetch_description(). Do you (or anyone) know where that might be?

juliasilge · 2022-09-04T17:54:10Z

I think the documentation here is pretty good for that endpoint. Does it have what you were thinking of?

jmobrien · 2022-09-07T15:37:50Z

Sort of--yes, the schema they're now publishing here are what I'm interested in. But using the web display for that isn't great.

Main issue is similar the one before when I was mining the survey description downloads themselves--the trees describing the survey/schema are of arbitrary depth, so it's not easy to map out where everything important is/would be.

A smaller challenge here is that the schema references other sub-schema, e.g. the Questions object that embeds what comes from here. (and is something we'd need if we want to properly update survey_questions().)

Here's what I know so far. The schema for each endpoint docs page aren't a part of the webpage source, but are imported via JS-based calls to some more programmatic API reference files served from company Stoplight, e.g. these things I dug out:

(These are JSON, Firefox has a nice tool for formatting these but apologies if it's a mess on your screen)
https://stoplight.io/api/v1/projects/cHJqOjk3NDQ/nodes/9d0928392673d-get-survey
https://stoplight.io/api/v1/projects/cHJqOjk3NDQ/nodes/YXBpOjYwOTM2-qualtrics-survey-api
https://stoplight.io/api/v1/projects/qualtricsv2/publicapidocs/nodes/reference/surveyDefinitions.json

That first link looks like the raw data we would want, just without the organizing/labeling the web page offers. I (or someone) could put the time in to manually build that framework from that raw data, sure--but I'm wondering if someone who knew things like web scraping and/or conventions of API publishing would see how this could be done faster + in a more maintainable way. Both are way outside my expertise, though.

jmobrien · 2022-09-07T15:40:04Z

Side note--these same Stoplight files are probably where we'd get any standardized list of metadata vars such as @chrisumphlett suggested in #272.

jmobrien added the discussion label Sep 7, 2022

jmobrien mentioned this issue Dec 8, 2022

Adjust response caching to consider all differences in API request params? #300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making `convert = FALSE` the default in `fetch_survey()` #281

Consider making `convert = FALSE` the default in `fetch_survey()` #281

juliasilge commented Aug 31, 2022

juliasilge commented Aug 31, 2022

jmobrien commented Aug 31, 2022

jmobrien commented Aug 31, 2022

juliasilge commented Aug 31, 2022

jmobrien commented Aug 31, 2022

juliasilge commented Sep 4, 2022

jmobrien commented Sep 7, 2022

jmobrien commented Sep 7, 2022

Consider making convert = FALSE the default in fetch_survey() #281

Consider making convert = FALSE the default in fetch_survey() #281

Comments

juliasilge commented Aug 31, 2022

juliasilge commented Aug 31, 2022

jmobrien commented Aug 31, 2022

jmobrien commented Aug 31, 2022

juliasilge commented Aug 31, 2022

jmobrien commented Aug 31, 2022

juliasilge commented Sep 4, 2022

jmobrien commented Sep 7, 2022

jmobrien commented Sep 7, 2022

Consider making `convert = FALSE` the default in `fetch_survey()` #281

Consider making `convert = FALSE` the default in `fetch_survey()` #281