Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "_id" field to Browse Form Data view #338

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

m4tx
Copy link

@m4tx m4tx commented Mar 12, 2017

This adds _id field to "Browse Form Data" view's table (/<username>/forms/<form-id>/instance#/<submission-no>)

I'm not sure about whether this patch's quality is appropriate (as the column is essentially hardcoded in the js source). Let me know if there's a nicer place to put that in.

@willywutang
Copy link

I would like to provide more background about why this patch can be very useful. In the initial stages of case management, often a new case will have no pre-existing identifier. Therefore, we must generate or assign an identifier that can be used to track this case. The UUID --- which is already exposed in the data view --- can be used for this purpose, but is far too long and clumsy for many applications. Alternatively, _id can be used as an identifier unique to this form. When quality control staff review data submitted to this form, they can then extract this _id identifier from the data view thanks to this patch.

@Tomatosoup97
Copy link

^ up: @dorey @esmail @jnm

@dorey
Copy link
Contributor

dorey commented Aug 14, 2017

We are not planning to continue using the _id column in future features for a few reasons:

  • It is not unique across servers
  • UUID exists and is unique
  • _id exposes the auto-incrementing database index. (more of an annoyance, really)

I don't think we will want to merge this change into master because UUID would be more future proof.

I'd like to note: in upcoming features (in kobotoolbox/kpi and kobotoolbox/formpack repositories), we have decided to use a new index that says "this is the nth submission for this form" because, as you noted, uuid is quite clumsy. This value was originally going to be gleaned from the index of the database query, but it might also make sense to store this value in a column of its own.

@jeverling
Copy link
Contributor

jeverling commented Aug 14, 2017 via email

@willywutang
Copy link

Thanks for the quick reply @dorey. A few comments:

  • We deal with many use cases where ids must occasionally be transcribed to paper, and consequently UUIDs can never work because of their length. Hence, we took this approach.
  • I'm not clear about why uniqueness across servers is important, as I imagine most users are dealing with one server.
  • AFAIK, exposing _id poses no security risk ... but we agree that seeing large jumps in _id within one form, due to high activity in other forms, can be surprising and annoying to users.

Some questions:

  • Have you had other users raise this issue before, and if so, what workarounds did they use, if any?
  • If you add this new "nth submission" index, do you plan to provide tools to help existing users migrate smoothly? For instance, will existing records have their nth submission index calculated retroactively?

After replying to our questions, we may close this PR.

@dorey
Copy link
Contributor

dorey commented Aug 14, 2017

Thanks @jeverling . I had forgotten that UUID changes on save. I don't think this was by design, so I will bring this up for discussion as something to be fixed in coming months.

@dorey
Copy link
Contributor

dorey commented Aug 14, 2017

@willywutang

  • Uniqueness of IDs across servers and databases is a practice that lends itself towards portable, mergeable datasets.
  • Reliance on auto-incrementing IDs does create security risks in our API. It makes it easy for a script to crawl an API and find data that has been marked as "public", but is not intended to be publicly listed. (ie. the owner only wants people with the link to see the data)
  • When we add an nth-submission index, it will be paired with a management command that allows server administrators to populate that field retroactively, yes.

@Tomatosoup97
Copy link

Reliance on auto-incrementing IDs does create security risks in our API. It makes it easy for a script to crawl an API and find data that has been marked as "public", but is not intended to be publicly listed

If exposing _id is a security risk, doesn't it indicate a security breach in your API, then? _id can be accessed by anyone in the API who have permissions for given submission (by using /api/v1/data/{form_id} endpoint). In this patch we are not really exposing it to the user - the user already had access to this property, we just made it conveniently visible in the data table. Please correct me if I'm wrong.

Not that i'm pushing to merge this change, just discussing the concept.

@dorey
Copy link
Contributor

dorey commented Aug 14, 2017

Yes. form_id as an incrementing integer is also problematic for the same reason. And this is not directly impacted by this PR. However, we do not want to move the application URL's in a direction which will be abandoned in short time. If we add a new URL to access individual submissions we want it to continue to work for the forseeable future.

On the other hand, because of the maintainability issues with the Kobocat repo, all the kobocat URLs might be phased out in the coming months once the features are ported over to KPI. (It will truly be a "legacy" interface that will stop receiving newly deployed projects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants