Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import nexus file as ObservationMatrix #3929

Merged
merged 99 commits into from
May 24, 2024

Conversation

kleintom
Copy link
Contributor

Some questions and comments

  • Do we want to make ObservationMatrix documentable and document it with the nexus file, or should that be an option?
  • I added the beginnings of Document filter queries so that I could match on nexus file extensions when selecting a document. If that's not something you want to be able to filter on, no problem just let me know and I can revert it. If it is:
    • Right now I'm passing around a text string '.nex, .nxs' indicating nexus extensions, repeating that string in javascript, ruby, and specs - what's the right way to do that?
    • Are there other nexus extensions we should recognize here? .nexus (.tre?)
  • Do we want an option to generate descriptor short names like mx has?
  • I assume that ExceptionNotifier in production sends an email to somebody when a background job fails?
  • Currently ExceptionNotifier is configured by default to ignore the following errors: ["ActiveRecord::RecordNotFound", "Mongoid::Errors::DocumentNotFound", "AbstractController::ActionNotFound", "ActionController::RoutingError", "ActionController::UnknownFormat", "ActionController::UrlGenerationError"] - ActiveRecord::RecordNotFound is one I can theoretically hit in the background if I pass a bad document id (which I don't), so currently I'm converting that error to TaxonWorks::Error. I just wanted to check that it's intentional that we ignore RecordNotFound in background jobs (even if it's not intentional it may not be worth changing just for this).

…m a file

This commits the result of
`bin/rails generate job import_nexus --queue import_nexus`

Testing that import_nexus jobs can be created and run:
* From `bin/rails c`, do `::ImportNexusJob.perform_later()`
* From an app shell, do `QUEUE=import_nexus rake jobs:workoff` - it should report a nexus_import job was run (if you `puts` some text from your job you should see the text in your console)
… Import Nexus task

I tried as much as I could to write things so they could be reused in a Filter Documents task (in case that ever comes to be and if the facet I've written is desired there). In that event maybe we'll want to filter by document type as well, and maybe the user should be able to enter an extension and/or type.
… radio list

So you can select an extension and then go back to no extensions selected (which returns the most recent documents, which may or may not be available from other tabs depending on recent documentation activity).
… import

For now the import code is in a function called `runit` on the ObservationMatrix controller, for ease of testing.

Still many TODOs and decisions about how things should work, but this version will import a basic nexus matrix into TW.
As I noted at the defineModel() for the options ref, and as the vue docs point out: if you define your model ref default in defineModel, the default is not passed to the parent - so if your options, like matrix name, are never touched, then you wind up passing an empty options hash to the controller and it raises on an empty required param object.

The solution is simply to assign the default outside of defineModel().
…tu taxon matching

This is what was initially intended of course.
…names

That's what we were already doing, this just makes it optional.
…ng a new one

This is required for Qualitative Observations since they have a unique validation.
Note that the particular names assigned to gap states has implications for which existing character states will match the ones being imported (if the user chooses the option to match imported characters to existing ones), in ways users might not be able to guess unless they know the algorithm used to generate gap names.
…not just the first

There can be multiple descriptors with the same name
…e, use it for preview

That's what we were already doing, now there's an option to turn it off
…requests

On GET, as opposed to POST, booleans are passed as strings, not bools
…or preview as import

This supports repeated taxa, though the ui doesn't currently differentiate repeats.
…th preview and import

This supports duplicate nexus descriptors, though the matrix ui won't differentiate duplicates.
… a little

With the file read error combined with the parse error, I think the message
'Nexus parse error: Couldn't find Document with 'id'=0' works fine.
I never felt comfortable putting those in the model, but I'm still kinda fuzzy on why it's okay to have them in the controller just because they rely on current_session_id (does that get passed into a method in the model sometimes or should code that relies on session_current_project_id really not be in the model?).
They'll get discarded if they ever make it back to nexus_parser again.
@mjy
Copy link
Member

mjy commented May 16, 2024

Ignoring whitespace on those attributes is not good. We'll have to find another route. Can we pre-process the nexus names with the same string cleaning method before match? I'm actually working on some related improvements, I think it should be possible.

@mjy
Copy link
Member

mjy commented May 16, 2024

@kleintom I haven't looked at the UI here at all yet, but we can plan for a 2 stage release, first is just match as best as possible, second (post release) is a UI that lets users manually choose target when no match, or over-ride choice that was made as well. I think a "create all as new" is going to be useful as well. Again, I haven't looked at UI, so apologies if this is variously addressed.

@mjy
Copy link
Member

mjy commented May 16, 2024

Maybe another issue: on long descriptor names the ui gets pretty jumbled, th

Yes, definitely, a matrix renderer is another issue- that reminds me of another OS project I was involved in a long time ago.. for another story/issue.

@jlpereira
Copy link
Member

Maybe another issue: on long descriptor names the ui gets pretty jumbled, though you can more or less read the names when they change to black on hover: Screenshot 2024-05-15 at 18-13-44 TaxonWorks - _tasks_observation matrices_view

I pushed a fix for this, now long descriptors should be cut and visible on hover:

image

…ame as for Convert

A little awkward that those options are only listed for Convert, below the Preview button.
@kleintom
Copy link
Contributor Author

Can we pre-process the nexus names with the same string cleaning method before match?

Yes that worked out fine (I think). Importing those 15 morphobank nexus files and then importing them again with otus and descriptors matched results in no new otus, descriptors, character states, or observations.

I think a "create all as new" is going to be useful as well. Again, I haven't looked at UI, so apologies if this is variously addressed.

No problem, that's helpful. We do already support 'create all as new' if you just don't select any of the matching options, but I can see where being able to adjust the matching that does occur could be very helpful in the future.

One other related comment: currently you can select 'match otu by taxon' and 'match otu by name' at the same time: it matches by taxon first, then by name. That's what made sense to me, but I'm not actually sure matching both at the same time is useful, and if so if we should give an option in which order to match. Feedback can wait though.

I pushed a fix for this, now long descriptors should be cut and visible on hover:

Thanks, looks great to me!

@mjy
Copy link
Member

mjy commented May 17, 2024

@kleintom Thanks for the matching, sounds good. In your mind are the next steps to have us do some testing and get you feedback?

@kleintom
Copy link
Contributor Author

In your mind are the next steps to have us do some testing and get you feedback?

Yes, I think things have pretty much settled on my end, testing and feedback would be great, thanks!

@mjy
Copy link
Member

mjy commented May 22, 2024

@kleintom two requests upon first pass (looking great):

  • Development uses a new namespace for the nilify method, you'll need to change Utilities::Strings to Utilities::Rails::String. I.e. go ahead and pre-merge development into the PR.
  • Capitalize first word in option labels throughout

@mjy
Copy link
Member

mjy commented May 23, 2024

@jlpereira Can you take a very quick look at the vue here?

@kleintom please see the comment on the capitalized symbols, other than that things look ready for merge from the back-end side.

A non-blocking minor comment would be that we're going to ultimately try and move away from controller specs, and towards feature if we want to test functionality. I don't think it's worth blocking this merge with a port of the controller to feature here (they are significantly trickier to implement sometimes), but a note for future reference.

The one catch here is that if you send an empty options hash param to the controller, rails doesn't include a parameter for it, and then your params.require(:options) fails (it would also fail if options *was* sent as {}...).

My old workaround was to set the default values for options in vue (where really to solve the empty options hash issue I would have only needed to set one of them) - which I've removed here in favor of (re)creating an empty hash in the controller and then permitting it as the options hash. I like that better - I no longer need to list out all of my options in vue so that I can assign them defaults (the matching and the citation params are now encapsulated in their own components), and the new rails workaround just acknowledges that rails discards #empty? params.
@kleintom
Copy link
Contributor Author

Feeling good from my end as long as you're happy with the new fixes.

@mjy mjy merged commit 9ec47ba into SpeciesFileGroup:development May 24, 2024
5 checks passed
@mjy
Copy link
Member

mjy commented May 24, 2024

@kleintom Thanks, this is a fantastic contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants