Import nexus file as ObservationMatrix #3929

kleintom · 2024-04-29T15:41:43Z

Some questions and comments

Do we want to make ObservationMatrix documentable and document it with the nexus file, or should that be an option?
I added the beginnings of Document filter queries so that I could match on nexus file extensions when selecting a document. If that's not something you want to be able to filter on, no problem just let me know and I can revert it. If it is:
- Right now I'm passing around a text string '.nex, .nxs' indicating nexus extensions, repeating that string in javascript, ruby, and specs - what's the right way to do that?
- Are there other nexus extensions we should recognize here? .nexus (.tre?)
Do we want an option to generate descriptor short names like mx has?
I assume that ExceptionNotifier in production sends an email to somebody when a background job fails?
Currently ExceptionNotifier is configured by default to ignore the following errors: ["ActiveRecord::RecordNotFound", "Mongoid::Errors::DocumentNotFound", "AbstractController::ActionNotFound", "ActionController::RoutingError", "ActionController::UnknownFormat", "ActionController::UrlGenerationError"] - ActiveRecord::RecordNotFound is one I can theoretically hit in the background if I pass a bad document id (which I don't), so currently I'm converting that error to TaxonWorks::Error. I just wanted to check that it's intentional that we ignore RecordNotFound in background jobs (even if it's not intentional it may not be worth changing just for this).

…m a file This commits the result of `bin/rails generate job import_nexus --queue import_nexus` Testing that import_nexus jobs can be created and run: * From `bin/rails c`, do `::ImportNexusJob.perform_later()` * From an app shell, do `QUEUE=import_nexus rake jobs:workoff` - it should report a nexus_import job was run (if you `puts` some text from your job you should see the text in your console)

… matrix

… Import Nexus task I tried as much as I could to write things so they could be reused in a Filter Documents task (in case that ever comes to be and if the facet I've written is desired there). In that event maybe we'll want to filter by document type as well, and maybe the user should be able to enter an extension and/or type.

… radio list So you can select an extension and then go back to no extensions selected (which returns the most recent documents, which may or may not be available from other tabs depending on recent documentation activity).

…import

… import For now the import code is in a function called `runit` on the ObservationMatrix controller, for ease of testing. Still many TODOs and decisions about how things should work, but this version will import a basic nexus matrix into TW.

As I noted at the defineModel() for the options ref, and as the vue docs point out: if you define your model ref default in defineModel, the default is not passed to the parent - so if your options, like matrix name, are never touched, then you wind up passing an empty options hash to the controller and it raises on an empty required param object. The solution is simply to assign the default outside of defineModel().

…tu taxon matching This is what was initially intended of course.

…names That's what we were already doing, this just makes it optional.

…ng a new one This is required for Qualitative Observations since they have a unique validation.

Note that the particular names assigned to gap states has implications for which existing character states will match the ones being imported (if the user chooses the option to match imported characters to existing ones), in ways users might not be able to guess unless they know the algorithm used to generate gap names.

…not just the first There can be multiple descriptors with the same name

…d somewhat

…e, use it for preview That's what we were already doing, now there's an option to turn it off

…requests On GET, as opposed to POST, booleans are passed as strings, not bools

…or preview as import This supports repeated taxa, though the ui doesn't currently differentiate repeats.

…th preview and import This supports duplicate nexus descriptors, though the matrix ui won't differentiate duplicates.

… a little With the file read error combined with the parse error, I think the message 'Nexus parse error: Couldn't find Document with 'id'=0' works fine.

I never felt comfortable putting those in the model, but I'm still kinda fuzzy on why it's okay to have them in the controller just because they rely on current_session_id (does that get passed into a method in the model sometimes or should code that relies on session_current_project_id really not be in the model?).

@mjy

…s gap states Per @mjy (thanks!).

They'll get discarded if they ever make it back to nexus_parser again.

mjy · 2024-05-16T19:15:50Z

Ignoring whitespace on those attributes is not good. We'll have to find another route. Can we pre-process the nexus names with the same string cleaning method before match? I'm actually working on some related improvements, I think it should be possible.

mjy · 2024-05-16T19:17:52Z

@kleintom I haven't looked at the UI here at all yet, but we can plan for a 2 stage release, first is just match as best as possible, second (post release) is a UI that lets users manually choose target when no match, or over-ride choice that was made as well. I think a "create all as new" is going to be useful as well. Again, I haven't looked at UI, so apologies if this is variously addressed.

mjy · 2024-05-16T19:48:55Z

Maybe another issue: on long descriptor names the ui gets pretty jumbled, th

Yes, definitely, a matrix renderer is another issue- that reminds me of another OS project I was involved in a long time ago.. for another story/issue.

jlpereira · 2024-05-16T20:50:06Z

Maybe another issue: on long descriptor names the ui gets pretty jumbled, though you can more or less read the names when they change to black on hover:

I pushed a fix for this, now long descriptors should be cut and visible on hover:

…ame as for Convert A little awkward that those options are only listed for Convert, below the Preview button.

kleintom · 2024-05-17T14:17:29Z

Can we pre-process the nexus names with the same string cleaning method before match?

Yes that worked out fine (I think). Importing those 15 morphobank nexus files and then importing them again with otus and descriptors matched results in no new otus, descriptors, character states, or observations.

I think a "create all as new" is going to be useful as well. Again, I haven't looked at UI, so apologies if this is variously addressed.

No problem, that's helpful. We do already support 'create all as new' if you just don't select any of the matching options, but I can see where being able to adjust the matching that does occur could be very helpful in the future.

One other related comment: currently you can select 'match otu by taxon' and 'match otu by name' at the same time: it matches by taxon first, then by name. That's what made sense to me, but I'm not actually sure matching both at the same time is useful, and if so if we should give an option in which order to match. Feedback can wait though.

I pushed a fix for this, now long descriptors should be cut and visible on hover:

Thanks, looks great to me!

mjy · 2024-05-17T14:22:30Z

@kleintom Thanks for the matching, sounds good. In your mind are the next steps to have us do some testing and get you feedback?

kleintom · 2024-05-17T14:41:36Z

In your mind are the next steps to have us do some testing and get you feedback?

Yes, I think things have pretty much settled on my end, testing and feedback would be great, thanks!

mjy · 2024-05-22T13:07:42Z

@kleintom two requests upon first pass (looking great):

Development uses a new namespace for the nilify method, you'll need to change Utilities::Strings to Utilities::Rails::String. I.e. go ahead and pre-merge development into the PR.
Capitalize first word in option labels throughout

app/controllers/observation_matrices_controller.rb

mjy · 2024-05-23T15:34:57Z

@jlpereira Can you take a very quick look at the vue here?

@kleintom please see the comment on the capitalized symbols, other than that things look ready for merge from the back-end side.

A non-blocking minor comment would be that we're going to ultimately try and move away from controller specs, and towards feature if we want to test functionality. I don't think it's worth blocking this merge with a port of the controller to feature here (they are significantly trickier to implement sometimes), but a note for future reference.

app/javascript/vue/tasks/observation_matrices/import_nexus/components/ImportOptions.vue

app/javascript/vue/tasks/observation_matrices/import_nexus/components/CitationOptions.vue

app/javascript/vue/tasks/observation_matrices/import_nexus/components/ImportPreview.vue

app/javascript/vue/tasks/observation_matrices/import_nexus/components/ImportList.vue

My misunderstanding. This reverts commit e8bb42e.

Thanks for the suggestion José!

The one catch here is that if you send an empty options hash param to the controller, rails doesn't include a parameter for it, and then your params.require(:options) fails (it would also fail if options *was* sent as {}...). My old workaround was to set the default values for options in vue (where really to solve the empty options hash issue I would have only needed to set one of them) - which I've removed here in favor of (re)creating an empty hash in the controller and then permitting it as the options hash. I like that better - I no longer need to list out all of my options in vue so that I can assign them defaults (the matching and the citation params are now encapsulated in their own components), and the new rails workaround just acknowledges that rails discards #empty? params.

…ing saved

kleintom · 2024-05-24T15:40:57Z

Feeling good from my end as long as you're happy with the new fixes.

mjy · 2024-05-24T18:18:41Z

@kleintom Thanks, this is a fantastic contribution.

kleintom added 30 commits April 23, 2024 17:21

SpeciesFileGroup#2029 Add nexus_parser gem and Vendor code

a03100b

SpeciesFileGroup#2029 Create a new task to import a nexus file into a…

9f2635f

… matrix

Support selection_options on Document for SmartSelector

1aef7c0

SpeciesFileGroup#2029 Use a SmartSelector to select a document to import

f55760d

SpeciesFileGroup#2029 Support New option in nexus file smart selector

c7d28ff

Support filtering by extension type on Documents

c413afb

SpeciesFileGroup#2029 Move document selection to its own component

fb1969d

Update model description comments in Document.rb

e5bef9c

document filter fixup

332e2b6

SpeciesFileGroup#2029 Fetch and display preview OTUs from nexus file …

f508326

…import

SpeciesFileGroup#2029 Display preview of descriptors from nexus file

f7a9afe

SpeciesFileGroup#2029 Match otus against cached instead of name for o…

47dcf39

…tu taxon matching This is what was initially intended of course.

SpeciesFileGroup#2029 Make an option to match otus to taxon (cached) …

dd5f035

…names That's what we were already doing, this just makes it optional.

SpeciesFileGroup#2029 Add option to match otus to the db by otu name

3eff618

SpeciesFileGroup#2029 Check for an existing Observation before creati…

316f2c3

…ng a new one This is required for Qualitative Observations since they have a unique validation.

SpeciesFileGroup#2029 Check all Descriptors for a name/states match, …

651d8ad

…not just the first There can be multiple descriptors with the same name

SpeciesFileGroup#2029 Refactor loading a nexus file from a document_i…

df49782

…d somewhat

SpeciesFileGroup#2029 Add option to match descriptors to db using nam…

06d760c

…e, use it for preview That's what we were already doing, now there's an option to turn it off

SpeciesFileGroup#2029 Fix option params handling of booleans for GET …

ceb3980

…requests On GET, as opposed to POST, booleans are passed as strings, not bools

SpeciesFileGroup#2029 Find matched otus all at once, doing the same f…

ee5487f

…or preview as import This supports repeated taxa, though the ui doesn't currently differentiate repeats.

SpeciesFileGroup#2029 Factor out descriptor matching and use it in bo…

810259a

…th preview and import This supports duplicate nexus descriptors, though the matrix ui won't differentiate duplicates.

SpeciesFileGroup#2029 Tidy up error catching on nexus file load/parse…

281969d

… a little With the file read error combined with the parse error, I think the message 'Nexus parse error: Couldn't find Document with 'id'=0' works fine.

SpeciesFileGroup#2029 Use 'gap' instead of 'tw_gap' for un-named nexu…

d071bd3

…s gap states Per @mjy (thanks!).

SpeciesFileGroup#2029 Don't use [] brackets in generated names

b2d935a

They'll get discarded if they ever make it back to nexus_parser again.

kleintom added 3 commits May 16, 2024 20:44

Merge branch 'development' into 2029_import_nexus_file

19c7b43

SpeciesFileGroup#2029 cleanup

4b05d51

SpeciesFileGroup#2029 Clarify that the Preview matching options are s…

0ac0c1d

…ame as for Convert A little awkward that those options are only listed for Convert, below the Preview button.

kleintom added 3 commits May 22, 2024 22:51

Merge branch 'development' into 2029_import_nexus_file

97f48d4

SpeciesFileGroup#2029 Update nil_squish_strip namespace

fae2e71

SpeciesFileGroup#2029 Capitalize first word of all options

e8bb42e

mjy reviewed May 23, 2024

View reviewed changes

app/controllers/observation_matrices_controller.rb Outdated Show resolved Hide resolved