Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import_Op.apply() is really slow... #347

Open
benweinberg89 opened this issue Jan 17, 2023 · 1 comment
Open

Import_Op.apply() is really slow... #347

benweinberg89 opened this issue Jan 17, 2023 · 1 comment

Comments

@benweinberg89
Copy link

Thank you for your bug report!

Please fill out the following template.

PLATFORM (Mac, PC, Linux, other):

Mac, M1 Pro

OPERATING SYSTEM (eg OSX 10.7, Windows 8.1):

OS 10.12

SEVERITY (Critical? Major? Minor? Enhancement?):

Enhancement

DESCRIPTION:

  • What were you trying to do?
    Importing many flow tube objects into a single Experiment object is very slow. My guess is its Experiment.add_events() that is slow with a growing set of events.
  • What happened?
    It starts fast then slows down as it keeps iterating through more tubes. It took about 1hr to do 450 samples as a single Experiment object.
  • What did you expect to happen?
    Would like it to run a lot faster

Don't forget to attach the log file to this bug report!

If you are having trouble with a particular FCS file, please attach that file too.

@benweinberg89
Copy link
Author

benweinberg89 commented Mar 5, 2023

Alright, I started digging into this and the slow steps seem to occur in experiment.add_events().
In particular, this code below grows in slowness upon growth of the experiment object:

# check that the conditions for this tube exist in the experiment
# already
if( any(True for k in conditions if k not in self.conditions) or \
    any(True for k in self.conditions if k not in conditions) ):
    raise util.CytoflowError("Metadata for this tube should be {}"
                             .format(list(self.conditions.keys())))

And this line also is slow:
meta_type = self.conditions[meta_name].dtype
Commenting out the first code box (and assuring I always set all my conditions correctly) and hard-coding meta_type to "category", which I used for all my conditions anyways DRAMATICALLY speeds up import. Something that took an hour to import now takes just over 5 minutes. There is still some inefficiency, which I may still dig into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant