Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot allocate memory #29

Open
dabitz opened this issue Jan 25, 2024 · 13 comments
Open

cannot allocate memory #29

dabitz opened this issue Jan 25, 2024 · 13 comments

Comments

@dabitz
Copy link

dabitz commented Jan 25, 2024

Hi,

Thanks a lot for the very nice tool!

I am trying to phase the subgenomes from this hexaploid haplotype-phased genome (9Gb), but somehow I always get stuck with the error message cannot allocate memory, despite changing the memory option several times... Any help with that is appreciated.

Cheers
André
...
24-01-25 07:23:35 [INFO] Loading kmer matrix from jellyfish
24-01-25 07:23:35 [INFO] Start Pool with 40 process(es)
24-01-25 07:23:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_53.fasta_15.fa
24-01-25 07:28:54 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_60.fasta_15.fa
24-01-25 07:29:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_5.fasta_15.fa
24-01-25 07:30:13 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_57.fasta_15.fa
24-01-25 07:30:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_61.fasta_15.fa
24-01-25 07:30:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_54.fasta_15.fa
24-01-25 07:31:00 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_22.fasta_15.fa
24-01-25 07:31:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_50.fasta_15.fa
24-01-25 07:31:46 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_52.fasta_15.fa
24-01-25 07:32:25 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_48.fasta_15.fa
24-01-25 07:32:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_42.fasta_15.fa
24-01-25 07:32:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_47.fasta_15.fa
24-01-25 07:32:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_55.fasta_15.fa
24-01-25 07:32:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_4.fasta_15.fa
24-01-25 07:33:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_35.fasta_15.fa
24-01-25 07:33:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_40.fasta_15.fa
24-01-25 07:33:53 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_25.fasta_15.fa
24-01-25 07:34:02 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_27.fasta_15.fa
24-01-25 07:34:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_38.fasta_15.fa
24-01-25 07:34:22 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_37.fasta_15.fa
24-01-25 07:35:11 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_41.fasta_15.fa
24-01-25 07:35:17 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_26.fasta_15.fa
24-01-25 07:35:28 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_33.fasta_15.fa
24-01-25 07:35:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_65.fasta_15.fa
24-01-25 07:35:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_28.fasta_15.fa
24-01-25 07:36:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_7.fasta_15.fa
24-01-25 07:36:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_17.fasta_15.fa
24-01-25 07:36:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_36.fasta_15.fa
24-01-25 07:36:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_30.fasta_15.fa
24-01-25 07:36:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_14.fasta_15.fa
24-01-25 07:37:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_18.fasta_15.fa
24-01-25 07:37:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_63.fasta_15.fa
24-01-25 07:38:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_1.fasta_15.fa
24-01-25 07:38:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_16.fasta_15.fa
24-01-25 07:38:27 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_31.fasta_15.fa
24-01-25 07:38:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_12.fasta_15.fa
24-01-25 07:38:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_11.fasta_15.fa
24-01-25 07:39:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_62.fasta_15.fa
24-01-25 07:39:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_23.fasta_15.fa
24-01-25 07:39:18 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_64.fasta_15.fa
24-01-25 07:39:23 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_66.fasta_15.fa
24-01-25 07:39:37 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_39.fasta_15.fa
24-01-25 07:39:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_15.fasta_15.fa
24-01-25 07:40:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_3.fasta_15.fa
24-01-25 07:40:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_21.fasta_15.fa
24-01-25 07:40:29 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_24.fasta_15.fa
24-01-25 07:41:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_29.fasta_15.fa
24-01-25 07:41:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_34.fasta_15.fa
24-01-25 07:41:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_32.fasta_15.fa
24-01-25 07:41:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_56.fasta_15.fa
24-01-25 07:42:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_8.fasta_15.fa
24-01-25 07:42:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_9.fasta_15.fa
24-01-25 07:42:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_10.fasta_15.fa
24-01-25 07:42:43 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_13.fasta_15.fa
24-01-25 07:42:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_2.fasta_15.fa
24-01-25 07:43:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa
24-01-25 07:43:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa
24-01-25 07:43:30 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa
24-01-25 07:43:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa
24-01-25 07:43:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa
24-01-25 07:43:51 [INFO] 62557073 kmers in total
24-01-25 07:43:51 [INFO] Filtering differential kmers
Traceback (most recent call last):
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 422, in run
d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter
for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/RunCmdsMP.py", line 336, in pool_func
pool = multiprocessing.Pool(processors)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

@zhangrengang
Copy link
Owner

How much is the RAM of your computer?

@dabitz
Copy link
Author

dabitz commented Jan 25, 2024

That's the thing. I am running from a cluster with 500G RAM and 64 threads

@zhangrengang
Copy link
Owner

How about the peak memory? Surely the large genome require large memory, but I can run the wheat genome (14Gb, 140M kmers, 21 chromosomes) with 1Tb RAM.
If it actually exceed the 500G RAM, you may try to increase -lower_count to reduce kmers, or reduce the chromosomes in the config file. If necessary, you may try to decrease the chunksize in /netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py from:

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 20000

to

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 200

By the way, if your hexaploid is an autohexaploid, there's no reason to waste time to try subphaser.

@dabitz
Copy link
Author

dabitz commented Jan 25, 2024

Thanks! I will try on our HPC cluster with 1TB or adjust the parameters as you suggested. How long should the whole run take?
I am not sure is a autopolyploid, there is some evidence for a hybdrid between autotetra with a diploid.

@zhangrengang
Copy link
Owner

In general 1-2 days is needed for the large genome.

@dabitz
Copy link
Author

dabitz commented Jan 26, 2024

somehow is strange... running on our HPC node with 1TB the job exits with>
Resource usage summary:

CPU time   :   2301.09 sec.
Max Memory :     69212 MB
Max Swap   :    754180 MB

Max Processes  :        44
Max Threads    :        48

@zhangrengang
Copy link
Owner

Are you using SLURM which limits Memory according to Processes?

@dabitz
Copy link
Author

dabitz commented Jan 26, 2024

We use LSF, but I set the memory limit to 980G, and still exits. But it seems that the max memory set was not even reached before it exits.

@zhangrengang
Copy link
Owner

It is strange. You may try to reduce the -cpu set to 1 to see the memory cost.

@dabitz
Copy link
Author

dabitz commented Feb 5, 2024

it did advance a bit, but still failed.

24-02-04 08:21:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa
24-02-04 08:22:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa
24-02-04 08:22:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa
24-02-04 08:22:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa
24-02-04 08:22:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa
24-02-04 08:22:48 [INFO] 62557073 kmers in total
24-02-04 08:22:48 [INFO] Filtering differential kmers
24-02-04 08:22:48 [INFO] Start Pool with 1 process(es)
24-02-04 08:28:46 [INFO] Processed 10000000 kmers
24-02-04 08:34:51 [INFO] Processed 20000000 kmers
24-02-04 08:40:59 [INFO] Processed 30000000 kmers
24-02-04 08:47:03 [INFO] Processed 40000000 kmers
24-02-04 08:52:35 [INFO] Processed 50000000 kmers
24-02-04 08:58:40 [INFO] Processed 60000000 kmers
24-02-04 09:00:08 [INFO] After filtering, remained 4 (0.00%) differential (freq >= 200) and 56 (0.00%) candidate (freq > 0) kmers
24-02-04 09:00:08 [INFO] Plot /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_freq.pdf
24-02-04 09:00:44 [INFO] New check point file: /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_CBC_k15_q200_f2.kmer.mat.ok
24-02-04 09:00:44 [INFO] ###Step: Cluster
24-02-04 09:00:44 [INFO] Performing bootstrap of 1000 replicates, with each replicate resampling 50% data with replacement
24-02-04 09:01:29 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9635; mean V-measure score: 0.9538
24-02-04 09:01:29 [INFO] Subgenome assignments: OrderedDict([('scaffold_1', 'SG1'), ('scaffold_4', 'SG2'), ('scaffold_16', 'SG3'), ('scaffold_18', 'SG2'), ('scaffold_25', 'SG3'), ('scaffold_28', 'SG3'), ('scaffold_7', 'SG2'), ('scaffold_11', 'SG2'), ('scaffold_12', 'SG2'), ('scaffold_14', 'SG3'), ('scaffold_63', 'SG2'), ('scaffold_66', 'SG5'), ('scaffold_41', 'SG2'), ('scaffold_47', 'SG2'), ('scaffold_52', 'SG3'), ('scaffold_54', 'SG2'), ('scaffold_55', 'SG2'), ('scaffold_57', 'SG2'), ('scaffold_5', 'SG2'), ('scaffold_37', 'SG2'), ('scaffold_38', 'SG2'), ('scaffold_40', 'SG3'), ('scaffold_42', 'SG2'), ('scaffold_48', 'SG2'), ('scaffold_22', 'SG3'), ('scaffold_23', 'SG2'), ('scaffold_17', 'SG2'), ('scaffold_35', 'SG2'), ('scaffold_36', 'SG2'), ('scaffold_65', 'SG2'), ('scaffold_26', 'SG2'), ('scaffold_27', 'SG2'), ('scaffold_30', 'SG2'), ('scaffold_31', 'SG2'), ('scaffold_33', 'SG2'), ('scaffold_39', 'SG4'), ('scaffold_50', 'SG2'), ('scaffold_53', 'SG3'), ('scaffold_60', 'SG2'), ('scaffold_61', 'SG2'), ('scaffold_62', 'SG2'), ('scaffold_64', 'SG2'), ('scaffold_15', 'SG4'), ('scaffold_3', 'SG5'), ('scaffold_21', 'SG3'), ('scaffold_24', 'SG3'), ('scaffold_29', 'SG3'), ('scaffold_34', 'SG4'), ('scaffold_32', 'SG3'), ('scaffold_56', 'SG2'), ('scaffold_8', 'SG2'), ('scaffold_9', 'SG2'), ('scaffold_10', 'SG2'), ('scaffold_13', 'SG2'), ('scaffold_2', 'SG3'), ('scaffold_19', 'SG3'), ('scaffold_20', 'SG3'), ('scaffold_45', 'SG6'), ('scaffold_46', 'SG6'), ('scaffold_6', 'SG1')])
24-02-04 09:01:29 [INFO] Outputing chromosome - subgenome assignments to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.chrom-subgenome.tsv
24-02-04 09:01:29 [INFO] Outputing significant differiential kmer - subgenome maps to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.sig.kmer-subgenome.tsv
24-02-04 09:01:29 [INFO] Start Pool with 1 process(es)
24-02-04 09:01:29 [INFO] 3 significant subgenome-specific kmers
24-02-04 09:01:29 [INFO] 3 SG1-specific kmers
24-02-04 09:01:29 [INFO] run CMD: Rscript /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer.mat.R
24-02-04 09:01:31 [INFO] Outputing PCA plot to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_pca.pdf
Traceback (most recent call last):
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in
sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')())
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main
pipeline.run()
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 469, in run
cluster.pca(outfig, n_components=self.nsg, sg_color=self.colors,)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Cluster.py", line 50, in pca
X_pca = pca.fit_transform(self.data)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 383, in fit_transform
U, S, Vt = self._fit(X)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 430, in _fit
return self._fit_full(X, n_components)
File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 446, in _fit_full
raise ValueError("n_components=%r must be between 0 and "
ValueError: n_components=6 must be between 0 and min(n_samples, n_features)=4 with svd_solver='full'

However, it produced the two PDFs attached, which I assume seems to indicate a pretty much autohexaploid origin, right?

CBC_k15_q200_f2.kmer_freq.pdf
CBC_k15_q200_f2.kmer.mat.pdf

@zhangrengang
Copy link
Owner

The error is because there are too few differential kmers (only four). But it is not the time to say it is an autohexaploid. You may set -nsg 3 and -baseline 2, or prune the three allelic chromosome sets to result three homoeologous chromosome sets like the wheat's ABD assembly. Even if it was an allohexploid (for example AABBDD), the current settings are identify differential kmers by comparing the homologous chromosome pairs (e.g. the two As).

@dabitz
Copy link
Author

dabitz commented Feb 19, 2024

Ok, thanks a lot for the suggestion. I have finally managed to run SubPhaser using the unphased genome version and as initially suspected, I guess it looks pretty much like an autohexaploid except for a few chromosomes... Due to introgression maybe???
CBC_hap1k15_q200_f2.circos.pdf
CBC_hap1k15_q200_f2.LTR_Gypsy.tree.pdf
CBC_hap1k15_q200_f2.ltr.insert.density.pdf
CBC_hap1k15_q200_f2.kmer_pca.pdf
CBC_hap1k15_q200_f2.kmer.mat.pdf

@zhangrengang
Copy link
Owner

Yes, it looks like an autohexaploid. You may generate a kmer histogram and Smudgeplot (https://github.com/KamilSJaron/smudgeplot) for cross-valiadation. The plots can be generated from whole-genome HiFi reads.
Introgression is hard to say based on the results only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants