Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCTX parsing is not thread safe. #68

Open
nborunov-integral opened this issue Jan 19, 2021 · 0 comments
Open

GCTX parsing is not thread safe. #68

nborunov-integral opened this issue Jan 19, 2021 · 0 comments

Comments

@nborunov-integral
Copy link

nborunov-integral commented Jan 19, 2021

Here is the code I'm using:
`import cmapPy.pandasGEXpress.parse_gctx as parse_gctx
import time
from threading import Thread

res = []
threads = []

def read(idx):
    print(f'Start reading {idx}')
    t = time.time()
    res.append(parse_gctx.parse('GSE92742_Broad_LINCS_Level5_COMPZ.MODZ_n473647x12328.gctx', ridx=[idx]))
    t = (time.time() - t)
    print(f'Done reading {idx} in {t} seconds')

threads.append(Thread(target=read, args=(6000,)))
threads.append(Thread(target=read, args=(12000,)))
threads.append(Thread(target=read, args=(5000,)))
threads.append(Thread(target=read, args=(300,)))
threads.append(Thread(target=read, args=(40,)))
threads.append(Thread(target=read, args=(800,)))

all_t = time.time()

for t in threads:
    t.start()

for t in threads:
    t.join()

all_t = time.time() - all_t

print(f'The End in {all_t} seconds')

all_t = time.time()
res = []
for idx in [234, 4351, 6233, 9087, 987, 97]:
    read(idx)

all_t = time.time() - all_t

print(f'The End in {all_t} seconds')`

And here is the output:
Start reading 12000 Start reading 6000 Start reading 5000 Start reading 800 Start reading 40 Start reading 300 Done reading 12000 in 337.7198541164398 seconds Done reading 6000 in 337.7183690071106 seconds Done reading 800 in 338.19431233406067 seconds Done reading 300 in 338.36488699913025 seconds Done reading 5000 in 339.04932618141174 seconds Done reading 40 in 339.0456030368805 seconds The End in 339.0754089355469 seconds Start reading 234 Done reading 234 in 55.63448905944824 seconds Start reading 4351 Done reading 4351 in 55.87116312980652 seconds Start reading 6233 Done reading 6233 in 55.85987401008606 seconds Start reading 9087 Done reading 9087 in 55.898045778274536 seconds Start reading 987 Done reading 987 in 56.020151138305664 seconds Start reading 97 Done reading 97 in 56.393441915512085 seconds The End in 335.67835783958435 seconds

As you can it takes about 55 sec to read one record, when I read the records sequentially. When I try to create parallel threads it take the same time as when I read the files sequentially instead of about 55 seconds for all in the threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant