Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-based cache can slow down experiments #422

Open
2 tasks
MichaelRoeder opened this issue Oct 26, 2022 · 3 comments
Open
2 tasks

File-based cache can slow down experiments #422

MichaelRoeder opened this issue Oct 26, 2022 · 3 comments
Assignees

Comments

@MichaelRoeder
Copy link
Member

Problem

If the file-based sameAs cache reaches a larger size, serializing it takes up some time. During this time, the serializing thread owns all semaphore permits of the cache and no other thread can make use of the cache.

The first thread in the following status is blocked because the second thread is writing the cache to a file.

eTConfig(XXX,XXX,"QA","STRONG_ENTITY_MATCH")
state=WAITING
progress=null
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:116)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMeanings(SameAsRetrieverUtils.java:39)
org.aksw.gerbil.semantic.sameas.SameAsRetrieverUtils.addSameURIsToMarkings(SameAsRetrieverUtils.java:32)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getPreparedDataset(AbstractDatasetConfiguration.java:100)
org.aksw.gerbil.dataset.AbstractDatasetConfiguration.getDataset(AbstractDatasetConfiguration.java:74)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:122)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig(XXX,XXX,"QA","STRONG_ENTITY_MATCH")
state=RUNNABLE
progress=100.0% of dataset
java.io.FileOutputStream.writeBytes(Native Method)
java.io.FileOutputStream.write(FileOutputStream.java:326)
java.io.ObjectOutputStream$BlockDataOutputStream.writeBlockHeader(ObjectOutputStream.java:1890)
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1875)
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1108)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.performCacheStorage(FileBasedCachingSameAsRetriever.java:258)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.requestUri(FileBasedCachingSameAsRetriever.java:184)
org.aksw.gerbil.semantic.sameas.impl.cache.FileBasedCachingSameAsRetriever.retrieveSameURIs(FileBasedCachingSameAsRetriever.java:135)
org.aksw.gerbil.semantic.sameas.impl.AbstractSameAsRetrieverDecorator.addSameURIs(AbstractSameAsRetrieverDecorator.java:43)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:560)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:167)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Solution

  • Improve the writing speed (if possible)
  • Allow reading operations while writing the cache to file
@MichaelRoeder
Copy link
Member Author

@MichaelRoeder
Copy link
Member Author

MichaelRoeder commented Oct 27, 2022

It seems like the low number of changes that are allowed before the cache is written to the hard disk caused GERBIL to write the cache at least once per minute. Making the threshold configurable and increasing it to 100k improved the runtime of GERBIL QA a lot (b277873).

@MichaelRoeder
Copy link
Member Author

We applied the same change to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant