Samplebase is a database, which is:
- in-process/server-less
- file-local, all data lives in a user defined path and can be moved around
- document-based/ad-hoc, no need to define a model beforehand
- thread-safe, multiple readers, one writer
The core functionality is provided by the Sample(Document)
object. On top of that, there are a few utility methods to help with parallely processing Samples
.
Let there be a task, which you can easily solve for given input arguments.
result = solve(**args)
A Sample
is a pair of such args
and the corresponding result
.
Imagine you now have thousands of different args
that you want to sample.
Samplebase enables you to separate the creation of Samples
, from the execution of the solve
operation and from the analysis of results.
First define the task to be executed
import samplebase
def solve(x=None, y=None):
# a lengthy calculation
return {"product": x * y}
Span the space of arguments that you want to sample. Here, we have two samples.
data_dir = "/my/data/dir"
samplebase.create_sample(data_dir, args={"x": 2, "y": "barbara"})
samplebase.create_sample(data_dir, args={"x": 3, "y": "og"})
Map the function solve
on the samples, which are identified via their location on disk and their auto-generated names.
names = samplebase.names_of_samples(data_dir)
samplebase.run_parallel(func=solve, prefix=data_dir, sample_names=names)
Look at results.
samples = samplebase.list_of_samples(data_dir)
for s in samples:
print(s.result["product"])
>>> barbarabarbara
>>> ogogog
This last part can safely be executed in another interpreter/notebook even if samples are being processed.
Mostly because of parallel access with a server-less architecture. The motivation was: Being able to look at results, while some samples are still being processed
If this does not convince you, consider tinydb