Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using (wrapped?) Gen distributions in PClean models? #13

Open
marcoct opened this issue Mar 5, 2021 · 1 comment
Open

Support using (wrapped?) Gen distributions in PClean models? #13

marcoct opened this issue Mar 5, 2021 · 1 comment

Comments

@marcoct
Copy link
Contributor

marcoct commented Mar 5, 2021

Also related to https://github.com/probcomp/GenDistributions.jl and probcomp/Gen.jl#362

@marcoct
Copy link
Contributor Author

marcoct commented Mar 5, 2021

It does seem useful to discuss whether it's really a different set of modeling primitives that are intended to be used in PClean versus Gen. In some cases, there could be the same primitives, but with different -- and less jargon-y -- names. But I could also imagine that most PClean users won't need to model the low-level numerical data types that most Gen distributions are based on.

This came up because I saw a date field in my data set. My initial reaction was "PClean probably needs a Date type" with a D/M/Y integers. But then I thought, well -- aren't dates basically integers from some day 0? So I just need a distribution on integers, and Gen has that. But I think for dates, and most other data appearing in PClean data sets, there are many representations that could be optimized for expressing different types of knowledge.

A key question is whether to (i) encourage lower-level logic like constructing dates to take place in user code for now (e.g. if I wanted to model dates, I could use Ints and then write the manual String conversion code in the model, I think) -- an approach that would make it natural to overlap with Gen's distribution, or (ii) stay with the current pattern of adding distributions for higher-level data types that are have more specialized semantics, like 'Date'.

It also seems like a user might be able to get pretty far with the string distributions provided. It's not obvious to me when more distributions and other primitives need to be added, or what the process for adding them could be.

@marcoct marcoct changed the title Add support for using (or wrapping) Gen distributions in PClean models Add support for using (or wrapping) Gen distributions in PClean models? Mar 5, 2021
@marcoct marcoct changed the title Add support for using (or wrapping) Gen distributions in PClean models? Support using (or wrapping) Gen distributions in PClean models? Mar 5, 2021
@marcoct marcoct changed the title Support using (or wrapping) Gen distributions in PClean models? Support using (wrapped?) Gen distributions in PClean models? Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant