Delay proposals of values with no likelihoods in the current subproblem #9

alex-lew · 2021-03-04T06:20:29Z

Consider the program

class NameWithNickname begin
  true_name ~ string_prior(1, 30) preferring all_first_names
  nickname ~ string_prior(1, 30) preferring all_first_names
end
class Person begin
  fname ~ NameWithNickname
  lname ~ string_prior(1, 30) preferring all_last_names
end
class Record begin
  person ~ Person
  name ~ uniform([person.fname.true_name, person.fname.nickname])
end

The problem here is that when you first process a record, you are (by design) assumed to be observing either the person’s true first name, or their nickname. But PClean will try to initialize both latent fields. Suppose you see a person’s first name, and PClean gets it right that it’s a full first name. Then later you see their nickname in another record. You won’t be able to assign the new record to the same “person” object, because the “person” object will already have some (other, generated-from-the-prior) nickname.

If we can delay the proposal of the "other" latent until we have evidence for it, we could circumvent this issue, and do accurate inference in models like this.

This is also very relevant for data integration across multiple sources, where different sources may report different attributes.

The text was updated successfully, but these errors were encountered:

alex-lew added the research label Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay proposals of values with no likelihoods in the current subproblem #9

Delay proposals of values with no likelihoods in the current subproblem #9

alex-lew commented Mar 4, 2021 •

edited

Delay proposals of values with no likelihoods in the current subproblem #9

Delay proposals of values with no likelihoods in the current subproblem #9

Comments

alex-lew commented Mar 4, 2021 • edited

alex-lew commented Mar 4, 2021 •

edited