Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to choose between safe/unsafe/fastcholesky inverse in Gaussian nodes #76

Open
bvdmitri opened this issue Jan 10, 2022 · 7 comments

Comments

@bvdmitri
Copy link
Member

bvdmitri commented Jan 10, 2022

This might be a meta flag for Gaussian nodes.
For example by default we might assume Hermittian structure and use meta = AssumeHermittian() that fallbacks to fastcholesky.

cholinv(::AssumeHermittian, something) = fastcholesky(something)

In other cases we might prever pinv(), e.g. meta = AssumeNonHermittian()

cholinv(::AssumeNonHermittian, something) = pinv(something)

@bartvanerp , @albertpod

@bartvanerp
Copy link
Member

I am not sure whether this should be a flag for Gaussian nodes. I think it is better to always constrain the cholinv method. Cholesky by definition only applies for Hermitian/Symmetric matrices. Therefore cholinv should actually only be defined as cholinv(A::Hermitian) instead of cholinv(A::AbstractMatrix).

This issue might be better formulated whether we wish to constrain covariance/precision matrices to always be Hermitian. I am not sure whether we have cases when this does not apply.

@albertpod
Copy link
Member

albertpod commented Jan 10, 2022

Agree with @bartvanerp that cholinv should be applied to Hermitian matrices only.

However, we can easily encounter situations when the pos-def property is violated.

                 (u)
                  |
    (x_t_prev) --> + --> = --> (x_t)
                        | 
                 (H)--> *
                        |
                (c')--> *
                        |
                        N
                        |
                      (y_t)

Assuming x_t in R, H is 2x1 vector, and c' is transposed unit vector of length 2. N denotes Gaussian node.
This is a fair setup from ReactiveMP.jl perspective, moreover, it's also fair mathematically; there is nothing but linear transformation of a gaussian x_t. However, if you look at the marginal between * edges, you can easily spot that it results from the multiplication of two degenerate distributions.

Another 'easy to spot' example is from the rule itself:

@rule typeof(*)(:out, Marginalisation) (m_A::PointMass{ <: AbstractVector }, m_in::UnivariateNormalDistributionsFamily) = begin
    a = mean(m_A)

    μ_in, v_in = mean_var(m_in)

    # TODO: check, do we need correction! here? (ForneyLab does not have any correction in this case)
    # TODO: Σ in this rule is guaranteed to be ill-defined, has rank equal to one and has determinant equal to zero
    μ = μ_in * a
    Σ = mul_inplace!(v_in, a * a')

    return MvNormalMeanCovariance(μ, Σ)
end

You are almost always guaranteed to end up with a non-Hermitian "covariance" matrix in this case.

@bartvanerp
Copy link
Member

I agree that sometimes we end up with improper distributions, from which the covariance/precision matrix is not positive definite. However, this does not mean that the matrix is not Hermitian I believe. Maybe I am wrong here, but the cholesky function just assumes that the covariance/precision matrix can be represented by just its lower/upper triangular matrix.

What happens in #69 is the following:

  • We assume that the matrix is symmetrical/Hermitian. Then fastcholesky will calculate the cholesky decomposition by using only the lower/upper triangular of the matrix. The other part of the matrix is used for intermediate storage to prevent extra allocations.
  • If the matrix turned out to be positive definite and the algorithm finishes, great job.
  • If the algorithm fails (because of taking a negative square root as a result of not positive definiteness) then the algorithm returns a flag, after which we call cholesky(Positive, A), which takes care or the indefiniteness of the matrix.

@bvdmitri
Copy link
Member Author

@bartvanerp the problem here is that cholesky(Positive, A) also assumes that the matrix is symmetrical/Hermitian. But it accepts slightly non-sym/non-Hermitian, makes a "smart" correction and then executes standard cholesky method.

This issue came up from our conversion with Albert. There are some matrices that do have a proper inverse, but they are neither symmetrical nor Hermitian. In some situation (IMO rare, but anyway) user might prefer to use a different inverse method if they now the structure of their improper covariance matrix. E.g.

julia> A = [ 1.0 1.0; 0.0 1.0 ]
2×2 Matrix{Float64}:
 1.0  1.0
 0.0  1.0

julia> inv(A) * A
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0

julia> ReactiveMP.cholinv(A) * A
2×2 Matrix{Float64}:
 1.0  1.0
 0.0  1.0

I do think though that having such an inverse for lets say covariance matrix might make little sense from mathematical point of view. As a workaround a user can always define such methods with the @rule macro on their side with custom meta and use whatever inverse method they prefer.

@bartvanerp
Copy link
Member

bartvanerp commented Jan 10, 2022

I am not sure whether I am following entirely here, so better to discuss this in person.

There are some matrices that do have a proper inverse, but they are neither symmetrical nor Hermitian

So then we just use inv() right?

if they now the structure of their improper covariance matrix.

What do you mean with improper here? If we are dealing with an arbitrary non-symmetric covariance/precision matrix then we can make it symmetric by just redistributing its elements without any loss of generality. Then we can call cholinv and solve the problem:

julia> B = [1.0 0.5; 0.5 1]
2×2 Matrix{Float64}:
 1.0  0.5
 0.5  1.0

julia> ReactiveMP.cholinv(B) * B
2×2 Matrix{Float64}:
 1.0  -1.11022e-16
 0.0   1.0

@bvdmitri
Copy link
Member Author

So then we just use inv() right?
then we can make it symmetric by just redistributing its elements without any loss of generality

Yes and yes. But neither of these options can be controlled by a user in message update rules. In message update rules we always use cholinv at the moment. I opened this issue to discuss do we need to provide user any sort of control on matrix inversion in our rules or just let them define their own with whatever inverse method they prefer.

@bvdmitri bvdmitri changed the title Allow user to choose between safe/unsafe/fast cholesky in Gaussian nodes Allow user to choose between safe/unsafe/fastcholesky inverse in Gaussian nodes Jan 10, 2022
@bvdmitri
Copy link
Member Author

We recently discussed an associated issue, with cholinv. The fastcholesky tries to correct the matrix as much as possible, even if it is obviously not positive definite, e.g.

julia> fastcholesky([ 0.0 0.0; 0.0 0.0 ])
LinearAlgebra.Cholesky{Float64, Matrix{Float64}}
L factor:
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 1.0   
 0.0  1.0

The remedy would be to assume that any matrix that is passed to the Gaussian is proper. For non-proper non-invertible matrices we must implement a special type of matrix, e.g. MatrixWithInverse, which would carry the inverse without actually calling the fastcholesky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants