Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressing the final weights learnt as a parametric equation #106

Open
ekta1007 opened this issue Jul 14, 2017 · 8 comments
Open

Expressing the final weights learnt as a parametric equation #106

ekta1007 opened this issue Jul 14, 2017 · 8 comments
Labels

Comments

@ekta1007
Copy link

ekta1007 commented Jul 14, 2017

@ibayer After learning the model weights with the parameter names , eg :
with X.test.columns as the names, how do I express this as a functional relationship , so it can be productionized ?
ie. .. f(x1*-1.29966466e-03+ x2* 1.78455648e+01 ... )

fm.w_ [ -1.29966466e-03 1.78455648e+01 -2.05648306e-01 -2.40578327e+00 4.44556106e+00 9.42411346e-02 1.82644589e+00 2.35087155e+00 -4.14614164e-01 1.52788247e+00 6.72193895e-01 -1.51634745e-01 1.96703805e+00 -7.19508942e-01 -3.00903099e-01 8.13209301e-01]

@ibayer
Copy link
Owner

ibayer commented Jul 14, 2017

@ekta1007
Copy link
Author

ekta1007 commented Jul 14, 2017

@ibayer - thanks for your prompt response. So, seeing eqn 1 , I have X encoded as {1,0 along with some continuous variables, see sample below} , which of these will return the y as in eqn 1. I have fm.w0 , fm.w1_ & fm.V_

fm.fit_predict(X_train, y_train,X_test) // class "labels" 1/0

fm.fit_predict_proba(X_train, y_train,X_test) // class probabilities

fm.predict(X_test) // a continuous number, eg : 15.35047575

Additional note:
// my fm is initialised as
fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
// sample X_train(i) = [969.0, 1.0, 24.3618275, 1.0, 1.0, 0.4, 0.161803, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0]

@ekta1007
Copy link
Author

ekta1007 commented Jul 26, 2017

@ibayer @takuti @macks22 @chezou @bdaskalov - I got the y_pred from fm.predict(X_test) - which I tested 1:1 with Bayer eqn 1/ Rendle eqn 2 , which is a continuous number [1.0584077976, 0.00105908767392] - how do I get the class labels & predicted probabilities back ?

My use case is storing the model params (w0_,w_, V_) over n iterations of different test-data sets (my test data is ~ 1.8 TB) and then computing the final y_pred ( probability) at run time, so I can't use fm.fit_predict_proba(X_train, y_train, X_test) back at runtime ( I should therefore use an equivalent of find_prediction function below, for computational efficiency at runtime.

Pasting a quick code, in-case someone has a similar question .

import numpy as np

def get_first_order_weights(x_test, w_):
    m_x_test = np.matrix(x_test)
    m_w_ = np.matrix(w_).transpose()
    return float((m_x_test * m_w_))

def get_weight(V_, i,j,rank=2):
    """returns weight of i,j # see also equation #2 , Steffen Rendle(2011, SIGIR) """ 
    weight=0
    for rank_ in xrange(rank) :
        weight=weight+V_[rank_][i]*V_[rank_][j]
    return weight

def get_second_order_weights(x_test, V_):
    second_order_w=0
    for i in xrange(len(x_test)):
        for j in range(i, len(x_test)):
            if i !=j :
                if x_test[i] !=0 and  x_test[j] !=0  :
                    weight = get_weight(V_, i=i, j=j, rank=2)
                    second_order_w =  second_order_w + weight * x_test[i] * x_test[j]
    return  second_order_w


def find_prediction(x_test, w0_, w_, V_):
    y_pred = w0_ + get_first_order_weights(x_test, w_) + get_second_order_weights(x_test, V_)
    print y_pred


"""Quick test """
    y_pred_proba=0.666498763558 # what we want. This is the output of fm.fit_predict_proba(X_train, y_train,X_test) for THIS x_test
    y_pred=0.551379351199 # what find_prediction(x_test, w0_, w_, V_) gives
    hyper_param_=[ 0.72957556,  1.09654054,  4.28621216,  0.80880482, -0.26278131,
            0.17551987, -0.17272419]
    x_test = [0, 0, 0, 1, 0, 1, 0, 0]
    w0_ =0.44717137049755357
    rank=2 # Rank : The rank of the factorization used for the second order interactions.
    w_ = [ 0.13673361, -0.50175393, -0.43582785,  0.91480033,  0.64150534,
             0.85911802, -0.20877941, -0.20461079]
    V_ =[[-0.30315417, -0.01520948,  0.35000127,  0.54788385, -0.26731813,
             -0.07202204,  0.74163199,  0.25263453],
            [ 0.73052313,  0.93649875,  0.55294677,  1.23317741, -0.88026332,
             -1.321992  , -0.44626548,  0.27878056]]
    
    get_y_prediction=find_prediction(x_test, w0_, w_, V_,rank=rank)
    print get_y_prediction #0.551379337337

get_y_prediction gives 0.551379337337 which corroborates with the output of y_pred = fm.predict(X_test), I need the class probabilities , which should corroborate with the output of y_pred = fm.fit_predict_proba(X_train, y_train,X_test) . How do I get this ?

What I tried

  1. sigmoid function 1.0/(1+math.exp(-1*float(y_pred))) #gives 0.6344555517817589 which is different from 0.666498763558 # what we want. In the paper @ibayer mentions that MCMC classification is modelled with a loss function of Probit(MAP) , Probit, Sigmoid, but I couldn't see an option to specify the loss function (I was hoping I could use the model params back to get the prob. value if this was clear)

  2. Modelling as a Probit link function

2a)

    from scipy.stats import norm
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) 
    norm.cdf(x_beta)  # gives 0.96196167 , Y =Φ(Xβ + ε), cumulative normal CDF

2b)

def phi(x):
        return (1.0 + math.erf(x / math.sqrt(2.0))) / 2.0
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
    phi(x_beta) # gives 0.9619616715011631 Y =Φ(Xβ + ε): Cumulative distribution function for the standard normal distribution
  1. Mapping as Logit link function , which gives 0.6394991222434503

     import math
     import numpy as np
     x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
     math.pow(1+math.pow(float(x_beta), -1),-1) # Pr(Y=1∣X)=[1+e−X′β]−1 
    

What is the link function, I can use ?
Where, fm is initialized as

fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
fm.fit_predict(X_train, y_train,X_test)

@ekta1007
Copy link
Author

@ibayer - Here are the results of the prob. as returned by fm.fit_predict_proba (red), sigmoid of y_pred_hat (back calculated, real number, the y_pred_hat itself maps to fm.predict(X_test)) (green) and the corrected probability values when corrected by median of % difference between red & green (in yellow) . Would be glad if you could suggest getting the prob. to stack up ~1:1 as returned by fm.fit_predict_proba

screen shot 2017-07-27 at 12 26 35

screen shot 2017-07-27 at 12 26 21

@ekta1007
Copy link
Author

ekta1007 commented Jul 27, 2017

@ibayer - Also from documentation here http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

"Probit regression uses the Cumulative Distribution Function (CDF) of the standard normal Distribution as link function. Mainly because the CDF leads to an easier Gibbs solver then the sigmoid function used in the SGD classifier implementation. The results are in practice usually very similar."

, but the results from fm.fit_predict_proba are actually way more odd when using a probit link function (both std normal & normal )

screen shot 2017-07-27 at 14 29 38

screen shot 2017-07-27 at 14 29 53

def find_probit_normal(x, std, mean):
    deno=std*math.sqrt(2*math.pi)
    num=math.exp(-1*((x-mean)*(x-mean)/1.0*(std*std))/2.0)
    return num/deno
std,mean=np.std(y_train), np.mean(y_train)

def find_probit_std_normal(x):
	return math.exp(-(1*x*x)/2.0)/(math.sqrt(2*math.pi))

where, I initialize,  x as y_pred (real valued).  Recall that y_pred maps to reqn 2(pg: 4), Rendle, 2011 / Bayer eqn 1 (2016)

@ibayer
Copy link
Owner

ibayer commented Jul 27, 2017

Can you summarize again what you are trying to achieve? I saw you use mcmc somewhere in you code.
Please keep in mind

It’s also possible to just call predict on a trained MCMC model but this returns predictions that are solely based on the last parameters draw. These predictions can be used for diagnostic purposes but are usually not as good as averaged predictions returned by fit_predict.

http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

@ekta1007
Copy link
Author

ekta1007 commented Aug 4, 2017

@ibayer - Simplifying the question.

  1. I have trained the model on some 1200+ files (~1.8 GB) - with 50 files in each run.

  2. I store w0,w_ , V_ from each run. I get the predict using the code above (which is a real valued number)
    Question : How do I get the probability values(y_hat) back for each X vector ?

    I tried using probit, logit, sigmoid as above, but even if these were based on just 1 draw I don't get the values close ( I test this by storing the prob. values and then reverse-engineering the probability values as per the functions above)

@ibayer
Copy link
Owner

ibayer commented Aug 7, 2017

Looks to me like you use the mcmc solver. If that's the case then

I store w0,w_ , V_ from each run`

doesn't make sense (I assume that by run you mean one call to fit_predict_proba()). In this case I recommend to use a different solver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants