Expressing the final weights learnt as a parametric equation #106

ekta1007 · 2017-07-14T07:50:31Z

@ibayer After learning the model weights with the parameter names , eg :
with X.test.columns as the names, how do I express this as a functional relationship , so it can be productionized ?
ie. .. f(x1*-1.29966466e-03+ x2* 1.78455648e+01 ... )

fm.w_ [ -1.29966466e-03 1.78455648e+01 -2.05648306e-01 -2.40578327e+00 4.44556106e+00 9.42411346e-02 1.82644589e+00 2.35087155e+00 -4.14614164e-01 1.52788247e+00 6.72193895e-01 -1.51634745e-01 1.96703805e+00 -7.19508942e-01 -3.00903099e-01 8.13209301e-01]

The text was updated successfully, but these errors were encountered:

ibayer · 2017-07-14T08:17:45Z

http://www.jmlr.org/papers/volume17/15-355/15-355.pdf
See equation 1.

ekta1007 · 2017-07-14T09:19:49Z

@ibayer - thanks for your prompt response. So, seeing eqn 1 , I have X encoded as {1,0 along with some continuous variables, see sample below} , which of these will return the y as in eqn 1. I have fm.w0 , fm.w1_ & fm.V_

fm.fit_predict(X_train, y_train,X_test) // class "labels" 1/0

fm.fit_predict_proba(X_train, y_train,X_test) // class probabilities

fm.predict(X_test) // a continuous number, eg : 15.35047575

Additional note:
// my fm is initialised as
fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
// sample X_train(i) = [969.0, 1.0, 24.3618275, 1.0, 1.0, 0.4, 0.161803, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0]

ekta1007 · 2017-07-26T07:26:25Z

@ibayer @takuti @macks22 @chezou @bdaskalov - I got the y_pred from fm.predict(X_test) - which I tested 1:1 with Bayer eqn 1/ Rendle eqn 2 , which is a continuous number [1.0584077976, 0.00105908767392] - how do I get the class labels & predicted probabilities back ?

My use case is storing the model params (w0_,w_, V_) over n iterations of different test-data sets (my test data is ~ 1.8 TB) and then computing the final y_pred ( probability) at run time, so I can't use fm.fit_predict_proba(X_train, y_train, X_test) back at runtime ( I should therefore use an equivalent of find_prediction function below, for computational efficiency at runtime.

Pasting a quick code, in-case someone has a similar question .

import numpy as np

def get_first_order_weights(x_test, w_):
    m_x_test = np.matrix(x_test)
    m_w_ = np.matrix(w_).transpose()
    return float((m_x_test * m_w_))

def get_weight(V_, i,j,rank=2):
    """returns weight of i,j # see also equation #2 , Steffen Rendle(2011, SIGIR) """ 
    weight=0
    for rank_ in xrange(rank) :
        weight=weight+V_[rank_][i]*V_[rank_][j]
    return weight

def get_second_order_weights(x_test, V_):
    second_order_w=0
    for i in xrange(len(x_test)):
        for j in range(i, len(x_test)):
            if i !=j :
                if x_test[i] !=0 and  x_test[j] !=0  :
                    weight = get_weight(V_, i=i, j=j, rank=2)
                    second_order_w =  second_order_w + weight * x_test[i] * x_test[j]
    return  second_order_w


def find_prediction(x_test, w0_, w_, V_):
    y_pred = w0_ + get_first_order_weights(x_test, w_) + get_second_order_weights(x_test, V_)
    print y_pred


"""Quick test """
    y_pred_proba=0.666498763558 # what we want. This is the output of fm.fit_predict_proba(X_train, y_train,X_test) for THIS x_test
    y_pred=0.551379351199 # what find_prediction(x_test, w0_, w_, V_) gives
    hyper_param_=[ 0.72957556,  1.09654054,  4.28621216,  0.80880482, -0.26278131,
            0.17551987, -0.17272419]
    x_test = [0, 0, 0, 1, 0, 1, 0, 0]
    w0_ =0.44717137049755357
    rank=2 # Rank : The rank of the factorization used for the second order interactions.
    w_ = [ 0.13673361, -0.50175393, -0.43582785,  0.91480033,  0.64150534,
             0.85911802, -0.20877941, -0.20461079]
    V_ =[[-0.30315417, -0.01520948,  0.35000127,  0.54788385, -0.26731813,
             -0.07202204,  0.74163199,  0.25263453],
            [ 0.73052313,  0.93649875,  0.55294677,  1.23317741, -0.88026332,
             -1.321992  , -0.44626548,  0.27878056]]
    
    get_y_prediction=find_prediction(x_test, w0_, w_, V_,rank=rank)
    print get_y_prediction #0.551379337337

get_y_prediction gives 0.551379337337 which corroborates with the output of y_pred = fm.predict(X_test), I need the class probabilities , which should corroborate with the output of y_pred = fm.fit_predict_proba(X_train, y_train,X_test) . How do I get this ?

What I tried

sigmoid function 1.0/(1+math.exp(-1*float(y_pred))) #gives 0.6344555517817589 which is different from 0.666498763558 # what we want. In the paper @ibayer mentions that MCMC classification is modelled with a loss function of Probit(MAP) , Probit, Sigmoid, but I couldn't see an option to specify the loss function (I was hoping I could use the model params back to get the prob. value if this was clear)
Modelling as a Probit link function

2a)

    from scipy.stats import norm
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) 
    norm.cdf(x_beta)  # gives 0.96196167 , Y =Φ(Xβ + ε), cumulative normal CDF

2b)

def phi(x):
        return (1.0 + math.erf(x / math.sqrt(2.0))) / 2.0
    x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
    phi(x_beta) # gives 0.9619616715011631 Y =Φ(Xβ + ε): Cumulative distribution function for the standard normal distribution

Mapping as Logit link function , which gives 0.6394991222434503

 import math
 import numpy as np
 x_beta=(np.matrix(x_test)*np.matrix(w_).transpose()) # see x_test & w_ above
 math.pow(1+math.pow(float(x_beta), -1),-1) # Pr(Y=1∣X)=[1+e−X′β]−1

What is the link function, I can use ?
Where, fm is initialized as

fm = mcmc.FMClassification(n_iter=100, init_stdev=0.1, rank=rank, random_state=seed, copy_X=True)
fm.fit_predict(X_train, y_train,X_test)

ekta1007 · 2017-07-27T06:58:13Z

@ibayer - Here are the results of the prob. as returned by fm.fit_predict_proba (red), sigmoid of y_pred_hat (back calculated, real number, the y_pred_hat itself maps to fm.predict(X_test)) (green) and the corrected probability values when corrected by median of % difference between red & green (in yellow) . Would be glad if you could suggest getting the prob. to stack up ~1:1 as returned by fm.fit_predict_proba

ekta1007 · 2017-07-27T09:06:32Z

@ibayer - Also from documentation here http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

"Probit regression uses the Cumulative Distribution Function (CDF) of the standard normal Distribution as link function. Mainly because the CDF leads to an easier Gibbs solver then the sigmoid function used in the SGD classifier implementation. The results are in practice usually very similar."

, but the results from fm.fit_predict_proba are actually way more odd when using a probit link function (both std normal & normal )

def find_probit_normal(x, std, mean):
    deno=std*math.sqrt(2*math.pi)
    num=math.exp(-1*((x-mean)*(x-mean)/1.0*(std*std))/2.0)
    return num/deno
std,mean=np.std(y_train), np.mean(y_train)

def find_probit_std_normal(x):
	return math.exp(-(1*x*x)/2.0)/(math.sqrt(2*math.pi))

where, I initialize,  x as y_pred (real valued).  Recall that y_pred maps to reqn 2(pg: 4), Rendle, 2011 / Bayer eqn 1 (2016)

ibayer · 2017-07-27T18:32:58Z

Can you summarize again what you are trying to achieve? I saw you use mcmc somewhere in you code.
Please keep in mind

It’s also possible to just call predict on a trained MCMC model but this returns predictions that are solely based on the last parameters draw. These predictions can be used for diagnostic purposes but are usually not as good as averaged predictions returned by fit_predict.

http://ibayer.github.io/fastFM/tutorial.html#bayesian-probit-classification-with-mcmc-solver

ekta1007 · 2017-08-04T11:10:26Z

@ibayer - Simplifying the question.

I have trained the model on some 1200+ files (~1.8 GB) - with 50 files in each run.
I store w0,w_ , V_ from each run. I get the predict using the code above (which is a real valued number)
Question : How do I get the probability values(y_hat) back for each X vector ?

I tried using probit, logit, sigmoid as above, but even if these were based on just 1 draw I don't get the values close ( I test this by storing the prob. values and then reverse-engineering the probability values as per the functions above)

ibayer · 2017-08-07T20:19:26Z

Looks to me like you use the mcmc solver. If that's the case then

I store w0,w_ , V_ from each run`

doesn't make sense (I assume that by run you mean one call to fit_predict_proba()). In this case I recommend to use a different solver.

ibayer added the question label Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expressing the final weights learnt as a parametric equation #106

Expressing the final weights learnt as a parametric equation #106

ekta1007 commented Jul 14, 2017 •

edited

ibayer commented Jul 14, 2017

ekta1007 commented Jul 14, 2017 •

edited

ekta1007 commented Jul 26, 2017 •

edited

ekta1007 commented Jul 27, 2017

ekta1007 commented Jul 27, 2017 •

edited

ibayer commented Jul 27, 2017

ekta1007 commented Aug 4, 2017 •

edited

ibayer commented Aug 7, 2017

Expressing the final weights learnt as a parametric equation #106

Expressing the final weights learnt as a parametric equation #106

Comments

ekta1007 commented Jul 14, 2017 • edited

ibayer commented Jul 14, 2017

ekta1007 commented Jul 14, 2017 • edited

ekta1007 commented Jul 26, 2017 • edited

ekta1007 commented Jul 27, 2017

ekta1007 commented Jul 27, 2017 • edited

ibayer commented Jul 27, 2017

ekta1007 commented Aug 4, 2017 • edited

ibayer commented Aug 7, 2017

ekta1007 commented Jul 14, 2017 •

edited

ekta1007 commented Jul 14, 2017 •

edited

ekta1007 commented Jul 26, 2017 •

edited

ekta1007 commented Jul 27, 2017 •

edited

ekta1007 commented Aug 4, 2017 •

edited