Model types

Independently of whether you choose an Encoding- or RSA-style approach to building your model, the PCM toolbox distinguishes between a number of different model types, each of which has an own model class.

Fixed models

In fixed models, the second moment matrix \(\mathbf{G}\) is exactly predicted by the model. The most common example is the Null model \(\mathbf{G} = \mathbf{0}\). This is equivalent to assuming that there is no difference between any of the activity patterns. The Null-model is useful if we want to test whether there are any differences between experimental conditions. An alternative Null model would be \(\mathbf{G} = \mathbf{I}\), i.e. to assume that all patterns are uncorrelated equally far away from each other.

Fixed models also occur when the representational structure can be predicted from some independent data. An example for this is shown in the following example, where we predict the structure of finger representations directly from the correlational structure of finger movements in every-day life (Ejaz et al., 2015). Importantly, fixed models only predict the the second moment matrix up to a proportional constant. The width of the distribution will vary with the overall scale or signal-to-noise-level. Thus, when evaluating fixed models we usually allow the predicted second moment matrix to be scaled by an arbitrary positive constant (see Model Fitting).

Example

An empirical example to for a fixed representational model comes from Ejaz et al (2015). Here the representational structure of 5 finger movements was compared to the representational structure predicted by the way the muscles are activated during finger movements (Muscle model), or by the covariance structure of natural movements of the 5 fingers. That is the predicted second moment matrix is derived from data completely independent of our imaging data.

Models are a specific class, inherited from the class Model. To define a fixed model, we simple need to load the predicted second moment matrix and define a model structure as follows (see Application examples):

M1 = pcm.model.FixedModel('null',np.eye(5))    # Makes a Null model
M2 = pcm.model.FixedModel('muscle',modelM[0])  # Makes the muscle model
M3 = pcm.model.FixedModel('natural',modelM[1]) # Makes the natural stats model
M = [M1, M2, M3] # Join the models for fitting in list

When evaluating the likelihood of a data set under the prediction, the pcm toolbox still needs to estimate the scaling factor and the noise variance, so even in the case of fixed models, an iterative maximization of the likelihood is required (see below).

Component models

A more flexible model is to express the second moment matrix as a linear combination of different components. For example, the representational structure of activity patterns in the human object recognition system in inferior temporal cortex can be compared to the response of a convolutional neural network that is shown the same stimuli (Khaligh-Razavi & Kriegeskorte, 2014). Each layer of the network predicts a specific structure of the second moment matrix and therefore constitutes a fixed model. However, the representational structure may be best described by a mixture of multiple layers. In this case, the overall predicted second moment matrix is a linear sum of the weighted components matrices:

\[\mathbf{G}= \sum_{h}{\exp(\theta_{h})\mathbf{G}_{h}}\]

The weights for each component need to be positive - allowing negative weights would not guarantee that the overall second moment matrix would be positive definite. Therefore we use the exponential of the weighing parameter here, such that we can use unconstrained optimization to estimate the parameters.

For fast optimization of the likelihood, we require the derivate of the second moment matrix in respect to each of the parameters. Thus derivative can then be used to calculate the derivative of the log-likelihood in respect to the parameters (see section 4.3. Derivative of the log-likelihood). In the case of linear component models this is easy to obtain.

\[\frac{\partial \mathbf{G}}{\partial {\theta }_{h}}=\exp(\theta_{h}) {\bf{G}}_{h}\]

Example

In the example Finger demo, we have two fixed models, the Muscle and the natural statistics model. One question that arises in the paper is whether the real observed structure is better fit my a linear combination of the natural statistics and the muscle activity structure. So we can define a third model, which allows any arbitrary mixture between the two type.

MC = pcm.ComponentModel('muscle+nat',[modelM[0],modelM[1]])

Feature models

A representational model can be also formulated in terms of the features that are thought to be encoded in the voxels. Features are hypothetical tuning functions, i.e.models of what activation profiles of single neurons could look like. Examples of features would be Gabor elements for lower-level vision models, elements with cosine tuning functions for different movement directions for models of motor areas, and semantic features for association areas. The actual activity profiles of each voxel are a weighted combination of the feature matrix \(\mathbf{u}_p = \mathbf{M} \mathbf{w}_p\). The predicted second moment matrix of the activity profiles is then \(\mathbf{G} = \mathbf{MM}^{T}\), assuming that all features are equally strongly and independently encoded, i.e. \(E \left(\mathbf{w}_p\mathbf{w}_p^{T} \right)=\mathbf{I}\). A feature model can now be flexibly parametrized by expressing the feature matrix as a weighted sum of linear components.

\[\mathbf{M}= \sum_{h} \theta_h \mathbf{M}_{h}\]

Each parameter \(\theta_h\) determines how strong the corresponding set of features is represented across the population of voxels. Note that this parameter is different from the actual feature weights \(\mathbf{W}\). Under this model, the second moment matrix becomes

\[\mathbf{G}=\mathbf{UU}^{T}/P=\frac{1}{P}\sum_{h}\theta_{h}^{2}\mathbf{M}_{h}\mathbf{M}_{h}^{T}+\sum_{i}\sum_{j}\theta_{i}\theta_{j}\mathbf{M}_{i}\mathbf{M}_{j}^{T}.\]

From the last expression we can see that, if features that belong to different components are independent of each other, i.e. \(\mathbf{M}_{i} \mathbf{M}_{j} = \mathbf{0}\), then a feature model is equivalent to a component model with \(\mathbf{G}_h = \mathbf{M}_{h}\mathbf{M}_{h}^{T}\). The only technical difference is that we use the square of the parameter \(\theta_h\), rather than its exponential, to enforce non-negativity. Thus, component models assume that the different features underlying each component are encoded independently in the population of voxels - i.e.knowing something about the tuning to feature of component A does not tell you anything about the tuning to a feature of component B. If this cannot be assumed, then the representational model is better formulated as a feature model.

By the product rule for matrix derivatives, we have

\[\frac{{\partial {\bf{G}}}}{{\partial {\theta_h}}} = {{\bf{M}}_h}{\bf{M}}{\left( \bf{\theta} \right)^T} + {\bf{M}}\left( \theta \right){\bf{M}}_h^T\]

Correlation model

The correlation model class is designed model correlation between specific sets of activity patterns. This problem often occurs in neuroimaging studies: For example, we may have 5 actions that are measured under two conditions (for example observation and execution), and we want to know to what degree the activtiy patterns of obseving an action related to the pattern observed when executing the same action.

Fixed correlation models: We can use a series of models that test the likelihood of the data under a fixed correlations between -1 and 1. This approach allows us to determine how much evidence we have for one specific correlation over the other. Even though the correlation is fixed for these models, the variance structure within each of the conditions is flexibly estimated. This is done using a compent model within each condition.

\[\begin{split}\mathbf{G}^{(1)} = \sum_{h}{\exp(\theta^{(1)}_{h})\mathbf{G}_{h}}\\ \mathbf{G}^{(2)} = \sum_{h}{\exp(\theta^{(2)}_{h})\mathbf{G}_{h}}\\\end{split}\]

Usually the $mathbf{G}_{h}$ is the identity matrix (all items are equally strongly represented, or a matrix that allows individual scaling of the variances for each item. Of course you can also model any between-item covariance. The overall model is nonlinear, as the two components interact in the part of the G matrix that indicates the covariance between the patterns of the two conditions (C). Given a constant correlation r, the overall second moment matrix is calculated as:

\[\begin{split}\mathbf{G}= \begin{bmatrix} \mathbf{G}^{(1)} & r\mathbf{C} \\ r\mathbf{C}^T & \mathbf{G}^{(2)} \end{bmatrix}\\ \mathbf{C}_{i,j} = \sqrt{\mathbf{G}^{(1)}_{i,j}\mathbf{G}^{(2)}_{i,j}}\end{split}\]

If the parameter within_cov is set to true, the model will also add with-condition covariance, which are not part of covariance. So the correlation that we modelling is the correlation between the pattern component related to the individual items, after the pattern component related to the overall condition (ie. observe vs. execute) has been removed.

The derivatives of that part of the matrix in respect to the parameters \(\theta^{(1)}_{h}\) then becomes

\[\frac{{\partial {\mathbf{C}_{i,j}}}}{{\partial {\theta^{(1)}_h}}} = \frac{r}{2 \mathbf{C}_{i,j}} \mathbf{G}^{(2)}_{i,j} \frac{{\partial {\mathbf{G}^{(1)}_{i,j}}}}{{\partial {\theta^{(1)}_h}}}\]

These derivatives are automatically calculated in the predict function. From the log-likelihoods for each model, we can then obtain an approximation for the posterior distribution.

Flexible correlation model: We also use a flexible correlation model, which has an additional model parameter for the correlation. To avoid bounds on the correlation, this parameter is the inverse Fisher-z transformation of the correlation, which can take values of \([-\infty,\infty]\).

\[\begin{split}\theta=\frac{1}{2}log\left(\frac{1+\theta}{1-\theta}\right)\\ r=\frac{exp(2\theta)-1}{exp(2\theta)+1}\\\end{split}\]

The derivative of \(r\) in respect to \(\theta\) can be derived using the product rule:

\[\begin{split}\frac{\partial r}{\partial \theta} = \frac{2 exp(2 \theta)}{exp(2\theta)+1} - \frac{\left(exp(2\theta)-1\right)\left(2 exp(2 \theta)\right)}{\left(exp(2\theta)+1\right)^2} = \\ \frac{4 exp(2 \theta)}{\left(exp(2\theta)+1\right)^2}\end{split}\]

Example

For a full example, please see the Correlation models.

Free models

The most flexible representational model is the free model, in which the predicted second moment matrix is unconstrained. Thus, when we estimate this model, we would simply derive the maximum-likelihood estimate of the second-moment matrix. This model is mainly useful if we want to obtain an estimate of the maximum likelihood that could be achieved with a fully flexible model, i.e the noise ceiling (Nili et al. 20).

In estimating an unconstrained \(\mathbf{G}\), it is important to ensure that the estimate will still be a positive definite matrix. For this purpose, we express the second moment as the square of an upper-triangular matrix, \(\mathbf{G} = \mathbf{AA}^{T}\) (Diedrichsen et al., 2011; Cai et al., 2016). The parameters are then simply all the upper-triangular entries of \(\mathbf{A}\).

Example

To set up a free model, simple create a new model of type FreeModel.

M5 = pcm.model.FreeModel('ceil',n_cond)

If the number of conditions is very large, the crossvalidated estimation of the noise ceiling model can get rather slow. For a quick and approximate noise ceiling, you can also set use an unbiased estimate of the second moment matrix from pcm.util.est_G_crossval to determine the parameters - basically the starting values of the complete model. This will lead to slightly lower noise ceilings as compared to the full optimization, but large improvements in speed.

Custom model

In some cases, the hypotheses cannot be expressed by a model of the type mentioned above. Therefore, the PCM toolbox allows the user to define their own custom model. In general, the predicted second moment matrix is a non-linear (matrix valued) function of some parameters, \(\mathbf{G}=F\left(\theta\right)\). One example is a representational model in which the width of the tuning curve (or the width of the population receptive field) is a free parameter. Such parameters would influence the features, and hence also the second-moment matrix in a non-linear way. Computationally, such non-linear models are not much more difficult to estimate than component or feature models, assuming that one can analytically derive the matrix derivatives \(\partial \mathbf{G} / \partial \theta_{h}\).

To define a custom model, the user needs to define a new Model class, inherited from the abstract class pcm.model.Model. The main thing is to define the predict function, which takes the parameters as an input and returns G the partial derivatives of G in respect to each of these parameters. The derivates are returned as a (HxKxK) tensor, where H is the number of parameters.

class CustomModel(Model):
# Constructor of the class
def __init__(self,name,...):
    Model.__init__(self,name)
    ...

# Prediction function
def predict(self,theta):
    G = .... # Calculate second momement matrix
    dG_dTheta = # Calculate derivative second momement matrix
    return (G,dG_dTheta)

#  Intiialization function
def set_theta0(self,G_hat):
    """
    Sets theta0 based on the crossvalidated second-moment

    Parameters:
        G_hat (numpy.ndarray)
            Crossvalidated estimate of G
    """
    # The function can use G_hat to get good starting values,
    # or just start at fixed values
    self.theta0 = ....

Note that the predict function is repeatedly called by the optimization routine and needs to execute fast. That is, any computation that does not depend on the current value of \(\theta\) should be performed outside the function and stored in the object.