**************************************************************************
* bvsgs_spgi: Bayesian Variable Selection - Gibbs Sampler, Selection prior
*             Models with groups of variables and interactions
**************************************************************************
			(Matlab version 5 required)

REFERENCE: 
----------
_ Brown, P.J., Vannucci, M. and Fearn, T.,
  Multivariate Bayesian variable selection and prediction
  Journal of the Royal Statistical Society B, 60(3), 1998, pp. 627-641.

_ Delouille,V., Bayesian variable selection with related predictors,
  MSc thesis, University of Kent at Canterbury, UK, 1998.

		************************
		LIST OF MATLAB FUNCTIONS
		************************

bvsgs_spgi.m	--	Main program
order12.m       --      called by bvsgs_spgi
shorten.m       --      called by bvsgs_spgi
comb.m          --      called by shorten
gibbs_sp.m	--	Gibbs sampler
itergs_sp.m     --	called by gibbs_sp
bernoulli.m	--	called by itergs_sp
gofg_sp.m	--	log relative probability function
Complete.m      --      called by gofg_sp.m
prior.m         --      log-prior probability function - called by gofg_sp
priorstr.m      --      called by prior.m
priorweak.m     --      called by prior.m
priorgam.m      --      called by priorstr.m
priorstrD.m     --      called by priorstr.m
priorweme.m     --      called by priorweak.m
priorweD.m      --      called by priorweak.m
gamprod.m       --      called by priorgam, priorweme
gamsum.m        --      called by priorweme
gamsumD.m       --      called by priorweD
repliche.m   	--	search for replicates
probord.m	--	posterior and marginal probs + ordering
pbvs_spgi.m	--	prediction



		*****
		USAGE
		*****

[Gamma, GammD, logProb, logProbD, PostGamD, MargGam, SWITCH]= ...
bvsgs_spgi(gamprec, X,Y, delta,k, w, v1,pm,nDum)

Inputs:		
-------
gamprec,	starting binary vector.
		If gamprec=[] the program asks for r 
		to select a starting vector with first r
		elements equal to 1
X,		independent variables, n by p
Y,            	response variables, n by q
delta, k,     	hyperparameters Inverse Wishart
w,		hyperparameters Bernoulli prior.
                For STRONG HEREDITY principle, i.e. if
                a 2-way interaction effect X_iX_k (denoted X_ik) is
                allowed  to be selected (or present) only when both
                X_i and X_k  are selected, w is decomposed as        	
		w=[wm p11 eta wmD p11D], where 
		wm = hyperparameter Bernoulli prior for independent main effects 
		Other hyperparameters Bernoulli prior are:
                p11 = prob(X_jk present|X_j and X_k present), where Xj and Xk are
		independent variables (with i=k, we have a quadratic term)
                eta= prob(X_jk present|X_j absent and/or X_k absent), for all
		variables (independent and groups of variables), eta usually taken
		to be small (e.g. eta=0.00001)
                wmD = prob(X_j present), where X_j is a group of variables (often
		Dummy variables)
                p11D = prob(X_jk present|X_j and X_k present), where X_j or 
		X_k is a group of variables.

                For WEAK HEREDITY principle, i.e. if
                a 2-way interaction effect  X_iX_k (denoted X_ik) is 
                allowed  to be selected (or present) when one or both of
                X_i and X_k  are selected, w is decomposed as
		w = [wm, p11, p10, p00, wmD, p11D p10D], with
                wm = hyperparameter Bernoulli prior for independent main
                effects; other hyperparameters Bernoulli prior are:
                p11 = prob(X_jk present|X_j and X_k present), where Xj and Xk are
                independent variables (with i=k, we have a quadratic term)
                p10 = prob(X_ik present|X_i absent,X_k present) 
		    = prob(X_ik present|X_i present,X_k absent), where X_i and X_k
		are both independent effects;
                p00 = prob(X_ik present|X_i and X_k absent) for both
		independent effects and groups of variables, p00 usually  taken to
		be small (e.g. p00= 0.00001).
                wmD = prob(X_j present), where X_j is a group of variables (often
                Dummy variables),
                p11D = prob(X_jk present|X_j and X_k present), where X_i or X_k is
		a group of variables 
                p10D = prob(X_ik present|X_i absent,X_k present)
                     = prob(X_ik present|X_i present,X_k absent) where X_i or X_k
		is a group of variables.
v1,            	hyperparameter normal selection prior
                (column vector (p by 1) of standard deviations)
		(e.g. v1=c*sqrt(diag(pinv(X'*X))), with c a 'well chosen'
                constant, for example c=1, 0.1 or 10)
pm,             number of independent main effects (i.e. not the variables that are
		in a group)
nDum,           vector whose components give the actual number of      
                variables in each group.


Outputs:
--------
Gamma,        	all visited vectors (in sparse form)
		(Gamma(1,:) contains the starting vector)

GammaD,		distinct visited vectors, ordered according 
		to their (normalized) relative post prob (matrix in sparse form)

logProb,	log-relative post probs of all visited	vectors

logProbD        log-relative posterior probabilities of distinct visited vectors

PostGamD	normalized ordered relative probs of distinct visited vectors

MargGam 	marginal probs of components

SWITCH,		number of component switches (out of p) from iteration to iteration


Functions called by BVS_GSsPgi:
-------------------------------
order12, shorten, gofg_sp, gibbs_sp, repliche, probord


Notes:
------
_ Data must be centered
_ The independent effects (main effects and interactions) must be
  provided first.
  After the independent effects, the order is as follows: if, for
  example, nDum=[3 2], the order is :
  D_11, D_12,D_13,D_21,D_22 (main effects of grouped variables),
  X_1D_11,X_1D_12,X_1D_13 (first independent effect multiplied by the
  variables of the first group),...,
  X_pmD_11,X_pmD_12,X_pmD_13 ( last independent effect multiplied by the 
  variables of the first group), X_1D_21,X_1D_22(first independent effect
  multiplied by the variables of the second group),...,X_pmD_21,X_pmD_22
  (last independent effect multiplied by the variables of the second
  group). The programs `processing.m' and `process_gi.m' generates the
  interactions terms in the correct order (see below).
_ The programs asks for the Gibbs parameters (initial number of variables included,
  number of iterations)
_ QR matrices updated every m iterations (m provided by the user).
_ Programs use sparse matrices. To convert to the full form use
  the Matlab function full.m



			**********
			PREDICTION
			**********

[BayesPred,LSPred,ILS,IB]=pbvs_spgi(X,Y,Xf,Yf,PostGamD,GammaD,numero,v1,inde,nDum,pm)

Inputs:
-------
 X,            independent variables - calibration data
 Y,            response variables - calibration data
 Xf,           independent variables - future data
 Yf,           response variables - future data
 PostGamD,     normalized ordered relative probabilities
               of distinct visited vectors
 GammaD,       distinct visited vectors, ordered according to
               their (normalized) relative post. prob. (PostGamD)
 numero,       number of most likely models for Bayes prediction
 v1            hyperparameter - normal selection prior
               (column vector (p by 1) of standard deviations)
 inde,         number of independent effects (main effects + sd
               order terms), can be computed as `inde=order12(pm)'
 nDum,         vector whose components give the actual number of
               variables in each group.
 pm,           number of independent main effects
            
Outputs:
--------
 BayesPred     Bayes prediction with the 'numero' most likely models
 LSPred        Least Squares prediction with the best model
 ILS           Indices of selected variables for LS prediction
 IB	       Indices of selected variables for Bayes prediction

                
                                
                        ***************
                        PARALLEL CHAINS
                        ***************

Use BVS_GSsP.m to obtain the chains
(ex. [Gamma1, GammD1, logProb1, logProbD1, PostGamD1, MargGam1, SWITCH1]= ...
		bvsgs_spgi(gamprec, X,Y, delta,k, w, v1,pm,nDum))
     [Gamma2, GammD2, logProb2, logProbD2, PostGamD2, MargGam2, SWITCH2]= ...
		bvsgs_spgi(gamprec, X,Y, delta,k, w, v1,pm,nDum))	)

Pool together distinct visited vectors
(ex. Gamma = [GammaD1' GammaD2']';   )

Pool together log-relative post probabilities of distinct visited vectors
(ex. logProb = cat(2, logProbD1, logProbD2);   )

Use repliche.m to search for replications
(ex.[GammaD, logProbD]=repliche(Gamma, logProb);  )

Use ProbOrd.m to get normalized posterior and marginal probs and
to order the distinct visited vectors according to probability
(ex.[GammaD, logProbD, PostGamD, MargGam]=probord(logProbD, GammaD,inde,nDum,pm);)

Do prediction using PostGamD and GammaD


			**************************
			PRE-PROCESSING OF THE DATA
			**************************
[Y,X,pm,combin]=processing(nbresp,YX)

Inputs:
-------
 nbresp,    number of response variables. 
 YX,        matrix of raw data, with in the first `nbresp' columns the
            response variables Y and in the other columns the independent predictors 
	    (main effects)


Outputs:
--------
 Y,        matrix of centered response variables;
 X,        matrix of regressors where the main effects have been
           centered; and the quadratic and 2-way interactions
           terms have been added.
 pm,       number of main effects;
 combin,   contains the link between index of the variables and the
           corresponding main effect or interaction term.


[DX] = process_gi(Xpm,dataD,nDum)

Inputs:
-------
 Xpm,      matrix with main effects of independent variables,
           the data must be centered.
 dataD,    main effects of the variables that are grouped, also
           centered.
 nDum,     number of variables in each group (the order of the
           groups in `nDum' must correspond to the order in `dataD').


Outputs:
--------
DX,        matrix containing the main effects of grouped variables
           and the centered interactions between grouped variables and
           independent main effects.


************************************
Copyright (c) 1997 Marina Vannucci
Modified by Veronique Delouille 1998
************************************
