Вы находитесь на странице: 1из 3

Data File Format For BigN

There are two set of files, input files and out files.

1 Input File
Input file consists of 4 data files: data.dat, location.dat, centroid.dat, and nb.dat, and one parameter
file and two initial value files.
data.dat is a file with design matrix. location.dat is file with all the locations and centroid.dat is
a file with all centroids(or knots) and nb.dat is the neighborhood structure file. The first two files are
needed for any setting. centroid.dat is needed when macro-level setting is exponential or Matern while
nb.dat is needed when it is CAR.
The parameter file, called ”infile” here, contains the parameters for the software, including prior
parameters.
The software will run two chains in sequence. init0.dat and init1.dat are initial values for these two
chains.
The following is the format of each file.

• data.dat:
The first row is is number of observations and the second row is the number of covariates.
Starting from the third row is the observed data, one row for one observation. The first column
is the id of its location, followed by the response value, and followed by design matrix. Notice
that the location id starts from 0 and must be consecutive integers.

• location.dat
The first row is the number of distinct locations. The second row is the number of subregions,
denotes as nc. The next nc rows are the number of distinct locations within each subregion in
the order of subregion id.
Right after that are the locations. The first column is the location id and the second column is
the corresponding subregion id. The third and fourth column are longitude and latitude. Note
that like location id, subregion id also starts from 0 and must be consecutive integers.

• centroid.dat
Each row is a centroid. The first column is the corresponding subregion id and the second and
third column are longitude and latitude.

• nb.dat
Each row contains neighborhood information for one subregion. The first row is the subregion
id, and the second row is the number of neighbors for this subregion. The rest of this row are
corresponding neighbor subregion id.

1
• infile
The first row are 7 numbers: the first one and two are indicator of macro and micro-level covari-
ance structures, respectively. 0 means CAR, 1 means exponential and 2 means Matern. The next
three numbers are burn-in iterations, sample iterations and thinning parameters. The next one is
the reporting period for acceptance rate. If this is small, the software outputs very frequently to
the screen (or to the file assigned). The last number is the number of iterations for pre-burn-in,
which are used to adjust parameters for jumping distribution to get about 20% acceptance rate.
All the parameters relating to jumping distribution may be adjusted during pre-burn-in period.
Once again, those values for ν and φ are always needed even they may not be used.
The second row are 4 numbers indicating fixed or random Matern or exponential parameters. 0
means fixed and 1 means random. The first two are for macro ν and φ (Matern or exponential
parameters), the rest two are for micro ones. These are required even, for example, the macro
setting is ”CAR”. They are effective only with the proper setting. They are assumed uniform
priors if effective.
The next row are 2 numbers indicating prior choice of macro and micro-level variance parameters.
0 means uniform prior and 1 means inverse Gamma prior.
The next two rows are lower and upper bound for the priors of macro ν and φ. The next row are
parameters for the prior of macro variance. They will be the lower and upper bound if the prior
is uniform or scale and shape parameters if the prior is inverse Gamma.
The next rows are parameters for micro-level ν’s. Followed are parameters for micro φ’s. The
last row are parameters for micro-level variance parameters(assumed same prior for each one).
Again, those parameters for ν and φ are required even they are not used in the program.

• init0.dat and init1.dat. Same format for them.


In this program, the jumping distribution for β is multivariate normal with diagonal covariance
matrix. For ν, φ and σ we use lognormal distribution. We use normal distribution as the jumping
distribution for spatial residuals. The variance will vary for macro ones and each micro ones. Once
again, those values for ν and φ are always needed even they may not be used. If ν and φ are
fixed values, the initial values must be fixed values as wanted.
The first row is the scaling parameter for the jumping distribution of β. Followed are the value
and variance of β, one row for one β. The scaling parameter controls the acceptance rate for β
and may be adjusted during pre-burn-in period.
Followed three rows are values and variances of jumping lognormal distribution for macro ν, φ
and σ, respectively. The variances may be adjusted during pre-burn-in period.
Followed then is the variance of the jumping distribution for macro-level spatial residuals and
then the initial values.
Followed are parameters for each subregion. For each subregion, the first three rows are parame-
ters for ν, φ and σ, like for macro ones. Followed is the variance for the spatial residuals in this
subregion and then are the corresponding initial values.

2 Output File
There are 9 output files for each chain, indexed by chain number(0 or 1). dic.dat contains model setting
parameters and DIC, pD, computation time. beta.dat is the sample of β. sigphinuD.dat contains the
sample for macro-level parameters. nusub.dat, sigsub.dat and phisub.dat are the samples of micro-
level parameters. sp.dat has the spatial residual values at each location. w.dat is the sample of macro

2
spatial residuals. wsub.dat is the mean of micro-level spatial residuals. We only save the mean because
otherwise the file might be too large.

Вам также может понравиться