Variogram Lab for QERM 550, Spring 2008

The objectives of this lab are to: Our first main task will be to simulate some correlated data. The point is that then we KNOW what the correlation structure is and can evaluate how well the variogram performs. Later, we will look at some real data.

To start, we need to make sure the appropriate R packages are available. For this lab, we will need:

RandomFields: used to simulate correlated spatial data,
geoR: used to compute and plot the variogram,
akima: contains a nice helper function
maps: allows us to draw maps of the US

If these packages are not yet installed on your system, load them from CRAN using:

> install.packages("RandomFields")
inserting the appropriate package name.

Simulating Correlated Data

> require(RandomFields) 	# we need this package loaded for the GaussRF() function 
Look at the help for the GaussRF function. It mainly requires you to specify a set of locations where the data should be simulated, as well as the correlation function to use. That's the C(h) from the lecture.

Here's a simple function to create a Gaussian field using a spherical correlation model.
test.GaussRF <- function(dim)
{ 
	model <- "spherical"
	mean <- 0
	variance <- 4
	nugget <- 1
	scale <- 10
	alpha <- 1
	step <- 1
	x <- seq(0, dim, step)
	y <- seq(0, dim, step)
	f <- GaussRF(x=x, y=y, model=model, grid=T, 
		param=c(0, variance, nugget, scale))
	par(pty="s")
	image(x,y,f)

	test.grid <- expand.grid(x=seq(0, dim, step), y=seq(0, dim, step))
	return((cbind(test.grid, f=as.vector(f)))
}
This function should create and plot an image of a Gaussian correlated field. Does it work? Do you believe the data to be correlated? Why? Look at the help file for the GaussRF function. What do the variance, nugget and scale parameters control?

Before proceeding with the variogram calculations, let's do some EDA of this simulated file.
  1. Calculate the and plot the mean value in the x and y directions separately. Do you see any evidence of trend? Are these two plots adequate to evaluate the presence of trend? If not, what else might you want to look at?
  2. Calculate the overall mean and variance of the plot. What do these values represent?
  3. Plot a histogram of the simulated data. What do you see and is this what you would expect?
Repeat the above calculations with a dim value of 20 and of 100. What happens as the size of the plot varies? Why?

So far we've been simulating spherical data because when we get to variogram analysis, the sill is easier to pick out. If you ever have a need to simulate other correlation structures, such as exponential data, you just need to change the model and include the appropriate parameters. Again, see the help for the GaussRF function for guidance.

Simple Variogram Analysis

Let's take the simulated data from above and calculate the variogram. You will need to load the geoR package (require(geoR)) for access to these functions. Then, you need to convert the data into the right from, using the as.geodata() function. Finally, the call to variog() computes the actual variogram. Peruse the help file to understand what it is doing what the options are, and what values are returned.
sim50 <- test.GaussRF(50)
sim50.gd <- as.geodata(sim50)
sim50.variog <- variog(sim50.gd)
You can quickly plot the variogram by calling plot(sim50.variog). What do you expect to see, and does this give you the expected results? Everyone's plots should be different, but I suspect that something strange may be happening at higher lags.

In class we discussed some practical rules. If our grid is 50x50, what is the largest lag we should be computing the variogram up to? You can set this value using the max.dist parameter to the variog function. Adjust this appropriately, and replot the variogram. Is the situation improved?

The other practical consideration was to ensure at least 30 pairs per lag distance. You can check this by looking at sim50.variog$n (or whatever your object is called). Are we ok here?

Now, to evaluate isotropy, we will use the variog4() function. The function basically works the same, so use it to calculate the directional variogram and then plot it. (Note this function is a little slow, so be patient.) What does this suggest about whether the given result is isotropic? Comment specifically on whether you think the range values are equivalent and if the sill values are equivalent.

Next, lets evaluate the range (and so variation) of computed results at each lag within the variogram. We are going to look at what is referred to as a "variogram cloud".

# first calculate the variogram using the option="cloud" parameter
sim50.vcloud <- variog(sim50.gd, option="cloud", max.dist=25)
# and create a boxplot of this data ... this seems a little klugey, but the 
#  point is to create a nice looking boxplot...
boxplot(sim50.vcloud$v~as.factor((sim50.vcloud$u %/% 2.5)*2.5))
What does the box plot show? How would you interpret the results?

Finally, let's try to fit the data with a variogram model. We know what the true model is, because we simulated it. So let's see what the parameter estimates are when we fit a spherical model to it. Look at the help file for the variofit() function. You need to pass it the variogram data, initial parameter estimates including the partial sill (namely the difference between the nugget and the sill) and range, and what covariance model you want to use. For example
sim50.ols <- variofit(sim50.variog, c(3.5, 10), cov.model="spherical")
(Note: if you didn't set the max.dist parameter when you calculated the empirical variogram, your parameter estimates will be highly biased!)

What are your parameter estimates? You will need to look at the help file to figure out the object names for the nugget, partial sill and range. How do these compare to the original values used within the simulation? If they are different, why do you suspect that might be? You might want to compare the parameter estimates from each of the different grid dimensions. What happens as the grid size goes up?

The Scallop Data

The following data and sample code are taken from Kalunzy et al (1998), S+ Spatial Stats User Manual.

The scallop data is available here. Save this file to your computer in your current R directory. Read in the scallops data (command below) and quickly look at what's happening for each column. The important columns are "lat" and "long", which are the latitude and longitude where the catch occurred, "tcatch" which is the total catch and is the sum of the "prerec" or prerecruits and "recruits".

> scallops.df <- read.table("scallopsdf.txt", sep=",", header=T)
> summary(scallops.df)
It seems we have highly skewed data, so let's perform a log transformation on the catch data.
> scallops.df[,"lgcatch"] <- log(scallops.df$tcatch+1)
For this next part, we want to draw a map with the actual locations of the catch, and on top of that we will add a contour plot of the log of the catch amount.
> require(maps)		# needed for the map() function call
> require(akima)	# needed for the interp() function call

draw.scallopmap <- function() {
	# draw the background as the map of the NE coast of the US
	map("usa", xlim=c(-75.25,-71), ylim=c(38.2, 41.5), fill=T, col="gray")
	
	# add the points where the catch occurred
	points(scallops.df$long, scallops.df$lat, cex=0.75, pch=20)
	
	# overlay the contour plots
	contour(interp(scallops.df$long, scallops.df$lat, scallops.df$lgcatch), 
		add=T, nlevels=4)
}

> draw.scallopmap()
Next, here are some functions to rotate the scallops data. You can play with this more if you want to, but it turns out that 52 degrees is pretty close to the `best' rotation. You'll see the results when you call the rotate.scallops() function.
rotate.axis <- function(xy, theta)
{
	# convert the rotation angle to radians
	theta.rad <- theta*(pi/180)
	# create the rotation matrix
	newx <- c(cos(theta.rad), sin(theta.rad))
	newy <- c(-sin(theta.rad), cos(theta.rad))
	# perform the matrix multiplication to perform the rotation
	XY <- as.matrix(xy) %*% cbind(newx, newy)
	# return the value
	return(as.data.frame(XY))
}

rotate.scallops <- function()
{
	scall.rot <- rotate.axis(scallops.df[c("long", "lat")], 52)
	plot(scall.rot$newx, scall.rot$newy)	
	return(scall.rot)
}
Finally, use these functions to create the rotated data set, and then convert it to its geodata version, which is merely the x,y and response data.
> scall.rot <- rotate.scallops()
> scallops.gd <- as.geodata(cbind(scall.rot, scallops.df$lgcatch))

The Homework

Now you should be off and running. Take the rotated scallops data, perform simple EDA, calculate the variogram, look for isotropy, and try to fit a model. Comment on what you find.

Last updated: 5/23/08.