The issue of spatial sampling is more involved than we have time to go into today. For further reference in sampling in archaeology please see
Banning, E. B. (2000). The Archaeologist's Laboratory: The Analysis of Archaeological Data. Plenum Publishers, New York. Pp. 75-85.
Drennan, R. (1996). Statistics for archaeologists: A commonsense approach. Plenum Press, New York. pp.132-134.
Shennan, S. (1997). Quantifying archaeology. Edinburgh University Press, Edinburgh. Pp. 361-400.
How big should a sample size be? There is no fixed rule about proportionality of the size of samples because sample sizes are determined by the variability (standard error) of the phenomenon being sampled and the desired statistical strength of the results.
In this example we want to collect artifacts within 1 m2 area of sample points from all the archaeological sites based on area (m2) of the sites. In order to implement this we would use a GPS to guide us to these sampling locations in the field. This is a type of Cluster Sampling because you will collect all artifacts within a particular 1m2 collection location.
If we want a circular area that is 1 m2 we can either try to make a box with measuring tapes, or we can use the dog-leash method and delimit a circle that is 0.56m in radius. Since the area of a circle is pi*r2 then
r = sqrt (1 / 3.1415) and therefore r = 0.56m
Thus the most expedient way to do this is to navigate the sample locations selected below and for each location draw out a circle 56 cm in radius and collect every artifact in that circle.
Look at the value in [Shape_Area]. Because this is a Geodatabase feature (as opposed to a Shapefile) the Shape_Area field is generated automatically and always up-to-date.
Scroll quickly down the list. What are the largest and the smallest sites you see? We want to sample a little bit from the small sites, and proportionally more from the larger sites.
Here is one possible solution.
We are taking the Logarithm of what?
Note that the output is nice round integers (good for sampling) and not decimal. Wouldn't you expect decimal values from the Log calculation above? It's because the Field type was Integer so it rounds the result to whole numbers.
Note that one of them is a negative value, which isn't great, but it's a very small site so we can consider not sampling it.
Now we can run Hawth's tools Stratified Sampling methods using the samp values.
Look at the "Sites_Rand1" attribute table and notice that each record has an ArchID value linking it to the site # in question. This could be very useful when you're actually navigating to these sampling locations. This will obviously be a one-to-many relationship with many samples from each site.
Zoom in to a single large site and look at the results. Is this a sample you'd like to collect? It would involve revisiting many places so maybe you would choose to sample from fewer sites instead based on further information about those sites.
There is a tool for mobile GIS work in Arcpad that allows you to construct a similar sampling strategy while you're still in the field. Thus you could do a cluster sampling method immediately while you're still at the site and not after-the-fact as we're doing here. In this case, you'd navigate to these point locations with GPS and dog-leash out a circle that is 56cm in radius and collect the contents.
A similar approach could be used, for example, to sample from the types of plants being grown in mixed cultivation farm plots. The study of Iraqi mortality since 2003 that we discussed also used cluster sampling methods because they would identify a neighborhood and then they would visit every household in a particular neighborhood. They used GPS in the 2004 method, but they used main streets and side streets to perform the sampling in the 2006 study which opened them up to criticism regarding bias in the sampling due to the dangerousness of main streets.
See Shennan (1997: 382-385) for more information on applying these methods to archaeology.
There are two basic categories of measures of point patterns dispersal.
In an example of this kind of analysis we might ask: is there spatial structure to the concentrations of Late Intermediate period pottery in the Callalli area?
The first basic problem is that we collected sherds into a variety of spatial proveniences: ceramic_point locations, sites, ceramic loci, even lithic loci in some cases. This is where the All_ArchID layer comes in handy. You'll recall that All_ArchID is a point layer showing the centroid of ALL spatial proveniences from our survey survey regardless of the file type. This is like the master record or the spinal column that joins all the various tables. However we still have a 1 to Many relationship issue with respect to getting point locations for every sherd. There are many sherds in one site, for example, and only one centroid to join the sherd location to in order to conduct analysis.
One way to solve this is to reverse-engineer a MANY-to-ONE join. This is not something that is done naturally in Arcmap because we're effectively stacking points on top of points, which is a bad way to organize your data. However for these particular analyses it can be useful.
Why did we do that? Recall that we started with Ceramics data so this way we're ONLY including ceramics point locations.
We've now created a point layer that has one point for every diagnostic sherd we analyzed from this region. This occurred because of the directionity of the join. We broke the cardinality between 1:Many by reversing it so we got a ton of ONES and no MANYS. In oher words, all the MANYS are their own ONES, each with a point location. We can now look into the point location patterns.
Look at the two circles. What are these depicting? It is also possible to assign weights to these measures. The "Mean Center" tool, in the same area of Arctoolbox, will show you the mean center based on particular numeric values.
These analyses quantify the autocorrelation of points based on a particular attribute value.
Spatial Autocorrelation reflects Tobler's First Law of Geography where "All things are related, but near things are more related than distant things". We will examine this using a particular numerical attribute field.
In this example we don't have many numerical fields to choose from because not many quantitative measures were taken from these ceramics. We do have diameters from rim diameter measurements.
Look at the distribution highlighted in blue. Do they look clustered or randomly distributed? Keep in mind that some points are stacked on top of each other from the Many to One stacking from earlier.
Study the resulting box. Does this confirm your visual impression of the distributions? The text at the bottom of the box shows you the proper statistical language to use when describing these results. This test is often used as an early step in data exploration to reveal patterns in the data that might suggest further exploration.
One final analysis will display the degree of clustering visually and it produces standardized index values that are more easily comparable between analyses. This analysis displays clusters of points with values similar in magnitude, and it displays the clusters with heterogeneous values.
This analysis allows you to compare non-numerical values
If the Red and Blue circles do not appear you may have to load the Cer2_Per_Lyr file manually from where ever you saved it.
These values are Z scores which means that the numbers reflect standard deviations. At a 95% confidence level you cannot reject your null hypothesis (of random distribution) unless the Z scores are in excess of +/- 1.96 (1 stan dev).
More information is available in the documentation and by the author of the geostatistics tool in an online forum.
Which points represent which time periods? Look in the Attribute Table and see which Field contains the Periods (LIP-LH, LH, etc). The field name is probably changed to something strange. Go to layer properties > Labels and label by the Period field name. Now you can see the homogeneous and heterogeneous clusters, and where they fall.
You're encouraged to read the help files and additional reading before attempting to apply these functions in your research.