Week 6 - Points and pattern analysis

Week 6 In-Class Exercises, Points and pattern analysis

PART I. Random samples and point pattern analysis

A. Download and Install Hawth's Tools for statistical analysis

We will use a third party tool to generate random sampling locations. There are thousands of useful scripts and tools for ArcGIS that you can download for free from

http://arcscripts.esri.com/


Go there now and change "All ESRI Software" field to "ArcGIS Desktop".
There are many other useful tools on Arcscripts for completing a variety of tasks. It's often a good place to go to extend the capabilities of ArcGIS.

Search for "random points" and you'll get half-dozen tools related to this function.
Download Hawth's Analysis Tools for ArcGIS 9 by Hawthorne Beyer

Don't download it from the Arcscripts site, however, because it is out of date on there (as he notes on the webpage). Click on the "www.spatialecology.com/htools" link and get the latest version from there.

Download and unzip Hawth's Tools to a folder, and install it by double-clicking on "htools_setup.exe"
Use all the default settings during install.

Open Arcmap and right-click in the vacant space on the toolbar area and select "Hawth's Tools" from the list. A small window button will appear, drag that into an open position on your toolbar.

B. Cluster sampling for Fieldwork

In order to evaluate the strength of a particular spatial distribution many spatial analysis techniques will compare a known distribution to a randomly generated sample of point locations. We will use random sampling methods in two applications.

  1. For collecting samples in the field we will have the GIS generate a number of random point locations for sampling in a polygon feature where the size of the sample (by count) is relative to the area of the feature being studied.
  2. We will use randomly generated point scatter to learn about the point density spatial analysis tools in ArcGIS.

The issue of spatial sampling is more involved than we have time to go into today. For further reference in sampling in archaeology please see

Banning, E. B. (2000). The Archaeologist's Laboratory: The Analysis of Archaeological Data. Plenum Publishers, New York. Pp. 75-85.
Drennan, R. (1996). Statistics for archaeologists: A commonsense approach. Plenum Press, New York. pp.132-134.
Shennan, S. (1997). Quantifying archaeology. Edinburgh University Press, Edinburgh. Pp. 361-400.

How big should a sample size be? There is no fixed rule about proportionality of the size of samples because sample sizes are determined by the variability (standard error) of the phenomenon being sampled and the desired statistical strength of the results.

  • In this exercise we will use Site_A and the DEM from the Callalli data so Add those to your project.

In this example we want to collect artifacts within 1 m2 area of sample points from all the archaeological sites based on area (m2) of the sites. In order to implement this we would use a GPS to guide us to these sampling locations in the field. This is a type of Cluster Sampling because you will collect all artifacts within a particular 1m2 collection location.
If we want a circular area that is 1 m2 we can either try to make a box with measuring tapes, or we can use the dog-leash method and delimit a circle that is 0.56m in radius. Since the area of a circle is pi*r2 then
r = sqrt (1 / 3.1415) and therefore r = 0.56m

Thus the most expedient way to do this is to navigate the sample locations selected below and for each location draw out a circle 56 cm in radius and collect every artifact in that circle.

  • Open the Attribute Table for Sites_A.
  • Options > Add Field…
  • Add an integer field called "Samp"

Look at the value in [Shape_Area]. Because this is a Geodatabase feature (as opposed to a Shapefile) the Shape_Area field is generated automatically and always up-to-date.


Scroll quickly down the list. What are the largest and the smallest sites you see? We want to sample a little bit from the small sites, and proportionally more from the larger sites.


Here is one possible solution.

  • Right-click "Samp" and Calculate Values…
  • Use this formula: Log ( [Shape_Area] / 10 )
We are taking the Logarithm of what?
Note that the output is nice round integers (good for sampling) and not decimal. Wouldn't you expect decimal values from the Log calculation above? It's because the Field type was Integer so it rounds the result to whole numbers.
Note that one of them is a negative value, which isn't great, but it's a very small site so we can consider not sampling it.

Now we can run Hawth's tools Stratified Sampling methods using the samp values.

  • Go to Hawth's Tools > Sampling Tools > Generate Random Points…
  • Click the "Web Help" button at the bottom and skim through his description of the tools.
  • Select Layer: Site_A
  • Enforce Minimum Distance between points: 2
  • At the bottom of the page choose "Generate number of points per polygon specified in this field:" and choose "samp"
  • Once you've done that you need to go up to "Polygon Unique ID Field" and choose ArchID from that list.
  • Output the sampling points to a file called "Sites_Rand1" in the Callalli folder.


Look at the "Sites_Rand1" attribute table and notice that each record has an ArchID value linking it to the site # in question. This could be very useful when you're actually navigating to these sampling locations. This will obviously be a one-to-many relationship with many samples from each site.

Zoom in to a single large site and look at the results. Is this a sample you'd like to collect? It would involve revisiting many places so maybe you would choose to sample from fewer sites instead based on further information about those sites.
There is a tool for mobile GIS work in Arcpad that allows you to construct a similar sampling strategy while you're still in the field. Thus you could do a cluster sampling method immediately while you're still at the site and not after-the-fact as we're doing here. In this case, you'd navigate to these point locations with GPS and dog-leash out a circle that is 56cm in radius and collect the contents.

A similar approach could be used, for example, to sample from the types of plants being grown in mixed cultivation farm plots. The study of Iraqi mortality since 2003 that we discussed also used cluster sampling methods because they would identify a neighborhood and then they would visit every household in a particular neighborhood. They used GPS in the 2004 method, but they used main streets and side streets to perform the sampling in the 2006 study which opened them up to criticism regarding bias in the sampling due to the dangerousness of main streets.

See Shennan (1997: 382-385) for more information on applying these methods to archaeology.


Part II. Point pattern dispersal studies

There are two basic categories of measures of point patterns dispersal.

  1. The first category uses only the point location and the points are basically all the same characteristics they have only their spatial position to differentiate them. An example of this might be body sherds of a particular paste type scattered throughout a site or a region.
  2. The second category uses some attribute, often numeric but sometimes non-numeric, that can be assessed relative to spatial location.

A. Spatial Structure for Point locations without values

In an example of this kind of analysis we might ask: is there spatial structure to the concentrations of Late Intermediate period pottery in the Callalli area?

The first basic problem is that we collected sherds into a variety of spatial proveniences: ceramic_point locations, sites, ceramic loci, even lithic loci in some cases. This is where the All_ArchID layer comes in handy. You'll recall that All_ArchID is a point layer showing the centroid of ALL spatial proveniences from our survey survey regardless of the file type. This is like the master record or the spinal column that joins all the various tables. However we still have a 1 to Many relationship issue with respect to getting point locations for every sherd. There are many sherds in one site, for example, and only one centroid to join the sherd location to in order to conduct analysis.

One way to solve this is to reverse-engineer a MANY-to-ONE join. This is not something that is done naturally in Arcmap because we're effectively stacking points on top of points, which is a bad way to organize your data. However for these particular analyses it can be useful.

1. Create a Ceram_Lab2_All_ArchID layer

  • Load Ceram_Lab2 and load All_ArchID from the Callalli database.
  • Recall that Ceram_lab2 is a non-spatial database so the Source table in the table of contents must be selected to even view it.
  • Right-click Ceram_Lab2 and choose Joins and Relates > Join
  • Use All_ArchID in field 2 then use ArchID for field 1 and ArchID for field 3. Click Advanced… and choose Only Matching Records

Why did we do that? Recall that we started with Ceramics data so this way we're ONLY including ceramics point locations.

  • Go to Tools > Add XY Data…
  • Choose "Ceram_Lab2_All_ArchID"
  • And for X and Y choose "All_ArchID.X" and "All_ArchID.Y".
  • For reference system choose Projected > UTM > WGS1984 Zone 19 S
  • Look at the resulting attribute table in Ceram_Lab2_All_ArchID. There should be 113 records.
  • Note that the file is "[layername] Events": it is not a regular Feature or shapefile. You must choose "Data > Export Data..." and save it out in order to proceed with this exercise. Export it to a Geodatabase Feature class (rather than a Shapefile) so that the long fieldnames will not become truncated.

We've now created a point layer that has one point for every diagnostic sherd we analyzed from this region. This occurred because of the directionity of the join. We broke the cardinality between 1:Many by reversing it so we got a ton of ONES and no MANYS. In oher words, all the MANYS are their own ONES, each with a point location. We can now look into the point location patterns.

2. Point structure for Late-Intermediate/Late Horizon period points

  • Open Attribute table for Ceram_Lab2_All_ArchID and sort Ascending on Ceram_Lab2.Period_gen
  • Select all the records that have LH as a value
  • In the Toolbox choose Spatial Statistics Tools > Measuring Geographic Distributions > Standard Distance
  • Choose the Ceram_Lab2_All_ArchID file as an input, for output call it CerLH_StanDist and leave it as one standard deviation.
  • Run the operation and look at the resulting circle.
  • Return to the Attribute Table for Ceram_Lab2_All_ArchID and choose just the LIP-LH rows and rerun the operation. Call the output “CerLIP_StanDist”.

Look at the two circles. What are these depicting? It is also possible to assign weights to these measures. The "Mean Center" tool, in the same area of Arctoolbox, will show you the mean center based on particular numeric values.


B. Point pattern distributions for points with values.

These analyses quantify the autocorrelation of points based on a particular attribute value.

Spatial Autocorrelation reflects Tobler's First Law of Geography where "All things are related, but near things are more related than distant things". We will examine this using a particular numerical attribute field.

In this example we don't have many numerical fields to choose from because not many quantitative measures were taken from these ceramics. We do have diameters from rim diameter measurements.

  • Sort the Ceram_Lab2_All_ArchID table by "Diameter" and select only the rows that have values (hint: sort Descending). We could use Part:rim attribute to select these, but a few rims were too small to get diameter values for and so their value is empty.

Look at the distribution highlighted in blue. Do they look clustered or randomly distributed? Keep in mind that some points are stacked on top of each other from the Many to One stacking from earlier.

  • In Toolbox choose Spatial Statistics Tools > Analyzing Patterns > Spatial Autocorrelation
  • Choose the Ceram_Lab2_All_ArchID table
  • Choose Ceram_Lab2.Diameter for the Field
  • Check the box for Display Graphically
  • Run the function

Study the resulting box. Does this confirm your visual impression of the distributions? The text at the bottom of the box shows you the proper statistical language to use when describing these results. This test is often used as an early step in data exploration to reveal patterns in the data that might suggest further exploration.

C. Cluster / Outlier Analysis (Anselin Local Moran's I)

One final analysis will display the degree of clustering visually and it produces standardized index values that are more easily comparable between analyses. This analysis displays clusters of points with values similar in magnitude, and it displays the clusters with heterogeneous values.

This analysis allows you to compare non-numerical values
  • Unselect all records in all tables (go to the Selection menu and choose "Clear Selected Features").
  • In Toolbox choose Spatial Statistics Tools > Mapping Clusters > Cluster/Outlier Analysis with Rendering
  • For Input Feature choose Ceram_Lab2_All_ArchID table
  • For Input Field choose Ceram_Lab2.Period_gen (note that these are non-numerical)
  • For Output Layer save a file called Cer2_Per_Lyr
  • For Output Feature save a file called Cer2_Per (note that these file names cannot be the same. If it does not run the first time try using different file names for the Output Layer and Output Feature)
  • Run the operation

If the Red and Blue circles do not appear you may have to load the Cer2_Per_Lyr file manually from where ever you saved it.

These values are Z scores which means that the numbers reflect standard deviations. At a 95% confidence level you cannot reject your null hypothesis (of random distribution) unless the Z scores are in excess of +/- 1.96 (1 stan dev).
More information is available in the documentation and by the author of the geostatistics tool in an online forum.

Which points represent which time periods? Look in the Attribute Table and see which Field contains the Periods (LIP-LH, LH, etc). The field name is probably changed to something strange. Go to layer properties > Labels and label by the Period field name. Now you can see the homogeneous and heterogeneous clusters, and where they fall.

You're encouraged to read the help files and additional reading before attempting to apply these functions in your research.