Exercises completed during class and lab hours.
See newer versions of these workshops here: Geospatial Archaeology, spring 2015
In this exercise we will learn how to create a polygon shapefile by digitizing the polygon from an satellite image of the pyramids of Giza.
You just brought in raw point data from an XY text table. This is a very powerful technique for importing raw data tables from books, data sets online, and a variety of input tools like GPS units, total stations, laser transits.
As you can see a powerful part of GIS is the ability to combine tabular data with geographical location. The Spatial Join function is very useful for combining datasets that are not otherwise compatible because they use different indexing systems or naming conventions, or they are in different languages. The common reference of spatial position brings them together.
This week's class will focus on data organization and management for GIS applications to anthropological research. Specifically, we will spend time discussing the importance of relational databases and means of organizing and querying data organized into related tables.
I will also describe approaches to indexing data for anthropological fieldwork. The ID number system at various spatial scales and methods for linking tables between ID number series.
In lab this week we will focus on a-spatial data management techniques in ArcGIS. Joining and relating tables is are powerful tools, but it also creates a structure that can make queries and representation more difficult.
Download Callalli Geodatabase.
Look the data in Arcmap.
Display. Turn off ArchID_centroids.
Discuss file formats: coverages, shapefiles, GDB. Raster / GRID.
Problem 1:
Ceramics Lab results. Cleaning up the database logic.
Working at the artifact level of analysis the ID number system is problematic because you cannot refer to a single artifact from an index field.
What are some solutions?
1. Open the Ceramics_Lab2 attribute table.
Do you now have a six digit ID#? That’s your new unique ID number that allows you to refer to individual artifacts.
2. Cut off the “diameter” measure for more efficient design
This is somewhat extreme "database normalization", but you can see the principal behind it.
Other selection methods. Try selecting data in the Ceram_p (not the lab file) file using Select by Attributes. The text you enter in the box is a modified SQL query. Try to select all of the Bowls from the LIP period and look at the selection on the map.
Problem 2:
Show relationship between size the archaeological site (m2) and proportion of obsidian artifacts in the lab2 analysis.
The basic problem here is that there are MANY fields in each site. In order to represent the data on the many site we need to collapse the contents of the Many table into units so the One table can symbolize it.
Steps:
1. Problem. You’ve explored the data in Site_A and in Lithics_Lab1 and you recognize that you have to collapse the values in Lithics_Lab1 before you can show them on the map. You can collapse table values with a tool called Pivot Tables and then by Summarizing on the SiteID column.
First, make sure there are no row selected in the "Lithics_Lab1" table.
Open the Pivot Tables function in the toolbox under Data Management > Tables > Pivot Tables
2. Pivoting the data table. In the Pivot Tables box select “Lithics_Lab1” as your pivot table. This is the table that has the quantity of data we want to reduce. We are effectively swapping the rows and columns in the original table.
You can use the default filename. It should end up in your geodatabase. Have a look at the contents of the output table. Note that there many SITEID values that are NULL. That is because there were artifacts collected outside of sites in some cases. Note, also that the MATERIAL TYPE values have become column headings (Obs, NotObs), but there are still duplicate sites in rows as you can see in the SITEID column.
Look at the resulting table. Is there only one SITEID value per row? Why are there so many with a <NULL> SITEID value? Where did that data come from?
3. Representing the data. Now that the results for each site is found on a single line you can Join it to the Site_A feature.
Look at the resulting table. Does it contain all the information you need collapsed onto rows? Look at the number of records. There were 88 in the original Site_A table, now there are fewer (or there will be if you update by scrolling down the table). Why is that? The original question asked about the relationship between site size and presence of obsidian. How are you going to measure site size?
4. Symbolizing the data. You can show these data using a number of tools. For example, for a simple map you could divide the weight of obsidian by the weight of non-obsidian per site and symbolize by the ratio of obsidian over non-obsidian.
Here, we will do something slightly more complex and use a pie chart to show the relationship with respect to size.
As mentioned in class, next week we will focus on methods of bringing data into GIS. Which project area do you plan to work in? We will begin constructing acquiring data (imagery and topographic data) next week so please decide on a project area by then.
You will need Latitude/Longitude coordinates in WGS1984 for the four corners of a bounding box around your study area. An easy way to get these values in in GoogleEarth. You'll want to pick a region about the size of a County in the US for this assignment.
Write down these four corner coordinate values. We'll use them in class next week.
Exercises: finding datasets, global coverage:
Data acquisition methods for anthropological research.
Exercises: finding datasets, global coverage:
Data acquisition methods for anthropological research.
This class session will focus on means of incorporating new and existing datasets into a GIS environment.
Incorporating new and existing data into a GIS is a major obstacle to applying digital analysis methods to anthropological research. How do you acquire digital data in remote locations with poor access to electricity? How do you reconcile the differences in scale between
These methods deserve a one quarter class apiece and there is insufficient time to go into detail on remote sensing or total station data acquisition.
This week we will focus on the input of digital data. There are a variety of imagery and scanned map sources available on the web and in the first exercise you will georeference a scanned map of South America (7mb).
Note that this JPG map has Lat/Long coordinates on the margins of the map. These will allow us georeference the data provided we know the coordinate system and datum that the Lat/Long gradicule references. If we did not have the Lat/long gradicule we may be able to do without it by georeferencing features in the map such as the hydrology network, major road junctions and other landscape features to satellite imagery in a similar way. But what projection and coordinate system were they using? Notice that the lines of longitude are converging towards the south. This is a projection used for displaying entire continents known as "Azimuthal Equidistant"
Note that there is a lot of error in the result. These are shown as blue Error lines in Arcmap. Perhaps we should have georeferenced this map to a Conic projection. There are helpful resources online for learning about map projections. These include
As you'll see in the exercise below, it's much easier to georeference GoogleEarth imagery because the latitude/longitude graticule consists of right-angles. It is a flat geographic projection.
Acquiring and Georeferencing data for your study area.
Visit the area in GoogleEarth (download free GoogleEarth installer, if necessary) and georeference data using these instructions
Finally, as part of this assignment:
Incidentally, with ArcGIS 9.2 the GoogleEarth vector format (KML) is supported for direct read/write from within Arcmap.
1. Next, we will acquire topographic data for your study area from the SRTM data (CGIAR).
Instructions for converting GeoTIFF DEM data to GRID (for both SRTM and ASTER DEM data)
setnull ( [GRIDNAME] < 0 , [GRIDNAME] )
That command sets all topographic values less than 0 (sea level) and the no-data collar area to <NULL>.
If your study area is on the edge of various tiles you can merge them into a single large GRID by using the following method:
Zoom out to show the whole study area and go to "Options..." in the Spatial Analyst, choose "Extent" tab, and scroll the list upward to "Set the Extent to the same as Display".
Return to the Spatial Analyst > Raster Calculator and do this
mosaic ( [GRIDNAME1] , [ GRIDNAME2], [GRIDNAME3] )
with as many grids as you need.
2. To get topographic data that is 30m (3x higher resolution than SRTM), try to acquire topographic data from ASTER. The problem with ASTER is that it is from individual satellite images and so there are a lot of edge issues, and also any clouds will appear as big holes in your dataset. However, ASTER is probably the best free DEM layer you'll get in less-developed nations of the world, and you also can get a 15m resolution satellite image that registers perfectly to the DEM for further analysis. It's definitely worth the time searching for ASTER imagery.
To be clear: by locating a cloud free ASTER scene you can get both 30m DEM (topography) and 15m multiband imagery from that particular scene.
ASTER DEM arrives as GeoTiff and it can be converted to GRID using the same instructions mentioned above (section B1) for SRTM data.
ASTER imagery processed to the L1B level and acquired in a multiband format called HDF-EOS Swath. The HDF format can not be directly imported into ArcGIS but you can convert these filese to GeoTIFF using free software following the directions posted here.
Visit the Univ of Maryland data depot and acquire a scene of Landsat TM+ imagery for your area using Map search.
Some countries serve geographic datasets online and you may download these from government-specific servers. For example, there are many datasets for regions in the United States on federal and state GIS servers and the European Union serves a variety of datasets online as well.
Research taking place in other countries will find that policies vary widely for two principal reasons. First, in the case of many less-developed countries, there may be few detailed digital data available for the public. Secondly, many countries have strict copyright restrictions on government data and serving those datasets on line for reuse in a GIS and maps for publication would violate copyright laws.
In these cases, global datasets like the raster data presented above and a few global vector data sources, are particularly valuable. Vector data at 1:1 million and 1:250,000 scale are available in the public domain from an older US Government project known as Vector Smartmap (VMAP).
The official server for VMAP data is NIMA's Raster Roam server, however it is easier to acquire VMAP0 and VMAP1 datasets from MapAbility.com. These data are in WGS84, decimal degree. ArcGIS can read the VPF format directly and you can save them out as Geodatabase or Shapefile.
You can download the VMAP0 level data on a country by country basis, rather than in large tiles, at the Digital Chart of the World.
The less-detailed VMAP0 data is largely consistent with the ESRI World dataset that you may have on your computer. The 2006 edition of ESRI Data and Maps comes on five DVDs with the global SRTM topographic data set included.
There have been efforts to get the US Government to release the entire VMAP1 dataset because it is not technically classified (it is "limited distribution") and as an unclassified, tax-payer funded project there is an obligation to release the data under the Freedom of Information Act.
Other useful sources of free geographical data are indexed online at websites like GlobalMapper. Searching the web with terms like your geographical area and "shapefile" can sometimes turn up useful project-specific datasets served online.
These US Government data are in the public domain, but in publications and websites using these data you should include a line thanking the source. It might read something like this
"We want to thank NASA and JPL for allowing the use of their ASTER and Landsat Images."
Further discussion of database normalization and relates, overlays, buffers, and other means of working with GIS data in a relational database structure.
In class presentations.
Manipulating data through queries, overlays, and other methods become more complex with relational databases, but understanding how to work with these data is fundamental to organizing a project in digital structure.
1. Linking related tables through primary key with a spatial index
We learned that anthropological data often involves working with data in related tables that are linked (Joined or Related) through unique ID numbers. The theory behind design of relational databases is known as Database Normalization.
When detailed data is held in non-spatial tables (from lab analysis, for example) it can be difficult to connect the data in these attributes tables back into physical space for analysis and mapping. Recall that we used a Primary Key (ArchID) for indexing all spatial proveniences and then a CatID key for individual artifacts within those spatial areas.
One expedient solution for assigning space to non-spatial attributes is to have a single point type feature that contains a single point for every ArchID number regardless of whether it is the location of a locus of ceramics inside of a site (a polygon inside a site polygon), an isolated projectile point (a geographical point), or a single core at a site (point inside a polygon). All ArchID numbers are reconciled in a single point feature class and therefore space can be assigned to every item in some form and all these non-comparable attribute tables can be effectively “zipped together” into space.
What about the artifacts collected from line and polygon features? How do you mix spatial location for points and polygons into a single file? For the purposes of organization, you can turn lines and polygons into centroids. Centroids are defined in a number of ways, but think of them as the center of gravity for a line or polygon. In the Callalli database the All_ArchID shapefile is a collection of point locations and centroids for all of the ArchID numbers from the Callalli area. Keep in mind that a number of items with different CatID (from lab work) can be assigned to a single ArchID (hence, a single point), and that there are a number of limitations to using centroids for analysis.
This method is a solution for dealing with the “Many to Many relationships” situation.
If you don't already have it, download, unzip, and load the data from the Callalli geodatabase.
Quick exploratory task: Make a map showing the locations of all obsidian bifaces and projectile points because we want to know if they are close to rivers.
One possible solution to this task is as follows
Look at the data organization. As this is a non-spatial table it may not appear in the Table of Contents unless you have the “Source” tab selected at the bottom of the list.
The two major problems are:
Take a look at the Attribute table for the ArchID layer. This is derivative feature containing centroids for lines and polygons, and point locations for all instances of each ArchID field. Look in the table and notice that it’s a Point type geometry file with a single Point record for each ArchID. The “Filetype” field tells you what file type the original ArchID number belongs to so we can relink these centroids to their original polygons if needed down the road. Also notice the PhotoURL field contains links to site photos. We’ll explore that in the future.
We need to link the table “Lithics_Lab2” with All_ArchID points… but there are many Lab results for each ArchID so we can not use the JOIN function because this is a one to many (1:M) situation.
Instead we will use the RELATE command
- ARCHID
- Lithics_Lab2
- ARCHID
- type in “ArchID-LithLab2”
Now we need a selection in the Lithics_Lab2 table.
Another way to make or modify a selection is as follows
Your line should look like this
[Form] = 'Biface Broken' OR [Form] = 'Biface' OR [Form] = 'Proj Point Broken' OR [Form] = 'Proj Point'
You’ve got a subset of the table. Notice the number of artifacts that are selected. Now in order to see these geographically we need these features selected in the All_ArchID point layer.
The All_ArchID attribute table will appear with only those records selected that are contain obsidian bifaces and proj points. How many are there? Many fewer, what happened to the number of records highlighted in the previous selection?
Close this attribute table and you will see on the map that those point locations are highlighted. Are they close to rivers?
In this part we learned that a single point feature (All_ArchID) can be used to give geographical position to non-spatial tabular data provided that they have some reference to spatial data (such as a unique ARCHID number). Through the use of Relates and linked through a common reference table the All_ArchID table links different data sets using space as a common referent. You can quickly create selections in one dataset and move that selection into other datasets. Think of the All_ArchID table as the spinal column that links different elements of the skeleton.
Note that we used two methods of querying: the first was to just Sort Ascending for a particular field and select from that group. The second was to use text-based queries to construct an explicit query. For simple selections the first method is easier, but for complex queries, non-contiguous groups, or to store the query for later reference the latter is better.
Keep in mind that All_ArchID is only centroids so they don’t have the full information of the original points, lines, and polygons. However, this single spine can be used to jump between related selections.
For example, if we Relate All_ArchID to the Sites_A field we could find out the size of the sites that have obsidian bifaces. Are the sites with obsidian predominantly large or small sites?
Lets try that. Without losing your previous selection,
- SITEID
- Site_A
- ARCHID
- call this “All_ArchID-Site_A”
These techniques, combined with Pivot Tables and Joins that you learned in labs during weeks 1 and 2, give you the basic tools to explore and manipulate data in databases organized through relationships. One of the advantages of using the ESRI Geodatabase instead of the older Shapefile format is that you can build permanent Relationships directly into the geodatabase so you don’t have to recreate them for each Arcmap MXD project.
There are still some cumbersome elements to related tables organization, for example symbolizing and labeling through relates is still not easily available in Arcmap. However with clean data organization you can move through joins and relates to produce special joins for specific map projects that require this kind of symbology.
Buffers are extremely versatile in GIS. They are commonly used for expanding or contracting a polygon by a constant value, however they can also be used other useful functions. For example, you can use buffers convert points and lines into polygon areas (corridors). You can use buffers to evaluate spatial proximity from one theme, like a road, on another theme, like an archaeological site.
Exploratory question: what is the relationship between the presence of obsidian in sites and the proximity to rivers in the Callalli area?
We will use the “Obsidian by Site” table you created in Lab 2.
Recall that this table aggregated lithic locus data to by site in order to compare the density of obsidian and non-obsidian by site.
There are numerous ways to calculate distances in Arcmap. We will explore two methods today.
The Obsidian_By_Site table is a-spatial so you have to Join that table to a spatial table. Let’s use All_ArchID because we know that every value in the Obsidian_By_Site table will have an entry in the All_ArchID point file (can you think of any problems with using points instead of site Polygons for this analysis?)
- SITEID
- Obsidian_By_Site table
- SITEID
Now we need to know where these features fall with respect to the hydro buffers layer underneath it. Use the toolbox Analysis Tools > Overlay > Identity… to compile all the attributes from the two layers into a point layer.
Use All_ArchID as the Input Feature and HydroBuff as the Identity feature. Click OK.
Look in the attribute table of All_ArchID_Identity” and notice you have a DISTANCE field (distance from river, by buffer class) and Site_Sum_Obsidian. These results could be explored numerically (is there a meaningful relationship?), or they could be symbolized together.
What if you want to know the actual distance in meters between a point and the river, and not the grouped buffer class of these distances? For this we need to use the Raster data model (continuous values).
What happened? Zoom to Layer of the output layer (EucDist_hydr2).
Look carefully at the classes produced. Those are JUST for visualization.
Have a careful look at the a-spatial output table (by default its called zstat) and see if you understand the results.
Note that the Value field is ArchID. That means you can use a Join to connect All_ArchID to the zstat table and connect the output table to spatial locations through that join.
Again, these can be symbolized or explored numerically.
Continuous data is represented in the form of cellular or “gridded point” in a GIS. This data model is suitable for working with themes that consist of continuous values across spaces. Examples include: elevation, vegetative cover, temperature, and barometric pressure.
In archaeological applications a raster theme might include all artifacts of a particular class, such as obsidian flakes, across a space, or all the occupation areas from a particular time period. Generally the rasters are thematically simple: one attribute value is shown varying across space. Data is organized logically into layers or surfaces. With Raster data we are moving beyond the simple data management, cartographic applications of GIS into the more interesting realm of theoretical and analytical problems that you could not accomplish without a computer.
You’ve already worked with Rasters in this class in previous labs. The GoogleEarth and imagery, the Landsat scene, and the SRTM elevation data were examples of raster data.
You should write down these densities for later reference below.
- Site_A: the site boundary polygon. For the purposes of this demonstration all the sites in Site_A have a minimum density of 1 flake per square meter.
- Lithic_A: the lithic locus polygon. Here the density is determined from the C1_DENS field. Low = 3 flakes / m2, Med = 7 flakes/m2, and High = 15 flakes/m2.
These data were gathered using a Mobile GIS system. With the GPS running I walked around different densities of artifacts I perceived on the ground.
We want to repeat this operation with the Site_A data, but as these are not lithics-specific (they are merely site boundaries) we need to prepare the Site_A file for conversion to raster.Input Features: Lithic_A
Field: C1_DENS
Output Cell Size: 1 (meter)Output Raster: <click the Folder and go to your /callalli/ work dir and create a new folder called “Lithics”. Save this raster called “LociDens” into /callalli/lithics/
The format is
MERGE ( [GRID1] , [GRID2] )
and so on, substituting the GRID names for GRID1 etc. In this case, GRID1 should be "Reclass of LociDens and GRID2" is "SiteDens"
Unlike the Mosaic command you used previously, the Merge command does NOT blend the GRIDs together and the cell values remain discrete. Also, the order of the GRIDs in the list is important, so the GRID1 is on top of GRID2 in the output. Since Sites are bigger than Loci spatially, we want SiteDens at the bottom.
Notice that what you have is a single raster surface representing a theme "Lithic density" from different GPS derived files. The cells themselves are 1 m2 and the value of the cell is the estimated density of lithics found in that square area.
If you’re satisfied with the output we need to save this file in a good place. The output of your raster calculator command is a temporary GRID file called “Calculation”. Since we are Make this a permanent file, right-click > Make Permanent…. And put that file in the /callalli/lithics/ and call it “LithicDens”.
Save this file. We will work with it further in the coming week.
5. Some words about working with Raster data in ArcGIS.
GRID files are often single band, but they can be combined into a Multiband grid. Thus you could have Chert and Obsidian lithic layer separated into different GRIDs but stored together. We used the Combine Bands command in Week 3 lab to combine Landsat bands that were stored as individual GRID files into a single larger R,G,B file. This command is an update of the GRIDSTACK command.
GRID files are an internally compressed binary format that is the native Raster format for ESRI products. Often transformations of other Raster formats that ArcGIS supports, such as TIFF, JPG, IMAGINE (IMG) require converting each band separately into GRID files and doing the operation, such as reprojecting to a new coordinate/datum, and then re-combining the GRIDs.
GRID files do not always behave as you expect because it is an old format that doesn’t necessarily comply with the Windows file explorer.
Here are some hints for working with GRID files.
- keep file names short and try to avoid having any spaces in your paths. This means “My Documents” is out because there’s a space in the folder name, and there are also spaces earlier in the path as well. Often people will work in “C:\GISDATA\” or something simple like that. ArcGIS9 is better at handling spaces in GRID filenames, but it is still best to avoid this practice.
- Look at your GRID files in Windows Explorer. Browse to My Computer and locate your data and look in the folder where your raster is located. There is a folder named for the GRID, but there’s also an “info” directory and a .AUX file. The point is, this isn’t a normal file system format and you can’t move GRIDs in Windows! Use ArcCatalog to rename, copy, and delete GRID files.
- It is important to be tidy and under control when working with GRID files because they can be very large. You don’t have space to store mistakes and mix them with good data. Pay attention to where you’re saving files and as soon as possible (before finishing for the day) go back and clean up the experimental junk files and the files from intermediary steps that inevitably result from working in Spatial Analyst.
- keep GRID files segregated into their own folders whenever possible. This makes it easier to deal with GRID files later because they don’t share an info directory. You CAN move GRID files, zip them, copy them around in Windows as long as you take their enclosing folder and you don’t touch the stuff next to the info directory. If you see an info directory, don’t touch anything associated with GRIDs in that folder.
- Finally, Spatial Analyst creates a lot of Calculations and these are stored in the TEMP folder that is specified in Spatial Analyst > Options… You don’t want to leave your stuff as Calculations in this class because they will get wiped off the drive. Right Click all the Calculations you want to save and choose Make Permanent… and consider putting them in their own folder!
Ok, enough eye-candy for now. Quit ArcScene and get back to work in Arcmap...
The DEM data you acquired from SRTM can be used cartographically for representation of the land surface, but it also forms the basis for a number of analyses that you may compute.
Lets say we're designing a project where we need to ignore all slopes over 15 degrees because it is too steep for the features we're looking for. How do we delimit those areas?
1. Calculating slopes:
2. Choose Spatial Analyst > Reclassify….
3. Choose Spatial Analyst > Convert > Raster to Vector…
1. Hillshade - cartographic
Note that these data are Z values in meters on in a UTM coordinate (metric) horizontal space. When you work with SRTM data chances are it'll be in Decimal Degrees horizontal units, but metric Z (altitude) units. Because of this inconsistency, use a Z factor of 0.00001 when producing Hillshades from a Lat/Long DEM such as your SRTM dataset.
2. Add contour lines
3. Highlight and label index contours.
Cartographers usually have bold contours at regular intervals to help the eye follow the lines.MOD( "CONTOUR" , 100) = 0
Do you understand what this query does? Modulus is the remainder after a division operation. Thus, this query selects the contours that can be evenly divided into 100.
You want it to read: [index] = [CONTOUR] for the selected rows.
You'll have two copies of Contours25. Rename one of them Contours100.
How does it look? See if you can put a MASK behind the contour in Symbol > Properites > Mask… then set the white Halo to 1.5
Click map to see a topo map (diagnostic ceramics theme) for the Callalli area from my dissertation.
Finally, this week the task is to use these techniques to develop an attractive topographic map for your study region using your SRTM data.
The issue of spatial sampling is more involved than we have time to go into today. For further reference in sampling in archaeology please see
Banning, E. B. (2000). The Archaeologist's Laboratory: The Analysis of Archaeological Data. Plenum Publishers, New York. Pp. 75-85.
Drennan, R. (1996). Statistics for archaeologists: A commonsense approach. Plenum Press, New York. pp.132-134.
Shennan, S. (1997). Quantifying archaeology. Edinburgh University Press, Edinburgh. Pp. 361-400.
How big should a sample size be? There is no fixed rule about proportionality of the size of samples because sample sizes are determined by the variability (standard error) of the phenomenon being sampled and the desired statistical strength of the results.
In this example we want to collect artifacts within 1 m2 area of sample points from all the archaeological sites based on area (m2) of the sites. In order to implement this we would use a GPS to guide us to these sampling locations in the field. This is a type of Cluster Sampling because you will collect all artifacts within a particular 1m2 collection location.
If we want a circular area that is 1 m2 we can either try to make a box with measuring tapes, or we can use the dog-leash method and delimit a circle that is 0.56m in radius. Since the area of a circle is pi*r2 then
r = sqrt (1 / 3.1415) and therefore r = 0.56m
Thus the most expedient way to do this is to navigate the sample locations selected below and for each location draw out a circle 56 cm in radius and collect every artifact in that circle.
Look at the value in [Shape_Area]. Because this is a Geodatabase feature (as opposed to a Shapefile) the Shape_Area field is generated automatically and always up-to-date.
Scroll quickly down the list. What are the largest and the smallest sites you see? We want to sample a little bit from the small sites, and proportionally more from the larger sites.
Here is one possible solution.
We are taking the Logarithm of what?
Note that the output is nice round integers (good for sampling) and not decimal. Wouldn't you expect decimal values from the Log calculation above? It's because the Field type was Integer so it rounds the result to whole numbers.
Note that one of them is a negative value, which isn't great, but it's a very small site so we can consider not sampling it.
Now we can run Hawth's tools Stratified Sampling methods using the samp values.
Look at the "Sites_Rand1" attribute table and notice that each record has an ArchID value linking it to the site # in question. This could be very useful when you're actually navigating to these sampling locations. This will obviously be a one-to-many relationship with many samples from each site.
Zoom in to a single large site and look at the results. Is this a sample you'd like to collect? It would involve revisiting many places so maybe you would choose to sample from fewer sites instead based on further information about those sites.
There is a tool for mobile GIS work in Arcpad that allows you to construct a similar sampling strategy while you're still in the field. Thus you could do a cluster sampling method immediately while you're still at the site and not after-the-fact as we're doing here. In this case, you'd navigate to these point locations with GPS and dog-leash out a circle that is 56cm in radius and collect the contents.
A similar approach could be used, for example, to sample from the types of plants being grown in mixed cultivation farm plots. The study of Iraqi mortality since 2003 that we discussed also used cluster sampling methods because they would identify a neighborhood and then they would visit every household in a particular neighborhood. They used GPS in the 2004 method, but they used main streets and side streets to perform the sampling in the 2006 study which opened them up to criticism regarding bias in the sampling due to the dangerousness of main streets.
See Shennan (1997: 382-385) for more information on applying these methods to archaeology.
There are two basic categories of measures of point patterns dispersal.
In an example of this kind of analysis we might ask: is there spatial structure to the concentrations of Late Intermediate period pottery in the Callalli area?
The first basic problem is that we collected sherds into a variety of spatial proveniences: ceramic_point locations, sites, ceramic loci, even lithic loci in some cases. This is where the All_ArchID layer comes in handy. You'll recall that All_ArchID is a point layer showing the centroid of ALL spatial proveniences from our survey survey regardless of the file type. This is like the master record or the spinal column that joins all the various tables. However we still have a 1 to Many relationship issue with respect to getting point locations for every sherd. There are many sherds in one site, for example, and only one centroid to join the sherd location to in order to conduct analysis.
One way to solve this is to reverse-engineer a MANY-to-ONE join. This is not something that is done naturally in Arcmap because we're effectively stacking points on top of points, which is a bad way to organize your data. However for these particular analyses it can be useful.
Why did we do that? Recall that we started with Ceramics data so this way we're ONLY including ceramics point locations.
We've now created a point layer that has one point for every diagnostic sherd we analyzed from this region. This occurred because of the directionity of the join. We broke the cardinality between 1:Many by reversing it so we got a ton of ONES and no MANYS. In oher words, all the MANYS are their own ONES, each with a point location. We can now look into the point location patterns.
Look at the two circles. What are these depicting? It is also possible to assign weights to these measures. The "Mean Center" tool, in the same area of Arctoolbox, will show you the mean center based on particular numeric values.
These analyses quantify the autocorrelation of points based on a particular attribute value.
Spatial Autocorrelation reflects Tobler's First Law of Geography where "All things are related, but near things are more related than distant things". We will examine this using a particular numerical attribute field.
In this example we don't have many numerical fields to choose from because not many quantitative measures were taken from these ceramics. We do have diameters from rim diameter measurements.
Look at the distribution highlighted in blue. Do they look clustered or randomly distributed? Keep in mind that some points are stacked on top of each other from the Many to One stacking from earlier.
Study the resulting box. Does this confirm your visual impression of the distributions? The text at the bottom of the box shows you the proper statistical language to use when describing these results. This test is often used as an early step in data exploration to reveal patterns in the data that might suggest further exploration.
One final analysis will display the degree of clustering visually and it produces standardized index values that are more easily comparable between analyses. This analysis displays clusters of points with values similar in magnitude, and it displays the clusters with heterogeneous values.
This analysis allows you to compare non-numerical values
If the Red and Blue circles do not appear you may have to load the Cer2_Per_Lyr file manually from where ever you saved it.
These values are Z scores which means that the numbers reflect standard deviations. At a 95% confidence level you cannot reject your null hypothesis (of random distribution) unless the Z scores are in excess of +/- 1.96 (1 stan dev).
More information is available in the documentation and by the author of the geostatistics tool in an online forum.
Which points represent which time periods? Look in the Attribute Table and see which Field contains the Periods (LIP-LH, LH, etc). The field name is probably changed to something strange. Go to layer properties > Labels and label by the Period field name. Now you can see the homogeneous and heterogeneous clusters, and where they fall.
You're encouraged to read the help files and additional reading before attempting to apply these functions in your research.
Comparing point patterns
This exercise will introduce functions found in the Geostatistical Analyst toolbar.
Introduction
The lab is organized around the following question in the Callalli data:
We know that obsidian is found about 15 km to the southwest of the area, but that chert cobbles are found in the local geology and the cobbles can be found in streambeds. We noted during fieldwork that chert artifacts are commonly found along the banks of the streams where there appeared to have been chert tool manufacture occurring. Obsidian appeared to be primarily located in discrete deposits that were imported from the source region.
On the principal of least effort we might expect that larger artifacts, particularly cores and concentrations of flakes, would be encountered close to the rivers in the case of chert but not in the case of obsidian. We could structure this in terms of hypothesis testing (H0: no relationships between weight of artifacts and proximity to streams; H1: concentrations of heavy chert artifacts are found in proximity to streams). However, a visual exploration of the relationship is also informative.
The Lithics_Lab2_All_ArchID table is constructed the same way as we created the Ceram_Lab2_All_ArchID last week. It is a de-normalized table where points were created for every artifact from our lab analysis using the best spatial provenience available.
Note that there are 390 artifact rows.
Look at the Attribute Table and observe the types of data involved. We have projectile points and other lithic tools, and a small sample of flakes. It turns out these flakes are only from a couple of sites, so lets exclude them. Also, we only want to work with Chert and Obsidian, so first we will select out the Chert non flakes using the Definition Query.
Your query should look something like
("Lit_Mat" = 'Chert' AND "Form" <> 'Flake Complete') and ("Lit_Mat" = 'Chert' AND "Form" <> 'Flake Broken')
Build it yourself so you remember how… don't just copy that text!
Note that we can now work with subsets of that single table without altering the original table. How many records are found in each Attribute Table?
B. Spatially explore the table values by Frequency.
First, turn on the Geostatistical Analyst. This takes two steps
We have Coinciding Samples… Choose “Include All”.
Why do we have spatially coinciding samples?
Note that you can interactively select by
- spatial position in the map (using the blue selection arrow)
- attribute characteristics from the table
- frequency (histogram) of values in Explore window
This tool is very useful for data exploration and pattern recognition.
This display also allows you to include histograms as well as values like Count, Mean, St. Dev., etc. on your map layout (using the Add to Layout button) where you can also export them to PDF for use in Word documents, etc.
Apply the Cluster/Outlier Analysis function that we used in lab last week
Input Feature: Chert Tools…
Input Field: Wt_g10
Output Layer File (click the Folder and name it: ChertWtLyr1)
Output Feature Class (click the Folder and name it: ChertWt1)
Look at the patterns. Zoom and and investigate the red and blue dots. These are features with statistically significant autocorrelation (positive or negative).
The values are Z scores which means that the numbers reflect standard deviations. For example, at a 95% confidence level you cannot reject the null hypothesis (of random distribution) unless the Z scores are in excess of +/- 1.96 (1 stan dev).
Look at those values in the Histogram and in the Attribute table. Can you figure out why they have high/low spatial autocorrelation values?
The Geostatistical Analyst provides tools for examining these kinds of patterns in detail.
In this brief introduction we will simply produce an IDW surface using the default values and compare it visually to the known patterns.
We don't have time to explore all the feature here, but this window is concerned with the directionality and magnitude of the interpolation.
This window displays the interpolation algorithm and allows you to reject specific interpolated (predicted) values that diverge from the Measured values.
Study the results. Compare the Clusters/Outliers results (red/blue dots) with the geostatistical wizard output.
It is somewhat difficult to compare the concentrations of heavy chert and obsidian artifacts from these rasters because they obscure each other.
You can compare the prediction strength in both layers
There are other powerful interpolation tools in the Geostatistical Analyst including kriging and semivariogram production. If you choose to use these methods you should research the most appropriate interpolator for your task.
The Subset function is also quite powerful in Geostatistical Analyst. It allows you to set aside a portion of your data for validation purposes. We will use this function in the coming weeks in the locational modeling exercise.
As discussed in class, predictive models range in complexity from a simple collection of landform criteria for planning efficient survey strategies, to complex logistic regression functions where the contribution of each variable is made explicit. As described in the readings, these various models can be categorized by three major elements: the types input data, types of output data (form of results), and goals of the model.
A simple Rule-based survey model can be constructed by assembling various criteria and progressively delimiting the terrain included in the survey. For example, one might survey all terrain in a river valley that is less than 500m from the highest river terrace, and slopes under 15 degree slopes. The construction of such a model in Arcmap would involve several Spatial Analyst operations.
First, all the land below the high terrace is delimited with a polygon.
Next, the distance from that polygon can be calculated using
Toolbox > Spatial Analyst Tools > Distance > Euclidean Distance...
for a more complex approach consider using a "cost distance" model such as Tobler's hiking function based on travel time that can be implemented in
Toolbox > Spatial Analyst Tools > Distance > Cost Distance...
The resulting distance GRID can be classified into two regions: one within (with 0 values) and one outside of (1 values) the specified travel distance (or time) using Spatial Analyst > Reclassify...
Next, the land under 15 degree slope can be selected by reclassifying the output from a Spatial Analyst Slope calculation. Again if two GRIDs are specified with 0 and 1 values.
Finally, these output GRIDs can be combined in raster calculator (GRID1 + GRID2) and all the places with "2" in the output are in the overlap zone between the two GRID files.
The output might be converted to Vector (polygon) and mobile GIS can be used to guide the survey to remain within the bounds specified with the model.
Consider do a swath of "100%" coverage in order to test the assumptions of your model. What kinds of sites were found with the 100% survey that were missed in the model survey?
Keep in mind that you are not modeling human behavior -- you are modeling the areas with high probability of encountering sites in modern contexts.
This week we will make use of a lab developed by UCSB Geography that uses a simple Rule-based predictive model based on Anabel Ford's research on the border of Belize and Guatelamala. We will take the results of this model and evaluate it in JMP.
Begin by going to the UC Santa Barbara Geography 176b lab 4 page.
Go through Exercise II of Lab 4 "Developing a Predictive Model".
Note: choose a larger sample. We want at least 100 sites in the output.
At the end of the Geography lab when you have arrived at the point with a file entitled Arch_Sites_ID_Freq return to this page
Now export this data to a comma-delimited text file.
Analyze > Fit Y by X (Use the SITE field for Y)
Under Analyze > Fit Model
See the discussion on this webpage for interpreting the results
We will further discuss analysis in class.
Introduction: Non-Cartesian Distance Units
Spatial analysis in anthropology is frequently based on Cartesian coordinates that describe space in terms of regular units on a grid coordinate system. The UTM system is a good example of a Cartesian coordinate system in metric units. GIS permits the calculation of distance costs in terms of non Cartesian units that, some have argued, are a better reflection of real perspectives and human (agent) decision-making.
Consider your commute to the University today. You may calculate it in terms of linear distance units like miles or kilometers, but you likely arrived here by any number of ways: bike, bus, walking, or private car. Furthermore, there are added complexities to some modes of travel, such as catching the proper bus or finding parking for your car.
In terms of time budgeting, you might be better off considering the commute to the university in terms of time units (hours) rather than linear distance units. In terms of a general model we could assign some costs to certain features: wading across the river perhaps takes 5x longer per meter than crossing land. Bushwacking through chaparral takes 10x longer. These types of costs can be implemented systematically in the GIS in order to convert distance measures in Cartesian space (e.g., UTM) into time. More explicit ecological approaches might convert costs into calories.
Clearly these kinds of cost-surfaces will involve many assumptions about transportation and impedance.
In this week's exercise we will perform a cost distance calculation using a customized cost function that reflects empirical data published by Tobler, 1993. Tobler's function estimates time (hours) to cross a raster cell based on slope (degrees) as calculated on a DEM.
I have converted data provided by Tobler into a structure that ArcGIS will accept as a "Vertical Factor Table" for the Path Distance function. In other words, ArcGIS makes these sorts of calculations easy by allowing you to enter customized factors to the cost function.
First, skim over the ArcGIS 9 description of the PathDistance function.
PathDistance is an elaboration of Cost Distance algoritm, as discussed in class.
This exercise begins by calculating distance and estimating time the old fashioned way, as you would do on a map with a ruler.
We will begin by creating an isotropic cost surface and then generating 15 min isolines on this surface.
Hint: Sort Ascending ArchID column, and hold down ctrl button while you click the left-most column in the table for those two rows.
Input Feature: All_ArchID
Output Dist Raster: Iso_dist1
<click folder and save it in the same DEM folder>
Leave the other fields blank.
- Before clicking OK click the Environments… button at the bottom of the window.
- Click the General Settings… tab and look at the different settings. Change the Output Extent field to read "Same as layer callalli_dem".
The output is a raster showing the metric distance from the two sites.
Let's assume for the purposes of this example that hiking speed is 4 km/hr regardless of the terrain, which works out to 1000m each 15 min.
How would you construct that equation in Raster Calculator to get the raster to display in units of minutes?
If you've done it right the output raster (Calculation) will range in value from
0 to 112.614
Hint: divide the raster by 66.66
What do you think of the results of these lines?
This photo was taken facing south towards Taukamayo which lies in the background on the right side of the creek in the center-left of the photo. Taukamayo is the western-most of the two sites you just worked with in Arcmap. You can see how steep the hills are behind the site and on the left side of the photo you can see the town of Callalli for scale. Clearly the land is very steep around the site, especially in those cliff bands on the south end of the DEM. There are also cliffbands on the north side of the river but the river terrace in the foreground where the archaeologists are standing is quite flat.
We will use ArcWorkstation GRID program for this calculation because the custom (directionally dependent) slope calculation is not supported directly in the Arcmap CostDistance or Pathdistance functions.
ArcGRID dates to before the Shapefile and Geodatabase formats were used, so we must work with GRID rasters in this program.
Convert the two points to a GRID layer
Input: All_ArchID
Field: ARCHID
Output (default)
Output raster: click folder and save into /DEM/ folder. Call it "sites"
Note: For today's exercise you can not have spaces anywhere in the path to these files.
Turn off the other layers and look closely at the two sites in the raster you just created. Do you see a square at each site?
You will see a description of the function with a breakdown of how it is issued. It looks like this with a variety of required and optional arguments.
PATHDISTANCE(<source_grid>, {cost_grid}, {surface_grid}, {horiz_factor_grid}, {horiz_factor_parm}, {vert_factor_grid}, {vert_factor_parm}, {o_backlink_grid}, {o_allocate_grid}, {max_distance}, {value_grid})
The formula used by PATHDISTANCE to calculate the total cost from cell a to cell b is:
Cost_distance = Surface_distance * Vertical_factor * (Friction(a) * Horizontal_factor(a) + Friction(b) * Horizontal_factor(b)) /2)
In ArcInfo GRID the PATHDISTANCE function can be used to conduct anisotropic distance calculations if we supply the function with a customized Vertical Factor table with the degree slope in Column 1 and the appropriate vertical factor in Column 2. The vertical factor acts as a multiplier where multiplying by a Vertical Factor of 1 has no effect. The vertical factor table we will use was generated in Excel and it reflects Tobler's Hiking Function by using the reciprocal of Tobler's function. This function was used in Excel:
TIME (HOURS) TO CROSS 1 METER or the reciprocal of Meters per Hour =0.000166666*(EXP(3.5*(ABS(TAN(RADIANS(slope_deg))+0.05))))
Thus for 0 deg slope which Tobler's function computes to 5.037km/hr, the result should be 1/5037.8 = 0.000198541
You can download the file that we will use for these values here.
Right-click ToblerAway.txt, choose"Save Link as..." and save it in the same folder as your GRID files (the reciprocal ToblerTowards.txt is used for travel in the opposite dirction, towards the source).
An abbreviated version of this ToblerAway file looks like this
Slope (deg) |
Vertical Factor |
-90 |
-1 |
-80 |
-1 |
-70 |
2.099409721 |
-50 |
0.009064613 |
-30 |
0.001055449 |
-10 |
0.00025934 |
-5 |
0.000190035 |
-4 |
0.000178706 |
-3 |
0.000168077 |
-2 |
0.000175699 |
-1 |
0.000186775 |
0 |
0.000198541 |
5 |
0.000269672 |
10 |
0.000368021 |
30 |
0.001497754 |
50 |
0.012863298 |
70 |
2.979204206 |
80 |
-1 |
90 |
-1 |
on my computer this command reads: &wo c:\gis_data\anth197\dem\
Substituting "sites" for source_grid and "callalli_dem" for DEM_grid in the above function. Make sure you type it exactly! If you have an error, use the up arrow key to re-type the previous command and the left/right arrow keys to move around. Include the # sign. # tells GRID to use the default values.
Why aren't the values larger? Recall what units the Tobler_Away function calculates?
We need to multiply this result by 60 because…. It's in hours.
Using the Info tool (the blue 'i') you can point-sample distances along the way.
The differences between the two surfaces are interesting
Look at the result. What are the high and low values showing?
Recall that you generated a Backlink grid called Aniso_bklk as part of your ArcGRID work. We will now use that to calculate least-cost paths between the two sites and other sites in the region.
Look at the results. Recall how Backlink Grids work (read the Help on COST SURFACES), do the values 0 through 8 make sense?
Choose four sites from ArchID for this calculation
588, 611, 811, 835
It should look something like this
[ARCHID] = 588 OR [ARCHID] =611 OR [ARCHID] = 811 OR [ARCHID] = 835
Input Feature: All_ArchID
Destination Field: ArchID
Input Cost Distance Raster: aniso_hours (use the hours not the minutes one)
Input Cost Backlink Raster: aniso_bklk
Note that the output is a raster GRID file. Wouldn't these be more useful as Polylines?
Input Raster: <the name of your cost paths file>
Field: Value
Geometry type: Polyline
Zoom in and explore the results with only the following layers visible:
Callalli_DEM, Hydro_lines100k, aniso_minutes isolines, and the cost-paths you just generated. Note how the paths avoid steep hills and choose to follow valleys, as you might do if you were planning to walk that distance.
The Lab 4 assignment (due next week at the beginning of class) involves conducting these analyses on your own dataset.
To quit GRID type "quit" at the prompt.
There are several types of viewshed analysis available in ArcGIS.
We will visit two of these types using the Callalli study area.
Instead of using the data from before, I've prepared new files because we need a larger DEM and a Sites shapefile with time periods assigned.
Download the View.Zip file and uncompress to a path with no spaces.
As with other data used in this class, this Sites dataset is based on my dissertation data set, but the attributes are not the actual data so don't compare this Sites file with what is in my dissertation.
Look at the coordinates. Is this in UTM or in Decimal Degrees? With Viewshed analysis linear units (like Meters) are necessary to accurately calculate distances.
Look at the Attribute table for Sites. You're familiar with most of these attributes from previous labs.
We'll use the "Period" field to differentiate sites from different time periods in this study.
The values are
Formative (2000BC - AD400)
MH Middle Horizon (AD400-AD1100), there are no MH sites in this dataset
LIP Late Intermediate Period: (AD1100-AD1476)
LH Late Horizon (AD1476-AD1532)
Please see the explanation on this webpage: ArcGIS Help - Viewshed
The optional viewshed parameters are SPOT, OFFSETA, OFFSETB, AZIMUTH1, AZIMUTH2, VERT1, and VERT2<. These will be used if they are present in the feature observer attribute table.
The only two attribute variables we are concerned with at this point are OFFSETA (height of viewer) and RADIUS2 (the maximum radius of the viewshed calculation).
The viewer's eyes are at 1.5m above the ground, and we want to know the visible area within 5 km of each site.
Precision sets the number of significant digits, while the Scale is the location of the decimal point from the right side of the value. This allows the field to store 1.5.
Since we want to limit the value of the view to 5 km, what should the value be in that goes in this field? What are the spatial units of this map?
You can use Calculate Values... command to fill the RADIUS2 field with the value 5000 all the way down.
Input: Colca_Aster
Observer Points: Sites
Use the defaults and allow the output to go into Temporary.
Look at the result. What is this showing? Is this informative?
This help you to visualize what is happening.
What are some problems with this output?
If we use this quantitatively do you see any potential issues with edge effects?
To quickly resolve problems in this dataset we have to subset the data:
Can you think of another way to solve the Edge effects problem? Two simple solutions: (1) throw out samples close to the edges, (2) try to acquire a larger DEM in the first place.
What is missing from the previous analysis was a theoretical question guiding the inquiry. Obviously, you can select a single site and run a viewshed and find out the resulting area. But how is this different from just going to the site and looking around and taking photos?
You could also select a line and find the viewshed along a road, for example. However, computers can answer interesting questions by looking at general patterns with larger datasets. This is where computers give you an advantage because you're repeating many small operations.
We want to answer questions such as:
How do viewsheds change based on the type of site that is being evaluated?
Are Late Intermediate sites more or less likely to be constructed such that they can view one another?
We can calculate separate viewsheds for each group of sites in our table by manually selecting each group from the Attribute Table and running the viewshed calc.
This is a good application for ModelBuilder, except that Batch processing is not available in ModelBuilder until version ArcGIS 9.2 and we are still using ArcGIS 9.1 in the lab.
Have a look at the output GRID Attribute table. The VALUE field indicates how many of the sites can view a given cell, and the COUNT field indicates how many cells fall into that category.
In other words, in Form5km GRID the VALUE = 3 row shows that 1716 cells are visible from 3 sites but no one cell is visible from all four Formative sites in this group.
We could calculate the number of cells in view per site, or other types of averages.
For example, in the Sites Attribute table there is a SiteType field that indicates that some of the LIP sites are Pukaras. These are hilltop fortresses that have notoriously good views. However… there's a problem. The problem lies in the bias sample we have because sites tend to aggregate around resource patches or other features.
In these areas our dataset has high "intervisibility" because it reflects aggregation within the 5km buffer.
In other words, many sites can see a particular cell in those situations where there are many sites in a small area!
We can get around this problem by calculating viewshed for random site locations as representative of "typical" views in the area. We will then compare our site views against this typical viewshed.
Another way to calculate viewsheds is more powerful yet because it allows you to compare the result against a "typical" viewshed for the area.
In version 9.1 of ArcGIS we still need to use the command line GRID program to accomplish this task. Thus we have to convert out files to coverage or GRID to view them in the older GRID program.
I created a Shapefile with 500 random points was created in Hawth's Tools (available from Arcscripts and SpatialEcology.org, as you learned previously in this class) and there is a minimum of 50m spacing between each point.
Download Random.zip and unzip into the same folder as the Colca DEM folder.
Look at the Attribute table and note that the OffsetA and Radius2 fields were added, as above.
We must convert the Rand500 point file to Coverage in order to use it with GRID.
Choose
Toolbox > Conversion > To Coverage > Feature Class to Coverage….
Save the coverage as "rand500_cov" into the /view/dem_colca/ folder.
That way everything is in one folder.
When it completes return to Arcmap and add the resulting RandVisib grid.
Type 'q' to quit GRID.
Now use the Spatial Analyst > Zonal Analysis ...
Tool to answer the question: Sites from which of the three time periods have the highest mean visibility with respect to the RandVisib grid?