Week 4 - Queries, relates, and buffers

Classroom Topics

Further discussion of database normalization and relates, overlays, buffers, and other means of working with GIS data in a relational database structure.

In class presentations.

In-Class Labwork, Week 4

Manipulating data through queries, overlays, and other methods become more complex with relational databases, but understanding how to work with these data is fundamental to organizing a project in digital structure.

1. Linking related tables through primary key with a spatial index

We learned that anthropological data often involves working with data in related tables that are linked (Joined or Related) through unique ID numbers. The theory behind design of relational databases is known as Database Normalization.

When detailed data is held in non-spatial tables (from lab analysis, for example) it can be difficult to connect the data in these attributes tables back into physical space for analysis and mapping. Recall that we used a Primary Key (ArchID) for indexing all spatial proveniences and then a CatID key for individual artifacts within those spatial areas.

The ArchID Centroid solution

One expedient solution for assigning space to non-spatial attributes is to have a single point type feature that contains a single point for every ArchID number regardless of whether it is the location of a locus of ceramics inside of a site (a polygon inside a site polygon), an isolated projectile point (a geographical point), or a single core at a site (point inside a polygon). All ArchID numbers are reconciled in a single point feature class and therefore space can be assigned to every item in some form and all these non-comparable attribute tables can be effectively “zipped together” into space.

What about the artifacts collected from line and polygon features? How do you mix spatial location for points and polygons into a single file? For the purposes of organization, you can turn lines and polygons into centroids. Centroids are defined in a number of ways, but think of them as the center of gravity for a line or polygon. In the Callalli database the All_ArchID shapefile is a collection of point locations and centroids for all of the ArchID numbers from the Callalli area. Keep in mind that a number of items with different CatID (from lab work) can be assigned to a single ArchID (hence, a single point), and that there are a number of limitations to using centroids for analysis.

This method is a solution for dealing with the “Many to Many relationships” situation.

 

Queries and related tables in Callalli data

 

If you don't already have it, download, unzip, and load the data from the Callalli geodatabase.

Part A. Query and display relates

Quick exploratory task: Make a map showing the locations of all obsidian bifaces and projectile points because we want to know if they are close to rivers.

One possible solution to this task is as follows

  • Open Attribute Table for non-spatial table “Lithics_Lab2”

Look at the data organization. As this is a non-spatial table it may not appear in the Table of Contents unless you have the “Source” tab selected at the bottom of the list.

The two major problems are:

  1. These are non-spatial, so how do you show them on the map?
  2. The attributes are spread across two fields so how can you easily select them for display?

Take a look at the Attribute table for the ArchID layer. This is derivative feature containing centroids for lines and polygons, and point locations for all instances of each ArchID field. Look in the table and notice that it’s a Point type geometry file with a single Point record for each ArchID. The “Filetype” field tells you what file type the original ArchID number belongs to so we can relink these centroids to their original polygons if needed down the road. Also notice the PhotoURL field contains links to site photos. We’ll explore that in the future.

We need to link the table “Lithics_Lab2” with All_ArchID points… but there are many Lab results for each ArchID so we can not use the JOIN function because this is a one to many (1:M) situation.

Instead we will use the RELATE command

  • Right click All_ArchID and choose Joins and Relates… > Relates…
  • Read the description and then use these values in the Relate box:
  1. ARCHID
  2. Lithics_Lab2
  3. ARCHID
  4. type in “ArchID-LithLab2”

 

Now we need a selection in the Lithics_Lab2 table.

 

  • Open the Attribute table for Lithics_Lab2.
  • Right-click Lit_Mat field and choose Sort Ascending
  • Scroll down until you get to the Obsidian artifacts and click and drag down in the left-most column to select these records. Now all Obsidian artifacts should be highlighted. The next problem is that we only want some of the obsidian artifacts.

 

Another way to make or modify a selection is as follows

  • click Options.. at the bottom and choose “Select by Attributes”. We can use this box to do a modified SQL query of the table. In the Method field choose “Select from current selection”.
  • Double-click [Form] and notice how it pops up in the text box below. We’re going to build up an SQL query using graphical tools. Click “=” then Click “Get Unique Values”. From this list double-click “Biface Broken”. Click “OR” then click [Form] and ‘=’ again, then choose “Biface”, then repeat these steps adding “Proj Point” and “Proj Point Broken” to your selection criteria.
  • Click the Verify button. Did you pass the test?

Your line should look like this

[Form] = 'Biface Broken' OR [Form] = 'Biface' OR [Form] = 'Proj Point Broken' OR [Form] = 'Proj Point'

  • Click “Apply”

You’ve got a subset of the table. Notice the number of artifacts that are selected. Now in order to see these geographically we need these features selected in the All_ArchID point layer.

  • Click on the Options… button and go to “Related Tables…” and choose AllArchID-LithicsLab2.

The All_ArchID attribute table will appear with only those records selected that are contain obsidian bifaces and proj points. How many are there? Many fewer, what happened to the number of records highlighted in the previous selection?

Close this attribute table and you will see on the map that those point locations are highlighted. Are they close to rivers?

In this part we learned that a single point feature (All_ArchID) can be used to give geographical position to non-spatial tabular data provided that they have some reference to spatial data (such as a unique ARCHID number). Through the use of Relates and linked through a common reference table the All_ArchID table links different data sets using space as a common referent. You can quickly create selections in one dataset and move that selection into other datasets. Think of the All_ArchID table as the spinal column that links different elements of the skeleton.

Note that we used two methods of querying: the first was to just Sort Ascending for a particular field and select from that group. The second was to use text-based queries to construct an explicit query. For simple selections the first method is easier, but for complex queries, non-contiguous groups, or to store the query for later reference the latter is better.

Keep in mind that All_ArchID is only centroids so they don’t have the full information of the original points, lines, and polygons. However, this single spine can be used to jump between related selections.

For example, if we Relate All_ArchID to the Sites_A field we could find out the size of the sites that have obsidian bifaces. Are the sites with obsidian predominantly large or small sites?

 

Lets try that. Without losing your previous selection,

  • right-click All_ArchID and choose Joins and Relates…> Relates… and for relates use
  1. SITEID
  2. Site_A
  3. ARCHID
  4. call this “All_ArchID-Site_A”
  • Now right-click “AllArchID” and go to the Attribute Table.
  • At the bottom go to Options… > Related Tables… note that you now have several Relates to choose from. Choose the one to Site_A.
  • Now look at the Site_A attribute table. How many sites are selected? Notice how our selection is getting smaller as data is aggregated into these inclusive groups such as Sites.
  • Click the Selected table at the bottom of the screen. You can see that the Shape_Area field shows the m2 of these sites. A frequency distribution on site sizes can be found by right clicking the Shape_Area field and choosing “Statistics…”.

These techniques, combined with Pivot Tables and Joins that you learned in labs during weeks 1 and 2, give you the basic tools to explore and manipulate data in databases organized through relationships. One of the advantages of using the ESRI Geodatabase instead of the older Shapefile format is that you can build permanent Relationships directly into the geodatabase so you don’t have to recreate them for each Arcmap MXD project.

There are still some cumbersome elements to related tables organization, for example symbolizing and labeling through relates is still not easily available in Arcmap. However with clean data organization you can move through joins and relates to produce special joins for specific map projects that require this kind of symbology.

 

 

PART II – Buffers

Buffers are extremely versatile in GIS. They are commonly used for expanding or contracting a polygon by a constant value, however they can also be used other useful functions. For example, you can use buffers convert points and lines into polygon areas (corridors). You can use buffers to evaluate spatial proximity from one theme, like a road, on another theme, like an archaeological site.

Exploratory question: what is the relationship between the presence of obsidian in sites and the proximity to rivers in the Callalli area?

We will use the “Obsidian by Site” table you created in Lab 2.

Recall that this table aggregated lithic locus data to by site in order to compare the density of obsidian and non-obsidian by site.

There are numerous ways to calculate distances in Arcmap. We will explore two methods today.

1. Multiple Buffers

  • Buffer the rivers by opening to the red Toolbox icon choosing Analysis Tools > Proximity > Multiple Ring Buffers…
  • Input Features: choose rivers in “Hydro_lines100k”
  • Output Feature Class: hydrobuff (specific your output location by clicking the Folder icon)
  • For distances use 25, 50, 250, 500, 5000 (click the Plus between each one).
  • Click Ok.

The Obsidian_By_Site table is a-spatial so you have to Join that table to a spatial table. Let’s use All_ArchID because we know that every value in the Obsidian_By_Site table will have an entry in the All_ArchID point file (can you think of any problems with using points instead of site Polygons for this analysis?)

  • Right-click “All_ArchID” and choose Joins and Relates…. > Joins
  1. SITEID
  2. Obsidian_By_Site table
  3. SITEID
  • Then click Advanced… and choose “Keep only matching records…” (why are we doing that step?... read the text. This is an exclusive, inner join)

Now we need to know where these features fall with respect to the hydro buffers layer underneath it. Use the toolbox Analysis Tools > Overlay > Identity… to compile all the attributes from the two layers into a point layer.

Use All_ArchID as the Input Feature and HydroBuff as the Identity feature. Click OK.

Look in the attribute table of All_ArchID_Identity” and notice you have a DISTANCE field (distance from river, by buffer class) and Site_Sum_Obsidian. These results could be explored numerically (is there a meaningful relationship?), or they could be symbolized together.

  • For example, right click “All_ArchID_Identity” and choose Symbology.
  • Show: Quantities, Graduated Colors.
  • Value: Site_Sum_Not_Obsidian
  • Normalization: Distance

Quantitative Proximity Distance

What if you want to know the actual distance in meters between a point and the river, and not the grouped buffer class of these distances? For this we need to use the Raster data model (continuous values).

 

  • In the Toolbox choose
  • Spatial Analyst Tools > Distance > Euclidean Distance…
  • Input Raster/Feature: Hydro_lines100k
  • Cellsize: 90
  • Click OK

What happened? Zoom to Layer of the output layer (EucDist_hydr2).

Look carefully at the classes produced. Those are JUST for visualization.

  • Under Spatial Analyst toolbar choose “Zonal Statistics…”
  • Zone Dataset: All_ArchID_identity
  • Zone Field: All_ArchID_ARCHID
  • Value Raster: EucDist_hydr2

Have a careful look at the a-spatial output table (by default its called zstat) and see if you understand the results.

Note that the Value field is ArchID. That means you can use a Join to connect All_ArchID to the zstat table and connect the output table to spatial locations through that join.

Again, these can be symbolized or explored numerically.