Week 2 - Data organization and management

Exercises:

Classroom Topics

This week's class will focus on data organization and management for GIS applications to anthropological research. Specifically, we will spend time discussing the importance of relational databases and means of organizing and querying data organized into related tables.

I will also describe approaches to indexing data for anthropological fieldwork. The ID number system at various spatial scales and methods for linking tables between ID number series.

 

In Class Lab Exercises, week 2

In lab this week we will focus on a-spatial data management techniques in ArcGIS. Joining and relating tables is are powerful tools, but it also creates a structure that can make queries and representation more difficult.

 

Download Callalli Geodatabase.

Look the data in Arcmap.

  • There are a few major file groups: Ceramics, Lithics, Sites
  • What is the function of these fields: ArchID, CatID, LocusID, SiteID

Display. Turn off ArchID_centroids.

  • show sites as tan, ceramics as red, lithics as lt green. Orange circles = sherds, green triangles = points.
  • hydrology = blue. Lg river, small river.

Discuss file formats: coverages, shapefiles, GDB. Raster / GRID.

 

Problem 1:

Ceramics Lab results. Cleaning up the database logic.

Working at the artifact level of analysis the ID number system is problematic because you cannot refer to a single artifact from an index field.

What are some solutions?

1. Open the Ceramics_Lab2 attribute table.

  • Click Add field… in the menu below.
  • Create a Long Integer field and call it “ArchCat”. Make a second one called “ArchCat_temp”
  • Do you see both ArchCat fields in the table? Scroll to the right.
  • Make sure no rows are selected before the next step
  • Right click the “ArchCat_temp” field title column
  • Choose “Calculate Values…” in the menu.
  • Multiply the contents of ArchID by 1000 into this field.
  • Right click the “ArchCat” field title column
  • Choose “Calculate Values…” in the menu.
  • Add the contents of “CatID” to “ArchCat_temp” into this field.

Do you now have a six digit ID#? That’s your new unique ID number that allows you to refer to individual artifacts.

2. Cut off the “diameter” measure for more efficient design

This is somewhat extreme "database normalization", but you can see the principal behind it.

  • Right click the Diameter column and sort Descending.
  • Select in blue the rows with values > 0
  • Click Export… and save to a new table called Rim_Diameters
  • Look at the new table. Where is your ArchCat field? It’s back on the right side. The goal here is to have a very efficient database structure with little redundancy and minimal empty cells.
  • Open the toolbox (red toolbox in the top toolbar) and go to Data Management Tools > Fields > Delete Fields…
  • Choose the RimDiameters table.
  • Select All, then uncheck “Diameters” and “ArchCat” (at the end of the list).
  • Go to the Ceramics_Lab2 table and delete the Diameters field.
  • Right click the Ceramics_Lab2 table and click “Joins and Relates… > Join” and join the table to Ceramics table to the RimDiameters table. Think about the fields that you will use to join. Is this a One-One or a One-Many link?

Other selection methods. Try selecting data in the Ceram_p (not the lab file) file using Select by Attributes. The text you enter in the box is a modified SQL query. Try to select all of the Bowls from the LIP period and look at the selection on the map.

 

Problem 2:

Show relationship between size the archaeological site (m2) and proportion of obsidian artifacts in the lab2 analysis.

The basic problem here is that there are MANY fields in each site. In order to represent the data on the many site we need to collapse the contents of the Many table into units so the One table can symbolize it.

Steps:

1. Problem. You’ve explored the data in Site_A and in Lithics_Lab1 and you recognize that you have to collapse the values in Lithics_Lab1 before you can show them on the map. You can collapse table values with a tool called Pivot Tables and then by Summarizing on the SiteID column.

First, make sure there are no row selected in the "Lithics_Lab1" table.

Open the Pivot Tables function in the toolbox under Data Management > Tables > Pivot Tables

2. Pivoting the data table. In the Pivot Tables box select “Lithics_Lab1” as your pivot table. This is the table that has the quantity of data we want to reduce. We are effectively swapping the rows and columns in the original table.

  • Click in the Input fields box and read the Help on the right (click Show Help button if it’s not on).
  • “Define records to be included in the pivot table”. Think about this… we can only link these lab results to space if we can tie them to spatial units. Space, in this exercise, is defined by SITE areas. Therefore we ONLY want to deal with data in terms of the site that falls into. If you have records logged by other criteria other (too many boxes checked) you would have too many rows in your pivot table and it will not mesh with the Site_A feature. Therefore select only SITEID.
  • Click in the Pivot Field box and read the help. “Generate names of the new fields”. Fields are columns in tables therefore you are choosing the field that will become your column headings. We are trying to distinguish Obsidian from non-obsidian therefore pick MATERIAL.
  • Click in the Value field and read the help. “Populate the new fields”. In other words this is the value be summed for each site per material type. Chose Peso_Orig (spanish for Weight, measured in grams).
  • The result should include the following: for each site with obsidian or non-obsidian we will add report the weight of each material in separate columns that can easily be summed.

You can use the default filename. It should end up in your geodatabase. Have a look at the contents of the output table. Note that there many SITEID values that are NULL. That is because there were artifacts collected outside of sites in some cases. Note, also that the MATERIAL TYPE values have become column headings (Obs, NotObs), but there are still duplicate sites in rows as you can see in the SITEID column.

  • We need to Summarize by the SITEID column in order to collapse these into a single line per Site. right-click the title and go to Summarize on SITEID.
  • Next, check sum in "Obsidian" and check sum in "Not Obsidian". Name the table "Obsidian_by_Site".

Look at the resulting table. Is there only one SITEID value per row? Why are there so many with a <NULL> SITEID value? Where did that data come from?

3. Representing the data. Now that the results for each site is found on a single line you can Join it to the Site_A feature.

  • Right-Click the Sites_A feature and go down to Joins and Relates… and choose Join…
  • Make sure the top field is for “Attributes from a Table”. Then jump down to the #2 question in the middle and choose your output Pivot Table from Step 2. Next, select the appropriate fields in fields #1 and #3 on which to base a join. You should be able to join ARCHID in the Site table with SITEID in the pivot table.
  • Click “Advanced…” and read the different descriptions. This distinction is also known as the “Outer Join” vs “Inner Join” relationship. Choose “matching records” (i.e., inner join, or exclusive rather than inclusive join).

Look at the resulting table. Does it contain all the information you need collapsed onto rows? Look at the number of records. There were 88 in the original Site_A table, now there are fewer (or there will be if you update by scrolling down the table). Why is that? The original question asked about the relationship between site size and presence of obsidian. How are you going to measure site size?

4. Symbolizing the data. You can show these data using a number of tools. For example, for a simple map you could divide the weight of obsidian by the weight of non-obsidian per site and symbolize by the ratio of obsidian over non-obsidian.

Here, we will do something slightly more complex and use a pie chart to show the relationship with respect to size.

  • Go to Sites > Properties > Symbology…
  • Choose Show > Charts… > Pie
  • Choose the bottom two fields on the Field Selection and click the right arrow.
  • Change the colors of the two fields you selected to two contrasting fields.
  • Click OK and look at the result.
  • What about the size of the site? Click Site_A Properties > Symbolize … again
  • Click the Size… button at the bottom center. Choose “Vary size using a Field…” and choose “Shape_Area” as the Size.. field and “Log” as the Normalize field.

Zoom in and out and notice how the symbols change as you pan around. Zoom way in. It looks better that way, doesn't it?

Experiment with other ways of symbolizing these data.

Save this project, we will work on it more later.

Homework

As mentioned in class, next week we will focus on methods of bringing data into GIS. Which project area do you plan to work in? We will begin constructing acquiring data (imagery and topographic data) next week so please decide on a project area by then.

You will need Latitude/Longitude coordinates in WGS1984 for the four corners of a bounding box around your study area. An easy way to get these values in in GoogleEarth. You'll want to pick a region about the size of a County in the US for this assignment.

Write down these four corner coordinate values. We'll use them in class next week.