Version 4.1.6.2
Pivot Tables are extremely useful, and have been credited as one of the reasons Excel replaced Lotus 1-2-3 as the spreadsheet of choice. But they are also very confusing to most spreadsheet users. The reason is that they are very flexible and powerful, which means that it is easy to get confused. The term pivot is actually a clue to how you can use them. Imagine a table of summary data where you can rotate the table so that rows become columns, depending on your needs. That is what the pivot part refers to.
The other thing you want to pay attention to here is that the power of pivot tables comes from using raw data, not summarized data. As an example, the biostatistics data we used in the previous tutorials was raw data. You had an individual row for each observation, and multiple measurements were made for each individual, covering a variety of variables. But for this analysis I want to use another data set, which I found at the University of South Florida. It contains fictional data of sales at a company, and this is the canonical example of using pivot tables. Note that each row of the data set is for an individual order.
The rules for using pivot tables are not too bad, but pay attention:
- You cannot have any empty rows or columns. Sometimes people insert blanks for formatting purposes, but if you wish to use pivot tables they must be removed. The reason for this is that Calc searches in all four directions from the cell you selected to locate your data, and if it encounters a completely blank row or column it treats that as a signal of the end of the data.
- Always select only one cell when starting the pivot table. The program will automatically infer the whole range. If you select more than one cell it will assume you are putting in a list, and the sorting will be mixed up.
- The data must have simple linear structure, i.e., the normal form. For instance means you cannot have data divided into different columns that is essentially the same data. In our data set, we would not want to have sales for the Northeast region in one column, sales for the South in another, etc.
Also, while it is possible to use external data sources for this kind of analysis, I will stick to doing this with data that is already in a spreadsheet.
Creating the Pivot Table
Click on a cell somewhere within your data, then go to Data–>Pivot Table–>Create. You should get a window like this:
Note that along the right side of this window all of your fields are listed. If for some reason you do not get all of your fields, highlight all of the columns you want before you create the Pivot Table, and that should take care of it. This window is called the DataPilot dialog
Table Fields
To set your different field in the Pivot Table you just drag-and-drop each field to the appropriate area in the window.
- Page Fields – This is a place to potentially limit the data to one value in one of your columns. For instance, if we look at the Region field we may want to look individually at each region, so putting this field in Page Fields will let us do this.
- Data Fields – This area must contain at least one variable. Fields in this area are aggregated, and the obvious candidate for that in our sample data set is the Total field that records total sales on each order. The idea is that you are going to add up the sales for some set of values to be defined in the Row and Column fields.
- Column Fields – Whatever you put here will be a column in the resulting Pivot Table. As a general rule, if you have two possible fields to use, make one a column field and one a row field.
- Row fields – And these will be rows in the resulting Pivot Table
Here is an example. It was created with Region as the Page Field, Total as the Data Field, and Rep and Item as the row fields. No column fields were used in this example.
With a little practice you can see how to use these fields. The Page field allows you to select one, several, or all categories of the selected field. Since looking at regions is a pretty reasonable thing to do, it makes sense to use this. Total is the only field that you would want to aggregate. By this I mean that you are adding together all of the individual orders that fit the classification. As an example, looking at Representative Andrews we see a total of $298.65 in Pencil sales, but looking at the raw data we see that this came from three separate orders. I don’t see any other field here where you want to do that kind of calculation. Note also that you could have gotten this answer laboriously by going through the orders, sorting them, and adding up the individual orders to get the totals, but the Pivot Table does this almost effortlessly.
The remaining question is about Row and/or Column fields. The fields that are suitable for this kind of thing are in one sense similar to the Page Field options. You will note that an old friend of ours has returned, and that is the distinction between qualitative and quantitative variables. Data Fields will pretty much need to be quantitative variables, and we only have two here, Total and Units. Either one can be used in some types of analysis, but that depends on the rest of your fields as to which makes sense. Given the Page Field and Row Fields we already chose, Units would make no sense because the numbers are not comparable. How do you add the number of pencils to the number of binders and get a meaningful total? You can’t. But that is partly because we started out to compare sales by region anyway. But Page Fields, Row Fields, and Column Fields will all be qualitative. We have three good candidates here, Region, Rep, and Item, and we used all of them in our analysis. But they could be interchanged depending on the analysis we want to focus on. As to whether they should Rows or Columns, that is simply a matter of presentation. For example, here is basically the same Pivot Table except that Item is now a column field:
This is probably better than the first Pivot Table example for two reason. First, it is definitely easier to read. But the second reason is that it allows you to get sub-totals by both of your Fields. You can see total sales for each Rep, and total sales for each Item, by putting one in each area of the layout. That is a good thing to remember.
The final option here, creating a Pivot Table with both Rep and Item as Column Fields, produces a Pivot Table that is very wide and just about unreadable.
Changing the Layout
Suppose you put fields in the wrong area, and did not realize it until after you create the Pivot Table? That is an easy fix. Note that in the top left cell (A1 in my sample spreadsheet) it has the word “Filter”. If you right-click on that cell a menu of options opens, and the very first item in this menu is “Edit Layout…”. If you click on this the DataPilot window will open again, and you change things simply by dragging fields to where you want them. You can drag a field form one layout area to another, you can add a field you didn’t have before, or you can remove a field by dragging it outside of the layout area. So you have a lot of flexibility here.
The sample spreadsheet I used for this tutorial can be downloaded from here.
Listen to the audio version of this post on Hacker Public Radio!