Until last Friday, it had been almost a full decade since I last launched a copy of Microsoft Access. I spent a fair amount of my undergraduate career getting familiar with it and the concepts of relational databases, and SQL, but once I started working with full-featured database servers like Oracle and Microsoft SQL Server, Access just looked like a cute little toy piece of software that I had simply outgrown.
But thanks to the need for some quick and dirty data analysis at work last week, I have a new respect for Access 2007 and one of its coolest features: The Pivot Chart.
The Background
One of our systems at work runs a bunch of SQL queries on a schedule – either every x minutes or at a specific time of the day – and then gives us the results in an easy to use format. We’ve got well over 100 different queries setup this way – some of them run as frequently as every four minutes, and some that are complex enough that they more than a few seconds to finish.
If one of the scheduled queries takes longer than 30 seconds to finish, the application server considers it a database timeout, proceeds to terminate the process, and then emails me the portion of its log file containing the ID number of the query and date and time of when the error occured.
The Problem
I get a varying number of error messages in my inbox throughout the day, and eventually I was able to make the following simple observations:
- I saw error messages for one particular query, #110, a lot more than for the others.
- Errors often appeared in clusters. So, if I got an error for one query, it was not surprising to have it followed up by a series of other errors in the following minutes.
- There were hardly any errors during the evenings or weekends.
- Most of them seemed to happen in the late morning and mid-afternoon.
So, after collecting hundreds of error messages in my email for the last few months, I finally opened up Access and created what is probably my most simple database table ever. It has just two columns: One for the error ID number, and the other for the date/time stamp of when the error happened. I then took some time to dump in everything up until the beginning of April, giving me a good sample to work with.
First Up: The PivotTable
One of Microsoft Excel’s best features is the PivotTable – the surprisingly powerful tool that allows you to view and summarize data in new ways. Microsoft later added PivotTable support to MS Access, including the 2003 and 2007 versions.
After getting my data in, I switched my table into PivotTable view and started rotating my columns between various positions. The end result looked a little like this:
Now, I don’t know about you, but this doesn’t mean a whole lot to me. What it actually does, is give you the total count for each of the queries we’ve gotten errors for, but it’s pretty hard to tell by looking at this data.
Enter the PivotChart
Thankfully, Microsoft has a companion tool to the PivotTable: The PivotChart. The PivotChart, simply, lets you visualize a PivotTable. Because we humans are visual creatures, this gives us a better chance of understanding the data and or spotting trends or patterns.
For example, this is what my PivotTable from above looks likes like when I show it as a bar chart:
That’s a little more meaningful, right? In fact, the result is an almost classic power law curve. While this chart is interesting, and confirmed my theory that query #110 was the biggest problem child, it didn’t really tell me anything new.
Digging Deeper with Date/Time
In the case of my little project here, it turns out that having a Date/Time column provides a ton more opportunities to slice and dice my data. Because there are so many different ways we think about time (days, hours, seconds, minutes, weeks, months, etc.), being able to visual my errors against different representations of it creates some interesting possibilities.
Error Trends Over Time
The next chart I created gave me a general snapshot of the total number of error message received for each day, starting on April 1st. I also added a trend line so I could know – not guess – if we’re generally trending up or down in the number of errors we get.
This one confirmed my thought that a lot less errors are generated on the weekends, but also gave me a new insight: Mondays (and to a lesser extent, Fridays) tend to see more errors than other days.
Charting Error Trends Throughout the Day
Next, I drilled down further to test my assumptions about the times of the day where we see the most errors:
I thought this one was pretty interesting. You can see that we have high rates of errors in the late morning and after lunch – the noon hour is suspiciously lower. Combine this knowledge with low after-hours and weekend error rates, and it builds a pretty strong case that the number is correlated with high loads on our database server as our employees and our customers use our applications.
Charting Micro Trends
Finally, I charted the distribution of errors by minute throughout the hour. As soon as you see this one, you can immediately spot some obvious patterns:
Based on the timing of the errors as shown here, I went back and took a look at the schedules our queries are set to run on. Query #110, the one that has almost as many errors as all the others combined is set to run every 3 minutes – 3 times more often than any other query. Almost all of the other queries that have frequent errors are set to run every 10 minutes. So, if these get kicked off on the hour, then on the 10th, 20th, 30th minute, etc., and then wait a full 30 seconds before timing out, it would explain the patterns we see above: high numbers of errors on the first and second minute of every 10 minute cycle.
I took my last chart one step further and broke out the errors for each minute by query type, represented by a different color. I don’t have a legend for you, but as a reference, query #110 is represented by the dark purple color:
This view of the data lines up quite nicely with our query schedule, providing some supporting evidence.
That’s Great. Now What?
By experimenting with our data using the PivotChart feature, we were able to discover some trends and patterns that weren’t obvious just by looking at the stream of text. But, the real question is, what do we do now, armed with new knowledge?
I can think of a wide range of options, ordered here by most easy to implement to most difficult to implement:
- Reschedule Queries: I can experiment by changing the timing of some of the frequently run queries so that they fire during odd minutes, hopefully leveling out the peaks and valleys of the “errors grouped by minute” chart.
- Increase the Database Timeout Period: Currently, a query is killed if it runs for more than 30 seconds. We can try increasing this value by small amounts to see how it impacts the number of errors we see.
- Optimize Problem Queries: I can load up each of the queries, examine its query execution plan, and see if there are opportunities for further performance optimization.
- Reexamine Our Task Scheduler: I’m not intimately familiar with the software that runs our tasks, as I didn’t program it, but our developers might be able to revisit the code that schedules and runs these queries to see if there are ways to run them more efficiently.
- Increase Resources on the Database Server: As a last resort, I might be able to use these charts as evidence that we need to take steps to increase memory and/or CPU power on our SQL Server in order to handle our peak load.
Wrapping it Up
So, I hope you see the power available to you in the copy of Access you probably already have installed on your PC. PivotTables and PivotCharts do take some time to wrap your head around, as they’re not exactly intuitive, they might be worth a look if you have a sudden need (or desire) for some quick and dirty data analysis.
One reply on “Guerrilla Data Analysis using Pivot Charts in Microsoft Access 2007”
wow . . . I just finished some classes in Access and there was not much interest in PivotCharts/Tables.