
photo credit: therefromhere
I am a big fan of Google Spreadsheets, they give you most of the features of a desktop “office” based spreadsheet but your data is stored online, no more need to carry your data on a USB stick.
I’m also a huge fan of databases, SQL is a fantastic tool for querying data.
Now you can combine the best of both worlds: store your data in a spreadsheet and then summarise the data using SQL.
I have uploaded a sample spreadsheet that shows this in action here.
As I’m training for the Edinburgh Marathon (feel free to donate here : http://www.justgiving.com/26miles4chas) I wanted a way to log my data and also summarise it the way that I wanted.
Data Entry
All data should be entered in the sheet entitled “Logger”, simply copy the last row with data, paste it on the row below and then enter your values. The cells in grey contain formula so should not be changed.
All data should be entered in KM, I also have a miles to KM converter on the page if you prefer imperial measures. Once you have logged your runs, you will want to see how they are summarised over the day, week, month or year.
Using SQL
When I was trying to get this to work I could not find an example that worked that’s why I’m writing this.
The basic syntax of a SQL statement is:
SELECT “the columns you want to see”
FROM “wherever the data is stored”
WHERE “you filter the rows of data to show”
GROUP BY “if you want to summarise the data”
ORDER BY “if you want your results sorted”
There are a couple of differences when you are using the Google Query syntax:
You don’t need the “FROM” clause in the query it’s self, it’s passed into the function as a parameter.
There’s an additional “LABEL” clause that allows you to change the column headings in your output.
Using The Query Function
A simple example would be:
=Query(Logger!A1:P999, “select C, sum(E) group by C order by C label sum(E) ‘Distance KM’”)
The first parameter defines the data that you will be querying (the from clause).
The second parameter defines the query, in this case, showing the week and the total KM run in that week.
Note you must always refer to the column letters, even if you have put titles in row 1.
These query statements can get quite long and hard to read. What I like to do is to store the queries in a separate workbook “Queries” and then refer to them:
=QUERY(Logger!A1:P999,Queries!B2)
Feel free to take a copy of my spreadsheet and put it to good use!
Follow me on Twitter 









This is really interesting and has me thinking about a need it can fill for me. Any idea if one can do CROSSTAB queries in this way? For example, if I have 6 teams in a hockey league, and for each game there could be three outcomes: win, lose, draw. Can I set up a query that will list the six teams down the left column, list the three outcomes across the top and count the number of W, L, D for each team and fill in the table?
Thanks, again for this!
Show me a sample of your source data and I’ll prototype this for you.
That’s a very nice offer. A little background: I’m the statistician for a local wiffle ball league, and I’m trying to get the whole process onto google docs and off MS Access. So here’s a link to the current working Google Spreadsheet:
https://spreadsheets.google.com/pub?key=0Apgq3e_CQsPCdHVsenhLMGJTaFdaUldQMldMQlBXUFE&hl=en&output=html
The ABs tab is the dummy data. Queries Tab has the three queries I have set up and they work, including the Batting query which uses the Pivot command. The three middle tabs are the query results. So I think I can get what I need so far, which is a tabulation of hits, outs, etc, grouped by batter.
What I need to do now is pull this data into another Sheet or another Google Spreadsheet, so I can do post processing to calculate batting averages, slugging averages, etc, and then sort the data to show the batting leaders.
Any tips, thoughts, advice would be very kind. Thanks for the offer.
Being based in Scotland I’m not too familiar with the stats or even the rules of Wiffle Ball.
What I would say is, any sheet in a spreadsheet can be used as the source of a query:
Sheet1 contains your raw data.
Sheet2 contains the results of a query on Sheet1
Sheet3 contains the results of a query on Sheet2
etc
Does that help?
Ross -
This is an excellent tutorial. Sharing the spreadsheet enabled your short tutorial to have so much more impact than just describing how to do the query. This is an outstanding example of the power of Google docs and learning.
Bless you. 10 minutes after reading this post, I accomplished a QUERY in a Google Docs spreadsheet that I had unsuccessfully spent 2 hours trying to do using their online documentation. It took me a bit of trial and error to discover the ability to create multi-line cells in a Google Docs spreadsheet using (Apple key – return) at the end of each line, but doing so ends all of the nonsense with punctuation marks to parse multi-part queries. Thanks again for posting your explanation and a public copy of your spreadsheet.
thanks very much for this – I’ve been looking everywhere for examples of a query that hides the label in a query result. Hopefully this will help other people on the same quest
My problem query was
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A=’”&B4&”‘ or A=’”&B4&”/’) “)
which was creating “sum” and then the sum of C beneath it.
I tried sticking various flavours of “label” near the SUM, with no luck.
What I missed was that the label bit needs to be right at the end
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A=’”&B4&”‘ or A=’”&B4&”/’) label sum(C) ””)
thankyou!