
photo credit: therefromhere
I am a big fan of Google Spreadsheets, they give you most of the features of a desktop “office” based spreadsheet but your data is stored online, no more need to carry your data on a USB stick.
I’m also a huge fan of databases, SQL is a fantastic tool for querying data.
Now you can combine the best of both worlds: store your data in a spreadsheet and then summarise the data using SQL.
I have uploaded a sample spreadsheet that shows this in action here.
As I’m training for the Edinburgh Marathon (feel free to donate here : http://www.justgiving.com/26miles4chas) I wanted a way to log my data and also summarise it the way that I wanted.
Data Entry
All data should be entered in the sheet entitled “Logger”, simply copy the last row with data, paste it on the row below and then enter your values. The cells in grey contain formula so should not be changed.
All data should be entered in KM, I also have a miles to KM converter on the page if you prefer imperial measures. Once you have logged your runs, you will want to see how they are summarised over the day, week, month or year.
Using SQL
When I was trying to get this to work I could not find an example that worked that’s why I’m writing this.
The basic syntax of a SQL statement is:
SELECT “the columns you want to see”
FROM “wherever the data is stored”
WHERE “you filter the rows of data to show”
GROUP BY “if you want to summarise the data”
ORDER BY “if you want your results sorted”
There are a couple of differences when you are using the Google Query syntax:
You don’t need the “FROM” clause in the query it’s self, it’s passed into the function as a parameter.
There’s an additional “LABEL” clause that allows you to change the column headings in your output.
Using The Query Function
A simple example would be:
=Query(Logger!A1:P999, “select C, sum(E) group by C order by C label sum(E) ‘Distance KM’”)
The first parameter defines the data that you will be querying (the from clause).
The second parameter defines the query, in this case, showing the week and the total KM run in that week.
Note you must always refer to the column letters, even if you have put titles in row 1.
These query statements can get quite long and hard to read. What I like to do is to store the queries in a separate workbook “Queries” and then refer to them:
=QUERY(Logger!A1:P999,Queries!B2)
Feel free to take a copy of my spreadsheet and put it to good use!
This is really interesting and has me thinking about a need it can fill for me. Any idea if one can do CROSSTAB queries in this way? For example, if I have 6 teams in a hockey league, and for each game there could be three outcomes: win, lose, draw. Can I set up a query that will list the six teams down the left column, list the three outcomes across the top and count the number of W, L, D for each team and fill in the table?
Thanks, again for this!
Show me a sample of your source data and I’ll prototype this for you.
That’s a very nice offer. A little background: I’m the statistician for a local wiffle ball league, and I’m trying to get the whole process onto google docs and off MS Access. So here’s a link to the current working Google Spreadsheet:
https://spreadsheets.google.com/pub?key=0Apgq3e_CQsPCdHVsenhLMGJTaFdaUldQMldMQlBXUFE&hl=en&output=html
The ABs tab is the dummy data. Queries Tab has the three queries I have set up and they work, including the Batting query which uses the Pivot command. The three middle tabs are the query results. So I think I can get what I need so far, which is a tabulation of hits, outs, etc, grouped by batter.
What I need to do now is pull this data into another Sheet or another Google Spreadsheet, so I can do post processing to calculate batting averages, slugging averages, etc, and then sort the data to show the batting leaders.
Any tips, thoughts, advice would be very kind. Thanks for the offer.
Being based in Scotland I’m not too familiar with the stats or even the rules of Wiffle Ball.
What I would say is, any sheet in a spreadsheet can be used as the source of a query:
Sheet1 contains your raw data.
Sheet2 contains the results of a query on Sheet1
Sheet3 contains the results of a query on Sheet2
etc
Does that help?
Ross -
This is an excellent tutorial. Sharing the spreadsheet enabled your short tutorial to have so much more impact than just describing how to do the query. This is an outstanding example of the power of Google docs and learning.
Bless you. 10 minutes after reading this post, I accomplished a QUERY in a Google Docs spreadsheet that I had unsuccessfully spent 2 hours trying to do using their online documentation. It took me a bit of trial and error to discover the ability to create multi-line cells in a Google Docs spreadsheet using (Apple key – return) at the end of each line, but doing so ends all of the nonsense with punctuation marks to parse multi-part queries. Thanks again for posting your explanation and a public copy of your spreadsheet.
thanks very much for this – I’ve been looking everywhere for examples of a query that hides the label in a query result. Hopefully this will help other people on the same quest
My problem query was
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A=’”&B4&”‘ or A=’”&B4&”/’) “)
which was creating “sum” and then the sum of C beneath it.
I tried sticking various flavours of “label” near the SUM, with no luck.
What I missed was that the label bit needs to be right at the end
=query(‘spreadsheet’!A$1:C$1501,”select sum(C) where B=’New Visitors’ and (A=’”&B4&”‘ or A=’”&B4&”/’) label sum(C) ””)
thankyou!
What about this problem in reverse? …using Google Spreadsheets (pivot tables) to do analysis and dynamic charting of SQL data. Any advice on a direction?
When you say “SQL Data” do you mean data held in a relational DB?
If so – I’m not aware of any way of querying data that is not already “in” Google.
Looks like there is a service now https://developers.google.com/apps-script/service_jdbc
What if I want to import Google Form to the Database instead. Would that work, too?
I’m not “really” a developer now so I’m not the best person to answer that question, maybe Matt has an opinion?
Based on the API calls in the documentation in his comment I would suggest it’s possible but, as to “how” to code it ……
Do you think it would be possible to do the opposite as well?
For example, I’d like to pump an SQL query results into a Google Spreadsheet so that I can use Google Charts and have them update automatically for reporting.
Does anyone know if this exists already?
If you have a look at the documentation that Matt links to above that shows you the API calls to access access a JDBC database. I don’t have a database at the moment that i can test this with unfortunately.
Have you been able to use the substring function in Google SQL queries? If so, can you provide an example?
I am trying to select only certain text from a column of my data.
Not able to test it just now but rather than using a SQL substring; add a new column and do the substring on the sheet it’s self.
thank you for posting an actual worksheet to work with.
i’ve been having issues referencing a ‘where’ as a dynamic cell.
i’m trying to create a large db for all my service invoice data and be able to call it based on client and date.
any help would be most appreciated.
Here is what I did, I have the query defined in one cell and then use a function to modify it.
”
select *
where Col1 > date ’2008-01-01′
order by Col1 desc
limit **Limit**
”
Then
”
=substitute(B2, “**Limit**”, Dates!$B$4)
”
Where B2 is the cell with the query and Dates!$B$4 is my dynamic value
In my query I then refer to the result of the substitution.
This allows me to edit the contents of B4 and my results redraw !
Please watch out for those darned “smart-quotes” re-type them in your editor !
Thank you. You just saved me hours of wasted time.