SQL Aggregate Functions
In today's podcast episode (which can be played by clicking the PLAY button above) you'll be learning all about the aggregate functions that exist in SQL.
What the heck is an aggregate function? Well that's what I'm going to try and teach you today, and I promise, it's not a difficult concept to grasp. Just think of an aggregate function as a method that you're calling that will process data in your database and return a value. Obviously the returned value will depend on which of the aggregate functions you choose to use.
So that begs an obvious question, what are the aggregate functions that we can use in SQL? I'm glad you asked, here's the ones that I use all the time in MySQL:
Okay, great! So now we know what the names of these functions are, now let's see some examples of them in use!
Well, before we jump into what all these functions do, we need to have a table to test all this stuff on right? So let's create a table that just holds some simple data points:
create table data_points ( data_points_id int(11) auto_increment primary key, data_point int ); insert into data_points (data_point) value (1); insert into data_points (data_point) value (2); insert into data_points (data_point) value (3); insert into data_points (data_point) value (4); insert into data_points (data_point) value (5); insert into data_points (data_point) value (6); insert into data_points (data_point) value (7); insert into data_points (data_point) value (8);
Sweet! Now we've got a table that will hold some simple data points (in the form of
ints). Now let's talk about our aggregate functions.
What this particular function does is to identify the maximum value in a give set of values (in this case our set of values is whatever we're choosing to select from in our database table).
So, given that we have our
data_points table, let's try and see what the
MAX value is in that particular table:
select max(data_point) from data_points;
When we run this code, we get one value returned, the value
8. This makes sense, as the number 8 is the largest value we have stored in the table. Pretty straight forward right?
So that should go without saying that we can find the minimum value just as easily by running this code:
select min(data_point) from data_points;
And as you would guess, we get the value
1 returned when we run the script above.
Piece of cake right? Well, let's get a little bit more complex. What if our table had more than just numeric values? What if our table had
varchar values in a particular column? Let's test that scenario by adding a
varchar_data_point column to our table. We'll just write a script to
alter the existing table to add an additional column:
alter table data_points add column varchar_data_point varchar(20);
This is a new piece of code, we haven't seen how to edit a table just yet. All we've seen is how to create and delete tables. Well it's pretty straight forward to edit a table, you just tell it what table you want to edit, then tell it what you want to add or remove from the table.
But let's get back on track and focus on putting some
varchar values in the table and run our
MAX aggregate function on it.
insert into data_points (varchar_data_point) value ('Trevor'); insert into data_points (varchar_data_point) value ('Bob'); insert into data_points (varchar_data_point) value ('Zoolander'); insert into data_points (varchar_data_point) value ('Eric'); insert into data_points (varchar_data_point) value ('Christina'); insert into data_points (varchar_data_point) value ('Vanessa');
Okay, so now we've got some real data in the
varchar_data_point column. So let's take a look at what happens when we try to find the max value:
select max(varchar_data_point) from data_points;
When we run the above script, the result is
'Zoolander'. You see, SQL realizes that this particular column is of type
varchar so it changes the algorithm for finding the maximum value by first sorting the values in alphabetical order and picking the last one in the list.
So then we can probably figure out what we'll get when we look for the
MIN value right?
select min(varchar_data_point) from data_points;
This query returns
'Bob', which makes sense given the fact that it'll sort the column alphabetically.
Alright, so we've got a good understanding of how to use SQL Max and SQL Min, now let's look at the other functions.
SQL Sum and SQL Avg
These two functions are also very easy to understand. Performing a SQL Sum just means it's going to add up all the values in the column that you give to it, likewise the SQL Avg function will add up all the values and then divide by the number of values added.
So let's take a look at the syntax for running these functions:
select sum(data_point) from data_points;
Unsurprisingly, the result of this is
36, which is just the addition of each of the data points that we inserted at the beginning of this post (the numbers 1 through 8).
Let's take a look at finding the average of all these numbers:
select avg(data_point) from data_points;
When you run the average, all it does is divide the sum by the number of values, which is 36 divided by 8. So the resulting value is
4.5000, piece of cake!
The only special thing to note here is that in MySQL, it only keeps 4 digits of accuracy in the fraction part of the number. If you want to change the accuracy of the returned value, you can tweak it with the
cast() function. Here's an article on it if you're interested: www.w3resource.com/sql/aggregate-functions/avg-decimal-places-using-cast-within-avg.php
Also, one last thing to note is that if you don't specify a column with numeric values, you'll likely just get the value of
0 returned to you… as there were no numbers to sum up.
This brings us to the final aggregate function that we'll talk about today. The SQL Count function is quite useful, as it can quickly count the number of rows that exist in a table (either specifically by column name, or even as generic as ALL of the columns in the table).
Here's an example of counting the entries of JUST one specific column in our table:
select count(data_point) from data_points;
This query will return the value of
8. Now if you had a keen eye, you may have guessed that it would return a value of
14 because we actually have 14 rows in total in our
data_points table. Eight of those rows have values in the
data_point column and six have values in the
varchar_data_point column (for a total of 14). But, I assure you that it will return
8 as the count.
So why is this?
It's because we specifically asked it to count the
data_point column by specifying
select count(data_point)! Now if we changed it to count the
varchar_data_point values, what would it return?
select count(varchar_data_point) from data_points;
As expected, it returns
6, since there are only six varchar data points in our table.
Great, so now how do we count ALL the rows in the table? We use an asterisk in place of our column name like so:
select count(*) from data_points;
This query will count and return the TOTAL number of rows in the table, which in this case is
Now it's your turn to play around with aggregate functions
Be sure to use all the scripts I've included throughout this post and create your own table with data. Mess around with the data and play with the aggregate functions to see if you can get it to do anything unexpected. If you find any unexpected results, let me know in the comments section below and we'll see if we can learn something from the experience!