meta data for this page
Aggregate functions
The aggregate functions – COUNT, SUM, AVG, MAX, MIN and LIST – don't handle NULL in the same way as ordinary functions and operators. Instead of returning NULL as soon as a NULL operand is encountered, they only take non-NULL fields into consideration while computing the outcome. That is, if you have this table:
MyTable
ID | Name | Amount |
---|---|---|
1 | John | 37 |
2 | Jack | NULL |
3 | Jim | 5 |
4 | Joe | 12 |
5 | Josh | NULL |
…the statement select sum(Amount) from MyTable returns 54, which is 37 + 5 + 12. Had all five fields been summed, the result would have been NULL. For AVG, the non-NULL fields are summed and the sum divided by the number of non-NULL fields.
There is one exception to this rule: COUNT(*) returns the count of all rows, even rows whose fields are all NULL. But COUNT(FieldName) behaves like the other aggregate functions in that it only counts rows where the specified field is not NULL.
Another thing worth knowing is that COUNT(*) and COUNT(FieldName) never return NULL: if there are no rows in the set, both functions return 0. COUNT(FieldName) also returns 0 if all FieldName fields in the set are NULL. The other aggregate functions return NULL in such cases. Be warned that SUM even returns NULL if used on an empty set, which is contrary to common logic (if there are no rows, the average, maximum and minimum are undefined, but the sum is known to be zero).
Now let's put all that knowledge in a table for your easy reference:
Table 7. Aggregate function results with different column states
Function | Results | ||
---|---|---|---|
Empty set | All-NULL set or column | Other sets or columns | |
COUNT(*) | 0 | Total number of rows. | Total number of rows. |
COUNT(Field) | 0 | 0 | Number of rows where Field is not NULL. |
MAX, MIN | NULL | NULL | Max or min value found in the column. |
SUM | NULL | NULL | Sum of non-NULL values in the column. |
AVG | NULL | NULL | Average of non-NULL values in the column. This equals SUM(Field) / COUNT(Field).a |
LISTb | NULL | NULL | Comma-separated string concatenation of non-NULL values in the column. |
a. If Field is of an integer type, AVG is always rounded towards 0. For instance, 6 non-null INT records with a sum of -11 yield an average of -1, not -2.
b. LIST was added in Firebird 2.1
The GROUP BY clause
A GROUP BY clause doesn't change the aggregate function logic described above, except that it is now applied to each group individually rather than to the result set as a whole. Suppose you have a table Employee, with fields Dept and Salary which both allow NULLs, and you run this query:
SELECT Dept, SUM(Salary) FROM Employee GROUP BY Dept
The result may look like this (the row where Dept is <null> may be at the top or bottom, depending on your Firebird version):
DEPT SUM ====== ===================== <null> 219465.19 000 266643.00 100 155262.50 110 130442.81 115 13480000.00 120 <null> 121 110000.00 123 390500.00
First notice that the people whose department is unknown (NULL) are grouped together, although you can't say that they have the same value in the Dept field. But the alternative would have been to give each of those records a “group” of their own. Not only would this possibly add a huge number of lines to the output, but it would also defeat the purpose of grouping: those lines wouldn't be aggregates, but simple SELECT Dept, Salary rows. So it makes sense to group the NULL departments by their state and the rest by their value.
Anyway, the Dept field is not what interests us most. What does the aggregate SUM column tell us? That all salaries are non-NULL, except in department 120? No. All we can say is that in every department except 120, there is at least one employee with a known salary in the database. Each department may contain NULL salaries; in department 120 all the salaries are NULL.
You can find out more by throwing in one or more COUNT() columns. For instance, if you want to know the number of NULL salaries in each group, add a column COUNT(*) – COUNT(Salary).
Counting frequencies
A GROUP BY clause can be used to report the frequencies with which values occur in a table. In that case you use the same field name several times in the query statement. Let's say you have a table TT with a column A whose contents are { 3, 8, NULL, 6, 8, -1, NULL, 3, 1 }. To get a frequencies report, you could use:
SELECT A, COUNT(A) FROM TT GROUP BY A
which would give you this result:
A COUNT ============ ============ -1 1 1 1 3 2 6 1 8 2 <null> 0
Oops – something went wrong with the NULL count, but what? Remember that COUNT(FieldName) skips all NULL fields, so with COUNT(A) the count of the <null> group can only ever be 0. Reformulate your query like this:
SELECT A, COUNT(*) FROM TT GROUP BY A
and the correct value will be returned (in casu 2).
The HAVING clause
HAVING clauses can place extra restrictions on the output rows of an aggregate query – just like WHERE clauses do in record-by-record queries. A HAVING clause can impose conditions on any output column or combination of columns, aggregate or not.
As far as NULL is concerned, the following two facts are worth knowing (and hardly surprising, I would guess):
- Rows for which the HAVING condition evaluates to NULL won't be included in the result set. (“Only true is good enough.”)
- HAVING <col> IS [NOT] NULL is a legal and often useful condition, whether <col> is aggregate or not. (But if <col> is non-aggregate, you may save the engine some work by changing HAVING to WHERE and placing the condition before the GROUP BY clause. This goes for any condition on non-aggregate columns.)
For instance, adding the following clause to the example query from the GROUP BY paragraph:
...HAVING Dept IS NOT NULL
will prevent the first row from being output, whereas this one:
...HAVING SUM(Salary) IS NOT NULL
suppresses the sixth row (the one with Dept = 120).