Using the SQL with clause to group by calculated values

So here’s a question that cropped up recently: In an SQL Select statement, can you group by a calculated field (scalar function)?

The short answer is: No.

The longer answer is: Of course you can.

It did occur to me, last week, to simply post the script that I used to achieve this but there are a number of other things going on in there and I suspect that the unannotated code would be more confusing than clarifying. It is a useful, technique, however and one that I can see myself using again, so here is a simplified version of the problem along with the solution (file1 and field names have been changed for the sake of clarity).

So here’s the question: Can you provide a report showing the latest invoice number and invoice date for each customer (some selection criteria may apply)?

Since we have a sales file for which the sh_invoice_number field is updated when the transaction is invoiced, the first pass of this report is pretty simple:

select sh_sales_customer, max(sh_invoice_number), max(sh_invoice_date)
from sales_history
group by sh_sales_customer

Now for the wrinkle.

The sh_sales_customer field is the delivery point for the sales transaction. In many case, this is the same as the invoice account but there are cases where one invoice account encompasses several delivery points. Finance folk don’t care where goods are shipped, they just want to know where to send the invoice.

We have a file, account_numbers which maps the delivery point to the invoice account, if necessary. So the script needs to be changed so that it checks this file and uses either the retrieved value or the sales_customer, depending on the results of the lookup. The With clause is your friend.

You can’t Group By the calculated field, but you can use the With clause to factor the sh_sales_customer calculation into a subquery. And this is what we end up with:

with
 customer_accounts as 
   (select dp_delivery_point as ca_delivery_point,
           case 
              when an_invoice_account is not null then an_invoice_account 
              else dp_delivery_point
              end as ca_customer
    from delivery_points
    left outer join account_numbers on an_delivery_point = dp_delivery_point)
select ca_customer, max(sh_invoice_number), max(sh_invoice_date)
from sales_history
join customer_accounts on ca_delivery_point = sh_sales_customer
group by ca_customer

The delivery_points file contains all delivery points along with shipping information.

Footnotes

1 In case any of you youngsters are feeling confused by the terminology: a file (in this context) is a table, a record is a row and a field is a column. I am aware that I have the rather bad habit of using these terms interchangably, but I’m too lazy to go back and reword this post.

Addenda

The field and file names should be reasonably self explanatory, but if you are struggling with what belongs where, here are some definitions:

TABLE_NAME            COLUMN_NAME           DATA_TYPE         LENGTH   NUMERIC_SCALE 
SALES_HISTORY         SH_SALES_CUSTOMER     CHAR                   8                -
SALES_HISTORY         SH_INVOICE_NUMBER     INTEGER                4               0 
SALES_HISTORY         SH_INVOICE_DATE       DATE                   4                -

DELIVERY_POINTS       DP_DELIVERY_POINT     CHAR                   8                -

ACCOUNT_NUMBERS       AN_INVOICE_ACCOUNT    CHAR                   8                -
ACCOUNT_NUMBERS       AN_DELIVERY_POINT     CHAR                   8                -

The real files contain much more information, of course, but I have stripped these out in order to keep things reasonably simple.