So here’s a question that cropped up recently: In an SQL Select statement, can you group by a calculated field (scalar function)?
The short answer is: No.
The longer answer is: Of course you can.
It did occur to me, last week, to simply post the script that I used to achieve this but there are a number of other things going on in there and I suspect that the unannotated code would be more confusing than clarifying. It is a useful, technique, however and one that I can see myself using again, so here is a simplified version of the problem along with the solution (file1 and field names have been changed for the sake of clarity).
So here’s the question: Can you provide a report showing the latest invoice number and invoice date for each customer (some selection criteria may apply)?
Since we have a sales file for which the
sh_invoice_number field is updated when the transaction is invoiced, the first pass of this report is pretty simple:
select sh_sales_customer, max(sh_invoice_number), max(sh_invoice_date) from sales_history group by sh_sales_customer
Now for the wrinkle.
sh_sales_customer field is the delivery point for the sales transaction. In many case, this is the same as the invoice account but there are cases where one invoice account encompasses several delivery points. Finance folk don’t care where goods are shipped, they just want to know where to send the invoice.
We have a file,
account_numbers which maps the delivery point to the invoice account, if necessary. So the script needs to be changed so that it checks this file and uses either the retrieved value or the
sales_customer, depending on the results of the lookup. The With clause is your friend.
You can’t Group By the calculated field, but you can use the With clause to factor the
sh_sales_customer calculation into a subquery. And this is what we end up with:
with customer_accounts as (select dp_delivery_point as ca_delivery_point, case when an_invoice_account is not null then an_invoice_account else dp_delivery_point end as ca_customer from delivery_points left outer join account_numbers on an_delivery_point = dp_delivery_point) select ca_customer, max(sh_invoice_number), max(sh_invoice_date) from sales_history join customer_accounts on ca_delivery_point = sh_sales_customer group by ca_customer
delivery_points file contains all delivery points along with shipping information.
1 In case any of you youngsters are feeling confused by the terminology: a file (in this context) is a table, a record is a row and a field is a column. I am aware that I have the rather bad habit of using these terms interchangably, but I’m too lazy to go back and reword this post.
The field and file names should be reasonably self explanatory, but if you are struggling with what belongs where, here are some definitions:
TABLE_NAME COLUMN_NAME DATA_TYPE LENGTH NUMERIC_SCALE SALES_HISTORY SH_SALES_CUSTOMER CHAR 8 - SALES_HISTORY SH_INVOICE_NUMBER INTEGER 4 0 SALES_HISTORY SH_INVOICE_DATE DATE 4 - DELIVERY_POINTS DP_DELIVERY_POINT CHAR 8 - ACCOUNT_NUMBERS AN_INVOICE_ACCOUNT CHAR 8 - ACCOUNT_NUMBERS AN_DELIVERY_POINT CHAR 8 -
The real files contain much more information, of course, but I have stripped these out in order to keep things reasonably simple.