Using SQL to update one file from another

With a recent interface change, I was asked to go back and fix all of the affected historical data so that it matched the new requirements. Updating a field in one file with a value from a different file is something I have done several times in the past (far too many ah-hoc queries have been launched, if truth be told) and, while you do need to take a bit of care, the approach is pretty simple.

So here is an example (some names have been changed to protect the proprietary):

update target_file upd
set target_field = (select source_field from source_file
                    where source_key_1 = substr(upd.target_key, 1, 16)
                    and digits(source_key_2) = substr(upd.target_key, 17, 5) )
where exists (select source_field from source_file
              where source_key_1 = substr(upd.target_key, 1, 16)
              and digits(source_key_2) = substr(upd.target_key, 17, 5) )

The script is pretty simple. For each record in target_file for which an associated record can be found in source_file, populate target_field with the value in source_field. Obviously, the select clauses will need to reflect the relevant keys of whatever files you happen to be using.

Inevitably, there is a gotcha: for each record in target_file that you want to update, there must be exactly one record returned by the subquery. Handling this can be split into two parts.

The first part is handled by the where exists clause which ensures that the script will only attempt to update records in target_file if there is a record in source_file with which to update it. This ensures you don’t get caught out by subqueries that return zero records.

The second part involves that the subquery returns no more than one record for each record in target_file. This, unfortunately, cannot be solved generically – you just need to be a bit careful to ensure that the subquery selection is returning unique records. If in doubt, a variation on the below SQL can be used to validate.

select source_key_1, digits(source_key_2), count(*)
from source_file
group by source_key_1, digits(source_key_2)
having count(*) > 1

If you can’t find a unique selection criteria, the distinct clause may help and, if all else fails, try arbitarily using either max() or min().

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s