Lag and Lead September 21, 2007
Posted by Duncan in Oracle, PeopleSoft, SQL.trackback
I’ve previously looked at the RANK and DENSE_RANK analytic functions, so thought I’d expand a little by mentioning LAG (and LEAD).
Analytic Functions are extensions to Oracle SQL that enable common tasks to be accomplished faster than using large blocks of SQL or PL/SQL. They are apparently ‘under review by the ANSI SQL committee for inclusion in the SQL specification’, although whether this will ever happen or not is anyones guess.
The LAG function solves (at least for those of us on Oracle) the oft-encountered issue of retrieving data from a prior row.
The standard way for dealing with this is either to have multiple ‘self-joins’ to the same table, or to just use a blank space and populate those fields needing prior row data in a subsequent piece of SQL.
The example below shows how to code SQL to resolve an issue most of us have faced at least once in our careers … you’re pulling back all the employees in JOB who’ve changed departments, but we also want to know which department they’ve come from.
i.e. | EMPLID | EFFDT | NEW_DEPT | OLD_DEPT |
SELECT *
FROM (SELECT j.emplid, j.effdt, j.deptid, LAG (j.deptid, 1, NULL) OVER (PARTITION BY j.emplid, j.empl_rcd ORDER BY j.effdt, j.effseq) old_deptid
FROM ps_job j) j1
WHERE j1.deptid <> j1.old_deptid
The above SQL has an inline view with some columns from the job record, plus the LAG function which provides us with the Deptid from the prior row. The main select just chooses those rows where the current Deptid is different from the previous Deptid.
The LAG function works as follows:
LAG (j.deptid, 1, NULL) OVER (PARTITION BY j.emplid, j.empl_rcd ORDER BY j.effdt, j.effseq)
The first field (j.deptid) is the field that you want the previous value of. The 1 is how many rows to step back, and the NULL is what value is to be used when there is no prior row. The ‘Partition By’ fields are what divides the data into ‘result sets’, in this case we only want it to look back through the job rows for one person at a time, if there are no more rows for that emplid and we encounter data for the next emplid then stop and treat them as a new result set, don’t treat it as a Deptid change. The ‘order by’ fields just tell the function how to sort the rows before searching through them.
The LEAD function only differs from LAG in that it searches for the next row, rather than the previous one.
Quick to write and quick to run …