jump to navigation

An Alternative way of writing Effdt clauses October 7, 2007

Posted by Duncan in Oracle, PeopleSoft, SQL.
trackback

I’ve covered the RANK analytic function before but neglected to mention a useful application for it, and that is for rewriting Effdt and Effseq clauses more efficiently.

The usual way to return the job row (this could be any table, I’m just picking job as a common example) that’s effective for each Employee at a specific date is probably something similar to this:

SELECT emplid, empl_rcd, effdt, effseq
  FROM ps_job j
 WHERE j.effdt =
    (SELECT MAX (j1.effdt)
       FROM ps_job j1
      WHERE j1.emplid = j.emplid
        AND j1.empl_rcd = j.empl_rcd
        AND j1.effdt <= '10/AUG/2007')
   AND j.effseq =
    (SELECT MAX (j2.effseq)
       FROM ps_job j2
      WHERE j2.emplid = j.emplid
        AND j2.empl_rcd = j.empl_rcd
        AND j2.effdt = j.effdt)

Execution Cost = 3010, time = 1 sec.

SELECT emplid, empl_rcd, effdt, effseq
  FROM (
SELECT emplid, empl_rcd, effdt, effseq,
  RANK () OVER
       (PARTITION BY emplid, empl_rcd
            ORDER BY effdt DESC
                   , effseq DESC) rnk
           FROM ps_job
          WHERE effdt <= '10/AUG/2007') j
 WHERE rnk = 1

Execution Cost = 665, time = 78 msecs.

Not only is the code shorter and easier, the database execution cost is less than a quarter of the ‘standard’ method and the execution time is less than a tenth.

Explain plans and costs are obviously variable depending upon your environment, the tables and indexes and the volume of data; however it should be clear from the example above that there are sometimes quicker ways than the ‘standard’ method.

Advertisements

Comments

1. Noons - October 8, 2007

Excellent! Thanks for the great hint on analytics.
I always suspected they could b effectively used for Peoplesoft and their quaint timeseries format, this just confirms it.

Will definitely give it a go.

2. fambaus - May 6, 2008

Excellent tip. What about in a multi join query?

3. PeopleSoft Tipster - May 9, 2008

There are no issues with using this method in a multi-join query. You’d just list the next table after the alias on the inline view like this:

SELECT J.emplid, J.empl_rcd, J.effdt, J.effseq, P.NAME
FROM (
SELECT emplid, empl_rcd, effdt, effseq,
RANK () OVER
(PARTITION BY emplid, empl_rcd
ORDER BY effdt DESC
, effseq DESC) rnk
FROM ps_job
WHERE effdt <= ‘10/AUG/2007′) j,
PS_PERSONAL_DATA P
WHERE j.rnk = 1
and p.emplid = j.emplid

(This SQL hasn’t been tested, so I may have typo’ed or missed something obvious, but hopefully the method is clear)

4. Matthew Jacobson - June 10, 2008

This is not complete information. The ‘faster execution’ only applies when you have a relatively low number of job rows in your PS_JOB table. The more rows you have the faster method A will be compared to Method B.

With 320,000 rows in PS_JOB –

cost for A is 24,020
cost for B is 40,912

I’d suggest sticking with the obvious approach rather than attempting to use fancy analytical clauses.

5. PeopleSoft Tipster - June 23, 2008

This perhaps isn’t the place to get involved in a ‘do analytics work’ debate. They give greater database performance and developer productivity, and the gain is – in general – larger the greater the volume of data analysed.

Saying that, there will obviously be exceptions, and as I said in the article title, this is an alternative – at no point did I say ‘everyone should always do it this way from now forwards’. Before deciding what’s right for you investigate all methods and chose the fastest. Most of the time, that’ll be the analytic method due to reduced passes over the data through the use of partition result sets, and hence leading to fewer queries/subqueries (cutting out self-joins particularly), but not always.

I’ve used the functions on significantly larger data volumes than 320,000 rows and they’ve saved me much time and effort.


Sorry comments are closed for this entry

%d bloggers like this: