Dimensionally Modelling A Recursive Hierarchy


Creative Commons License photo credit: gadl

I was recently asked how to include a recursive hierarchy into a dimensional model.

What do I mean by a recursive hierarchy? This is when an entity (a table) relates to it’s self. Take the following example from an Oracle database:

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 ‘SMITH’ ‘CLERK’ 7902 ’17-DEC-1980′ 800 NULL 20
7499 ‘ALLEN’ ‘SALESMAN’ 7698 ’20-FEB-1981′ 1600 300 30
7521 ‘WARD’ ‘SALESMAN’ 7698 ’22-FEB-1981′ 1250 500 30
7566 ‘JONES’ ‘MANAGER’ 7839 ‘2-APR-1981’ 2975 NULL 20
7654 ‘MARTIN’ ‘SALESMAN’ 7698 ’28-SEP-1981′ 1250 1400 30
7698 ‘BLAKE’ ‘MANAGER’ 7839 ‘1-MAY-1981’ 2850 NULL 30
7782 ‘CLARK’ ‘MANAGER’ 7839 ‘9-JUN-1981’ 2450 NULL 10
7788 ‘SCOTT’ ‘ANALYST’ 7566 ’09-DEC-1982′ 3000 NULL 20
7839 ‘KING’ ‘PRESIDENT’ NULL ’17-NOV-1981′ 5000 NULL 10
7844 ‘TURNER’ ‘SALESMAN’ 7698 ‘8-SEP-1981’ 1500 0 30
7876 ‘ADAMS’ ‘CLERK’ 7788 ’12-JAN-1983′ 1100 NULL 20
7900 ‘JAMES’ ‘CLERK’ 7698 ‘3-DEC-1981’ 950 NULL 30
7902 ‘FORD’ ‘ANALYST’ 7566 ‘3-DEC-1981’ 3000 NULL 20
7934 ‘MILLER’ ‘CLERK’ 7782 ’23-JAN-1982′ 1300 NULL 10

From this we can see that there is a recursive hierarchy within this table between the empno column and the mgr column, the mgr column shows the employee that is the manager for each row.

For example:

7934-Miller is managed by 7782-Clark who in turn is managed by 7839-King who is the president.
ie:

7839-King 7566-Jones 7788-Scott 7876-Adams
. . 7902-Ford 7369-Smith
. 7698-Blake 7499-Allen
7521-Ward
7654-Martin
7844-Turner
7900-James
.
. 7782-Clark 7934-Miller .

From this source data I’m going to create an employee dimension:

EMPNO ENAME JOB MGR HIREDATE
7369 ‘SMITH’ ‘CLERK’ 7902 ’17-DEC-1980′
7499 ‘ALLEN’ ‘SALESMAN’ 7698 ’20-FEB-1981′
7521 ‘WARD’ ‘SALESMAN’ 7698 ’22-FEB-1981′
7566 ‘JONES’ ‘MANAGER’ 7839 ‘2-APR-1981’
7654 ‘MARTIN’ ‘SALESMAN’ 7698 ’28-SEP-1981′
7698 ‘BLAKE’ ‘MANAGER’ 7839 ‘1-MAY-1981’
7782 ‘CLARK’ ‘MANAGER’ 7839 ‘9-JUN-1981’
7788 ‘SCOTT’ ‘ANALYST’ 7566 ’09-DEC-1982′
7839 ‘KING’ ‘PRESIDENT’ NULL ’17-NOV-1981′
7844 ‘TURNER’ ‘SALESMAN’ 7698 ‘8-SEP-1981’
7876 ‘ADAMS’ ‘CLERK’ 7788 ’12-JAN-1983′
7900 ‘JAMES’ ‘CLERK’ 7698 ‘3-DEC-1981’
7902 ‘FORD’ ‘ANALYST’ 7566 ‘3-DEC-1981’
7934 ‘MILLER’ ‘CLERK’ 7782 ’23-JAN-1982′

And an Employee Fact:

EMPNO SAL COMM
7369 800 NULL
7499 1600 300
7521 1250 500
7566 2975 NULL
7654 1250 1400
7698 2850 NULL
7782 2450 NULL
7788 3000 NULL
7839 5000 NULL
7844 1500 0
7876 1100 NULL
7900 950 NULL
7902 3000 NULL
7934 1300 NULL

From this we can see each employees salary and commission by joining on the employee number.

What we cannot see is what 7782-Clarks salary bill is, ie, his own salary and all of his subordinates. This is what we need to model.

This is resolved by builting an intermediate “bridging” table. This bridging table is used to hold every valid permutation of the relationships. You must remember to include the reflective relationship: each rows relationship with it’s self, to get each employees “own” salary and the transitive relationship, each rows relationship with its sibling records. This would give us:

Parent Child Levels
7839 7839 0
7839 7566 1
7839 7788 2
7839 7876 3
7839 7902 2
7839 7369 3
7839 7698 1
7839 7499 2
7839 7521 2
7839 7654 2
7839 7844 2
7839 7900 2
7839 7782 1
7839 7934 2
7566 7566 0
7566 7788 1
7566 7876 2
7566 7902 1
7566 7369 2
7788 7788 0
7788 7876 1
7876 7876 0
7902 7902 0
7902 7369 1
7698 7698 0
7698 7499 1
7698 7521 1
7698 7654 1
7698 7844 1
7698 7900 1
7499 7499 0
7521 7521 0
7654 7654 0
7844 7844 0
7900 7900 0
7782 7782 0
7782 7934 1
7934 7934 0

This maps out every permutation of parent to child and the number of levels between them. The parent key will link to the dimension table and the child key will link to the fact table.

You can also add a “BottomLevel” flag to indicate the bottom level nodes if this is a reporting requirement.

The level field can be used, for example, 0 will give a direct mapping, ie show employees their own records. You could use level = 1 to see one level down, ie staff who report directly to the parent. You could ignore the level field and see all subordinate records.

All of the samples have been directly typed rather than being copied from the Oracle emp table so there is the chance of some transcription errors. If you find any please let me know and I’ll correct them.

4 thoughts on “Dimensionally Modelling A Recursive Hierarchy”

  1. Jings, this is a bit of a deep topic for a blog Ross !! It’s amazing what a bit of googling for Dimensional Modelling can turn up !!

    Take care, AB =:-)

  2. I know, a colleague asked me how this could be done and this was the easiest way to get it to him.
    If it drives more traffic to my site, all the better !

  3. There is a correction required in the emp number for clark in the below like from 7762 to 7782
    “For example:

    7934-Miller is managed by 7762[7782]-Clark who in turn is managed by 7839-King who is the president.
    ie:”

Leave a Reply