Using the Denormalized Design
Data Warehouse tables can be joined when reporting, but frequently joins are unnecessary because the information sought for reporting is already present in the main table being used for the report.
This is best illustrated with an example.
In TeamConnect, every Invoice record must have a Vendor, and a Vendor's information is contained in a Contact record. In schema terms, that means that a row in table T_INVOICE contains a value in the VENDOR_ID column, and you would use that value as a foreign key to join to the PRIMARY_KEY column in T_CONTACT. The T_CONTACT row would contain much of the information about the vendor, but several important pieces of information, such as addresses and phone numbers, would be found in other tables that are also related to T_CONTACT. If your report needed to include the vendor address, you would need to join column T_CONTACT.DEFAULT_ADDRESS_ID to column J_CONT_ADDRESS.PRIMARY_KEY, then retrieve address information from J_CONT_ADDRESS.
Contrast this with Data Warehouse. If you are reporting on table WH_INVOICE, that table already contains several pieces of information about the vendor:
- VENDOR_PRIMARY_KEY
- VENDOR_CONTACT_TYPE
- VENDOR_NAME
- VENDOR_FIRST_NAME
- VENDOR_MIDDLE_NAME
If your report requires only vendor name, then you do not need any join at all. If you need more information about the vendor, such as vendor address, you would join WH_INVOICE.VENDOR_PRIMARY_KEY to WH_CONTACT.PRIMARY_KEY. You would then have access to not just the DEFAULT_ADDRESS_ID, but also to these columns:
- ADDRESS_TYPE
- STREET
- CITY
- STATE
- COUNTY
- COUNTRY
- ZIP_CODE
So to report on invoices with just vendor name requires one join in TeamConnect and none in Data Warehouse. To report on invoices with vendor name and address requires two joins in TeamConnect and one in Data Warehouse.
But the real benefit comes when reports increase in complexity. If this invoice report also displayed vendor phone and vendor email information, that would require two additional joins in TeamConnect, for a total of four. In Data Warehouse, that information is already present in WH_CONTACT, so no new joins are necessary - only one join is required.
When might you still need to do joins in Data Warehouse? When you are seeking uncommon information that is not part of the denormalized information found in most tables. For example, if you want a vendor address other than the default address, or the name of the user who approved adding the vendor originally, you will need to join to other tables.