Data Warehouse Initial Loading and Refreshing

Initial Loading

If you have an existing Data Warehouse 4.x database, you can skip these Initial Loading steps and proceed to Refreshing the Data Warehouse.

The very first time this job runs it may require a long time to complete, because it will be loading a large volume of data from the source database to the target database. This first run should be done at a time when TeamConnect is stopped. Subsequent runs of this job will take much less time and it is not necessary to stop TeamConnect during the job. If stopping TeamConnect for a long time causes operational issues, see Initial Loading Times for a possible solution.

To perform the initial loading

Navigate to the datawarehouse directory.
Run one of the following files:
- For Windows, run the following batch file:
  TeamConnect_Warehouse_initial.bat
- For Linux, run the following shell file:
  ./TeamConnect_Warehouse_initial.sh
It is possible that initial loading may fail with a Java out-of-memory error message.
If so, for Windows, change the following line in the kitchen.bat file:
set OPT = Xmx1024m
For Linux, change this string in kitchen.sh:
javamaxmem=1024
Change 1024 to a higher number such as 1536 (megabytes). Your machine must have sufficient RAM to reserve this much memory for the job. Then rerun the initial loading job.

Note: If you inadvertently run this job against an already-populated database, the custom field tables will be truncated and reloaded. This will increase the run time but will have no other harmful effects.

How the ETL Job Works

Each transformation in the Data Warehouse ETL job utilizes the WH_LAST_REFRESH table to determine which records need to be part of the refresh. The transform reads the LAST_REFRESH_DATE from the WH_LAST_REFRESH table and compares this value to the MODIFIED_ON date of each row in the source tables. If the MODIFIED_ON date is greater than the LAST_REFRESH_DATE then the row will be refreshed in the Data Warehouse.

Note: In order for the ETL process to run, the appropriate database driver needs to be uploaded to the DW directory, e.g. DW_FOLDER/libext/JDBC/ojdbc6.jar (for Oracle).

In the very first ETL run, there won't be any values in WH_LAST_REFRESH, so every source row will be refreshed in the Data Warehouse. Subsequent runs will have much less activity than the first run.

This job produces the TeamConnect_Warehouse_date_time.log log file in the ./TCWH_logs subdirectory. For the TeamConnect_Warehouse_date_time.log file, where date and time are numbers representing the time when the job was started. This file contains process logging messages at a level of detail that you chose during configuration.

By default, kettle.properties parameter WH_REFRESH_LOOKUPS=NO. This means that during installation the Data Warehouse tables related to categories and currencies (WH_CATEGORIES and WH_CURRENCIES) will be populated with data from the source database. However, those tables will not be refreshed during any later ETL jobs.

If you have changed your TeamConnect design in ways that affect those tables, and you want your changes to be reflected in Data Warehouse, you must edit kettle.properties to change the parameter value to YES. Then the ETL jobs will pick up changes to those tables, and will continue to do so until you change the parameter value back to NO.

The ETL job also removes deleted records from Data Warehouse. These deletions occur after the changed rows are loaded into the warehouse. Each table in the source database has a trigger that captures the deleted record keys in a table called WH_REMOVED_RECORDS. WH_REMOVED_RECORDS is read and each corresponding key in the warehouse is removed in accordance with the relationships between the Data Warehouse tables (also referred to as cascade deletion).

The following table lists the triggers in the source database.

Source Database Triggers

Trigger Name	Table Name
`WH_AUDIT_REMOVAL_ACCOUNT`	T_ACCOUNT
`WH_AUDIT_REMOVAL_APPOINTMENT`	T_APPOINTMENT
`WH_AUDIT_REMOVAL_CONTACT`	T_CONTACT
`WH_AUDIT_REMOVAL_DOCUMENT`	T_DOCUMENT
`WH_AUDIT_REMOVAL_EXPENSE`	T_EXPENSE
`WH_AUDIT_REMOVAL_HISTORY`	T_HISTORY
`WH_AUDIT_REMOVAL_INVOICE`	T_INVOICE
`WH_AUDIT_REMOVAL_INVOLVED`	T_INVOLVED
`WH_AUDIT_REMOVAL_LINE_ITEM`	J_INVC_LINE_ITEM
`WH_AUDIT_REMOVAL_MILESTONE`	T_MILESTONE
`WH_AUDIT_REMOVAL_PROJECT`	T_PROJECT
`WH_AUDIT_REMOVAL_TASK`	T_TASK

Refreshing the Data Warehouse

First, determine whether the design of your TeamConnect source database has changed since the last refresh. This includes adding new custom objects or new custom fields. If such changes exist in the source database, you must run some extra jobs before running the actual refresh. For detailed information about what to do in this situation, see Changes in Your TeamConnect Design.

To refresh Data Warehouse

Navigate to the datawarehouse directory and:
- For Windows, run the following batch file:
  TeamConnect_Warehouse_Refresh.bat
- For Linux, run the following shell file:
  TeamConnect_Warehouse_Refresh.sh
For periodic refresh, use the Windows scheduler or UNIX cron to schedule the batch file or shell script named above to run. Verify that the job starts the TeamConnect_Warehouse_Refresh.bat or TeamConnect_Warehouse_Refresh.sh batch file from the datawarehouse directory.