Best Practices
Reviewing these tips will make your Data Warehouse deployment simpler and more reliable.
Initial Loading Times
If your source database is very large, the time required for the initial loading may be many hours, and it may be impractical to restrict use of the source database for this long. One possible workaround for this situation is to run the initial load against a backup copy of the source database by following these steps:
- Run the installation as described in Installing and Configuring Data Warehouse. This process creates an entire warehouse schema in the target database and makes a few schema changes in the source database. It is fast and does not require exclusive use of the databases.
- Make a backup of the source database, and use it to load a separate, new database.
- Modify the kettle.properties configuration file, as described in step 3 of New Installations, to temporarily point to the new database as the source.
- Run the initial load as described in Initial Loading.
- Again, modify the kettle.properties configuration file, this time pointing to the original source database.
- Run a refresh as described in Refreshing the Data Warehouse.
Database Tuning
The run time of Data Warehouse jobs can be greatly affected by the values of certain database parameters. Check the logs for your jobs and experiment with parameter settings to see their effects on performance.
Here are some important database parameters for Oracle, including some example values for the "target" database.
db_block_size: large size is recommended (16K or higher depending on your operating system) |
db_cache_size: 1824 MB |
shared_pool_size: 256 MB |
sga_max_size: 8 GB |
undo_retention: 4 hours |
undo tablespace size: 20 GB |
sessions: 665 |
processes: 600 |
sort_area_size: 1048576 |
open_cursors: 800 |
These are examples only, not requirements.
For the "source" database, example parameter values include sga_max_size of 1 GB and open_cursors value of 300.
For SQL Server, be sure that the transaction log size is large enough, and that enough RAM is allocated to the "target" database server. For SQL Server target databases, the bulk-logged recovery model is recommended, to minimize log space and permit high-performance bulk copy operations.
Important: Data Warehouse, when first installed, contains no database indexes at all. You should manually create indexes where appropriate to speed up your most common queries.
Changes in Your TeamConnect Design
Important: When new custom fields are handled by running one of these files, it is possible that column names in some Data Warehouse views may change, and existing column names could now refer to different data than they did previously. This is caused by the way duplicate column names (in the source database) are handled by table WH_CF_MAPPING (see Helper Table Definitions). Check for this situation and, if necessary, revise your reports to use the changed column names.
Troubleshooting
You should always check the log of a Data Warehouse job after it completes. Only the log file will tell you whether an error occurred during the run. Search the log for error messages. If an error is found, search upward for the string "opening" to determine which transformation was running when the error occurred.
Caution: If an error occurs and it is not caught quickly, Data Warehouse will contain inaccurate information that will impact the quality of your reporting.
This is an excerpt of the text at the end of a log file after a successful run:
2007/12/20 13:27:02 - Kitchen - Finished! 2007/12/20 13:27:02 - Kitchen - Start=2007/12/20 12:44:12.184, Stop=2007/12/20 13:27:02.851 2007/12/20 13:27:02 - Kitchen - Processing ended after 2570 seconds.
This is an excerpt of text showing a failed run:
2007/11/03 07:19:04 - Kitchen - Finished! 2007/11/03 07:19:04 - Kitchen - ERROR (version 2.5.2, build 252005 from 2007/10/23 19:31:44) : Finished with errors 2007/11/03 07:19:04 - Kitchen - Start=2007/11/02 15:49:33.359, Stop=2007/11/03 07:19:04.125 2007/11/03 07:19:04 - Kitchen - Processing ended after 55770 seconds.