Data Warehouse FAQ
For users accessing this guide online, click the headline for each enhancement to toggle additional details.
Installation and Initial Load
Do I need to be on TeamConnect 5.0 to use Data Warehouse 5.0?
Yes, you must be on TeamConnect 5.0 in order to install Data Warehouse 5.0. The installation upgrades mainly your warehouse database. It makes only minor changes to TeamConnect such as registering the current version of the warehouse application.
Can I upgrade from Data Warehouse 4.x without re-loading all data in the warehouse?
Yes. The installation scripts will migrate your current schema. You can then skip the Initial Loading steps and proceed to Refreshing the Data Warehouse.
Can I run an initial load offline (using a backup of my TeamConnect database) and then restore the warehouse database to my production environment?
Yes. You should not run any Refresh jobs in production until you restore the warehouse database. After restoring to production, run a Refresh to bring the warehouse up-to-date. See the online documentation for steps under Initial Loading Times.
Why is data missing after initial load?
The initial load only collects records whose timestamps are earlier than the maximum modifiedOn timestamp. This maximum value is computed by scanning all records when the initial load begins. You must run a Refresh job following initial load to collect records that were added after initial load. Expect this first refresh to take a while, because it will collect not only new records but also any records that were changed since the initial load began.
My initial load job terminated with an error before it finished. What should I do now?
Run the initial load job again, using the same command (TeamConnect_Warehouse_initial.sh or TeamConnect_Warehouse_initial.bat). It will re-start, skipping over entities that have already completed.
What if I don’t want the initial load to resume?
Run the WH_REMOVE.bat or WH_REMOVE.sh script to delete the warehouse database. Then repeat the installation and upgrade steps.
Refresh Jobs
Why are some of the changes users made in the last few minutes before a refresh not included?
It has to do with time boundaries. The warehouse applies an offset to calculate the upper boundary. The default is zero (0) seconds. We recommend you change the offset to a higher value in production environments, or anywhere that you will be actively changing data. An offset of 300 means that when a job starts, it will subtract 5 minutes from the job start time. So a job starting at 15:00 will include only records that were modified earlier than 14:55. The offset is used for safety, because a record that a TeamConnect user saves at, say, 14:59 might not be committed to the database until 15:01. If the boundaries were based strictly on the job start times, these last-minute updates might be missed permanently. The offset in this example adjusts for this potential latency, by collecting only records with timestamps between 13:55 and 14:55. Records updated after 14:55 will be collected in the next refresh -- a bit later, but not forgotten.
I made an update to a record soon after the last refresh but the changes weren’t included in the next refresh. Why not?
As explained above, the upper boundary (cut-off time for changes) is the starting time of the current refresh job minus the offset. If a record was created or updated before the job started, and then modified again after the job starts, its timestamp (modifiedOn time) is now greater than the upper boundary. If its data hasn’t been collected yet, it will be excluded from the current refresh. The changes will appear in the following refresh, unless the same pattern recurs.
I’m certain that the refresh job missed some records a while ago. How can I sweep those changes into the warehouse?
Edit the WH_LAST_REFRESH table. On the row that begins with "ALL", change the UPPER_REFRESH_DATE value to a time earlier than the timestamps of the records that were missed. This value determines the lower time boundary for the next refresh. Any records whose timestamps are equal to or later than this time will be re-collected and updated during the next refresh.
How frequently should refreshes be scheduled?
It depends on how long a typical refresh takes to complete. Most users will want to ensure that a refresh finishes before the next one starts. Data Warehouse 4.1 prevents a refresh from starting if one is still in progress by skipping jobs. When this happens, any accumulated changes won’t appear in the warehouse until the following refresh. We suggest scheduling refreshes no closer than one hour apart until you’re confident of job duration. If your refreshes complete in well under an hour, you can experiment with scheduling them more frequently.
Scheduled jobs should use the refresh scripts provided (.bat or .sh files). These scripts contain additional code to check for previous jobs that are still running.
Why did my my e-mail notification subject say “Second Refresh Stopped”?
The message means that the previous refresh job was still running. In order to prevent a collision, the refresh job was skipped (not launched).
Instead of skipping the next job when a previous refresh job is running, can it suspend until it finishes, then start?
The out-of-the-box warehouse refresh commands don’t provide functionality for sleeping. However, clients could implement their own code to accomplish it -- for example, monitoring for the existence of a lock file or the process ID it references. The OOTB scripts create lock files during a refresh job and delete them when it terminates.
Auditing
How can I determine what’s taking my refresh so long?
The refresh audit table (AUDIT_TBL) shows the number of records, starting/ending times and elapsed times for all jobs. Scroll to the bottom of the table to locate the rows that took the most time. If you need more detailed information, see the logs.
How do I verify whether the warehouse is up-to-date?
Run the overall audit count comparison utility (audit[.sh|.bat]).
My audit count shows discrepancies. Should I be worried that data’s missing?
The audit count can have minor inaccuracies due to changes made in the database while it runs. For example, if records are added after the script begin; or added and then deleted since the last refresh (see Known Issue DWE-2303).
Will the audit comparison tool identify which records are missing?
No, it only compares counts.
Will the audit comparison tool identify all objects whose counts don’t match?
No, it only compares main (top-level) entities. We’re considering adding counts for sub-entities in a future release.
Other
Is this release faster or slower than the previous one?
Testing to date indicates a performance improvement for initial load but not for refresh. However, we expect refresh performance to be better for large deltas because we’ve enabled much larger database commit sizes. Also, only data changed since the last refresh window is collected. Formerly, the application re-collected up to a day’s worth of previous changes, even when few records had changed.
Why can't I create certain custom fields?
Creating a custom field with a name in the format <EntityName>_ID [Example being INVOICE_ID on invoice object] will cause Data Warehouse to fail. This is due to these column names are what is used in the same tables as custom fields in Data Warehouse. This only occurs in Data Warehouse tables, so any field named this way will not be able to be added to the Data Warehouse.