Loading and Refreshing
**Instructions for Refreshing the Data Warehouse are at the bottom of this page.**
Initial Loading
The Initial Load, or large one-time load, is the process of loading data through the ETL for the very first time. The very first time this job runs it may require a long time to complete, because it will be loading a large volume of data from the source database to the target database.
The initial load will happen as part of the install and will not require any end-user action. After the initial load, users will be responsible for running the Refresh process.
How the ETL Job Works
The graphic below details how the CMO ETL job works. Each step in the process is explained in detail.
Replication: The first step in the ETL process is to move data from the transaction database (CMO) to replication. This is a built-in SQL Server function that runs automatically (once it is setup). The replication process is initiated by running a script to create publication and subscription, which then builds the database. Note: CDC for Replication is added after the subscription and publication are complete. CDC is available only in SQL Server Enterprise. A script, provided by the BI team will setup replication specific to the version of the Transactional database.
ETL (Extract > Transform > Load): The ETL process will be provided using SSIS, which is included with SQL Server. The purpose of this process is to move data from the Replication DB to the DW. This will be accomplished using a “staging” database to provide a separation of complexity.
Logging: Each stage of the ETL should provide logging for any type of error including invalid FKs, missing PKs, unexpected errors, and package execution information.
SSIS ETL Packages
The tables below are descriptions of the two packages which truncate the destination database and moves all ETL supported data from the source:
Package #1: Replication > Staging | Package #2: Staging > Data Warehouse |
|
|
Package #3: Replication CDC > Staging | Package #4: Staging CDC > Data Warehouse |
These two packages only move data that has been updated/added/deleted since the last ETL execution. This allows the ETL to maintain close to real-time data by reducing overall execution time from hours to minutes. | |
|
|
Refreshing the Data Warehouse
The Data Warehouse must be updated, or Refreshed, on a regular basis to ensure the information derived from it is current. The instructions in this section detail how to refresh the Data Warehouse on SQL Standard and SQL Enterprise.
Considerations:
- Must be logged into SSMS with a Windows user ID or with proper permissions to run scripts.
- For SQL Standard, run the DW Full job.
- For SQL Enterprise, run the DW CDC job.
- WARNING NOTE: Do not run the DW Full job on enterprise versions. This job is run (only) once by the installer.
- WARNING NOTE: Do not run the DW Full job on enterprise versions. This job is run (only) once by the installer.
Refreshing on SQL Server Enterprise
- In SSMS, Navigate to SQL Server and expand it.
- Expand Jobs.
- Right click on DW CDC and select Start job at Step...
- IMPORTANT NOTE: Do NOT run DW FULL when running the refresh on Enterprise. This job will be run only once, during install.
- IMPORTANT NOTE: Do NOT run DW FULL when running the refresh on Enterprise. This job will be run only once, during install.
- Click Start. Observe the progress in the popup.
- When this process is complete, there should be 2 checkmarks in the screen (shown above). Click Close when this process is complete.
If an error occurs follow the instructions in the To Monitor or review in case of errors section of these instructions.
Refreshing on SQL Server Standard
- In SSMS, navigate to SQL Server Agent and expand it.
- Expand Jobs.
- Right click on DW FULL and select Start Job at Step...
- Click Start. Observe the progress in the popup.
- When this process is complete, there should be 2 checkmarks in the screen (shown above). Click Close when this process is complete.
If an error occurs follow the instructions in the To Monitor or review in case of errors section of these instructions.
To Monitor or Review in case of Errors:
- Expand Integration Services Catalog.
- Right click SSISDB
- The default name for this folder is SSISBD, but the name is relative to what your company configures.
Navigate through the menu path: Reports > Standard Reports > select All Executions.
- The default name for this folder is SSISBD, but the name is relative to what your company configures.
- Click the Refresh button at the top of the page, as the page will not refresh automatically.