What is a Data Warehouse
Data Warehouse (DWH) is database (DB) specifically designed for extracting data and reporting. Data Warehouses are designed to gather data from data sources to perform business intelligence Analytics and reporting. The schema can follow multiple design patterns, each of which have benefits depending on the purpose of the design. Since data extraction is the main purpose of the database, the speed to extract is more important than the speed to load, which differs from a transactional database where both are necessary.
It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
Entity Relationship Diagram (ERD)
Click here to view the CMO Data Warehouse ERD.
Using Customized Terminology in CMO
Some of the terminology within the CMO web application is customizable. However, the CMO Data Warehouse does not recognize custom terms and will use the Best Practice terminology when reporting. If your company has custom terms please use the Standard Terminology for Features as a reference.
Data Warehouse User Guide
- System Requirements
- List of system requirements and prerequisites for installing the CMO Data Warehouse.
- Installation Guide
- Installation instructions for the CMO Data Warehouse.
- Removing the Data Warehouse
- Remove the CMO Data Warehouse from your system.
- Loading and Refreshing
- Instructions and details on loading and refreshing the CMO Data Warehouse.
- Commonly used SQL Scripts
- Commonly used SQL Queries to retrieve data from the CMO Data Warehouse
Helpful Data Warehouse Terms
|CMO’s software and/or name of CMO’s software’s database
|A term used for analyzing data. This can be as simple as a basic report, to interactive analytics, to predictive analysis.
|Database (DB) Schema
|The structure of a database - how data is stored and related to other data.
|Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
|This is the database tied to the application - it handles all transaction to/from the application.
|This database “mirrors” the transaction database, but only the selected columns/tables required within the ETL process. Its purpose is to relieve the strain on the transaction DB by offloading ETL/Report processing from the transaction to the replication DB. This is typically “real time”, but could see a slight delay depending on the size of the transactions hitting the transaction DB.
|This is an optional DB for an ETL process. For CMO’s purpose, Staging was included to separate the complexities of the CMO DB from the final ETL process to the data warehouse (which has its own set of complexities).
|Extract > Transform > Load (ETL)
|The process of moving data from one location to another. Often times the structure of the data has to be modified before reaching the destination, which is the transformation step of ETL.
|Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations
|A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse
|A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month."
|Data warehouses usually store many months or years of data. This is to support historical analysis.
|SQL Server Integration Services (SSIS)
|A tool provided by Microsoft that allows you to visually design an ETL process. There are many types of sources, transformation, destinations, logging, etc. needed by ETL processes, and SSIS allows you to better manage these complex structures in a more graphical interface.
|Change Data Capture (CDC)
|This is a tool SQL Server provides, which stores all data added, modified, and deleted from the database into special tables. This allows an ETL process to only process data that needs to be updated. Whereas truncating the destination DBs and moving all data can take hours, the ETL process can run in a matter of minutes depending on the number of changes that occurred since the last CDC ETL process ran.