Dimensional Models, Datalake, Data Warehouse, Data Mart, Functions, Trigger and Store Procedure

Dimensions Models

What is Dimensional Modeling ?

Dimensional model is a database design technique developed specifically for data warehouse design. In addition to maximizing query performance, it aims to provide a database structure that end users can simply comprehend and write query for. It has proven to be a significant advancement that can be accessed directly by end users, and it has taken the lead in data warehouse architecture in actual practice.

Why do we need to use Dimensional Modeling, Strengths and Weaknesses of Dimensional Models?

So, why do we need to use Dimensional Modeling while we have ER model

For example, if you have a database with more than one hundred tables and it is not standardized, after standardizing the database, the number of tables will increase. This is because standardization tends to increase the number of tables in order to reduce data redundancy and ensure data independence, while also maximizing data integrity by way to create a constraint on data, to check validity of data, ensure accuracy, and integrity of data.

So after standardizing the database, it very difficult to write a query to get data, because you need to know about association between tables to write query to get data. For that reason, ER not suitable for data retrieval, and we need change Dimensional Modeling for end users can simply write a query.

Strengths and weaknesses of Dimensional Models

The strengths is, it can help us reduce database complexity by reducing the number of tables and links between tables, query performance optimization. It make end user easy to query. And another strong point in Dimensional model is prevent the join of scheme, prevent unwanted changes in customer behavior.

But the strengths is the weaknesses, the limit of the structure of the schema limit the query can be written to the Dimensional. That mean This means the designer must have a good idea in advance about the questions users will want to ask. And Dimensional modeling may not be suitable for very complex data relationships, as it relies on simple structures to organize data.

When do we need to use Dimensional Modeling ?

Reporting and analysis: Dimensional modeling is very suitable for reporting and analysis, because it easy to query to get data
Design data warehouse
User need to understand the relationship between data
Design system towards customer requirements

How to design one Dimensional Modeling ?

Five step to creating a Dimensional Modeling:

Step-1: Identifying the business objective : This step we must exact identification for the business like Sales, Marketing,
Step-2: Identify Facts and Dimensions in Dimensional Data Model: Defined the Fact table and Dimension table
Step-3: Identify the Attributes for Dimensions: Example : Store Dimension has attributes like nam store, address, manager,...
Step-4: Define the Granularity for Business Facts: the granularity is the level of detail for business problems
Step-5: Building of Schema: There are two popular schemes: Star Schema and Snowflake Schema.

Data Lake

Data Lake is centralized repository designed to store, process, and secure large amounts of structured, semi-structured, and unstructured data. It can store data in its native format and handle all kinds of different data, ignoring size limitations. It provides high data volume to increase analysis performance and native integration.

Data Lake provides a scalable and secure platform that allớ businesses to: import any data from any system from physical systems, physical, cloud . Store any type or volume of data with high reliability: real-time or batch mode data processing, and analyze data using SQL, Python, R or any other language, third party data or analytics application.

Data Warehouse

Data Warehouse is a type of data storage management system designed to enable and support business intelligence (BI) operations, especially analytics. A data warehouse is only intended to perform queries and analysis and often contains large amounts of historical data. A data warehouse consolidates and compiles massive volumes of data from several sources, including external partner systems and applications for marketing, sales, finance, and customer-facing functions. The data warehouse stores this processed data so that it is readily accessible to decision makers.

A data warehouse provides information that caters to a specific topic. Those topics could be sales, promotions, inventory, etc. For example, if you want to analyze your company's sales data, you need to build a data warehouse that focuses on sales. Such a warehouse would provide valuable information such as “who were your best customers last year?” or “who is likely to be your best customer next year?”

Data Mart

A Data mart is a stripped-down version of a Data Warehouse and is designed for use by a specific department, unit, or group of users within an organization. For example: Marketing, Sales, Human Resources or finance. It is usually controlled by a single department within an organization.

Data mart typically only pulls data from a few sources compared to a Data Warehouse. Data mart is smaller in size and more flexible than a Data Warehouse.

Function

Function is an object in the database consisting of a set of statements grouped together to execute sql statements and created for reuse. In SQL Server, functions are stored and you can pass parameters and return values. One Function must return a value, if function not return a value, that is a void function. The void function is often used to perform tasks that do not need to return a value. Function can be called from stored procedure while stored procedure cannot be called from function. A function is compiled and executed each time that function is called.

Trigger

Trigger is SQL procedure that is executed on server side when any event occurs like Insert, Update or Delete. In addition, Trigger is also understand as a special stored procedure (because it has no parameters) that is automatically executed as soon as an event causes the data to be changed. Server DB is a place to store, manage Triggers, and is used when users want to check the integrity of the DB. In SQL, the term trigger is often used to check constraints (many tables/tables).

There are two classes of triggers in SQL Server:

DDL (Data Definition Language) trigger: This type of trigger fires when events change the structure (like creating, modifying or dropping a table). Or during server related events such as security changes or statistics update events. The DDL trigger can only be fired after the DDL event has occurred.
DML (Data Modification Language) trigger: This is the most used trigger type. In this case, the trigger event is a data modification statement. It can be an insert, update, or delete statement on a table. The DML trigger can be fired before, after, or during a DML event.

FOR or AFTER [[INSERT, UPDATE, DELETE] Trigger: Run after data checking actions of Rules, Constraint. The data has been temporarily changed in the table

INSTEAD OF Trigger: Run before data check actions, the data has not been changed, the action of updating data can be replaced with other action

Syntax to initialize trigger:

CREATE TRIGGER name_trigger

ON { name_table }

[ WITH <Options> ]

{ FOR | AFTER | INSTEAD OF }

{ [INSERT], [UPDATE] , [DELETE] }

Trigger in each database system has some differences like:

In PostgreSQL

Trigger in PostgreSQL can be trigged by events other than Insert, Delete, Update. For example in PostgreSQL can be trigged by the Truncate event.

In My SQL

Triggers in MySQL can only be triggered by data changes, such as adding, updating or deleting a record.

Store Procedure

A stored procedure is a group of pre-compiled SQL statements (prepared SQL code) that can be reused again and again.

They can be used to perform a wide range of database operations such as inserting, updating, or deleting data, generating reports, and performing complex calculations. Stored procedures are very useful because they allow you to encapsulate (bundle) a set of SQL statements as a single unit and execute them repeatedly with different parameters, making it easy to manage and reuse code.

Procedures have similar structure as functions; they accept parameters and perform operations when we call them; but the difference between them is that SQL stored procedures are simpler to write or create, whereas functions have a more rigid structure and support fewer clauses and functionality.

Reference

From ER Models to Dimensional Models: Bridging the Gap between OLTP and OLAP Design, Part I

https://www.analyticsvidhya.com/blog/2023/02/understanding-dimensional-modeling/

https://stackoverflow.com/questions/17604959/why-we-use-dimensional-model-over-denormalized-relational-model

https://www.cumul.io/blog/dimensional-data-model-for-embedded-analytics

https://learndatamodeling.com/blog/dimensional-data-modeling/

https://www.geeksforgeeks.org/dimensional-data-modeling/

https://www.metabase.com/learn/databases/data-mart-data-warehouse-data-lake

https://learn.microsoft.com/en-us/sql/relational-databases/triggers/ddl-triggers?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/relational-databases/triggers/dml-triggers?view=sql-server-ver16

thienhang.com