AlgoDaily - Data Modeling and Design

Home > Data Engineer > Data Engineer > Data Modeling and Design

Dimensional Data Modeling

Dimensional data modeling is a modeling technique used in designing data warehouses and decision support systems. It involves organizing data into dimensions and facts to provide a high-level view of the data, making it easier to analyze and gain insights.

In dimensional data modeling, we have two main components: dimensions and facts. Dimensions represent the descriptive attributes or business entities, such as customers, products, time, etc. Facts, on the other hand, represent the numerical or measurable data, such as sales, quantities, revenue, etc.

Let's take an example of creating a dimensional data model using Python's pandas library. Suppose we have a dataset of customers, their orders, and the products they purchased. We want to create a dimension model consisting of the 'CustomerID' and 'CustomerName' columns and a fact model consisting of the 'OrderID', 'Product', 'Quantity', and 'Price' columns:

SNIPPET

1{{code}}

In the code above, we define a function create_dimensional_model() that creates a sample DataFrame representing the data. We then perform dimension modeling by selecting the desired columns for the dimension model and the fact model.

The dimension model, dimension, consists of the 'CustomerID' and 'CustomerName' columns, while the fact model, fact, consists of the 'OrderID', 'Product', 'Quantity', and 'Price' columns.

Dimensional data modeling enables efficient data analysis and reporting by providing a simplified and intuitive representation of the data. It allows users to analyze data across different dimensions, such as customer segments, products, time periods, etc., and gain valuable insights.

xxxxxxxxxx
 
import pandas as pd
​
def create_dimensional_model():
    # Create a DataFrame
    data = {
        'CustomerID': [1, 2, 3, 4, 5],
        'CustomerName': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'OrderID': [101, 102, 103, 104, 105],
        'Product': ['Apple', 'Banana', 'Cherry', 'Durian', 'Elderberry'],
        'Quantity': [2, 3, 4, 1, 5],
        'Price': [1.0, 1.5, 2.0, 0.5, 2.5]
    }
    df = pd.DataFrame(data)
​
    # Perform dimension modeling
    dimension_model = df[['CustomerID', 'CustomerName']]
    fact_model = df[['OrderID', 'Product', 'Quantity', 'Price']]
​
    return dimension_model, fact_model
​
# Call the function
dimension, fact = create_dimensional_model()
​
print('Dimension Model:')
print(dimension)
​
print('Fact Model:')
print(fact)

Dimensional Data Modeling

Programming Categories

Popular Lessons