Thursday, February 23, 2012

Logically Modelling Requirements for Analytical Systems

August 28, 2011 by Kushel  
Filed under B.I., Latest

The relational model for database design is a common approach for operational systems [1]. The logical level, which defines entities, attributes, and the relationships between, models normalised entities minimising data redundancy. A flexible model for retrieving data, it is geared towards transaction processing and not optimized for analytical use. The data structure is not obvious to the business user and likely to be hidden behind a front end [2]. Simpson and Witt depict the business user working with the data modeller in defining system requirements, where the business user’s participation in the design process ends [3].

Kimball believes this approach is hazardous for analytical modelling [4]. The type of requirements differ with an emphasis on defining aggregated measures like profit, with context supplied by the intersections of base data, or ‘dimensions’ [5]. Running analytical queries against a relational database is likely to result in performance issues [6, 7]. A logical analytical model needs to be easy to navigate with a data structure easily understood by the business user for the database to be used effectively [2]. The model should be created with dialog between the business user and data modeller. This approach can provide clarification for existing requirements, highlight limitations, additional detail, and provide an insight into the business through visualisation. This technique describes dimensional modelling. Kimball states, “Dimensional modelling models the proposed solution from “a user perspective, rather than a data perspective”.” [3]. Dimensional modelling takes data from the transactional system and restructures it into a multidimensional cube creating an optimised, logical model of the data [6].

An analytical matrix like Kimball’s Bus Matrix [6] is a tabular matrix where measures are listed against common dimensions. Marking the intersection signifies a relationship. It depicts which measures are achievable from which set of common dimensions, however does not detail the granular levels of each dimension. It is unclear which slicing of dimensions can be achieved and the potential range of analytical measures the system can provide. It is not instinctive to navigate although there is some clue to the data structure [2]. The author feels that the model is conceptual, placing an emphasis on the business rules and requirements without the critical detail of the logical data structure and relationships between entities.

The Starnet Model [5] shows hierarchical attributes in granular order along a radial line. Each line represents a separate dimension with the most atomic side meeting other dimensions at a centre point. It depicts the granularities available for OLAP operations, like roll-up and roll-down navigation [5]. However there is no representation of the analytical measures. Without these measures the data modeller cannot grasp the underlying aims of the system to produce an optimised model for the requirements. The Sun Model bridges this gap [8].

The Sun Model displays dimensions in granular order along a radial line however the most atomic sides meet at a centre circle containing a list of measures. Each listed measure can be analysed by all combinations of the surrounding dimensions [8]. The model is easy to navigate and provides a bridge between the business user and the physical modeller. This is where the Sun Model excels beyond the Star Schema, the preferred model for “user understandability and predictable high performance, for both atomic and aggregated data.” [8].

Each dimension in a Star Schema is a table listing all granular attributes, and linked to the central Fact table which contains the measures. [11] The granularity of dimension is not depicted. In [8], the Sun model is converted into a physical Star Schema. Kimball states, “The physical model should mirror the logical model as much as possible” [9] indicating that the category of model depends on the level of detail. The reason the Sun Model excels is from the perspective of the business user in understanding, navigating and slicing hierarchies.

The writer has provided a case for the requirements of a good logical dimensional model and highlighted the advantages of the Sun Model over other more prominent alternatives. The success of dimensional modelling is based on the business user expressing their analytical needs correctly. Once the system is created, structural changes may prove difficult due to conformed dimensions producing duplicate data [1]. Both the ER and dimensional approaches to analytical modelling have deficiencies [1, 6]. Inmon [1] and Kimball [6] maintain the two styles are fundamentally different, but others [2] have shown this may not be the case. As systems continue to get faster and memory and storage continues to get larger and cheaper, performance issues may become less prominent. With this in mind, combined modelling techniques such as the Snowflake Schema [12] which attempts to combine the best aspects of both relational and dimensional techniques, may then become more widely used.

References

[1] Inmon, W. H. Building the Data Warehouse, Fourth Edition, Wiley, 2005, pp. 357-376.

[2] Moody, D. L. and Kortink, A. R. From ER Models to Dimensional Models Part I: Bridging the Gap between OLTP and OLAP Design, Journal of Business Intelligence, 8, Summer 2003.

[3] Simpson, G. C. and Witt G. C. Data Modelling Essentials, Third Edition, Elsevier, 2005, pp. 16, figure 1.3.

[4] Kimball, R. Is ER Modelling Hazardous to DSS, DBMS Online, October 1995.

[5] Simpson G. C. and Witt G. C. Data Modelling Essentials, Third Edition, Elsevier, 2005, pp. 485.

[6] Kimball, R. The Data Warehouse Toolkit, Second Edition, Wiley, 2002

[7] Han, J. and Kamber, H. Data Mining: Concepts and Techniques, Second Edition, Elsevier, 2006, pp. 110-126.

[8] Whitehorn, M. Business Intelligence for Small Businesses with SQL Server, iTExpertMag.com, June 2009.

[9] Kimball, R. The Data Warehouse Lifecycle Toolkit, Second Edition, Wiley, 1998

[10] Hoberman, S. Physical or Logical, Information Management Magazine, October 2006.

[11] Malinowski, E. and Zimányi, E. Hierarchies in a Multdimensional Model: From Conceptual Modelling to Logical Representation, Data & Knowledge Engineering 59, 2006, pp. 348-377

[12] Moody, D. L. and Kortink, A. R. From ER Models to Dimensional Models Part II: Advanced Design Issues, Journal of Business Intelligence, Autumn 2003.

  • Twitter
  • Facebook
  • Delicious
  • Bebo
  • StumbleUpon
  • Digg
  • Blogger Post
  • WordPress
  • Share/Bookmark

Related posts:

  1. Recency, Frequency and Intensity (RFI)
  2. Software Development Models – Quick And Dirty Guide

Comments

2 Responses to “Logically Modelling Requirements for Analytical Systems”
  1. Radaker says:

    Howdy, the team is just building the 1st site, exploring and working out what on earth is necessary. This web site jammed out there instantly. I am energized with this, as well as enjoy the design of your site. Do you make me aware what exactly “theme” it is?

  2. Kushel says:

    Hi, i’ve customised a studiopress theme.
    For more details, click:

    http://www.shareasale.com/r.cfm?b=241369&u=482816&m=28169&urllink=&afftrack=

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!

Communicate Through Technology is Digg proof thanks to caching by WP Super Cache!

Communicate Through Technology is Digg proof thanks to caching by WP Super Cache!