And there it was again, the lively discussion about the use of a Link Satellite. The argumentation, whether yes or no, went round in circles.
This time, the place of the debate was the DVEE Consortium Summit in Rotterdam, the annual meeting of Data Vault & Ensemble Enthusiasts (DVEE) and a worldwide event for exchanging ideas with experts in the field.
The discussion, as is often the case, included many and varied arguments for - and against assigning a Satellite to a Link in Data Vault.
It would be helpful to place these arguments in the context of a more formal decision-making guide, so that modelers can clearly see when a certain decision makes the most sense.
In the first part of this two-part article, I look at the background of the discussion from Diego Pasión's point of view, and present his approach on tackling this modeling decision and capturing the various perspectives in a decision-making guide. The second part deals with the bitemporal aspects related to these decisions.
What's the background?
Data Vault is a physical data modeling method with the aim of storing data in an efficient, flexible, and technically complete historized way. In a lot of cases, the focus is very much on the physical implementation, which can make it difficult to discuss the business causes of problems that surface during the modeling of a Data Vault, let alone finding a suitable solution.
Satellites on Links are technically possible, and are traditionally described over and over in the Data Vault literature, e.g. to include the context of a Unit of Work (UOW).
A solution and argumentation assistance
As is the case for many areas in the world of data, not only do ideas and methods evolve over time, but approaches do so as well.
This is also the case for Diego Pasión: he —the well-known coach of the DMCE team at FastChangeCo— is now convinced that a much better Data Vault model can be designed with a business/domain data model than without one. After all, the modeling of the information (business) is separate from the physical implementation.
This thinking forms guides him when discussions about the pros and cons of Link Satellites become heated, or when he explains why there should be no Link Satellites ‘per se.’
In situations such as these, Diego likes to use a simple example to show how a Data Vault model can be derived from domain-oriented data modeling. The results of this example provide a guideline whether Satellites should be modeled on Links, or not.
Diego starts with a domain-oriented data model and explains that it shows the following information: “An ‘employee’ has to ‘work in’ a ‘department’. Or the other way around: Many ’employees’ may ‘work in’ a ‘department.”
“The ‘employee’ and ‘department’ entities have attributes” he continues, “which describes an instance of each of them. The reason for the relationship between ‘employee’ and ‘department’ is that employees works in a department.”
“If you start from a domain-oriented data model (like the employee - department data model show before), which is generally recommended,” says Diego, “then during the instantiation of a logical data model (LDM) into a physical Data Vault model (DV) an entity ‘automatically’ becomes a Hub and a Satellite in the first step, and a relationship becomes a Link without a Satellite.”
The following simple rules therefore apply in the ‘Diego guideline’:
- Entity attribute(s) (identifying - How to get one and only one record):
Becomes the business key in the Hub - Entity attribute(s) (describe, measure - What is important about an entity?):
Becomes context attributes in the Satellite - Relationship (What is the reason for a relationship between two entities?):
Becomes a Link
“From a business point of view, in terms of an LDM,” Diego continues, “there can never be a Link Satellite because relationships themselves never have descriptive attributes.
The relationship between ‘employee' and ‘department’ would be updated in the ‘employee’ entity if the employee changes department.”
Diego looks around as he presents this example to the audience.
“If a relationship in the LDM had descriptive context, then the data modeler would model an associative entity in the LDM. This associative entity would incorporate the attributes that describe the relationship, but, according to the guideline this would simply become a Hub and Satellite in the physical Data Vault model.”
“Diego, can you show us an example of this?”
Diego creates the following DV model based on the LDM shown above following his guideline and says: “A Hub and a Satellite are physically modeled from the entities ‘employee’ and ‘department.’ The relation ‘works in’ becomes the Link between the two Hubs.”
Separation of concerns
“But what happens if the employee changes department?” Diego is asked.
Diego considers this for a moment, and then continues to explain: “An LDM does not ‘take care’ of technical historization, but this is still of interest of course. This is why, at implementation level, Data Vault supports technical historization for the descriptive context in Satellites as a fundamental tenet. Otherwise, in a physical data model that is derived ‘one-to-one’ from an LDM, there would be no historization, only updates.
Whether this is okay depends on the application that builds on the physical data model.”
Diego says that he is generally in favour of looking at business and temporal aspects separately: “The moment technical historization comes into the discussion, things look different. Because in a one-to-many relationship, a change must also be documented from a technical perspective. For example, when employee Sophie moves from department A to department B.
”Diego thinks about this for a moment, then he says: “This is best resolved by applying standard patterns in the physical model, and remove the discussion about the technical historization from the logical level.”
“In general, a ‘Load Date Timestamp’ column (LDTS, or, as the Inscription Timestamp in our data models) is used in the Data Vault Satellites purely for the technical historization/versioning of the data,” Diego explains.
“This column is part of the physical data model, the instantiation of the LDM, as this is where versioning (instead of updating) of the data is required to 'preserve' previous states of the data.”
“To stay with the example,” Diego continues, “if the employee Sophie now moves from department A to department B, this event adds another data record to the Link. In the data model shown above, the relationship would no longer be unique! As it is not clear from the data in which department Sophie is currently working.”
The solution preferred by Diego for this scenario (a one-to-many relationship) is adding an (end-dating) Satellite on the Link, which only task is to look after the technical historization of the Link.
With the help of a so-called driving key in the Link, the one-to-many changes in the Link can and are documented historically correctly by the Satellite.
The descriptive attributes
“The physical Data Vault data model would change again if the information model, the LDM, changes,” notes Diego.
“As already mentioned, these can be further descriptive attributes on the relationship. An example would be if the relationship in the above data model between ‘Employee’ and ‘Department’ contains the following descriptive attributes: Sophie's start date in a new department B is three months in the future. Sophie will assume the role of data modeler there for three months.”
“This can be, as already mentioned, adding descriptive relationship attributes,” explains Diego.
“An example is when following applies to the relationship between ‘Employee’ and ‘Department’:
Sophie's work start date (‘Work Begin Date’) in a new department B will be in three months in the future.
Sophie will then take on the role of a data modeler (‘Assignment’) for three months (time period: ‘Work Begin Date’, ‘Work End Date’).”
Diego presents the revised business data model / information model with the new information to his audience:
Diego looks around as he shows the LDM to the audience.
“The relationship between ‘Employee’ and ‘Department’ now has descriptive context, and to accommodate this the data modeler has added an associative entity to the LDM.
This new object contains the descriptive attributes of the relationship: ‘Work Begin Date’, indirectly the duration (time period: ‘Work Begin Date’, ‘Work End Date’) and ‘Assignment’.
Like other attributes, the business time period describes the relationship or the ‘normal’ entity.”
Enhancement of the 'Diego Guide'
Diego reminds the group of what he said earlier:
“If you always start from the business data model, which is generally recommended,” says Diego, “then when you instantiate a LDM into a physical DV model the entity ‘automatically’ becomes a Hub with a Satellite, and a relationship becomes a Link without a Satellite.”
The audience nods in agreement.
“If we apply our guidelines to this information model,” Diego continues, “then the associative entity in the physical data model becomes a Hub, and a Satellite and not a Satellite on the Link, right?”
Again, the audience nods in agreement. Diego thinks about his statement for a moment.
“We need to make a small addition, the 4th rule, in the guide.” Diego continues. “According to the current guidelines, the associative entity, with its two relationships, would lead to two links in the physical data model. But that would not aligns to the LDM.”
“What do you mean, Diego?”
“An associative entity and the ‘two associated relationships’ are one relationship in the true sense. It’s one relationship with descriptive attributes. That's why we data modelers in LDM need the associative entity trick.
In the DV, we implement this special relationship in such a way that one Link, one Hub and one Satellite are created from the associative entity.”
“Diego, can you please show us an example again?”
With the help of the modified guide, Diego creates a DV model from the LDM he showed earlier, and explains his approach.
“As previously shown, the entities ‘Employee’ and ‘Department’ are each used to physically model one Hub (rule 1) and one Satellite (rule 2). The relationship ‘works in’ previously existing in the LDM was further developed into an associative entity ‘Employee Works In Department’ based on the descriptive attributes.”
In the 'Diego Guide' the following simple rules apply:
- Entity attribute(s) (Identifying - How to get one and only one record): Becomes the business key in the Hub
- Entity attribute(s) (Describe, Measure - What is important about an entity?): Becomes context attribute(s) in the Satellite
- Relationship (What is the reason for a relationship between two entities?): Becomes a Link
- Associative entity (relationship with descriptive attributes or a many-to-many relationship): The entity becomes one Hub and one Satellite (rule 1 and 2), the ‘two’ or more relationships become one Link.
“As a result, the 4th rule now applies. The physical DV model changes in that the existing link between the two hubs is extended by one additional Hub with one Satellite.”
“The task of the ‘Hub Employee Works In Department’ and its Satellite is to map the original ‘many-to-many’ relationship with its the descriptive attributes correctly.”
For comparison, Diego again shows the previous DV model with the one-to-many relationship:
Final thoughts
“Thanks Diego, that's really helpful. This is a clear way to explain when Link-Satellites make sense.”
“By the way, the 4th rule also applies to many-to-many relationships, regardless of whether these relationships have descriptive attributes or not. This is because a many-to-many relationship must finally always be resolved by an associative entity,” adds Diego.
“We as a team should include this guideline in our Data Vault modeling rules,” everyone agrees.
Diego is happy that his approach is appreciated. This way, he can help the audience as a mentor and coach.
Then Diego remembers that he has something to mention about naming conventions in LDM. But that's another story. More on that in the next article in the series. Be sure to check back.
So long,
Dirk
Hi anonymous,
the DVEE actually has nothing to do with the Data Vault 2.0 Standard Committee. Nevertheless, all forms and variants of Data Vault are discussed there. In the article itself, I refer to Data Vault in general, regardless of a specific variant. The procedure described applies to all Data Vault types.
I hope this clears up any possible confusion.
Kind regards,
Dirk