Hi Lars and fellow anchor-modellers,
thanks to you and the rest of the team behind anchor modelling, it's very interesting work.
I'm writing a thesis in data warehouseing and am currently trying to compare anchor modelling to the data vault. Because of this I would like to make the anchor model source driven and not do any cleansing. That should come in a layer down the road.
While modeling the anchor I'm trying to model name-value pairs which I will be receiving from a source system. In the vault this is one table containing a name and a value column. In anchor modelling I can't quite work out how to receive multiple pairs and still keep their pairing.
The input of one loading cycle and one entity looks like this:
FK_ENTITY NAME VALUE
---------- ---------------- ------------------------------
Yes, you can implement “name-value pairs” in the way you suggest but I would not recommend it. I think it hurts quite a lot
I would instead recommend you to create one anchor and an attribute for each value you want to store.
One question that comes to my mind is what happens if “category 2, Some Text” comes with a new value let’s say “category 2, some other text” how should you handle that? Should you historize the tie and create a new anchor or should you keep the tie as it is and historize the attribute? It´s not clear to me…
If it is an attribute on an anchor, it would only be a new version with a new timestamp.
you are right. I failed to see that a combination of name and value could receive its own key, represented by the anchor. I will definitely use your approach!
This means I would use the already existing anchor "EN_Entity", and tie it to an anchor "IN_Information" that contains the attributes name and value. The tie shall be called "EN_has_IN_multiple", complemented by a knot isActive of boolean type when historized.
The information currently is not meant to change, it would be initially supplied and that's it. However I suppose I would just save the value in a new IN_Information-anchor using both name and value as the ID. The old IN_ID would then be deactivated in the historized "EN_has_IN_multiple"-tie using a then necessary isActive-Knot (Just of the top of my head).
I have trouble envisioning the "attribute-historisation"-solution you proposed. I love the sound of it, and think it could be done like this: The combination of IN_name and IN_value would not form the IN_ID. Instead the EN_ID and the IN_name would be the IN_ID, and then the IN_value can change meaningfully. However, is IN_Information an anchor if used like this? It seems to behave like a tie with an attribute.
Anchor Modeling was designed for modeling specifics with strong typing, so forcing an NVP approach, generics without typing, onto the technique feels like the wrong way of going about solving a problem. Even if it is doable, I wouldn't recommend it.
Either use a different technique, designed for NVP and possibly not even relational, for the staging area and then later move it into an anchor model. Or, dynamically build a specific and typed (or less desirable, untyped) model. The first time "Category 6" appears for "Entity type C", create an attribute for "Category 6" on the "Entity type C" anchor, and if that anchor doesn't already exist, create it. That way, you can also track statements like "'Entity type A' is related to 'Entity type C' under the 'Category 8' condition (and only between may and august)".
What is the exact reasoning behind the decision to use NVP?
Since you were talking about Data Vault, I am expecting you intend to store the NVP (EAV) data in a more structured model in a later step?
However, what many people who resort to NVP (EAV) modeling fail to recognize is that what they think is a problem solver now only has become someone else's headache. If you look at the process of turning data into usable information as value chain, whatever refinements you neglect to do before your data model will instead have to be taken care of by that which comes after your model. And, well, then things can go wrong... :)