Data driven, domain driven and use case driven modeling

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Data driven, domain driven and use case driven modeling

Hennie de Nooijer
Hi,

Question. In one of presentations of Lars there is a diagram of positioning anchor modeling. Also  dimensional and datavault in mentioned.

Anchor = domain driven
DV = data driven
dimensional = use case driven

Could you explain this in more depth? Why is that and why did you come to this conclusion?

Gr
Hennie
Reply | Threaded
Open this post in threaded view
|

Re: Data driven, domain driven and use case driven modeling

roenbaeck
Administrator
The reasoning behind the triangle is that it is impossible for any (successful) modeling technique to position itself only in one of the three edges. You always need a bit of all perspectives (domain, data, use case) in order to create a good model. However, different techniques tend to distribute themselves so that they are closer towards different edges.

In Anchor Modeling focus is put heavily on the domain. If something is weird in your model then you have misunderstood something about the domain. Return to the domain expert with the appropriate questions in order to resolve the problem. We care more for how the domain works than what source data looks like or what queries are expected.

In Data Vault (and this was before I had ever heard about a business data vault) you rely heavily on source data structure. For example, natural keys are identified very early in the modeling process. If something doesn't load well into the model, you have misunderstood something about your source data. Return to the source data definitions in order to resolve the problem. DV cares more for how the sources look than what the domain is or what queries are expected.

In Dimensional Modeling you rely heavily on the expected queries. If a query is running slow or impossible to answer, then you have misunderstood the use cases. Return to the users and ask them what they want answered in order to resolve the problem. DM cares more for catering to the use cases than what the actual domain looks like or the structure in the sources.
Reply | Threaded
Open this post in threaded view
|

Re: Data driven, domain driven and use case driven modeling

roenbaeck
Administrator
Related to this is also the fact that many modelers fail to see the whole picture, to have a holistic source to target perspective. Let's for example view a solution, containing a database, from the endpoints as a system that consumes human output in order to produce human input. In such a system, any genericity you introduce in the process at some level must at some other level be resolved to specifics again in order to be intelligible. While genericity may save you work on one level it only pushes that work to another level. The same thing holds for data integration. Any data that is not integrated close to the sources, must at some other point be integrated before reaching the target.

Looking at many different data warehousing techniques, many of them only differ in where they position the database in the process of getting from the source to the target. The sum of the ETL work being done in the staging area or in order to produce data marts should roughly remain the same though.

A modeling technique that can retain flexibility while still being able to model specifics, that can provide auditability, that can allow ad-hoc querying, should IMHO be much preferrable to those that cannot do so easily. I think this is the direction we are moving in with Anchor Modeling, as well as DV when you use many simultaneous flavors (raw/source + business + metadata).
Reply | Threaded
Open this post in threaded view
|

Re: Data driven, domain driven and use case driven modeling

Hennie de Nooijer
ok thnx for your answers.
Reply | Threaded
Open this post in threaded view
|

Re: Data driven, domain driven and use case driven modeling

roenbaeck
Administrator
Hennie, I am also following your blog. You mention that it "took quite some time to create the model". Have you got any suggestions how we could improve the modeling tool in order to speed up the modeling process? What exactly did you think took too much time, compared to other modeling tools?
Ivo
Reply | Threaded
Open this post in threaded view
|

Re: Data driven, domain driven and use case driven modeling

Ivo
Lars, Hennie,

In our Anchor Modeling implementations we have chosen what Lars would classify as a Datavault approach: our models are based on the source files. A source table becomes an Anchor, and foreign keys between source tables give rise to ties between Anchors.
We chose this approach since it leads to a more straightforward ETL, and we took the position that ETL is the bottleneck (both in development-cost and in performance).
Later, we use views on the model to mold the data to the reporting domain. We find there are generally only minor differences between the source-file domain and the reporting domain.

This approach means that the sourcefiles determine most of the model (apart from the choice whether to knot an attribute or not). Once source tables and their relations are well-defined one could auto-generate a model, that would need only minor tweaking. We are convinced that developing such an automatic generation tool would represent a very significant decrease in modelling time (and less human error).

Similarly, the ETL is almost entirely defined by the model. After several iterations we ended up with an ETL in SSIS that consists for a large part of sql scripts. These scripts are determined by the model, and we have made good steps in standardisation and parametrisation. The scripts could also be auto-generated.
Again, we believe that auto generating the standard scripts based on the model should save a lot of development time and increase accuracy.