Does anyone know if there is an implementation of Anchor modeling on Redshift?
Redshift is a MPP columnar database like Vertica.
It seems to have table/join elimination and uses the PostgreSQL dialect as SQL lanuage.
In a MPP it is important to choose the distribution key correctly. There is a Vertica implementation, so it seems to be possible on a MPP database.
In Redshift you also need to choose a sorting key. It is like a clustered index in SQL Server. It arranges the data sorted on disk. I suppose the standard clustered index strategy on SQL Server should be ported to the sort key strategy on Redshift.
Maybe after de Data Modeling zone conference we know more. There is talk about the Vertica implementation. Curious if the used the flex store option to tune stuff.
Would be cool to implement an Anchor model on Redshift and see if it performs well.
In Vertica there are three distribution models, and they happen to coincide with Anchor modeling constructs. I belive this should be transferrable to other MPP platforms as well.
Broadcast The data is available locally (duplicated) on every node in the cluster. This suits knots very well, since they may be joined from both ties and attributes anywhere.
Distribute The data is split according to some operation across the nodes. A modulo operation on the identity column in the anchor and the attribute tables is a good split method.
Project Data necessary in order to do a cross-node join is stored across the nodes, for all directions of the join. Ties fit this purpose perfectly and does not introduce any overhead thanks to only containing the columns on which joins are done.
All in all, Anchor modeling fits MPP extremely well, since the constructs turn out to be designed in such a way that the MPP is utilized as efficiently as possible. A knotted attribute or tie can be resolved locally on a node, and a join over a tie cannot be more efficient, while the assembly of the results from different nodes is trivial.
Incidentally, if you search about issues in Vertica, a lot of implementations suffer from having to create lots of projections or broadcasts of large tables in order to support ad-hoc querying. This is a natural result of using less normalized models. Anchor modeling, on the contrary, works "out of the box" for ad-hoc querying. Furthermore, what should be projected and broadcast is known beforehand, and not tuning work a DBA would have to worry about on a daily basis.