SQL Server query to generate AM xml of information_schema metadata

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SQL Server query to generate AM xml of information_schema metadata

delostilos
Hi,

I made a small query that uses the information in the information_schema of SQL Server to generate an Anchored version of the model. I've tested is against the  SQL Server Sakila demo database.

It generates a uni-temporal version of the AM.
Run the query and copy-paste the result in a file.
Save the file and load it in the AM modeler. Have fun.

Regards,
JJ.

The query:
-- Generate an anchor XML based on the information_schema metadata
-- The Business keys (BK's) are non historized all other stuff is historized
-- It is not a perfect generator. You can use to get an indication how an Achorized version of
-- your model will look like. To give you a nice start :-)    
-- The query will try to figure out if there are N:M tables used that are implemented as Ties
-- All other Ties are based on the FK's
-- Each Anchor get's an extra Validity Attribute to track the existence of the BK
-- Each N:M Tie get's an extra Validity Knot to track the existence of the N:M relationship
-- I hope it helps to define an Anchor model based on a '3NF' model in a database
-- It will work nice if the '3NF' model is based on 'natural/business' keys.
  with unique_keys as (
    -- get the unique keys columns
select table_catalog
     , table_schema
     , table_name
     , column_name
  from information_schema.key_column_usage
 where constraint_name in (select constraint_name 
                             from information_schema.table_constraints where constraint_type in ('PRIMARY KEY','UNIQUE'))
)
,      foreign_keys as ( 
    -- get the foreign key columns  
select table_catalog
     , table_schema
     , table_name
     , column_name
  from information_schema.key_column_usage
 where constraint_name in (select constraint_name 
                             from information_schema.table_constraints where constraint_type in ('FOREIGN KEY'))
)
-- filter out the n:m tables pk/ux that have a pk/ux that is covered by the combined fk's
, fk_minus_unique as (
select *
  from unique_keys
except
select * 
  from foreign_keys    
)
-- get the tables that become anchors 
, anchor_tables as (
select distinct table_catalog
     , table_schema
     , table_name
  from fk_minus_unique
)
-- build the xml of the anchors and attributes
-- the 'replace ....' function set is the equivalent of an initcap function
, anchor_and_attribute as (
select case row_number() over(partition by table_schema,table_name order by ordinal_position)
            when 1 then
       '<anchor mnemonic="'+substring(upper(table_name),1,1)+substring(upper(table_name),len(table_name)-2,1)+'"'
    + '        descriptor="'+replace(replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(
			' '+replace(table_name,'_',' '),
		' a',' A'),' b',' B'),' c',' C'),' d',' D'),' e',' E'),' f',' F'),
		' g',' G'),' h',' H'),' i',' I'),' j',' J'),' k',' K'),' l',' L'),
		' m',' M'),' n',' N'),' o',' O'),' p',' P'),' q',' Q'),' r',' R'),
		' s',' S'),' t',' T'),' u',' U'),' v',' V'),' w',' W'),' x',' X'),
		' y',' Y'),' z',' Z'),' ','')+'"'
    + '        identity="int">'
    + '        <metadata capsule="' +table_schema+'"'
    + '                  generator="true"/>' 
    + '<attribute mnemonic="VLI"'
    + '           descriptor="ValidityIndicator"'
    + '           timeRange="datetime2(7)"'
    + '           dataRange="smallint">'
    + '           <metadata capsule="dbo"'
    + '                     restatable="false"'
    + '                     idempotent="true"/>'
    + '</attribute>'
            else 
       '' 
       end 
    + '<attribute mnemonic="'+substring(upper(column_name),1,2)+substring(upper(column_name),len(column_name),1)+'"'
    + '           descriptor="'+replace(replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(
			' '+replace(column_name,'_',' '),
		' a',' A'),' b',' B'),' c',' C'),' d',' D'),' e',' E'),' f',' F'),
		' g',' G'),' h',' H'),' i',' I'),' j',' J'),' k',' K'),' l',' L'),
		' m',' M'),' n',' N'),' o',' O'),' p',' P'),' q',' Q'),' r',' R'),
		' s',' S'),' t',' T'),' u',' U'),' v',' V'),' w',' W'),' x',' X'),
		' y',' Y'),' z',' Z'),' ','')+'"'
    + case when exists (select 1 from unique_keys where c.table_catalog = table_catalog and c.table_schema = table_schema and c.table_name = table_name and c.column_name = column_name)
            then 
       ''
            else 
       '           timeRange="datetime2(7)"'
         end  
    + '           dataRange="'+data_type
    + case when coalesce(character_maximum_length, numeric_precision,datetime_precision) is not null
             and data_type not like '%int%'
             and coalesce(character_maximum_length, numeric_precision,datetime_precision) > 0
            then case when numeric_scale is null 
                      then '('+cast(coalesce(character_maximum_length, numeric_precision,datetime_precision) as varchar)+')'
                      else '('+cast(coalesce(character_maximum_length, numeric_precision,datetime_precision) as varchar)+','+cast(numeric_scale as varchar)+')'
                 end
            else ''
       end     
    + '">'
    + '           <metadata capsule="'+table_schema+'"'
    +  case when exists (select 1 from unique_keys where c.table_catalog = table_catalog and c.table_schema = table_schema and c.table_name = table_name and c.column_name = column_name)
             then 
       ''
             else 
       '                     restatable="false"'
    + '                     idempotent="true"'
        end          
    + '/>'
    + '</attribute>'
    + case when row_number() over(partition by table_schema,table_name order by ordinal_position) 
               = count(1) over(partition by table_schema,table_name)
            then 
       '</anchor>'
            else 
       ''
       end as xml    
  from information_schema.columns c
 where table_schema not in ('pg_catalog','information_schema') 
   and exists (select 1 
                 from anchor_tables a
                where a.table_catalog = c.table_catalog
                  and a.table_schema = c.table_schema
                  and a.table_name = c.table_name
              )
   and not exists (select 1
                     from foreign_keys f
                    where f.table_catalog = c.table_catalog
                      and f.table_schema = c.table_schema
                      and f.table_name = c.table_name
                      and f.column_name = c.column_name
                  )
    -- filter out the last_updated, it is the change tracking date
   and column_name not in ('last_update')
)
-- generate the ties based on FK's
, ties as (
select '<tie timeRange="datetime2(7)">'
    + '     <anchorRole role="parent"'
    + '                 type="'+substring(upper(tc.table_name),1,1)+substring(upper(tc.table_name),len(tc.table_name)-2,1)+'"'
    + '                 identifier="false"/>'
    + '     <anchorRole role="child"'
    + '                 type="'+substring(upper(fc.table_name),1,1)+substring(upper(fc.table_name),len(fc.table_name)-2,1)+'"'
    + '                 identifier="true"/>' 
    + '     <metadata capsule="'+tc.table_schema+'"'
    + '               restatable="false"'
    + '               idempotent="true"/>'
    + '</tie>' as xml
   from INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
   join information_schema.table_constraints tc
     on rc.UNIQUE_CONSTRAINT_CATALOG = tc.CONSTRAINT_CATALOG
	and rc.UNIQUE_CONSTRAINT_SCHEMA = tc.CONSTRAINT_SCHEMA
	and rc.UNIQUE_CONSTRAINT_NAME = tc.CONSTRAINT_NAME
   join information_schema.table_constraints fc
     on rc.CONSTRAINT_CATALOG = fc.CONSTRAINT_CATALOG
	and rc.CONSTRAINT_SCHEMA = fc.CONSTRAINT_SCHEMA
	and rc.CONSTRAINT_NAME = fc.CONSTRAINT_NAME
 where tc.table_schema not in ('pg_catalog','information_schema')
   --and tc.constraint_type = 'FOREIGN KEY'
   and exists (select 1
                 from anchor_tables a
                where a.table_catalog = fc.table_catalog
                  and a.table_schema = fc.table_schema
                  and a.table_name = fc.table_name
              )
)
-- generate the ties based on n:m tables
, ties_nm as (
select case when row_number() over(partition by fc.table_catalog, fc.table_schema, fc.table_name order by tc.table_name) = 1
            then 
       '<tie timeRange="datetime2(7)">'
            else 
       ''
       end     
    + '     <anchorRole role="'+replace(replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(replace(replace(
		replace(replace(replace(replace(replace(
			' '+replace(tc.constraint_name,'_',' '),
		' a',' A'),' b',' B'),' c',' C'),' d',' D'),' e',' E'),' f',' F'),
		' g',' G'),' h',' H'),' i',' I'),' j',' J'),' k',' K'),' l',' L'),
		' m',' M'),' n',' N'),' o',' O'),' p',' P'),' q',' Q'),' r',' R'),
		' s',' S'),' t',' T'),' u',' U'),' v',' V'),' w',' W'),' x',' X'),
		' y',' Y'),' z',' Z'),' ','')+'"'
    + '                 type="'+substring(upper(tc.table_name),1,1)+substring(upper(tc.table_name),len(tc.table_name)-2,1)+'"'
    + '                 identifier="true"/>'
    + case when row_number() over(partition by fc.table_catalog, fc.table_schema, fc.table_name order by fc.constraint_name)
                = count(1) over(partition by fc.table_catalog, fc.table_schema, fc.table_name)
            then 
       '     <knotRole role="valid"'
    + '               type="VAL"'
    + '               identifier="false"/>' 
    + '     <metadata capsule="'+tc.table_schema+'"'
    + '               restatable="false"'
    + '               idempotent="true"/>'
    + '</tie>'
            else 
       ''
       end as xml    
  from INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
   join information_schema.table_constraints tc
     on rc.UNIQUE_CONSTRAINT_CATALOG = tc.CONSTRAINT_CATALOG
	and rc.UNIQUE_CONSTRAINT_SCHEMA = tc.CONSTRAINT_SCHEMA
	and rc.UNIQUE_CONSTRAINT_NAME = tc.CONSTRAINT_NAME
   join information_schema.table_constraints fc
     on rc.CONSTRAINT_CATALOG = fc.CONSTRAINT_CATALOG
	and rc.CONSTRAINT_SCHEMA = fc.CONSTRAINT_SCHEMA
	and rc.CONSTRAINT_NAME = fc.CONSTRAINT_NAME
 where tc.table_schema not in ('pg_catalog','information_schema')
   --and tc.constraint_type = 'FOREIGN KEY'
   and not exists (select 1
                 from anchor_tables a
                where a.table_catalog = fc.table_catalog
                  and a.table_schema = fc.table_schema
                  and a.table_name = fc.table_name
              )
)
-- uni temporal header  
select '<schema format="0.98">'
    + '<metadata changingRange="datetime2(7)"' 
    + '          encapsulation="metadata"' 
    + '          identity="int"' 
    + '          metadataPrefix="Metadata"'
    + '          metadataType="int"' 
    + '          metadataUsage="false"' 
    + '          changingSuffix="ChangedAt"' 
    + '          identitySuffix="ID"'
    + '          positIdentity="int"'
    + '          positGenerator="true"' 
    + '          positingRange="datetime"' 
    + '          positingSuffix="PositedAt"' 
    + '          positorRange="smallint"' 
    + '          positorSuffix="Positor"' 
    + '          reliabilityRange="smallint"'
    + '          reliabilitySuffix="Reliability"' 
    + '          reliableCutoff="1"' 
    + '          deleteReliability="0"' 
    + '          reliableSuffix="Reliable"' 
    + '          partitioning="false"' 
    + '          entityIntegrity="true"' 
    + '          restatability="false"' 
    + '          idempotency="true"' 
    + '          assertiveness="false"' 
    + '          naming="improved"' 
    + '          positSuffix="Posit"' 
    + '          annexSuffix="Annex"' 
    + '          chronon="datetime2(7)"' 
    + '          now="sysdatetime()"' 
    + '          dummySuffix="Dummy"' 
    + '          versionSuffix="Version"' 
    + '          statementTypeSuffix="StatementType"' 
    + '          checksumSuffix="Checksum"' 
    + '          businessViews="false"' 
    + '          equivalence="false"' 
    + '          equivalentSuffix="EQ"' 
    + '          equivalentRange="smallint"' 
    + '          databaseTarget="SQLServer"' 
    + '          temporalization="uni"/>'
 union all
-- the validation knot used in n:m tie to track validity
select '<knot mnemonic="VAL"' 
    + '      descriptor="Validity"' 
    + '      identity="smallint"' 
    + '      dataRange="smallint">'
    + '      <metadata capsule="dbo"' 
    + '                generator="false"/>'
    + '</knot>'
 union all
-- the anchors with their attributes 
select *
  from anchor_and_attribute
 union all
-- the ties based on the FKs 
select *
  from ties
 union all
-- the ties based on the n:m tables (FKs cover the complete PK) 
select *
  from ties_nm  
 union all
-- the footer 
select '</schema>' 
/*
 * header
 * knots
 * anchor + attributes, add validity attribute
 * ties (fk based)
 * ties (table based), add validity knot 
 * footer
 * 
 */

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQL Server query to generate AM xml of information_schema metadata

roenbaeck
Administrator
Amazing! I need to test this right away :) Thanks for contributing this. May we add it to the generated code as a stored procedure?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQL Server query to generate AM xml of information_schema metadata

delostilos
Hi,

It is not a perfect generator. There will be cases that the generated Anchor model is not the best solution. It needs some tweaking for the different model construction you can have. But it is a nice start.

Yes, you may add it as a stored procedure.

Regards,
JJ.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQL Server query to generate AM xml of information_schema metadata

Chris
Hi JJ,

Very nice script! Is there a reason that you chose to use nested REPLACE statements versus a single UPPER command?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQL Server query to generate AM xml of information_schema metadata

delostilos
Hi Chris,

I originally made the query in PostgreSQL with this piece of code to create PascalCase identifiers:

replace(initcap(replace(table_name,'_',' ')),' ','')

Just did a search to find a equivalent of initcab in SQL Server and found the replace variant.

Now the query builds an Anchor for a table that is a N:M table with attributes in a way that each FK results in a single Tie. It only generates binary Ties in this case.
I think a better approach is to build a tie for all the FK's combined and connect the Anchor to that Tie.
table1 -|--|< N:M table >|--|- table2
wil transform in
Anchor(1) --1--< Tie >---n----\
                             Anchor(N:M) 
Anchor(2) --1--< Tie >---n----/
but a better way would be (I think)
Anchor(1) --n--/  n  \
              <  way  >--1-- Anchor(N:M)
Anchor(2) --n--\ Tie /
If I have some spare time, I'll try to make a version that creates this construction.

Regards,
JJ.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQL Server query to generate AM xml of information_schema metadata

roenbaeck
Administrator
For cross referencing, here's the blog topic on automatic generation of Anchor models:
http://www.anchormodeling.com/?p=1078
Loading...