How to achieve idempotency, or does it matter?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to achieve idempotency, or does it matter?

Shamus Fuller
Consider an architecture that has:
1) a read/write responsibility separation pattern is used for data access, call it CQRS or similar if you prefer. The writes could be done using Service Bus topics and queues, establishing processing pipelines, etc
2)Eventual consistency is achieved using transactional behavior via messaging, similar to what's described here: http://blogs.msdn.com/b/clemensv/archive/2011/10/06/achieving-transactional-behavior-with-messaging.aspx

Anchor modelling seems very useful for such an architecture. The asynchronous arrival of data benefit alone enables small messages, to write single attributes instead of whole 'rows' of mostly unchanging data as seen in relational models.

Two phase transaction commits across boundaries is not always possible, and is not really even desirable. To enable a message based transaction system, idempotency seems to be necessary. How would the idempotent quality be achieved? Or would it even matter? I can't quite place it, but it seems that in some way, a bitemporal, or time annexed system may naturally be able to relax the idempotent requirement. In some sense, who cares if the same message comes in twice? Does that even make sense?
Reply | Threaded
Open this post in threaded view
|

Re: How to achieve idempotency, or does it matter?

roenbaeck
Administrator
Incidentally, I did some work on supporting idempotency before the summer. I will make it a selectable option in the tool for the generated code, scheduled to be added during November. In order to achieve it, some changes have to be made to the triggers. Following is the proof-of-concept showing how it is done:

drop table MySource;
 drop table MyTarget;
 drop view lMyTarget;
 drop function sMyTarget;

 go
 create function sMyTarget (
    @identifier int,
    @value char(1),
    @changed date
 )
 returns int -- {0,1}
 as begin return (
    select
            COUNT(*)
    where (
            select top 1
                    value
            from
                    MyTarget
            where
                    identifier = @identifier
            and
                    changed < @changed
            order by
                    changed desc
    ) = @value
 );
 end
 go

 create table MyTarget (
    identifier int not null,
    value char(1) not null,
    changed date not null,
    constraint rsMyTarget check (
            dbo.sMyTarget(identifier, value, changed) = 0
    ),
    constraint pkMyTarget primary key (
            identifier,
            changed
    )
 );

 go
 create view lMyTarget
 as
 select
    identifier,
    value,
    changed
 from
    MyTarget t
 where
    t.changed = (
            select
                    MAX(s.changed)
            from
                    MyTarget s
            where
                    s.identifier = t.identifier
    );
 go

 create table MySource (
    identifier int not null,
    value char(1) not null,
    changed date not null
 );

 -- Insert 1.000.0000 records
 with rowGen(n) as (
       select 0
       union all
       select n + 1
       from rowGen
       where n < 999999
 )
 insert into MySource
 select
       n % 1000, -- 1000 distinct ids
       char(64 + ver),
       DATEADD(day, n, '2000-01-01')
 from (
       select
               n,
               ntile(5) over (partition by n % 10 order by n) as ver
       from
               rowGen
 ) x
 option (maxrecursion 0);

 create clustered index ixMySource on MySource(identifier, changed);


 go
 create trigger iMyTarget on lMyTarget
 instead of insert
 as
       with insertedAndNumbered as (
               select
                       ROW_NUMBER() over (partition by identifier order by changed) as N,
                       identifier,
                       value,
                       changed
               from
                       inserted
       ),
       insertedWithPreviousValue as (
               select
                       identifier,
                       value,
                       changed,
                      case
                                when N%2=1 then 
                                     MAX(case when N%2=0 then value end) 
                                     over (partition by identifier, N/2)
                                else 
                                     MAX(case when N%2=1 then value end) 
                                     over (partition by identifier, (N+1)/2)
                        end as previousValue
               from
                       insertedAndNumbered
       )
    insert into MyTarget
    /*
    output
               'INSERT' as op,
               inserted.*
       */
    select
               i.identifier,
               i.value,
               i.changed
    from
               insertedWithPreviousValue i
       where
               previousValue is null
       or
               value <> previousValue;

 go
 create trigger uMyTarget on lMyTarget
 instead of update
 as
    insert into MyTarget
    /*
    output
            'UPDATE' as op,
            inserted.*
    */
    select
            i.*
    from
            inserted i
    left join
            deleted d
    on
            d.identifier = i.identifier
    and
            d.value = i.value
    where
            d.identifier is null;
 go

 -- disable constraint while loading (safe since triggers remove
 restatements)
 alter table MyTarget nocheck constraint all;

 -- takes 10 seconds to run on a medium sized production server (without
 constraints)
 -- takes 30 seconds to run on a medium sized production server (with
 constraints)
 merge into lMyTarget as t
 using MySource as s
 on t.identifier = s.identifier
 when matched
 then update set
    t.identifier = s.identifier,
    t.value = s.value,
    t.changed = s.changed
 when not matched
 then insert
    values(s.identifier, s.value, s.changed);

 alter table MyTarget check constraint all;

 --select * from lMyTarget;
Reply | Threaded
Open this post in threaded view
|

Re: How to achieve idempotency, or does it matter?

Shamus Fuller
Thank you, that makes sense. My thinking was not giving proper weight to the temporal capabilities and i was not clearly separating the techniques available from how they would be generally handled in a relational model.

Also, restatement and timeline annexing were adding to my bewilderment. I can see now that assuming restatement is not allowed, idempotency is easily handled by considering one of the time values, changing time for example. My thought about how idempotency might be irrelevant came from not distinguishing changing time and recording time. Idempotency for recording time, doesn't seem to matter, strictly speaking. In some sense, who cares if multiple systems or sources register recording times. In some cases, that would be a desired scenario and I can see how it could be supported. If restatement is allowed, then maybe more treatment of changing time is needed for example, but it is certainly possible. With timeline annexing, idempotency could exist along each concurrent timeline or along information sources independently.
Reply | Threaded
Open this post in threaded view
|

Re: How to achieve idempotency, or does it matter?

roenbaeck
Administrator
Disallowing restatements and enabling idempotency should only be done in an environment where data are expected to arrive synchronously. In a situation where data may arrive out-of-changing time order you could otherwise end up with data loss. Consider the following:

> Message 1, with changing time 42 and value: A arrives.
> Message 2, with changing time 43 and value: B arrives.
> Message 3, with changing time 44 and value: A arrives.

Compared to the following:

> Message 1, with changing time 42 and value: A arrives.
> Message 2, with changing time 44 and value: A arrives.
> Message 3, with changing time 43 and value: B arrives.

Idempotency (with an optional restatement constraint) would in the latter case discard the second message, as its value is identical to the one already stored. That the decision to do so was incorrect can only be determined after the third message has arrived.

The proof of concept outlined above did not work out well enough with respect to existing functionality, and I had to change it. Following is how the insert trigger on the latest view may be altered for one specific example to enable idempotency:

ALTER TRIGGER [dbo].[itST_Stage] ON [dbo].[lST_Stage]
INSTEAD OF INSERT
AS
BEGIN
   SET NOCOUNT ON;
   DECLARE @now DATETIME2(7) = SYSDATETIME();
   DECLARE @v INT, @maxV INT;
   -- don't do this for non-generators
   DECLARE @ST TABLE (
      Row int identity(1,1) not null primary key,
      ST_ID int not null
   );
   INSERT INTO [dbo].[ST_Stage](
      ST_Dummy
   )
   OUTPUT
      inserted.ST_ID
   INTO
      @ST
   SELECT
      null
   FROM
      inserted
   WHERE
      inserted.ST_ID is null;

   -- don't forget metadata if used
   DECLARE @inserted TABLE (
	  ST_ID int not null,
	  ST_NAM_Stage_Name varchar(42) null,
	  ST_NAM_ChangedAt date null,
	  ST_LOC_Stage_Location varchar(42) null
   );
   INSERT INTO @inserted
   SELECT
      ISNULL(i.ST_ID, a.ST_ID),
      i.ST_NAM_Stage_Name,
      ISNULL(i.ST_NAM_ChangedAt, @now),
      i.ST_LOC_Stage_Location
   FROM (
	  SELECT
		 *,
		 ROW_NUMBER() OVER (PARTITION BY ST_ID ORDER BY ST_ID) AS Row
	  FROM
	     inserted
   ) i
   LEFT JOIN 
      @ST a
   ON
	  a.Row = i.Row;

   -- non-generator version:
   /*
   DECLARE @inserted TABLE (
	  ST_ID int not null,
	  ST_NAM_Stage_Name varchar(42) null,
	  ST_NAM_ChangedAt date null,
	  ST_LOC_Stage_Location varchar(42) null
   );
   INSERT INTO @inserted
   SELECT
      i.ST_ID,
      i.ST_NAM_Stage_Name,
      ISNULL(i.ST_NAM_ChangedAt, @now),
      i.ST_LOC_Stage_Location
   FROM 
      inserted i
   WHERE
      i.ST_ID is not null;   
   */
   
   -- for every idempotent attribute
   DECLARE @versioned TABLE (
	  ST_ID int not null,
	  ST_NAM_Stage_Name varchar(42) null,
	  ST_NAM_ChangedAt date null,
  	  ST_NAM_Version int not null,
  	  PRIMARY KEY(ST_NAM_Version, ST_ID)
   );
   INSERT INTO @versioned 
   SELECT
      ST_ID, 
	  ST_NAM_Stage_Name, 
	  ST_NAM_ChangedAt,
	  ROW_NUMBER() OVER (PARTITION BY ST_ID ORDER BY ST_NAM_ChangedAt)
   FROM
      @inserted
   WHERE
	  ST_NAM_Stage_Name is not null;
   
   SELECT @maxV = MAX(ST_NAM_Version) FROM @versioned;
   SET @v = 0;
   WHILE(@v < @maxV) 
   BEGIN
       SET @v = @v + 1;
	   INSERT INTO [dbo].[ST_NAM_Stage_Name](
		  ST_ID, 
		  ST_NAM_Stage_Name, 
		  ST_NAM_ChangedAt
	   )
	   SELECT
		  v.ST_ID, 
		  v.ST_NAM_Stage_Name, 
		  v.ST_NAM_ChangedAt
	   FROM 
		  @versioned v
	   LEFT JOIN
	      lST_Stage [ST]
	   ON
	      [ST].ST_ID = v.ST_ID
	   WHERE
	      v.ST_NAM_Version = @v
	   AND (
		  [ST].ST_NAM_Stage_Name is null
	   OR
	      [ST].ST_NAM_Stage_Name <> v.ST_NAM_Stage_Name
	   )
   END
   INSERT INTO [dbo].[ST_LOC_Stage_Location](
      ST_ID, 
      ST_LOC_Stage_Location
   )
   SELECT
      i.ST_ID,
      i.ST_LOC_Stage_Location
   FROM 
      @inserted i
   WHERE
      i.ST_LOC_Stage_Location is not null;
END
Reply | Threaded
Open this post in threaded view
|

Re: How to achieve idempotency, or does it matter?

roenbaeck
Administrator
The idempotency as described in the thread did not work when inserting retroactive changes, so the trigger logic has been rewritten in the test version to cope with the situation.

http://www.anchormodeling.com/modeler/test