Why ULID ids are required for deduplicating events?

  • 23 March 2023
  • 5 replies

Hi. I want to prevent the same event to be triggered more than once to prevent duplicated customer messages etc. According to the documentation we have:

  1. “If two events contain the same id, we won’t process the event multiple times”
  2. “The id must be a ULID

Item 1 is great and is what one would expect to prevent deduplication. I’m just wondering why item 2 is required. One could transform an existing id into a valid ULID, but that’s just additional complexity for apparently no benefit. Can someone clarify ?

Also, what would happen if we generate a ULID using a fixed constant timestamp (or the timestamp for the start of the day) and injecting an existing ID in the randomness part ? That would be a valid ULID. Would that work ?

Thanks in advance.


Best answer by katie_judd 24 March 2023, 19:42

View original

5 replies

Userlevel 2

Hi there! Thanks for posting your question. 😄

Currently we require ULID formatting because ULIDs can be sorted by creation order due to the timestamp that is encoded when it is created. Similar to UUIDs, ULIDs are unique and the possibility of having the same ULID is essentially zero because of this. This helps us ensure that events are only be deduplicated when they’ve occurred at the same time/instance. 

Userlevel 2

To answer the second part of your question - as long as you’re still making sure each ULID is unique you shouldn’t have any issues creating a fixed timestamp per day. 

Thanks @katie_judd. Does customerio use the ULID-embedded timestamps for some purpose?

Userlevel 2

You’re welcome @ppedruzzi ! We won’t need to utilize those timestamps within the ULID. We require the formatting to add standardization across our deduplication logic and help our users avoid possibly accidentally recreating IDs. 

So considering that the embedded timestamp is not used in any way, it follows that any unique event id compatible with the ULID format should work for event deduplication. Technically, ULID format roughly means 26 alphanumeric string except letters I, L, O, U. Starting with ‘0’ is enough to avoid the 48bit timestamp overflow.

For instance if one has a unique numeric id of limited length, they can simply wrap it into a ULID compatible string. For example: numeric id 654321 can become ULID id 00000000000000000000654321. Of course the timestamp in this case would be zero (Jan 1st 1970), but it doesn’t matter.

Can you confirm this understanding? If this is correct, it means that although you recommend ULID for event ids, you only enforce ULID format. And it’s actually easier for clients to know that so they can use their existing non-ULID ids.