Critical Instability and Event Omission in Event Streams (Early Access)

Feature: Event streams (Early access)

Description: The Event Stream feature is exhibiting significant and critical performance instability and unreliability when delivering events to the configured AWS EventBridge destination. This directly impacts downstream systems relying on these events for real-time processing and synchronization.

1. Inconsistent Latency

Event delivery latency is highly unstable, varying wildly between near-instantaneous and excessively delayed.

  • Observed Behavior: Delivery time from Auth0 to EventBridge ranges from less than one second to multiple minutes for events of the same type and context.

  • Impact: This unpredictability makes it impossible to rely on the stream for time-sensitive, near-real-time operations.

2. Complete Event Omissions (Missing Events)

Events are sometimes completely omitted or lost in the stream, even when events of the same type for the same resource are delivered successfully both before and after the omission.

  • Specific Example: For a single user entity, the system correctly received multiple subsequent user.updated events, yet the initial, crucial user.created event was never delivered.

  • Crucial Detail: After more than 24 hours since the expected creation time, the Event Stream UI does not show any corresponding failed delivery attempts for the missing event, suggesting the failure/omission occurred prior to the point of being marked as a delivery failure.

  • Impact: Data integrity is compromised, as essential initial state information is never received, leading to synchronization errors and incomplete records in connected systems.

Use-case: Integration with AWS EventBridge

That was the behaviour we faced multiple times.

It would be great if that product could be improved.

thanks

3 Likes

Hi @paul.sivtsov – Thank you for the feedback, and apologies for the bad experience. A recent bug fix went out for Event Omissions. Please check for v202540.519.0.

Regarding Event Delays, can you tell me what Public Cloud space the tenant where you observed the behavior is operating in?

1 Like

Oh yes please. This is such an important feature but we need better performance to ensure it’s usable.

2 Likes

ACK. Understood. We’re immediately making changes to our Latency Monitoring to distinguish between latency driven by customers who potentially have poorly performing Webhook handlers, and just overall problems with our own delivery platform.

hey Brian, thanks for your contribution here.

It was primarily US-4 and US-5