The schema definition for logs appears to be inaccurate

Problem statement

We have noticed that schema definition for tenant logs that is published in your Management API documentation shows that the response body’s scope property is of type string.

However when I receive logs that have values in scope the values are always arrays of strings.

I should mention that I receive these logs are processed through our custom webhook via a Log Stream. I have attached 2 screenshots to show what I’m talking about.

Why is there an apparent difference between the format of these two logs?

Solution

The short answer is that we have different sources/teams that write to tenant logs. We are in the process of streamlining all fields in the tenant logs and later this year (in 2023) we hope to publish an official schema.

What this means is that certain fields could have different property types based on the source that wrote the tenant log.

The long answer is that the reason behind the apparent inconsistency in the “scope” attribute format is that Auth0 tries to optimize log data representation based on the context in which logs are presented.

  1. When logs are accessed via the Management API, the ‘scope’ attribute is defined as a string. This is because API responses are designed to be consumed by developers or applications, and providing ‘scope’ as a single string allows for easier processing and ingestion by the consuming application.

  2. On the other hand, when logs are observed as a log-stream event, the ‘scope’ attribute can be either a single string or an array of strings. This is because Log Streams have a broader audience, including real-time analysis, logs storage, and human-readable exports. Rendering ‘scope’ as an array of strings provides a more flexible data structure, allowing Log Stream consumers to parse and understand the data more easily, depending on their specific use case.

In conclusion, while the representation of the ‘scope’ attribute in logs may differ, this is by design to better cater to the specific contexts and use-cases in which the logs are consumed. We understand that this may cause some confusion, but these design decisions aim to optimize the log data presentation based on the specific channels.

As noted in the ‘short’ answer, work is underway to create a more rational schema.