Bug: mgt API users endpoint fails in many ways.

I’ve got a pending ticket with support that I’d like to share here in case others struggle with the same or related issues.

The query gets users with lucene:

curl -H “Authorization: Bearer $token”
https://my-tenant.auth0.com/api/v2/users?fields=user_metadata&include_fields=true&q=_exists_:user_metadata.myCheckObj.myCheckProp&search_engine=v2

(The token was an API Explorer Client token in this case.)

The query has 3 issues, listed by prio:

  1. Query returns outdated user_metadata object.

After adding new properties to the user_metadata object, they are visible when browsing the user_metadata editor in the dashboard/user details, but the query returns the old object without the added props.

  1. Query fails with a 503 error (quite) some times.

  2. Query fails with a 500 error in a new tenant before creating a first user.

When you call the query and it doesn’t work, this could be the issue. The response is pasted below. Immediately after creating a user (even if deleting it again), the query starts working (returning ] in case of no users).

{“statusCode”:500,“error”:“Internal
Server Error”,“message”:“An internal
server error
occurred”,“errorCode”:“IndexMissingException[my-tenant]
missing]”}

We can work around 2 and 3, but 1 is pretty annoying.

Edit:

Thanks for updating, that’s in line with the support answers and my experience.

on #3, this is particularly relevant for multi-stage deployments.

Multi-stage deployments are done in order to ensure that an auth0 production tenant is flawlessly set up from a deterministic process like a setup script. With that, instead of manually playing around with options until “things work” in production, you play with a script in staging, test it in test, and deploy it one time in prod.

This is of particular importance, because required, in environments with regulated IT (e.g. Good Clinical Practice).

Now there are some other issues with using multi-stage deployments in auth0 that we discovered:

  • When you create an auth0 tenant, it may come with different functionality depending on when you open it (e.g. introduction of new client grants happens in newly created tenants only for now)
  • Which is fine, but it appears like you can’t make an existing tenant mimic that behaviour
  • Also, it seems you can’t “reset” it to remove all your customization to the auth0 default
  • and if you try to delete/re-create it: once deleted, a tenant name cannot be simply re-claimed but remains blocked.

So, at the bottomline there’s an issue that your deploy script may work in staging but not in production, and you have no way of “cleaning out” your tenant to ensure just that.
It looks like some of the tenant mechanism is currently under development, so hopefully we’ll have an API controlled “soft reset” that allows us to clean our tenant and set it up with the most recent auth0 default configs.

Thanks for sharing your discoveries and workarounds for the benefit of others. In relation to the individual situations, there’s a few things worth noting.

For #1 and #2 this was likely a temporary situation that is related to the open incident associated with user search functionality. At this time most of the tenants that did not exhibit an excessive amount of metadata fields have been migrated to a new search cluster so for those tenants the situation should be more stable now.

For #3 it does seem to be an issue unrelated to the incident and given it’s a bit of an edge case your proposed workaround of ensuring that at least one user is created is possibly the best course of action.