Refresh Token Unknown or Invalid before expiry "403: Unknown or invalid refresh token"

Problem statement

Experiencing a high amount of invalid refresh token errors in production: “APIException: Request failed with status code 403: Unknown or invalid refresh token.”

Symptoms

Status code:
403

Response body:

{
    "error": "invalid_grant",
    "error_description": "Unknown or invalid refresh token."
}

Troubleshooting

Refresh Token Expiration
Check token lifetime settings. Long absolute expiry may cause tokens issued before tenant log retention to expire and make it difficult to determine when the token was first issued.

Refresh Token Rotation
Check the Rotation toggle and the Reuse interval setting.
If the reuse Interval is present (= called “leeway” in some documents,) it will result in multiple family tokens every time a request with the original refresh token is made.

Tenant Logs:
Query the logs with the type: “ferrt” to confirm if there is a reused token.

Also, check the Context Data tab in the log to see its “family.”
familyId:
The group where the Refresh Token belongs
tokenCounter:
The index of the Refresh Token. When rotating the token, child tokens will be produced with each refresh. TokenCounter N means that it is the Nth Refresh Token during the token rotation.
latestCounter:
This is shown when an older Refresh Token is used. The total number of the Refresh Tokens issued so far. latestCounter X and tokenCounter Y indicate that the newest Xth Refresh Token should have been used but reused the older Yth Refresh Token instead, causing an error.

When refreshing the token (i.e. Refresh Token A), the API returns a new refresh token B. Refresh tokens A and B are considered to be a “family.” See more information here: Automatic Reuse Detection . If refresh token A is used, the API invalidates its family refresh token B.

image.png
image.png

Cause

(1) Token Expiration
The absolute lifetime of the token had elapsed.
One impacted user was seen to have only logged in once and had been using refresh tokens since. When the error was observed, it was 90 days after their initial login; therefore, the token had expired from its absolute lifetime.
Because this was 90 days and thus greater than tenant log retention, it was difficult to troubleshoot without additional logging.

(2) Custom Logics
A customer has a custom logic that handles refresh tokens outside of our SDK.
The SDK handled the login flow and saved the first refresh token. This refresh token was reused and rotated somewhere else without letting the SDK know that. As a result, the SDK reused the first refresh token, and all the following family tokens were invalidated.

(3) 200 Refresh Tokens Limit
One other area to check is if the logs are showing that refresh tokens are being removed by the resource cleanup that takes place if there are more than 200 RTs issued per-user per-application. These tenant logs appear with “type”: “resource_cleanup” and may also be a reason why a refresh token was invalidated unexpectedly.

(4) App/Device/Network Conditions
The issue may stem from a few use cases.

  1. A network failure to receive a response from Auth0 for the new refresh token.
  2. A storage failure on the device that causes the refresh token to be not stored correctly.
  3. The app crashes or is closed by the user in the middle of the flow.

Solution

(1) Token Expiration

  1. Let the user reauthenticate and get a new refresh token when the token is expired.
  2. Use refresh token rotation

Also, adding additional logging or shortening the token lifetime may make the troubleshooting easier.

(2) Custom Logics
You need to fix the defect in your custom logic.
Don’t handle the refresh token outside of the SDK unless you have a good business reason.

(3) App/Device/Network Conditions

  1. A network failure to receive a response from Auth0 for the new refresh token.
  2. A storage failure on the device causes the refresh token to be not stored correctly.
  3. The app crashes or is closed by the user in the middle of the flow.

For case #1, the application may check the network health before starting the refresh token call. This may help to avoid any network-related issues.
Increasing the reuse interval may also help in some cases like, if the application detects that it couldn’t get the refresh token, it can attempt using the existing token within the allowed reuse interval.

For case #2, it may be possible to access the refresh token from the app. E.g. see this link for Swift SDK

Please note we do not officially support or recommend accessing the refresh token with all of our SDKs. However, if it is accessible, this may help to build a solution to check if the refresh token is stored correctly.
Example flow:

  • While the app performs a refresh token call, it can access the new refresh token and keep it in memory through the shared storage interface.
  • Then, it can perform a read operation from the storage for the new refresh token and compare it with the one stored in memory.
  • If different, it can repeat step 3 until the token is successfully stored.

For use case 3 there isn’t much we can do as of today. However, this should be a rare use case.