For several hours now, whenever we attempt to login or signup to our application in our tenant ‘wyz-example@us’, authentication keeps failing.
We looked in our tenant logs and saw numerous instances of the message: "rejecting request of a tenant under quarantine "
Why is our tenant being ‘quarantined’ and what can we do to fix it?
According to the exact nature of the problem, Logins and / or Signups to one or more applications will fail.
If you look in the tenant logs, you will find instances of the error description “rejecting request of a tenant under quarantine”
If you would like to know a more detailed explanation, Webtask containers are used to run your tenant’s extensibility code, such as:
- Custom database scripts
- Some Auth0 extensions
In conditions where a tenant experiences a short but steep traffic spike (e.g. induced by buggy customer code or non-availability of an enpoint ), the assigned “containers” within webtask will suddenly get overwhelmed and return errors. These errors are the kind that lead to a 429 quarantine error.
Of course, webtask is designed to be able to handle periodic spikes. However, webtask does not give an unbounded amount of compute to each tenant. So the most likely reason that you are seeing these errors is:
The tenant was already at the maximum number of containers that is issued per tenants
The traffic spiked too quickly
As a frst step, try and track down when this problem first occurred. This will help you to identify the first instance of when this malfunction happened within your tenant environment.
Start by searching the tenant logs, using this string:
description:“rejecting request of a tenant under quarantine”
This will help you to identify the earliest occurrence of when the problem first emerged. One possible limitation here is that if you have only a basic paid subscription, you are limited to two days worth of tenant logs. The tenant logs will also tell you the name of the affected application(s).
Now you know the name of the application that is causing the problem and the date and time the problem was first observed, think back to any changes that you made at that time. You could revert to the previous version of the application code and see if that solves the problem.
Another possible explanation related to Actions (Auth0 Actions). To check this out, in your dashboard choose Actions from the left side menu and navigate to the Actions that you have configured. It’s possible that you have deployed a ‘buggy’ Action or updated an existing one, or else if calls an API endpoint that does not respond in a timely fashion. The result that the Action will behave in a ‘greedy’ fashion by consuming an excessive number of webtask resources.
Do any deployments or changes coincide with the time and date that the problem was first observed in the tenant logs? If yes, try disabling that Action and see if that solves the problem. If it is not clear what is causing the problem, you may need to add some additional ‘console.log’ statements to print the values of key parameters at critical points in your code.
- If you have a custom database, some possible triggers might include:
- You have recently updated one or more scripts and introduced ‘buggy’ code
- One or more of your scripts needs to access an external API but that service is currently unreachable
- If the error can be traced to an extension, make sure that you are running the latest version. Try uninstalling and re-installing the extension.
A final possibility might be that this is a genuine service outage, in which case it is likely that multiple customers would be affected.
If you have worked through steps 1 and 4 above but have failed to identify the cause of the problem, please create a support request via the Support Center or contact us here in the forum through a private message to our @support group.