Intermittent Management API Timeouts for Apps Hosted in AWS

lihua.zhang · January 28, 2023, 12:35am

Last Updated: Nov 11, 2024

Overview

Random timeouts were experienced when calling the Auth0 Management API from an application or API hosted in AWS. The typical manifestation is:

My Java/Python/Golang application is receiving intermittent connection-reset errors while talking to the Auth0 Management API

Example errors:

> 
HTTPSConnectionPool(host='domain.auth0.com', port=443): Read timed out. (read timeout=5.0)
> 
I/O error on POST request for "https://domain.auth0.com/api/v2/endpoint": domain.auth0.com:443 failed to respond; nested exception is org.apache.http.NoHttpResponseException: domain.auth0.com:443 failed to respond
>
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='domain.auth0.com', port=443): Read timed out. (read timeout=5.0)
>
read tcp 10.3.44.61:53066->104.16.82.103:443: read: connection reset by peer
>

Applies To

Management API
AWS
Apps Hosted in AWS

Cause

The root cause has been identified as unexpected timeout behavior from the AWS NAT gateway. Specifically, the NAT gateway will timeout any idle connections after 350 seconds; if a client later attempts to reuse that connection, the NAT gateway will respond with a TCP RST, confusing the client application. The AWS documentation explicitly states:

“When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).”

Before migrating to new network edge, the prior architecture, the corresponding AWS NLB implemented a 350-second timeout, but with friendlier behavior that most applications could handle gracefully.

The customers’ new edge implements a similar timeout after 400 seconds. This longer timeout period will not issue any keep alive traffic to preserve the connection until after the timeout in the NAT gateway has already expired. Thus, if the client application has not sent any traffic or keep alive segments to an Auth0 endpoint within this 350-second window, the NAT gateway will silently sever that connection, which leads to the timeout errors that are being seen.

Solution

There are two possible solutions to prevent these timeout errors:

At the application level, shortening TCP keepalive timeout < 350 seconds via socket options.
At the OS level, shortening TCP keepalive timeout < 350 seconds. For typical Linux hosts, including EC2 instances, this can typically be done by setting net.ipv4.tcp_keepalive_time to < 350 seconds in their host sysctl configuration.

Topic		Replies	Views
Upstream Requests to Auth0 Timeout when Using Reverse Proxy Knowledge Articles timeout , reverse-proxy	1	794	September 21, 2023
Frequent "Connection timed out" errors when calling Management API Get Help	4	6910	September 30, 2021
Time by time continually system gets timeouts while sending requests to Auth0 JP-1 server from .NET C# 8 application (recreated) Get Help community-topic , other	4	556	June 24, 2024
Random timeouts when using ManagementAPI Get Help management-api , anomaly	3	1733	January 26, 2023
Auth0 server error Get Help management-api , actions-management-api	7	84	June 23, 2025

Intermittent Management API Timeouts for Apps Hosted in AWS

Overview

Applies To

Cause

Solution

Related topics