Recent migration from virtual machine to lambdas has broken auth in production

Hi all,

I’m not entirely sure where the problem may lie, so I’m dumping everything I can think of here.

Context

I am using some Express middleware by Auth0 (2.16.0) to authenticate any users attempting to access my Node based webapp.

This is my configuration:

auth({
    session: {
      store: new RedisStore({ client: redisClient }),
    },
    // I manually specify which endpoints need auth when defining them
    authRequired: false,
    issuerBaseURL: process.env.ISSUER_BASE_URL,
    baseURL: process.env.BASE_URL,
    clientID: process.env.CLIENT_ID,
    clientSecret: process.env.CLIENT_SECRET,
    secret: process.env.SESSION_SECRET,
    idpLogout: true,
    errorOnRequiredAuth: true,
    routes: {
      postLogoutRedirect: '/logged-out',
    },
    authorizationParams: {
      response_type: 'code',
      audience: process.env.EXTERNAL_API_AUDIENCE,
      scope: process.env.EXTERNAL_API_SCOPE,
    },
  })

It’s been working great for a long time.

What changed

Recently I decided to move this webapp from being hosted on a virtual machine (AWS EC2) to being hosted on serverless lambdas (AWS Lambda). This was a seemingly trivial change using
the serverless-http library.

The problem

Locally this works fine. I’ve been using Serverless Framework to simulate lambdas on my machine, and the auth flow works correctly:

  1. I hit /login
  2. Middleware redirects to my issuer url
  3. Auth0 redirects back to /callback
  4. The middleware saves info to session, redirects back to /

In production, I also go through the auth flow all the way to step 4, and I even see a Success Login log entry in the Auth0 logs. However, I am not authenticated despite a session being persisted (in redis) with an access token.

Observations

  • Locally the /callback endpoint sets an appSession cookie when it gets a code back. This doesn’t seem to be happening in production. Another cookie is present on the final requests instead: auth_verification
  • Despite not having attemptSilentLogin set to true, in production, I can see the skipSilentLogin cookie being set indicating it is being attempted. Locally, I have to login every time I logout, but in production I don’t. This seems to indicate my config is not being respected in production, and yet, response_type is respected.
  • Locally the sessions stored only have two keys present: id_token and state. In prod there are more keys: id_token, scope, access_token, refresh_token, expires_at and token_type.
  • When I removed the response_type so the default is used (id_token), again this works fine locally, but in production I get a 400 (Bad Request) on my /callback route. I can see an id_token and state as part of the request, indicating the call from my tenant is correct. However, I get the following error:
BadRequestError: checks.state argument is missing
    at ResponseContext.callback (/var/task/node_modules/express-openid-connect/lib/context.js:354:15)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Thoughts

  • My first guess is that despite specifying redis for session storage, the requests still rely on maintaining state in memory in some way (as indicated by the checks.state argument is missing error above). In production, each request is potentially hitting an entirely different running process. However, this doesn’t explain why I see the information persist to my redis store.
  • My second guess is that I have messed up configuration of my production tenant in some way, which would be surprising since it worked well up till now
  • My last guess is that /callback is failing to parse the request due to how my lambda is setup in production (as indicated by the 400 when using id_token as the response_type).

Question

Does anyone have any advice on how I can troubleshoot this issue, or any ideas on what may be going wrong?

Based on the exception from /callback it seems like maybe your app/middleware are not receiving state. Is there a way to track params through each phase of your cloud infrastructure? Is there any API GW in front of it or any LB?

1 Like

Hi @phi1ipp. I have an ALB infront of the lambda. I am not entirely sure how to track the request bodies during the auth flow (since the middleware doesn’t allow you to inspect like that). But I just deployed an unauthenticated test route which received a post body and query params, indicating the middleware should be seeing any data passed from the tenant.

1 Like

I quickly checked the sources of the library and see that they use WeakMap to store values of the configuration and other objects in it. So maybe you are right about the fact that Lambda maybe the reason for this thing to happen to you. B/c after middleware kicks your unauthN user to Auth0 it should store state somewhere to validate a state redirected from Auth0 to /callback. And usually it is persisted in a session.

I’m not by any means the final authority here. Let’s see if some Auth0 folks come to answer your question.

1 Like

After a weekend of debugging that I’ll never got back, I figured out what it was.

It wasn’t an auth0 setup problem. It turns out the middleware library I’m using sets multiple cookies in a single response when hitting /callback. This is not something AWS handles well. I ended up having to make some infrastructure tweaks, but not the auth flow is working as expected.

1 Like

Thanks a lot for sharing it with the rest of community and glad you have figured it out!

@rb03 I’m also facing this exact same issue. You mentioned in your reply that you made some infrastructure tweaks to resolve the issue. Please tell me what infrastructure tweaks you made to resolve the issue, as we’re exploring multiple solutions as well for this issue.

1 Like

Hi @atshubh. Unfortunately this is contingent on what you have in front of your lambdas. The actual tweaks required would be different if you have an API gateway vs an ALB for example.

Whatever you are using, I would look into how to enable multi-value headers.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.