I learned a valuable lesson last week.
First, some context. One of DrupalEasy's clients is a large international industry association that is currently running a large Drupal 9 site with about a dozen custom modules and well over 150 contributed modules. We were brought onto the project a couple of years ago to do some custom module development, but until recently weren't all that involved in the overall site maintenance and development.
As part of a recent task we've been working on, I stumbled on an issue that was occurring only on my local, and not in a way that was super-consistent (or so it seemed.)
The issue manifested itself only on some publicly-facing pages (never on an admin UI page) as a 502 Bad Gateway error. Never as a PHP error, and never displaying any additional information to help me debug.
Initial debugging effort
Another developer and I were both using the same version of DDEV on our local environments, with similar points of the code base checked-out (and similar databases.) But, I was the only one seeing the 502 Bad Gateway issue.
I first spent more hours than I care to admit trying to chase down the cause using Drupal-y debugging techniques, including a few hours in Xdebug. One potential clue that looked promising was that if I changed the front-end theme to Olivero, the issue went away. This led me down a rabbit-hole of thinking it had to do with a theme preprocess function or an issue with one of the blocks configured to appear in the main theme. Neither of these theories panned out after several hours of effort.
*Give yourself a gold medal if you've already figured this one out at this point.🥇
Change of focus
I then turned my attention toward trying to figure out why I was having the issue but my co-developer on the project wasn't, despite the fact that we appeared to have identical setups. We compared DDEV config settings, Git check-out points, and databases. Toward the end of one of our Zoom debugging sessions, we realized that he wasn't using the default settings.local.php file while I was - finally, another clue! We didn't have time to investigate this difference immediately, but it did turn out to be the clue that I needed to solve the riddle.
Based on a tip/reminder from someone else I had mentioned the issue to, I decided to check the DDEV logs via ddev logs, where I saw more than a few instances of this:
upstream sent too big header while reading response header from upstream
That was all I needed to see.
*Give yourself a silver medal if you've figured it out now. 🥈
The cause
As part of our Professional Module Development workshop, we spend a couple of weeks talking about Drupal caching. One of the things we cover is the ability to output Drupal cacheability metadata to the response header of each page. This is useful to see all the cache tags and contexts that are used on any given page. This debugging data is output to the response header of the page when the following service parameter is set:
http.response.debug_cacheability_headers: true
*Give yourself a bronze medal if you've figured it out now. 🥉
The solution
If the page is complex enough, then it is possible for the number of cache tags and contexts to be so large that it maxes out the allowed size of a response header. Could a large site with more than 150 enabled contributed modules fall into this category? Absolutely.
Still, why was my local site hitting this issue and my co-developer's site not? Because I always use a settings.local.php for all of my local sites. It is part of my standard workflow. Since the default Drupal core-provided example.settings.local.php file includes the development.services.yml file, and the http.response.debug_cacheability_headers parameter is set to true by default in development.services.yml, the mystery was solved.
A quick test (flipping it to false on my local) confirmed it.
Lessons learned
- When a 502 Bad Gateway happens, check the web server logs before the Drupal/PHP logs.
When only one developer can produce an error, focus on local development environment configuration before Drupal debugging.
The pixel art image used in this blog post was generated by the DALL-E project of OpenAI.
Comments
Thank you very much for…
Thank you very much for sharing this information! I had the same problem after updating to D10.1 and this fixed it. It's even not my largest Drupal site on D10. Strange… Maybe a bad module?
Wow. Great article and I…
Wow. Great article and I think the first ever in my 20+ years of software development did I find an article that solved my issue on the first try in no time flat. Thnaks for saving me a ton of hours and frustration!
Thank you. I am running…
Thank you. I am running Valet with a Docker mysql and was getting a 502 on certain page loads of my Drupal instance. I had given up trying to troubleshoot the issue until I read this article.
Great stuff.
Add new comment