Preventing Drupal from Handling 404s for Performance

The .htaccess file included with Drupal tells Apache to send all 404 requests to Drupal to handle. While this is great in some cases, the performance degradation can have a huge impact on a site that has millions of users.

When Drupal processes a 404, it has to bootstrap Drupal, which includes Apache loading up the PHP process, gathering all of the Drupal PHP files, connecting to the database, and running some queries. This is quite expensive when Apache can be told to simply say “Page not found” without having to incur any of that overhead.

Now you might say your site doesn’t have any broken URLs as you haven’t changed any. Well that’s great, but as your site grows, it is going to be a target for spammers and hackers. They are going to start requesting all sorts of file to see if they can find an exploit. Instead of bootstrapping Drupal each time to tell them that DLL file doesn’t exist, it would be much better if Apache could just say that, to save resources for your real users.

So, what can you do? How can you stop Drupal from handling 404s but not break modules like imagecache?

Imagecache is one of the few modules that relies on Drupal’s 404 handling. It is a very smart module that automatically resizes images. Instead of resizing every single image as they are uploaded, it only resizes them when they are requested, which is great. So if we’re going to tell Drupal not to handle 404s, we need to be careful not to break this highly useful module.

To see this in action, visit the ParentsClick Network and test out some 404s. You’ll notice that 404s for files and Drupal paths show the same page. The following is the procedure we used to prevent Drupal from handling 404s.

A note, this functionality should really be in core, and this patch is where the necessary .htaccess code used below comes from, only being slightly modified to prevent Drupal from handling 404s completely. The below code is tested and working on Drupal 5.

Step 1 – Update your .htaccess file


  1. - ErrorDocument 404 /index.php
  2. + ErrorDocument 404 /sites/all/themes/foundation/404.php # path to your 404 file
  3.  
  4. +  RewriteCond %{REQUEST_FILENAME} !-f
  5. +  RewriteCond %{REQUEST_URI} !^/files/ # this makes it work with imagecache
  6. +  RewriteCond %{REQUEST_URI} \.(png|gif|jpe?g|s?html?|css|js|cgi|ico|swf|flv|dll)$
  7. +  RewriteCond %{REQUEST_URI} !^404.%1$
  8. +  RewriteRule ^(.<strong>)$ 404.%1 [L]
  9.  
  10.    RewriteCond %{REQUEST_FILENAME} !-f
  11.    RewriteCond %{REQUEST_FILENAME} !-d
  12.    RewriteRule ^(.</strong>)$ index.php?q=$1 [L,QSA]

What this basically does is it removes Drupal from handling 404s (removing the /index.php part) and tells Apache to use a specific file if it encounters a 404 (like a missing image or CSS file).

Step 2 – Tell Drupal to stop on 404s too


In your template.php inside of the _phptemplate_variables(), add in this code:

  1. // show custom 404 page
  2. $headers = drupal_get_headers();
  3. if (strpos($headers, 'HTTP/1.1 404') !== <span class="caps">FALSE</span>) {
  4.   // make sure this path = ErrorDocument in .htaccess above
  5.   include_once './sites/all/themes/foundation/404.php';
  6.   exit();
  7. }  

This tell Drupal to serve up this 404 page if it can’t find the path. The benefit of this is your designers can work on the same file that handles 404s for both Apache and Drupal. It also stops Drupal from fully executing. In Drupal 6 this could happen much earlier using the preprocess templating functions.

Step 3 – Create a 404 file


Create a 404.php file (or 404.html or whatever you want) and place the file where ever you want. Make sure to update the ErrorDocument in the .htaccess to point to this file along with the Drupal template code.

And voilá!

I was just looking for this

I was just looking for this too, thanks for the post and I’ll definitely help out testing the patch on d.o.!

Great idea. I get tired of

Great idea. I get tired of seeing “randomexploited.dll” show up in my watchdog logs.

You could also maintain the content of your 404 page within Drupal, and periodically write the output (via wget/curl) to a 404.html file in your root directory. This way your 404 file will have an up-to-date version of your menu/navigation, and if desired, the content of the page can be managed by someone else.

Ted, good thought. Also,

Ted, good thought. Also, pbull – I think that’s a cool idea.

I’ve filed an issue against customerror.module here regarding that. In my opinon, customerror is the error module of choice.

thanks..

thanks..

That’s really a great idea

That’s really a great idea to improve the performance on bigger Drupal websites. I found your blog searching for a better method to handle 404 errors and I think I will do it this way. So thanks again!

The following is an excerpt

The following is an excerpt from step 2 above:

Step 2 – Tell Drupal to stop on 404s too “This tells Drupal to serve up this 404 page if it can’t find the path. The benefit of this is your designers can work on the same file that handles 404s for both Apache and Drupal. It also stops Drupal from fully executing. In Drupal 6 this could happen much earlier using the preprocess templating functions.”

does anyone know how to do this for Drupal 6?

@itaine, you actually don’t

@itaine, you actually don’t need to do that for Drupal 6. The above trick still works nicely, I’ve used it on a few Drupal 6 sites now.

I have been looking all over

I have been looking all over for this. I will test it out tomorrow!

I just tried the above

I just tried the above modifications to the .htacess and my site returned a 500 Error

Is there something different for Drupal 6

BJR if you’re getting a 500

BJR if you’re getting a 500 Error you have a .htacess typo. To help pinpoint the problem run this command: apachectl configtest That will show what line is causing the syntax error. More on that here.

I have copied each line from

I have copied each line from this site. Maybe I am misunderstanding what to add or not add.

I am using Drupal 6.10

I am suppose to add the + lines and subtract the – lines correct

(without adding the + or – of course)

This is the exact bug I am

This is the exact bug I am trying fix, it arrived after moving from Joomla. Drupal is not handling the 404 for clean node URLs properly.

http://drupal.org/node/432384

Be sure you properly set

Be sure you properly set admin/settings/error-reporting and also have your 404 handler in .htaccess pointing to index.php as well.

Yeah I checked all that. I

Yeah I checked all that. I tried another .htaccess patch also and it returned the same error.

I am very frustrated with all this.

I noticed however you site does the same thing.

All my old indexed URLs that contain index.php in them are returning 200 for the home page of my site.

I tested a fake article URL for your site with index.php in the URL and it did the same thing, returned me to the homepage instead of page not found.

Ya Drupal has some strange

Ya Drupal has some strange 404 handling with the menu structure, one way around this is to write a quick module, use hook_menu() and look at $_GET[‘q’] and then call drupal_not_found(). I’ve used that on a few sites.

Hey thanks for your help and

Hey thanks for your help and advice. I believe this is a bit over my head at the moment, with tax season and a world of other things.

Basically I need to redirect any URL with index.php coming in to an error page.

I have an issue posted with Drupal as this can be a major issue with anyone moving from another CMS

You’re right that is a

You’re right that is a trickier issue to get around and would require some fancy htaccess for some fancy hook_menu work to check if path exists or not and return a 404 through Drupal.

Hi, Does anyone know how

Hi,

Does anyone know how to make this work on Drupal 6.x, to be exact is the Drupal version 6.13?

I have used this modification on Drupal 5.x flawlessly in the past, but now it doesn’t seem to work on Drupal 6.x.

Thanks for your help

@Quoc: I’m wondering the same

@Quoc: I’m wondering the same thing, and am working to try to find a solution. The issue here is that in Drupal 6, the function _phptemplate_variables has been deprecated. Here’s a read on that: http://drupal.org/node/223430.

I wonder if function _preprocess_page () could be good for this, by injecting some of the same code above (it would also have to be verified to work with Drupal 6, however).

  1. // show custom 404 page
  2. $headers = drupal_get_headers();
  3. if (strpos($headers, 'HTTP/1.1 404') !== <span class="caps">FALSE</span>) {
  4.   // make sure this path = ErrorDocument in .htaccess above
  5.   include_once './sites/all/themes/foundation/404.php';
  6.   exit();
  7. }  

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You can use Textile markup to format text.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <img> <pre>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]". PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

More information about formatting options