When I go to my website I get "Service Unavailable" (this may be all the time or may be intermittently).
The following discussion explains what causes the Service Unavailable message, and how to troubleshoot:
We implement shared hosting on IIS such that each website on our Windows webserver is assigned it's own unique application pool/worker process (actually, these two terms are slightly different, but for the sake of this conversation, we'll say an application pool is a worker process, and I may use them interchangably here). In general, the webserver kernal receives the request, then hands it off to the worker process for the site.
This worker process is a basically a thin process and support infrastructure for your user code to run. The Service Unavailable when the webserver kernel accepts the request, and turns to hand it to the worker process, and it's crashed/unresponsive. You get "Service Unavailable".
The reason you can refresh your browser and the page suddenly loads is because the webserver is smart enough to realize that the worker process has crashed, and it spins up a new one... the next request that comes in through the webserver kernel will be handed to that new worker process.
This is how you can have Service Unavailable one minute, and the page loads fine the next. Also note that the webserver includes some built in "rapid-fire failure" protections... i.e., if your workerprocess crashes X times in Y minutes, the application pool will not launch any more worker processes... and you'll get "Service Unavailable" until there has been some manual intervention (which resets that rapid-fire counter).
So the next question logically is, what causes the crash of the worker process. Basically, two reasons:
a) A flaw in the user code which results in an recoverable event, causing a "crash" or "access violation" of the worker process. You may not see the crash every time you access the site, because the flaw may exist only a certain set of conditions, like you chose certain options on the webpage, or you passed certain variables... or even a hacker is passing certain variables which your code is uanble to handle.
Consider this scenario, 3 people hit your site, one immediately after the other:
- visitor1 might be hitting the right combination that causes the problem and the workerprocess to crash... he probably experiences a timeout...
- visitor2 hits the site while that workerprocess is still crashed, before the webserver spins up a replacement workerprocess, so he gets a "Service Unavailable"
- visitor3 hits the site after the new workerprocess is spun up, and loads the page fine.
- visitor2 hits refresh, the page loads fine.
- no further problems until visitor1 comes back and hits the magic button... whatever it is that causes the actual crash.
This should exemplify why sometimes this can seem "random" or intermittently... i.e., happens one time and not the next.
b) The other possible cause is that the worker process is terminated by a server administrator. As a worker process runs your code handling visits, etc, it will use memory. The average shared webhosting site's worker process will use about 50MB of memory. If you open a database connection and load a record set, more mem will be used. Once the connection is released and the response sent to the visitor, memory is released.
However, due to code which doesn't properly release memory (called a memory leak) or other considerations, a worker process can begin consuming what we consider an excessive amount of memory.... excessive within the context of affordably priced shared hosting services. On a dedicated server, this isn't an issue (unless you run the server memory low enough to cause performance issues), but for shared hosting, this is a concern. Shared hosting assumes resource utilization appropriate for shared hosting levels, and moving above those levels is considered excessive. For 3Essentials, we start to review your memory usage if you exceed 100MB of memory usage by your worker process. For instance, if we find your worker process is using 150MB of RAM for a sustained period of time, we may terminate the worker process. If a user visits at that time, they could get the Service Unavailable message. The webserver will spin you up another worker process, and the site will load normally... if memory utilization on your worker processs continues to be excessive, we'll contact you to notify you that your site is using excessive memory, and that you need to reduce memory utilization or move to a dedicated server.
If we haven't contacted you about excessive memory utilization on your site, then this isn't your issue.
So... if you have some flaw in the actual website code on your site that is causing your application pool/worker process to crash.
If you're running a 3rd party product (like wordpress, DNN or others), you should review any updates to that product which might address a known issue, either in a bug, or an exploit which someone is using to cause the crash on your site. Even if that product or version is considered stable, you might suspect any plug-ins or add-on modules. This is especially true if the problem began at the time you installed any new plug-in or module. Many of these modules are open source and may written by new developers who may not fully test their code.
If you're running you're own code, then it's time to review your own code.
Unfortunately, IIS (or any webserver software for that matter), doesn't make diagnosing a worker process crash easy. They only log a few generic messages in the event log, and those basically say "the worker process crashed". The only real way to identify the cause of the crash is to install a debugger like IISState or DebugDiag which attaches a debugger to the worker process and performs a stack traces and memory dump at the time of the crash. We don't allow this on our production servers due to the performance impact of this. You would need to perform this on your own test environment.
Your next question might be "well, I'm not sure how to generate the crash, since I don't know what the cause is". The first step would be to review your website/HTTP logs for the periods of time where you know the crash occured. These logs capture all visits to your site, and contain a lot of info about what page was accessed, etc. You can find your HTTP logs at (ftproot)/statistics/logs at your ftp root for your domain. Note that the timestamps in the log are UTC time. Look for log entries around the time you know a crash occured... you might find a specific POST or page accessed that could lead you to identify what is leading to the crash.
No luck there? The next step would be to duplicate your site on a development/testing server as close as you can to the production environment (hosted on our servers). Then attempt to similulate the same site visits in an attempt to cause the crash. If you can cause the crash, then you can install the debugger, and then you'll have a crash dump and stack trace to analyze.
There's much temptation to just change something on your site and see if that makes a difference, then change something else... etc... the absolute most effective way is to identify the cause of the crash via a debug/crash dump. Once you can cause the crash, use the debug tools to capture the debug dumps and stack trace, and then either analyze the stack trace log yourself, pay someone to perform an analysis (for example, Microsoft PSS) or post to a newsgroup like microsoft.publit.inetserver.iis to see if anyone will do a free analysis. Only after analysis of the crash can you determine what is truly going wrong.
3Essentials does not provide consulting on code debugging, use of Microsoft debug tools, or crash analysis... but we're glad to provide you this article to help get you better understand the nature of the Service Unavailable message!