Picture the situation. Mail flow has been working just fine for days/weeks/months, when suddenly it stops, for no apparent reason. Obviously your first instinct would be to log onto the Exchange server itself and start checking the logs, but what if you cannot do that. Maybe it’s a server that is managed by another team. Maybe it is an Exchange server handled by a partner. For whatever reason, if you cannot log onto the Exchange server to troubleshoot it, that doesn’t mean there’s nothing you can do.
Exchange systems have a number of SMTP response codes that can indicate problems with the system. These can be included in a response code to another SMTP server when it tries to send email over an SMTP connection, or you can glean it using TELNET and connecting to the server to send email at the command line. One of the most commonly encountered errors that can stop all mailflow is the 4.3.1.
A 4.3.1 response code will be accompanied by text “Insufficient System Resources.” When Exchange is unable to process messages because of resource constraints, it refuses to accept inbound mail and generates the NDR with this error code so that other admins are aware of the issue.
System resources is a broad term that can refer to either disk or memory issues. In the case of disk issue, the most common cause is that the volume containing the Exchange database is out of space, but this could also be the log files disk. This might also indicate that the database has reached its configured limit, but since the default in Exchange 2010 is 1 TB, that’s less likely to be the case. In either case, you (or that server’s admin) need to start looking into disk based limits.
You might also be running into a memory issue. Exchange might be simply out of memory, or there may be so many handles opened that no more can be allocated. In either case, you need to get to the root of the memory issue. A short term fix might be to simply reboot the server, or restart the Information Store, but that may be only a very short term fix. Identifying what is opening so many handles, or increasing RAM to meet the demands of the user base, are both appropriate actions to take once you have identified the actual problem. If you cannot log onto that server, you won’t get very much further with this.
Of course, the likelihood of you troubleshooting an issue without access to the servers is pretty slim, but your first indication of a problem may very well be that 4.3.1. Knowing what that means gives you a jump start on fixing the issue.