Well, this was a first for us and it happened on one of our own sites. Needless to say, we thought it would be good of us to post the solution allowing any of you who have a similar problem to solve it as soon as possible.
Google Webmaster Tools sent us a message saying that Google cannot crawl thecornwallseoco.co.uk. According to Google we had to review the following;
If the site error rate is 100%:
Using a web browser, attempt to access http://www.thecornwallseoco.co.uk/robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot.
If your robots.txt is a static page, verify that your web service has proper permissions to access the file.
If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure.
We took the following steps to address this;
1) The first check was to make sure Robots.txt was formatted correctly and it was. Most Robots.txt files only contain the following two lines of text (unless you want to prevent spiders from accessing certain parts of your site);
2) There are a number of sites around the net which have their own tools to simulate how Googlebot views your site http://www.evolvedwebsites.com.au/googlebot/ etc. We prefer however to use the feature built into Google Webmaster Tools http://support.google.com/webmasters/bin/answer.py?hl=en&answer=158587. Using this tool we could see that Google was unable to access the Robots.txt. A useful tip is to then try to access other pages of your site such as the root and about us etc. to see if you have the same problem. If you do then it is more than likely not an issue with Robots.txt.
3) The next stage was to get our hosting company to check the load balancer to make sure the file or any other aspect of the site was not being blocked. The load balancer should return something in the region of;
thecornwallseoco.co.uk 22.214.171.124 – – [09/Apr/2013:05:05:44 -0500] “GET /robots.txt HTTP/1.1″ 301 743 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
It was therefore clear to see that the file was not being blocked.
4) The next step for the host is to check if the site is visible around the world using host tracker http://host-tracker.com.
5) Our hosting company then requested the useragent and the origin IP so they could diagnose if there may be a connection issue. Our host identified the issue and it was indeed a connection issue.
6) Confirm all is as it should be by using The Google Webmaster Tool mentioned in point 2 and you will hopefully get a success on the files you fetch.
Hope this helps any of you who have a similar problem.