• Home
  • Cloud
    • General
    • SaaS
    • BPaaS
    • PaaS
    • IaaS
    • Other Internet Hosted Applications
      • WordPress
        • WooThemes Canvas
          • WooThemes Canvas CSS
  • About me
  • Why Badly Wired?
  • Contact

Badly Wired

Alan's technical notebook - WordPress and other stuff

You are here: Home / Tech Tips / Google not able to fetch robots.txt

Google not able to fetch robots.txt

27th January 2015 by Alan Leave a Comment

Fully Managed UK Hosting - Only £1+VAT till 1st Jan 2021 on Shared, Reseller and Dedicated Hosting! .... read more ....

Recently I was getting a message from Web Master Tools, Googlebot can’t access your site!

Over the last 24 hours, Googlebot encountered 87 errors while attempting to access your robots.txt. To ensure that we didn’t crawl any pages listed in that file, we postponed our crawl. Your site’s overall robots.txt error rate is 64.4%

Fully Managed UK Hosting - Only £1+VAT till 1st Jan 2021 on Shared, Reseller and Dedicated Hosting! .... read more ....

 

Recommended action

  • If the site error rate is 100%:Using a web browser, attempt to access http://xxxxx.com/robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot.
  • If your robots.txt is a static page, verify that your web service has proper permissions to access the file.
  • If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure.
  • If the site error rate is less than 100%:Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors.
  • The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website.
  • If your site redirects to another hostname, another possible explanation is that a URL on your site is redirecting to a hostname whose serving of its robots.txt file is exhibiting one or more of these issues.

After you’ve fixed the problem, use Fetch as Google to fetch http://xxxxx.com/robots.txt to verify that Googlebot can properly access your site.

 

But when I manually looked at my robots.txt in my browser, all looked fine!

In WMT when I fetched this, it was coming back inaccessible. This bamboozled me for a little while.  I did some seraching and found others had similar issues, but no one seemed to get to the root of the problem.

So I stated to look at my .htaccess file and recalled that I was redirecting my whole site to HTTPS  ( see this post ) and though perhaps that was the issue.

The next step was to exclude robots.txt from the HTTPS redirect.  And when testing in WMT all was well and Googlebot fetched fine. My problem solved.  All I had to do was tweak my .htaccess file as follows

RewriteEngine On

# condition if https
RewriteCond %{HTTPS} off

# condition to exclude robots.txt from the condition
RewriteCond %{REQUEST_FILENAME} !robots\.txt

# rule to force https
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

 

But then, I realised perhaps this was not required.

What is really required is to just have my https://mysite.com registered in WMT not the http:// version!

Thoughts?

 

[Next] Find out where to host WordPress [Read the full article…]

Filed Under: Tech Tips, Useful Stuff, Wordpress  

About Alan

I'm Alan from Fullworks Digital Ltd, where I develop WordPress Plugins .

My day job consists of developing new code and solutions along with support my WordPress plugin user.

I started as a professional programmer in 1979 and had been involved with the IT of business technology in virtually every area that exist.

Badlywired.com is my technical notebook, my aide memoire of the many interesting facts that I come across and 'how to' recipes of things I do infrequently. As I spend a lot of time gathering parts of solutions from the internet and assembling them into my own solutions, and also just learning how to do things, this blog is one way of giving something back to the online community that has helped me extensively.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Applications
  • Cloud
    • General
    • Google Cloud
    • IaaS
    • Other Internet Hosted Applications
      • Wordpress
        • WooThemes Canvas
        • WooThemes Canvas CSS
    • SaaS
  • Code snippets
  • Discounts
  • Genesis
  • Google Apps for Works
  • Linux
  • News
  • SEO
  • Server setup
  • Services
  • Tech Tips
  • Uncategorised
  • Useful Images
  • Useful Stuff
  • WordPress Hosting
  • WordPress Plugins

Tags

background jobs beadcrumbs bind brandings Cache canvas Centos chrome css fail2ban Find firefox Flash fraud genesis gocardless godaddy Google google maps hackers internet explorer javascript KashFlow Linus linux Magento mapquest maps microsoft mysql news nohup php plugin plugins queens diamond jubilee replace SED SEO skype Varnish Virtualmin Webmin woothemes Wordpress

 

Affiliate and Privacy Notices

This site is free to use, but hopes to cover some costs through affiliate income, some products and links are affiliates and may earn the site advertising income.

Some affiliates use Cookies to track if you purchase from them, this allows them to apportion revenue to us you will need to refer to their specific privacy notices as to how you are tracked.

This site is a participant in the Amazon EU Associates Programme, an affiliate advertising programme designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.co.uk.

  • Privacy Policy

Copyright © 2021 · Badly Wired