In my previous post, I noted how the robots.txt file on my Pattern by Etsy website pattern.usefulcomponents.com had a line of code that disallowed google from crawling *all* the listings.  I don't create or control this file; Pattern/Etsy do.  I suggested that this was bad for business and inexplicable.

So, after a bit of research, I found this:

----------------------------

Question:

"I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed.

I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why.

Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?"

Answer:

"The robots file will avoid google to show further information on the disallowed pages but it doesn't prevent indexation.

They're still indexed (that's why you're seeing them) but with no meta desc nor text taken from the page because google wasn't allowed to retrieve more information.

If you want them to start showing info, you'll jsut need to remove that rule from the robots.txt and soon you'll start seeing those pages information showing, but if you want them out of the index you can use GWT to remove them from the index after you've included in each page the noindex meta tag which is the only command which will prevent indexation."

------------------------------------

So, all that carefully crafted text, meta tags on images etc. in my Pattern listings will be expressly ignored by google because etsy/pattern has deliberately sabotaged the entire thing in the robots.txt file.

Well done etsy, Another Winner!