i am displeased
Remember when I was singing the praises of Google Sitemaps, only to quickly reconsider? Well, I'm moving from "reconsidering" to "being kind of pissed off".
For those who don't know, the idea behind the sitemap is to give Google a specially formatted file that says "here's where my content is, here's when it was updated, and here's how important each piece of it is relative to the rest". It's supposed to make the Googlebot that crawls your site work more efficiently, and give you better results. Personally, I'm sick of having old-style URLs (e.g. 001234.php) showing up for our site.
But so far the sitemap hasn't managed to do anything except banish every included URL from Google's systems entirely. Which is pretty much exactly the opposite of what it's supposed to do. I posted the following message to the Sitemaps Google Group; I'll let you know if I hear anything back.
I hope someone can help me figure out what's going on. Last week I submitted a sitemap for my blog (http://www.zunta.org/sitemap.xml). Everything seems to be working properly according to my Google Sitemaps account dashboard.However, since submitting the sitemap every page that is in it has been excluded from the index, including many that I know used to have relatively good pageranks. I know that there have been some recent hiccups with the site: operator, but this applies to other queries as well. I wrote an SSH tutorial with the word "sshirking" in its title a while ago that got a number of links and attained a high pagerank for the unusual word "sshirking". The proper permalinked URLs used to be among the top hits; now they can't be found anywhere in the index (as proven by entering the full url as a query, e.g. http://www.zunta.org/blog/archives/2005/08/30/sshirking_work_1/index.php).
What's more, the old version of these pages -- before I changed permalink naming styles -- are still in the index. http://www.zunta.org/blog/archives/004498.php was the original URL of the above link (it now redirects to the proper URL). Only this second, less descriptive URL (which is NOT in the sitemap) is still in Google's index. It's only the files included in the sitemap that have been dropped from the index.
I tried deleting and resubmitting the map, and have patiently waited since May 18 for a new crawl to include the results. Nothing so far.Can anyone tell me what's going on? Right now it seems that having a sitemap achieves nothing other than nuking your results from the index entirely.

Comments
It kind of grosses me out to think of some bot crawling all over your site while I'm reading it. I mean, I know it's really just ones and zeroes and hard drives whirring away off in Silicon Valley somewhere. But the mental image I get is closer to this guy lurking at every turn.
Thanks for that.
Post A Comment