9/24/2005

Google Sitemaps Explained: How to Get Your Site Indexed with Google Sitemaps

Three Ways to Index Your Site with Google Sitemaps (Difficult, Hard, and Easy)

Google has recently implemented a program where any webmaster can create a Sitemap of their Site and submit it for indexing by Google. It is a quick and easy way for you to keep your site constantly indexed and updated in Google.

The program is appropriately called Google Sitemaps.

In order for you to best use Sitemaps, you must have an XML generated file on your site that will transmit or send any updates, changes, and data to Google. XML (Extensible Markup Language) is everywhere these days, you have probably seen the orange XML logo on many web sites and it's often associated with Blogging because Blogs use XML/RSS feeds to syndicate their content.

Today RSS is known mostly as 'Really Simple Syndication' but its original acronym stood for 'Rich Site Summary'. XML is only simple code like HTML and it is used to syndicate your content to all interested parties.

And the interested party in this case is Google. By creating Sitemaps, Google is really asking webmasters to take charge of the indexing and updating of their sites. Basically, doing the Googlebot's job!

This is a 'Good' thing! With the steady influx of new web sites growing rapidly, indexing all this material will become a challenge, even with the resources of Google. With Sitemaps, webmasters can take charge and make sure their site is crawled and indexed.

Please note, indexing your site with Sitemaps won't improve your rankings in Google. You will still be competing with the other sites in Google for top positions. But with Sitemaps you can make sure all your pages are crawled and indexed quickly by Google.

There are some other big advantages of using Google's Sitemaps - mainly you have control over a few key variables, attributes or tags. To explain this as simply as possible, your XML powered sitemap file will have this simple code for each page of your site:


http://www.yoursite.com/
1.0
2005-07-03T16:18:09+00:00
daily


Along with 'urlset' tags at the beginning and end of your code, and an XML version indication - that's basically your XML file! File size will depend on the number of webpages you have.

Taking a closer look at this XML file:

location - http://www.yoursite.com - name of your webpage

priority - you set the priority you want Google to place on that page in your site. You can prioritize your pages: 0.0 being the least, 1.0 being the highest, 0.5 is in the middle. This is only relative to your site. It will not affect your rankings. Why is this important? You have certain pages on your site that are more important than others, (home page, high profit page, opt-in page, etc.) by placing high priority on these pages, you will increase their importance in Google.

last modified - when you last modified that page, this timestamp allows crawlers to avoid recrawling pages that haven't changed.

change frequency - you can tell Google how often you change that particular page. Never, weekly, daily, hourly, and so on - if you frequently update your page this could be extremely important.

Why do I need an XML Generator?

In order for this XML sitemap file on your site to be constantly updated, you need a Generator that will spider your site, list all the urls and automatically feed them to Google. Thus constantly updating your site in Google's massive index or database. Keep in mind, Google also gives you the option of submitting a simple text file with all your URLs.

There is already a flood of these generators popping up! Different ways of generating your XML powered sitemap file. More are probably appearing as you read this. But lets look at Three ways to generate your XML file.

Difficult - Google's Python Generator

That's a relative term; if you know your server like the back of your hand and installing scripts doesn't scare the bejesus out of you, you're probably smiling at the word "difficult." Google supplies a link to a generator (Google XML Generator) which you can download and set up on your server. It will cough up your sitemap XML file and automatically feed it to Google.

In order for this Generator to work, Python version 2.2 must be installed on your web server - many servers don't have this. If you know what you're doing, this will probably be a good choice.

You don't need a Google account to use Sitemaps, but it's encouraged because you can track your sitemap's progress and view diagnostic information. If you already have another Google Account, gmail, Google Alerts, etc. just use that one to sign in and follow directions from there.

To submit your Sitemap using an HTTP request, issue your request to the following URL:

www.google.com/webmasters/sitemaps/ping?sitemap=sitemap_url

Hard - A PHP Code Generator

This is a PHP Generator that you can place on your server. This generator will spider your site, and produce a XML sitemap file. Download the phpSitemapNG and upload it your server. Run the generator to get your XML sitemap file and send it to Google.

Again, this is only hard to do if you don't know your way around PHP files or scripts.

Easy - Free Online Generator

These Generators are popping up everywhere, and Google keeps a list of these 'third party suppliers' of generators on their site. Find them at: Google's List of Third Party Generators

One of the easiest to use is www.xm-sitemaps.com, and you can index up to 500 pages with this online Generator very quickly and it will give you the sitemap XML file Google needs to index your site. It will go into your site, spider it and index all your pages into an XML sitemap of your site. You can download this file, Compressed or Non-Compressed, and make minor changes such as setting the priority, changing frequency, etc.

Then upload this file to your site as sitemap.xml to the root directory of your server i.e. where you have your homepage. Then notify Google Sitemaps of your XML file and you're in business.

Of course, the only drawback, if you constantly add pages to your site, you will need to also add these pages to your XML sitemap file. This won't be much of a problem unless you're daily adding pages to your site - then you will need something like the PHP or Python generator to do all this for you automatically.

Google is still the major search engine on the web so getting your pages indexed and updated quickly is the major reason to use Google Sitemaps. If you want your site to remain competitive it's probably the wisest route to take.


About The Author


To learn more about the different services and programs offered by Google,

click here: Google Adsense & Google Adwords.

Copyright (c) 2005 Titus Hoskins of bizwaremagic.com. This article may be freely distributed if this resource box stays attached.

Google Sitemaps and You

Google Sitemaps and You

By Trevor Bauknight


Awhile ago, we looked ( http://www.cafeid.com/art-rss.shtml ) at the recent news that Microsoft had decided to embrace RSS in a big way in its upcoming releases of Internet Explorer and Windows "Longhorn" and determined that this was a Good Thing. This time, we're taking a look at implementing Google Sitemaps, a similar technology developed by Google in order to help you define your site more effectively to the search-engine behemoth.

This is not a ticket to a higher Google ranking (at least not that we know about); but it is a useful tool that lets you apply RSS-like control to your website's interactions with the Googlebot.

RSS (Really Simple Syndication) is the current heavyweight of so-called "disruptive technologies" (loosely defined as those that have the effect, if not developed with the intention, of changing the way we use technology in general) and its use is skyrocketing among content providers looking for a way to get their content in front of more eyes and ears. But RSS originally stood for Rich Site Summary, a standard way of cataloging your site's content for third-party aggregators.

Google Sitemaps have a similar function, in that they are an XML-based way to describe website content in a standard, predictable way; but they differ in that Sitemaps are intended for the Googlebot's eyes only, rather than for any third-party. Think of them as an automated way to make sure Google knows about your site's content (please note, however, that Google does not guarantee inclusion of your content based solely on the presence of a Sitemap file).

This sounds like a very specific undertaking, but the importance of Google to getting your site's content noticed can simply not be overstated. And with Google's expanding reach into more and more areas of Web content presentation, chances are that you can be assured that the information your Sitemap provides will eventually find some use you haven't yet thought about. That's what disruptive technology is all about, and Google has become one of the more innovative champions of such technological advances.

Where To Start:

The first thing you should do as a website developer is create a Google Account for yourself or your company. This will allow you to do other things besides access the Sitemaps infrastructure; but we'll leave that for another day. Create the account ( https://www.google.com/accounts/NewAccount ) and then proceed to the Sitemaps area at ( https://www.google.com/webmasters/sitemaps/login ).
Once you've logged in, you'll see the sparse Sitemaps interface. Don't be fooled, however, because like the simple interface to its search engine, this one hides quite a bit of information regarding the creation and use of Sitemaps, presenting it in digestible bites as you walk through the process.

There's probably more there than you need to know at this point, provided you don't have a huge site with a need for multiple Sitemaps and so on. But if you do have such a site, the information is there for creating truly complex Sitemaps and Sitemap Indices referencing many Sitemaps and you can familiarize yourself with that as needed. For now, we'll concentrate on what's required to establish a Sitemap for our site at Cafe ID (http://www.cafeid.com).

Like creating RSS feeds, creating a Google Sitemap is as simple as putting together an XML file at the root level of your site that describes the site according to the instructions that Google has laid out. You can use any text editor for this purpose, but some editors do a better job of helping you create properly formatted XML files. We heartily recommend two that cost money, BBEdit on Mac OS X ( http://www.barebones.com ) and Macromedia's Homesite on Windows ( http://www.macromedia.com/software/homesite/ ), but there are excellent free alternatives out there and when it comes to text editors, personal preferences take on an almost religious importance, so we won't proselytize about that here.

The Googlebot recognizes several Sitemap formats, ranging from a simple list of URLs to Sitemaps already created using something called the "Open Archive Initiative protocol for metadata harvesting", a format apparently popular with library collections. The OAI protocol is an advanced XML specification that you don't need to worry about if you don't already understand. An intermediate XML format is what we recommend, over the simple URL list, because of the additional information you can associate with each constituent URL of your site.

If you do want to just get started quickly, simply create a text file that looks like this:

http://www.example.com/catalog?item=1
http://www.example.com/catalog?item=11

making sure that the file in question does not include embedded newline characters and uses the UTF-8 text encoding (check your text editor settings). Also, your sitemap may not contain more than 50,000 URLs and all URLs must me fully-formed since they will be used directly during the Googlebot's crawl.

Getting Fancy:

The more advanced format isn't much more difficult to create and lets you specify additional information about each URL. The protocol is described fully here (https://www.google.com/webmasters/sitemaps/docs/en/protocol.html) and is too detailed to explain here. Your finished file will look something like this, except (hopefully) with more URLs specified:

http://www.cafeid.com/
2005-01-01
monthly
0.8

http://www.cafeid.com/art-over.shtml
weekly

Your Sitemap's location dictates what URLs can be included in it. A Sitemap placed at the root level of your site can specify any URLs on that site, while a Sitemap placed at www.yoursite.com/images can not include URLs under www.yoursite.com/banners, for example.

You can take as full or as little advantage of the availability of the various additional XML tags available in this format. Each needs to include at least the specification, but need not include the other three, and all URLs in a Sitemap file must be encapsulated within the tag. We recommend using at least the tag and the flag to let the Googlebot know how often it should check your site for updated content. Be sure to change the date, and maybe even the time, specified in the tag any time you actually update your site.

One more caveat is that your URL specifications must be XML-encoded, similarly to the way they're encoded under RSS. What this means is spelled out in detail here (http://www.w3.org/TR/REC-html40/appendix/notes.html), but essentially, what you're doing is converting a URL like http://www.yoursite.com/view?widget=3&count>2 to look like this: http://www.test.org/view?widget=3&count>2 (Note the substitution for the HTML entities & and > for the "&" and ">" symbols.)

Done. Now What Do I Do With It?:

You're almost home. Upload the Sitemap file you create to your server and then add the URL to the file itself using your Google Sitemaps account. You don't need to use the account, but doing so will allow you to keep track of what you've uploaded.
You're welcome to compress your Sitemap file using gzip, found typically on Mac OS X, Linux and BSD (normal PC zipping won't work, although you can certainly find a third-party gzip program for your Windows box). Click the "Add Your First Sitemap" link on the main Sitemaps page after you've logged into your Google Sitemaps account, and that's all there is to it!

You can use your Sitemaps account to keep track of and receive diagnostic information about your Sitemap submissions. You don't need to create a Sitemaps account, however, and if you already have a Google account for receiving Alerts, for accessing the Web Developer APIs and so on, your existing account will work as a Sitemaps account automatically.

Google has already played a significant role in shifting the paradigm of discovering the Web from doing so by following links to doing so by searching, and the company shows no signs of slowing down. Subscribing may well be the next paradigm, based on the flexibility of the protocols that put content syndication in the hands of mere mortals, and getting your content cataloged in these formats should be among your first priorities. The web browser and operating system is adjusting quickly to this new paradigm, and you should be too.



List of Important Websites

Ghost Writer, Inc. Original Website