Dumb SEO Questions

(Entry was posted by Tony McCreath on this post in the Dumb SEO Questions community on Facebook, Saturday, December 13, 2014).

Are there any resources that explain the sort of URL normalisation Googlebot does?

Are there any resources that explain the sort of URL normalisation Googlebot does.

Some normalisation is obvious, like ignoring case in domain names, but others are more interesting.

Would Googlebot treat the following URL pages as the same address:

/page?
/page

/page#
/page

/page?#
/page

/page?a=1 & b=2
/page?b=2 & a=1

/page?a=
/page

/category//page
/category/page

/category/../page
/page

/category/./page
/category/page

/page%2dname
/page%2Dname

I want to get a good handle on this before I delve into understanding how the GWT parameter tools works. So the main thing I`m interested in is parameter order normalisation.?

This question begins at 00:00:36 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:00:36
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions G+ community.

  • Tony McCreath: Are there any resources that explain the sort of URL normalisation Googlebot does.

    Some normalisation is obvious, like ignoring case in domain names, but others are more interesting.

    Would Googlebot treat the following URL pages as the same address:

    /page?
    /page

    /page#
    /page

    /page?#
    /page

    /page?a=1&b=2
    /page?b=2&a=1

    /page?a=
    /page

    /category//page
    /category/page

    /category/../page
    /page

    /category/./page
    /category/page

    /page%2dname
    /page%2Dname

    I want to get a good handle on this before I delve into understanding how the GWT parameter tools works. So the main thing I'm interested in is parameter order normalisation.
  • Federico Sasso: Not sure about Google +, but - for what it's worth - I certainly do all of them with VSS spider. I don't remember Google resources, I come to these conclusion mainly after dwelling into RFCs. Hope this helps.
  • Tony McCreath: + ; I do some of them with my crawler. I was thinking of making it work as close to Googlebot as possible.

    The parameter tool and other Google activities imply they do play with URLs. In fact the parameter tool may be a clue into the mechanics they use. They are examining parameters so are they treating uniqueness based on parameter values and not just a querystring. And what happens if a website does not play by the rules:

    /page?aaaaaaaaa
    /page?=aaaaaaaaa
    /page?????aaaa????
    /page?======
  • Federico Sasso: Interesting.
    There is not a thing such a real "rule" when it comes to URL parameters: having querystring tokens describing name-value pairs is merely a common convention, but those are all valid URLs too (not sure about multiple question marks, actually). Google has certainly to deal with them too.

View original question in the Dumb SEO Questions community on G+, Saturday, December 13, 2014).

Reference Links