Dumb SEO Questions

(Entry was posted by Saurabh Rawat on this post in the Dumb SEO Questions community on Facebook, 08/15/2015).

Blocking URLs in robots.txt

URL example:
/india-exports-trade-data.aspx%3Cbr%20/%3E

How to block these types of urls (% based) in robots.txt?
This question begins at 00:46:01 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:46:01
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions Facebook & G+ community.

  • Saurabh Rawat: URL example:
    /india-exports-trade-data.aspx%3Cbr%20/%3E

    How to block these types of urls (% based) in robots.txt
  • Federico Sasso: To block URLs having % symbol in any part of the path:
    Disallow: *%

    Beware of "friendly fire", are you sure you aren't blocking too much?

    Hint: test on your own using Google Search Console (yes, you can copy and paste there your candidate robots.txt code without having to publish it first)
  • Saurabh Rawat: Disallow: /{star}%{star}
    What about this code, is it right?

    Note: {star} = *
  • Federico Sasso: while correct, it's redundant: the initial / is already implicitly included by the first *, the last * is simply ignored.
  • Saurabh Rawat: Thank you so much +Federico Sasso ;
  • Tony “Tiggerito” McCreath: What is your main objective here? Blocking them will not stop them being indexed, or stop people visiting them.
  • Saurabh Rawat: +Tony McCreath I want to stop duplicate issue. My webmaster is showing duplicate meta tag error in HTML Improvement section. Most of the duplicate meta tags are being made by % based url in my website.
  • Tony “Tiggerito” McCreath: Your best move is to stop those URLs from returning a 200 "OK" code. They should return a 404 "Missing Page" code which will get them excluded from the report.

    Or you could have them 301 "Permanent Redirect" to the correct URLs.

    Don't block them in robots as that will just stop bots from finding out that they are invalid URLs.

    Try and find out what is linking to those invalid URLs and see if you can get them fixed.
  • Saurabh Rawat: I can't redirect 50,000 pages... I think robots.txt can stop duplicate issue? am I right? +Tony McCreath ;
  • Tony “Tiggerito” McCreath: If there is a pattern to the invalid URLs then you may be able to redirect them in one or two commands. If not, you want them to 404.

    robots.txt is not a good solution. It's more of a way to hide the problem. That your website returns the same content over many different URL variations.
  • Saurabh Rawat: Thank you so much +Tony McCreath ;
  • Dave Elliott: The other thing you should be sorting out is your urls! it looks like your url has a random, line return in there!(have a look at http://www.w3schools.com/tags/ref_urlencode.asp )  ;e.g. your page instead of being called india-exports-trade-data.aspx is actually called india-exports-trade-data.aspx<br />.

    This is obviously not needed! What CMS are you using? This is weird behavior. But, it looks like there is something wrong within your web.config file, have you edited it at all? Do you have a module that is installed that rewrites your urls?
  • Saurabh Rawat: Great suggestion +Dave Elliott I consult with my developer about that. Thank you. Cheers!!

View original question in the Dumb SEO Questions community on Facebook, 08/15/2015).