Dumb SEO Questions

(Entry was posted by Saurabh Rawat on this post in the Dumb SEO Questions community on Facebook, Saturday, August 15, 2015).

Blocking URLs in robots.txt

URL example:
/india-exports-trade-data.aspx%3Cbr%20/%3E

How to block these types of urls (% based) in robots.txt?

This question begins at 00:46:01 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:46:01
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions G+ community.

  • Saurabh Rawat: URL example:
    /india-exports-trade-data.aspx%3Cbr%20/%3E

    How to block these types of urls (% based) in robots.txt
  • Federico Sasso: To block URLs having % symbol in any part of the path:
    Disallow: *%

    Beware of "friendly fire", are you sure you aren't blocking too much?

    Hint: test on your own using Google Search Console (yes, you can copy and paste there your candidate robots.txt code without having to publish it first)
  • Saurabh Rawat: Disallow: /{star}%{star}
    What about this code, is it right?

    Note: {star} = *
  • Federico Sasso: while correct, it's redundant: the initial / is already implicitly included by the first *, the last * is simply ignored.
  • Saurabh Rawat: Thank you so much  ;
  • Tony �Tiggerito� McCreath: What is your main objective here? Blocking them will not stop them being indexed, or stop people visiting them.
  • Saurabh Rawat: I want to stop duplicate issue. My webmaster is showing duplicate meta tag error in HTML Improvement section. Most of the duplicate meta tags are being made by % based url in my website.
  • Tony �Tiggerito� McCreath: Your best move is to stop those URLs from returning a 200 "OK" code. They should return a 404 "Missing Page" code which will get them excluded from the report.

    Or you could have them 301 "Permanent Redirect" to the correct URLs.

    Don't block them in robots as that will just stop bots from finding out that they are invalid URLs.

    Try and find out what is linking to those invalid URLs and see if you can get them fixed.
  • Saurabh Rawat: I can't redirect 50,000 pages... I think robots.txt can stop duplicate issue? am I right?  ;
  • Tony �Tiggerito� McCreath: If there is a pattern to the invalid URLs then you may be able to redirect them in one or two commands. If not, you want them to 404.

    robots.txt is not a good solution. It's more of a way to hide the problem. That your website returns the same content over many different URL variations.
  • Saurabh Rawat: Thank you so much  ;
  • Dave Elliott: The other thing you should be sorting out is your urls! it looks like your url has a random, line return in there!(have a look at )  ;e.g. your page instead of being called india-exports-trade-data.aspx is actually called india-exports-trade-data.aspx<br />.

    This is obviously not needed! What CMS are you using? This is weird behavior. But, it looks like there is something wrong within your web.config file, have you edited it at all? Do you have a module that is installed that rewrites your urls?
  • Saurabh Rawat: Great suggestion I consult with my developer about that. Thank you. Cheers!!

View original question in the Dumb SEO Questions community on Facebook, Saturday, August 15, 2015).