web-dev-qa-db-fra.com

Rediriger le robot d'exploration Google vers un autre fichier robots.txt via .htaccess

J'ai cherché sur Google la réponse toute la journée et je n'ai toujours pas trouvé de réponse.

J'ai un sous-domaine virtuel www.static.example.com qui est un site miroir de www.example.com. Cela signifie que je n'ai qu'un dossier racine pour le sous-domaine et le domaine.

Je veux rediriger les robots d'exploration vers différents robots.txt fichier - robots_static.txt lorsqu'ils voient .static dans une URL dans laquelle je vais interdire l'indexation via la commande /disallow. Je souhaite le faire car j'ai dupliqué du contenu dans les résultats de recherche Google. Le sous-domaine affiche exactement le même contenu que le domaine principal.

Est-ce que quelqu'un sait comment pourrais-je arriver à ce que les robots d'exploration voient robots_static.txt au lieu de robots.txt?

Ce que j’ai réussi à trouver jusqu’à présent est la suivante:

RewriteCond %{HTTP_Host} ^www.static.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots_static.txt [NC,L]

mais lorsque je vérifie dans les outils pour les webmasters, il voit toujours robots.txt comme mon fichier de robots au lieu de robots_static.txt, de sorte qu'il analyse et indexe le tout deux fois.

Qu'ai-je fait de mal? Merci

EDIT: Ceci est mon fichier .htaccess

##
# @package      Joomla
# @copyright    Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved.
# @license      GNU General Public License version 2 or later; see LICENSE.txt
##

##
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations.  It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file.  If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's.  If they work,
# it has been set by your server administrator and you do not need it set here.
##

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

## Mod_rewrite in use.

RewriteEngine On

RewriteEngine On
RewriteCond %{HTTP_Host} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_Host}/$1 [R=301,L]




RewriteCond %{HTTP_Host} ^www.static.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots_static.txt [NC,L]


## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Return 403 Forbidden header and show the content of the root homepage
RewriteRule .* index.php [F]
#
## End - Rewrite rules to block out some common exploits.

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
## End - Custom redirects

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

# RewriteBase /

RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC]
RewriteCond %{THE_REQUEST} !/system/.*
RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L]
RewriteCond %{THE_REQUEST} ^GET

## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %{REQUEST_URI} !^/index\.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
#
## End - Joomla! core SEF Section.

<FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$">
Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT"
Header set Cache-Control "public"
</FilesMatch>

<ifModule mod_headers.c>
    Header set Connection keep-alive
</ifModule>

########## Begin - Remove Etags
    #
    FileETag none
    #
    ########## End - Remove Etags
3
user3474818

Les robots de Google voudront toujours demander /robots.txt à votre sous-domaine et non à /robots_static.txt qui n'aurait aucune signification pour eux.

RewriteCond %{HTTP_Host} ^www\.static\..*$ [NC]
RewriteRule ^/robots\.txt$  /robots_static.txt [L]

Lorsque les demandes de /robots.txt sont effectuées à partir de votre domaine www.static, le fichier /robots_static.txt sera servi comme s'il s'agissait de /robots.txt.

7
Dave Lozier