Bonjour,
Je souhaiterais reconnaitre sur mon site s'il s'agit d'un humain ou d'un robot.
J'ai fait un script du genre
Cependant j'ai visiblement plusieurs bot de chez Google qui ne respecte pas ma détection de l'useragent.
ai je oublié quelque chose ?
Les ip's concernés sont : 66.249.72.50, 66.249.72.115, 66.249.72.12, etc ...
Je souhaiterais reconnaitre sur mon site s'il s'agit d'un humain ou d'un robot.
J'ai fait un script du genre
Code:
$userAgent = getenv('HTTP_USER_AGENT');
if ( !ereg('([bB]ot|[sS]pider|[yY]ahoo|Rambler|Yahoo|AbachoBOT|accoona|ASPSeek|CocoCrawler|
FAST-WebCrawler|Lycos|MSRBOT|Scooter|AltaVista|eStyle|Scrubby|ia_archiver|ai_archiver|WebCrawler|
abacho | Abacho | adibot | Adifco | spiderman | AESOP | albertbot | AlbertBot | fast | All The Web | overture | All The Web | alpavista | Alpavista | mercator | Alta Vista | scooter | Alta Vista | altavista | AltaVista | aranhabot | Amazon.com | amfibi | Amfibi | amibot | Amidalla | anthill | Anthill | antibot | AntiSearch | aquiseeker | Aquiseeker | arks | Arks | ask | Ask | atom | AtomZ | attentio | Attentio | amiga-aweb | AWeb-Amiga | baiduspider | Baidu | msnbot | Bing | bitmagic | BitMagic | biz360 | Biz360 | blekko | Blekko | bnf.fr_bot | BnF | bspider | BSpider | busca | BuscaPique | cassandra | Cassandra | ccbot | CCBot | cityreview | CityReview | clushbot | Clush | comodo | Comodo | lwp-trivial | CPAN | twiceler | Cuill | cydralspider | Cydral | daum | Daum | deepindex | DeepIndex | Dillo | Dillo | pompos | Dir.com | dittospider | Ditto | tagword | DMOZ | dnabot | DNA | domnutch | DomNutch | dotbot | Dot | ecxi | Ecxi | enigmabot | Enigma | speedy | EntireWeb | envolk | Envolk | euripbot | Eurip | arachnoidea | Euroseek | exactseek | Exact Seek | exabot | ExaLead | architextspider | Excite | fastbot | Fast Search | fast-webcrawler | Fast Web Crawler | finebot | FineSearch | freecrawl | FreeCrawl | gaisbot | Gais | geckobot | Gecko | geohasher | GeoHasher | geonabot | Geona | gigabot | GigaBlast | ocelli | GlobalSpec | googlebot | Google | mediapartners | Google AD-Sense | adsbot | Google AD-Words | feedfetcher | Google Feedfetcher | gralon | Graal | heritrix | Heritrix | homerbot | HomerWeb | toutatis | Hoppa | hotbot | HotBot | aitcsrobot | HTML Index Search | almaden | IBM | ichiro | Ichiro | sidewinder | InfoSeek | ultraseek | InfoSeek | inktomi | Inktomi | ia_archiver | Internet Archive | internetseer | InternetSeer | irlbot | IRL-Crawler | isidorus | Isidorus | ixquick | IxQuick | jyxobot | Jyxo | kmccrew | KmcCrew | kumkie | KumKie | larbin | Larbin | legs | Legs | lexibot | LexiBot | spiderguy | Lexis-Nexis | lexxebot | Lexxe | libertyw | LibertyW | linkchecker | LinkChecker | grub | LookSmart | mantraagent | LookSmart | martini | LookSmart | lycos | Lycos | t-rex | Lycos | vachercher | Lycos | wobot | Magellan | magpie | Magpie | mj12bot | Majestic-12 | casper | MaMa | megabot | MegaGlobe | mlbot | MetaDataLabs | sandcrawler | Microsoft | msrbot | Microsoft Research | henrilerobotmirago | Mirago | henrythemiragorobot | Mirago | findwhat | Miva | miva | Miva / FindWhat | msnbot-media | MSN Media | dumrobo | Naver | naverbot | Naver | dloader | Naver | netcraft | NetCraft | netvibes | NetVibes | nomad | Nomad | gulliver | NorthernLight | nutch | Nutch | openbot | OpenFind | openwebspider | OpenWeb | orangebot | Orange | panelbot | PanelBot | patwebbot | PatSearch | peerbot | PeerBot | picosearch | Pico Search | psbot | PicSearch | pipeliner | Pipeline Search | powermarks | Powermarks | purebot | PureBot | sapo | Sapo | sbider | SBIder | imspider | ScanSoft | scoutjet | ScoutJet | scrubby | Scrub the Web | scilla.pl | Scylla | fluffy | Search Hippo | searchspider | SearchSpider | seoengbot | SEO Engine | sightquest | SightQuest | simplepie | SimplePie | asterias | SingingFish | sitespider | Site Spider | sledink | Sledink | sleipnir | Sleipnir | slik | Slider | slysearch | SlySearch | snapbot | Snap.com | snoopy | Snoopy | sogou | Sogou Spider | solomono | Solomono | sosospider | Soso | spiderku | SpiderKU | suchtop-bot | Suchtop-Bot | summify | Summify | swoogle | Swooglebot | sygol | Sygol | synobot | Synomia | szukacz | Szukacz | taco | Taco Bell | teoma | Teoma | directhit | Teoma | tide | Tide | tineye | TinEye | titan | Titan | tovero | Tovero | twengabot | Twenga | ubicrawler | Ubi | underscorebot | UnderScore | utse | Utse | verticrawl | VertiCrawl | voila | Voila | echo | Voila | amiga | Voyager | voyager | Voyager | w3mir | W3Mir | appie | Walhello | wasabot | Wasa | archive.org_bot | Web Archive | root | Web Core | sitewinder | Webwasher | wget | WGet | winona | WhatUseek | surveybot | WhoIs | wikia | Wikia | wikiwix | Wikiwix | willow | Willow | vagabondo | WiseGuys | wisenut | WiseNut | zyborg | WiseNut | yacy | Yacy | yahooseeker | Yahoo! | slurp | Yahoo! | yandex | Yandex | yaub | Yauba | yellspider | Yell | yeti | Yeti | zao | Zao | zealbot | Zeal | zibber | Zibber | zibie | Zibie | zoomspider | Zoom)', $userAgent))
{
// Mon instruction
}
Cependant j'ai visiblement plusieurs bot de chez Google qui ne respecte pas ma détection de l'useragent.
ai je oublié quelque chose ?
Les ip's concernés sont : 66.249.72.50, 66.249.72.115, 66.249.72.12, etc ...