|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Article Discussion: Protect Against Invaders by SPAM-Proofing Your Website
Benjamin Pfeiffer discusses how to SPAM-proof your website. He explains how to use Javascript and mod_rewrite to stop SPAMbots and Spybots from finding email addresses on your website. He also talks about how to find and set up the .htaccess file and gives examples of robots and how to block them.
Read the full article here: Protect Against Invaders by SPAM-Proofing Your Website |
|
#2
|
||||
|
||||
|
This is a very good article for webmasters. Everry webmaster should be at least applying the .htaccess method if they have the knowledge to correct do so.
Here is a list of some more user agent harvesters: CherryPickerSE/1.0 CherryPickerElite/1.0 Crescent Internet ToolPak HTTP OLE Control v.1.0 EmailCollector/1.0 EmailSiphon EmailWolf 1.00 ExtractorPro Mozilla/2.0 (compatible; NEWT ActiveX; Win32) Hexed
__________________
www.Three-Way-Links.com - Amazing way to increase your rank with Google - trading three way links! Free Online Games - free online games - what else could you want? |
|
#3
|
||||
|
||||
|
99% of web users will have JavaScript enabled browsers. If I could have the other 1% as customers I would be a very, very happy cloth cat.
I do use the JavaScript technique to hide mail addresses on a couple of my sites, but that technique makes the e-mail address inaccessible to a few people so I don't use the technique on my business site (which specialises in web accessibility). If only we all used Linux servers and could take advantage of .htaccess. :-) |
|
#4
|
||||
|
||||
|
Quote:
Hey Hexed, good list these should get people started. With the above mentioned bots you can also disallow them in the robots.txt as another measure to block unwanted spiders grabbing you email. But then again not all obey robots.txt, so mod_rewrite would be the way to go. Vord, your right, most people have javascript enabled browsers, I felt it far to mention that there are some as your mentioned that surf in text only form and/or prefer it that way (whether they are blind, can't read, or just plain technophobic).
__________________
Rank Smart Search Marketing - Search Engine Roundtable - Search Marketing Association - North America |
|
#5
|
||||
|
||||
|
Good artical. I use the robots.txt and javascript. There is a nifty little tool that will called hide-my-code. You mark the code, press a button and the js is automatically generated. An example of the code for my email address below...
Code:
<script language="JavaScript" type="text/javascript">
<!--
var c="9-180-291-96-312-342-303-306-183-102-327-291-315-324-348-333-174-315-330-306-333-192-291-294-291-321-351-345-135-315-330-348-303-342-330-303-348-135-327-291-342-321-303-348-315-330-309-138-300-303-102-186-315-330-306-333-192-291-294-291-321-351-345-135-315-330-348-303-342-330-303-348-135-327-291-342-321-303-348-315-330-309-138-300-303-180-141-291-186";var ac=c.split("-");var s="";for(i=1;i<ac.length;++i){s+=String.fromCharCode(Number(ac[i])/Math.sqrt(Number(ac[0])));}document.write(s);
//--> </script>
http://www.vollversion.de/download/..._code_1739.html (In German but straight forward). I also use an extensive robots.txt... #Despictable and evil robots to keep out User-agent: grub-client Disallow: / User-agent: grub Disallow: / User-agent: looksmart Disallow: / User-agent: WebZip Disallow: / User-agent: larbin Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: NetMechanic Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: CherryPicker Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: Crescent Disallow: / User-agent: SiteSnagger Disallow: / User-agent: ProWebWalker Disallow: / User-agent: CheeseBot Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: ia_archiver Disallow: / User-agent: ia_archiver/1.6 Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: MIIxpc Disallow: / User-agent: Telesoft Disallow: / User-agent: Website Quester Disallow: / User-agent: moget/2.1 Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebStripper Disallow: / User-agent: WebSauger Disallow: / User-agent: WebCopier Disallow: / User-agent: NetAnts Disallow: / User-agent: Mister PiX Disallow: / User-agent: WebAuto Disallow: / User-agent: TheNomad Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: RMA Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: asterias Disallow: / User-agent: httplib Disallow: / User-agent: turingos Disallow: / User-agent: spanner Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Foobot Disallow: / User-agent: WebmasterWorldForumBot Disallow: / User-agent: SpankBot Disallow: / User-agent: BotALot Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: URLy Warning Disallow: / User-agent: Wget/1.6 Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: Wget Disallow: / User-agent: LinkWalker Disallow: / User-agent: cosmos Disallow: / User-agent: moget Disallow: / User-agent: hloader Disallow: / User-agent: humanlinks Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Mata Hari Disallow: / User-agent: LexiBot Disallow: / User-agent: Web Image Collector Disallow: / User-agent: The Intraformant Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: JennyBot Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: WebEnhancer Disallow: / User-agent: suzuran Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Openfind data gathere Disallow: / User-agent: Openfind Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: Openbot Disallow: / User-agent: URL Control Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Webster Pro Disallow: / User-agent: EroCrawler Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: Bookmark search tool Disallow: / User-agent: GetRight/4.2 Disallow: / User-agent: FairAd Client Disallow: / User-agent: Gaisbot Disallow: / User-agent: Aqua_Products Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: Oracle Ultra Search Disallow: / User-agent: MSIECrawler Disallow: / User-agent: PerMan Disallow: / User-agent: searchpreview Disallow: / All kinds of evil and wicked robots there. Not just email scrapers but webscrapers and all kinds of other dodgy things. I never thought about mod_rewrite at the time and I think I'll be adding them. With still +100 spam mails a day I think I need to! Alan |
|
#6
|
||||
|
||||
|
Holy ****.
Thanks for the robots.txt list! This will come in handy when I begin my plan to take over the world. However, some robots will not pay attention to the robots.txt, and then you will need to apply the .htaccess method in order to keep them off your server. Hexed |
|
#7
|
|||
|
|||
|
Webby,
you really get a long long list.
__________________
ERP Software | Gift Ideas | Add to 100 SEO Friendly Directories Fast Market Research | build Natural Permanent One-Way Links that actually work |
|
#8
|
|||
|
|||
|
Good article ... thanks for the info.
I will use it in my websites .. Thanks |
|
#9
|
|||
|
|||
|
Actually many of the listed robots ignores robots.txt. It is better to use .htaccess and a server side script that checks how many requests the client made and bans those who download the pages too fast. Using just .htaccess may not work all the time because most of the grabbers allows changing the user agent string.
BTW you may want to add the HTTrack Web site grabber to your robots.txt. At least I see it quite often in my log files. __________________________________ MS Access Forums MS Office Forums SQL Server Forums Last edited by Luki : October 3rd, 2004 at 02:12 AM. |
|
#10
|
||||
|
||||
|
Thanks Webby - Great list - will be really useful
|
|
#11
|
|||
|
|||
|
Unicode...??
Hi !
I am a newbie to website designing and am just learning as I go about designing my website. I had read somewhere that you could hide your email address from being harvested by Bots by converting them into unicode format...?? Is that all hogwash...? Seems there is much more to being invisible from unfriendly bots it I guess. Dont know if I can comfortably edit the .htaccess file but the Robot.txt may be worth giving a whirl for whatever its worth. I presently only have my email address converted into unicode. Good article. Regards, Pickofindia |
|
#12
|
||||
|
||||
|
Quote:
Well harvesters like anything else are getting more sophisticated - so not really, most now can distinguish unicode. Robots.txt will stop a few that "obey" but most ignore robots.txt .htaccess banned list works for "active email links" but can't use on all server (Window servers the biggest problem). Best course (today anyway) - JavaScript link to pop open window and have active link there. Bot "cannot parse data in the Javascript -- at least not yet! ;)
__________________
We are what we repeatedly do… excellence, then, is not an act, but a habit. — Aristotle |
|
#13
|
||||
|
||||
|
Once again a 5 month old thread gets bumped up...
Oh, well, might as well follow along This is the best tool I have found to hide your email, it encodes it so no bot can understand it, but it is still a live email link Check it out here http://www.wbwip.com/wbw/emailencoder.html Don't know if this the unicode you are talking about, but all I can say is that since I used this to hide my email addresses, my spam dropped to an absolute cero! I love it!
__________________
Need some free backlinks for your site? Check this out! |
|
#14
|
||||
|