Search Engine Articles
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsOtherSearch Engine Articles

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old May 5th, 2004, 10:00 AM
SEO Chat SEO Chat is offline
Utility Bot
SEO Chat Novice (500 - 999 posts)
 
Join Date: Oct 2003
Posts: 752 SEO Chat User rank is Private First Class (20 - 50 Reputation Level)SEO Chat User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Article Discussion: Protect Against Invaders by SPAM-Proofing Your Website

Benjamin Pfeiffer discusses how to SPAM-proof your website. He explains how to use Javascript and mod_rewrite to stop SPAMbots and Spybots from finding email addresses on your website. He also talks about how to find and set up the .htaccess file and gives examples of robots and how to block them.


Read the full article here: Protect Against Invaders by SPAM-Proofing Your Website

Reply With Quote
  #2  
Old May 5th, 2004, 04:24 PM
hexed's Avatar
hexed hexed is offline
<- Have you read my blog?
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Mar 2004
Location: Toronto, Ontario, Canada
Posts: 1,149 hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 10 h 28 m 2 sec
Reputation Power: 8
Send a message via MSN to hexed
Exclamation

This is a very good article for webmasters. Everry webmaster should be at least applying the .htaccess method if they have the knowledge to correct do so.

Here is a list of some more user agent harvesters:


CherryPickerSE/1.0
CherryPickerElite/1.0
Crescent Internet ToolPak HTTP OLE Control v.1.0
EmailCollector/1.0
EmailSiphon
EmailWolf 1.00
ExtractorPro
Mozilla/2.0 (compatible; NEWT ActiveX; Win32)

Hexed
__________________
www.Three-Way-Links.com - Amazing way to increase your rank with Google - trading three way links!
Free Online Games - free online games - what else could you want?

Reply With Quote
  #3  
Old May 5th, 2004, 05:33 PM
VORD's Avatar
VORD VORD is offline
Zen Master
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Location: Bedfordshire UK
Posts: 360 VORD User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 m 16 sec
Reputation Power: 6
99% of web users will have JavaScript enabled browsers. If I could have the other 1% as customers I would be a very, very happy cloth cat.

I do use the JavaScript technique to hide mail addresses on a couple of my sites, but that technique makes the e-mail address inaccessible to a few people so I don't use the technique on my business site (which specialises in web accessibility).

If only we all used Linux servers and could take advantage of .htaccess.

:-)

Reply With Quote
  #4  
Old May 6th, 2004, 02:44 PM
Phoenix's Avatar
Phoenix Phoenix is offline
Contributing User
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Jan 2003
Location: Texas!
Posts: 1,135 Phoenix User rank is Corporal (100 - 500 Reputation Level)Phoenix User rank is Corporal (100 - 500 Reputation Level)Phoenix User rank is Corporal (100 - 500 Reputation Level)Phoenix User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 9 h 4 m 3 sec
Reputation Power: 8
Send a message via AIM to Phoenix
Quote:
CherryPickerSE/1.0
CherryPickerElite/1.0
Crescent Internet ToolPak HTTP OLE Control v.1.0
EmailCollector/1.0
EmailSiphon
EmailWolf 1.00
ExtractorPro
Mozilla/2.0 (compatible; NEWT ActiveX; Win32)


Hey Hexed, good list these should get people started. With the above mentioned bots you can also disallow them in the robots.txt as another measure to block unwanted spiders grabbing you email. But then again not all obey robots.txt, so mod_rewrite would be the way to go.
Vord, your right, most people have javascript enabled browsers, I felt it far to mention that there are some as your mentioned that surf in text only form and/or prefer it that way (whether they are blind, can't read, or just plain technophobic).

Reply With Quote
  #5  
Old May 6th, 2004, 04:22 PM
Webby's Avatar
Webby Webby is offline
Moderator
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Feb 2003
Location: Hannover, Germany
Posts: 1,384 Webby User rank is Lance Corporal (50 - 100 Reputation Level)Webby User rank is Lance Corporal (50 - 100 Reputation Level)Webby User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 21 h 53 m 58 sec
Reputation Power: 7
Send a message via ICQ to Webby
Good artical. I use the robots.txt and javascript. There is a nifty little tool that will called hide-my-code. You mark the code, press a button and the js is automatically generated. An example of the code for my email address below...
Code:
<script language="JavaScript" type="text/javascript">
<!--
var c="9-180-291-96-312-342-303-306-183-102-327-291-315-324-348-333-174-315-330-306-333-192-291-294-291-321-351-345-135-315-330-348-303-342-330-303-348-135-327-291-342-321-303-348-315-330-309-138-300-303-102-186-315-330-306-333-192-291-294-291-321-351-345-135-315-330-348-303-342-330-303-348-135-327-291-342-321-303-348-315-330-309-138-300-303-180-141-291-186";var ac=c.split("-");var s="";for(i=1;i<ac.length;++i){s+=String.fromCharCode(Number(ac[i])/Math.sqrt(Number(ac[0])));}document.write(s);
//--> </script>

http://www.vollversion.de/download/..._code_1739.html (In German but straight forward).

I also use an extensive robots.txt...

#Despictable and evil robots to keep out

User-agent: grub-client
Disallow: /

User-agent: grub
Disallow: /

User-agent: looksmart
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: WebmasterWorldForumBot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Oracle Ultra Search
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: searchpreview
Disallow: /

All kinds of evil and wicked robots there. Not just email scrapers but webscrapers and all kinds of other dodgy things. I never thought about mod_rewrite at the time and I think I'll be adding them. With still +100 spam mails a day I think I need to!
Alan
__________________
What is a website without traffic?
ABAKUS Internet Marketing

Reply With Quote
  #6  
Old May 6th, 2004, 04:39 PM
hexed's Avatar
hexed hexed is offline
<- Have you read my blog?
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Mar 2004
Location: Toronto, Ontario, Canada
Posts: 1,149 hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level)hexed User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 3 Days 10 h 28 m 2 sec
Reputation Power: 8
Send a message via MSN to hexed
Holy ****.


Thanks for the robots.txt list! This will come in handy when I begin my plan to take over the world.

However, some robots will not pay attention to the robots.txt, and then you will need to apply the .htaccess method in order to keep them off your server.

Hexed

Reply With Quote
  #7  
Old May 6th, 2004, 10:44 PM
dejaone dejaone is offline
Contributing User
SEO Chat Beginner (1000 - 1499 posts)
 
Join Date: Apr 2004
Posts: 1,178 dejaone User rank is Lance Corporal (50 - 100 Reputation Level)dejaone User rank is Lance Corporal (50 - 100 Reputation Level)dejaone User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 6 Days 23 h 17 m 45 sec
Reputation Power: 6
Webby,


you really get a long long list.

Reply With Quote
  #8  
Old May 12th, 2004, 03:49 AM
gabi2oo4 gabi2oo4 is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 52 gabi2oo4 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
Good article ... thanks for the info.
I will use it in my websites ..

Thanks

Reply With Quote
  #9  
Old October 3rd, 2004, 01:42 AM
Luki Luki is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Posts: 18 Luki User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 h 18 m 50 sec
Reputation Power: 0
Actually many of the listed robots ignores robots.txt. It is better to use .htaccess and a server side script that checks how many requests the client made and bans those who download the pages too fast. Using just .htaccess may not work all the time because most of the grabbers allows changing the user agent string.

BTW you may want to add the HTTrack Web site grabber to your robots.txt. At least I see it quite often in my log files.


__________________________________
MS Access Forums MS Office Forums SQL Server Forums

Last edited by Luki : October 3rd, 2004 at 02:12 AM.

Reply With Quote
  #10  
Old October 4th, 2004, 06:09 AM
cnile's Avatar
cnile cnile is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Mar 2003
Posts: 122 cnile User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 20 h 25 m 51 sec
Reputation Power: 6
Thanks Webby - Great list - will be really useful

Reply With Quote
  #11  
Old October 10th, 2004, 03:15 AM
Pickofindia Pickofindia is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: India
Posts: 2 Pickofindia User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Unicode...??

Hi !

I am a newbie to website designing and am just learning as I go about designing my website. I had read somewhere that you could hide your email address from being harvested by Bots by converting them into unicode format...?? Is that all hogwash...? Seems there is much more to being invisible from unfriendly bots it I guess.

Dont know if I can comfortably edit the .htaccess file but the Robot.txt may be worth giving a whirl for whatever its worth. I presently only have my email address converted into unicode.

Good article.

Regards,

Pickofindia

Reply With Quote
  #12  
Old October 19th, 2004, 12:51 AM
fathom's Avatar
fathom fathom is offline
Back to Reality!
Click here for more information.
 
Join Date: Mar 2003
Location: Saskatoon, Saskatchewan, Canada
Posts: 9,560 fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level)fathom User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 3 Months 2 Weeks 2 Days 21 h 12 m 44 sec
Reputation Power: 88
Send a message via ICQ to fathom Send a message via AIM to fathom Send a message via MSN to fathom Send a message via Yahoo to fathom Send a message via Google Talk to fathom Send a message via Skype to fathom Send a message via XFire to fathom
Quote:
Originally Posted by Pickofindia
Hi !

I am a newbie to website designing and am just learning as I go about designing my website. I had read somewhere that you could hide your email address from being harvested by Bots by converting them into unicode format...?? Is that all hogwash...? Seems there is much more to being invisible from unfriendly bots it I guess.

Dont know if I can comfortably edit the .htaccess file but the Robot.txt may be worth giving a whirl for whatever its worth. I presently only have my email address converted into unicode.

Good article.

Regards,

Pickofindia


Well harvesters like anything else are getting more sophisticated - so not really, most now can distinguish unicode.

Robots.txt will stop a few that "obey" but most ignore robots.txt

.htaccess banned list works for "active email links" but can't use on all server (Window servers the biggest problem).

Best course (today anyway) - JavaScript link to pop open window and have active link there. Bot "cannot parse data in the Javascript -- at least not yet! ;)
__________________
We are what we repeatedly do… excellence, then, is not an act, but a habit. — Aristotle

Reply With Quote
  #13  
Old October 19th, 2004, 01:01 AM
fryman's Avatar
fryman fryman is offline
Master of the cave
SEO Chat Intermediate (1500 - 1999 posts)
 
Join Date: Apr 2004
Location: Mexico
Posts: 1,533 fryman User rank is Private First Class (20 - 50 Reputation Level)fryman User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 21 h 27 m 36 sec
Reputation Power: 6
Send a message via MSN to fryman
Once again a 5 month old thread gets bumped up...

Oh, well, might as well follow along

This is the best tool I have found to hide your email, it encodes it so no bot can understand it, but it is still a live email link
Check it out here

http://www.wbwip.com/wbw/emailencoder.html

Don't know if this the unicode you are talking about, but all I can say is that since I used this to hide my email addresses, my spam dropped to an absolute cero!

I love it!
__________________
Need some free backlinks for your site? Check this out!

Reply With Quote
  #14  
Old October 19th, 2004, 01:24 AM
fathom's Avatar
fathom fathom is offline
Back to Reality!
Click here for more information.
 
Join Date: Mar 2003
Location: Saskatoon, Saskatchewan, Canada
Posts: 9,560