Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Scripting Language Based
Spam and Mail filtering

News

Phishing Recommended Links

FAQs

Papers

RFCs

Mail-relaying checking

Procmail-based filtering

Sendmail bridges Sendmail Milters Commercial MTA-based tools Client-side Filters Statistical technques Address collection blocking Blacklists

Spam Research Tools

Spam Assassin Spam Bouncer PureMessage JunkFilter bogofilter and Bayesian spam filtering Laws Humor

Etc

DNS checks are probably the most efficient way to detect spam and in combination with other header checks that can usually get you at ~98% accuracy level.  So it make sense to construct spam filter hierarchically, first canning "header-detectable" spam and then analyzing the body for the rest (say 2%) of the messages.  This idea implemented in Spam Assassin with weights is very questionable as in this case you usually need results all of the checks to get the weight.

IMHO Perl is one of the best languages for writing spam filters spam and Perl interface to MTA can help.  But with the current servers all this ideas based of milters are plain vanilla overkill. You  can get the same result based of pure movement of messages from one Sendmail (of other MTA) instance to another as files. This solution scales to approximately 1 million messages per day, and it is also simpler and more reliable then milter-based solutions. 

Actually Sendmail can benefit from incorporating some scripting language too (it also can replace dinosaur macro-based rewriting. Sendmail badly sucks, but it sill widely used :-).

Perl-based spam filter are usually open source. I know only one commercial Perl-based spam filter and it sucks ;-) Active State developed Perlmx (now Pure Message), see perl.com Filtering Mail with PerlMx [Oct. 10, 2001]  which looks like an expensive commercial variant of Procmail written in Perl, but with all due respect to Perl, this is a weak overpriced product that does not even comes close to  (far from being perfect) free SpamAssasin. Actually Pure Message is a nice demonstration of the level of degradation of commercial solutions where all development efforts go into marketing and interface and none into fundamental algorithms. That permit open source to complete with them despite much lesser manpower.  What is really funny is that Sophos bought Active State with the explicit goal of entering anti spam market. Paradoxically they managed to chose probably the weakest product that Active State had had. 

Procmail proved to be a really useful free tool that was available at the place at the right time, but its age now shows and it is very limited in its filtering capabilities. You can use Procmail to invoke Perl scripts and that approach is one of the simplest effective strategy to fight spam. This combined procmail+Perl approach can probably remove more then 80% or slightly higher percentage of spam with a very low number of false positives. One popular approach is cross checking of headers for RFC compliance and merciless junking of deviations not included in the whitelists. But you need to analyze your mail stream before implementing this measure and to create an exception lists(whitelists) as many legit sources abuse SMTP protocol (FedEx and many other Forthune 100 companies).

Catherine Hampton has developed a set of Procmail configuration files, named the "Spam Bouncer", which takes Procmail to the limit by implementing a spam blocking scheme using just pure Procmail regex features.  This is a very limited approach, and it has value mainly as the Procmail tutorial. One of the interesting capabilities of the Spam Bouncer that might deserve further attention from implementers of similar tools is its ability to divide potentially objectionable mail into two different levels of suspicion: blatant spam messages and questionable messages:

Again, Perl is a much better tool and combined approach (procmail+Perl) is probably the first choice to consider.  It proved to be simple and scalable of regular midrange Solaris hardware (4 CPUs 8G of RAM)  up to approximately half-million messages per day. 

It is very important to understand that spam changes the nature of email and unfortunately a "spam filter" further amplifies this effect. Nothing can compensate this deterioration of mail environment due to the spam filter, but one can easily amplify the negative effects of spam by using too much zeal in spam filtering ;-).  The road to hell is paved with good intentions. The fact is that from a reasonably reliable delivery mechanism ("old email" environment) combination of "spam+ a spam filter" turns email into a new variation of  "Alice in wonderland"  ("new email" environment). This "new email environment" represents a really unreliable/capricious mechanism that can arbitrarily block useful mails so delivery of any mail is no longer assured.   From this point of view the importance of whitelists cannot be underestimated as they restore predictability for at least part of the address space.

An "overzealous" spam filter can make this situation much worse by completely killing a weak useful signal: that's the law of unintended consequences of adopting weak commercial solution that I tried to stress.  the problem here is not with false positives but with false negatives. If you get 1000 emails and filter 990 of them as spam with just one false negative (useful message classified as spam) and one false positive, then you have false negative rate 10% while your filtering quality is 99.9%.  And one false negative out of 1K spam messages actually is typical for top of the line solutions. 

That means that if you are aggressive with spam filtering then you can really hurt users.  And most users are conservative and still have expectations of  the "old email" predictable environment while actually they need to operate in a completely different (unpredictable) "new email"  environment.

The problem with spam is not just useless or obnoxious messages but the fact that it is polluting the stream of incoming email that a person relies upon. That means that the marking mode is not much better then the spam blocking mode: marking the subject line at the gateway (and delivering the messages to a special folder, for example the Spam folder in Lotus notes or Netscape messenger using a client filter rule) does not help much because its very easy to overlook misidentified important mail.  But it has one tremendous advantage: it makes unnn4esseary ridiculously complex Web-based interface (often with Postgress or MySQL frond-end) and other means of retrieving email from quarantine. Actually I think that implementing those solutions is a almost sure of incompetent email architects.

What is actually important here is flexibility. A user that have a local folder does not depend on central infrastructure and correcting mistake is just simple move from one folder to another.  Please note that I am talking about business situation, where a single missed email might mean lost business, etc. not about home mail, but still this augment is partially applicable to home email stream too. Like crazy and stupid people tend to destroy communication in a group, crazy and stupid spam filters are destroying email: people sometimes miss very important emails due to misconfigured of badly written spam filters both in enterprise and home environments too.

Summarizing the augment above it looks like an overzealous spam filter in business situation looks more like a Trojan horse that harms the business that a useful addition to mail (especially if  it implemented on the gateway level with the blocking mode as default). IMHO the part of the IS that implemented a fascist (and stupid) spam filtering solution not taking into account interests of the business really deserves a very close scrutiny and is ripe for outsourcing :-)

Actually the current situation with the commercial spam filters reminds me the first generation of virus scanners: the quality is  extremly questionable (partially due to the problem of "too much zeal" that I mentioned above), products sugger from deadly "creeping featurism/excessive complexity"  in interface (if Web interface is implemented it consitutes probealy more then 50% of development effoirts, see Pure Message as an example). Taking into account the limitations of the current technology it is very important to know were to stop. IMHO the key today is not "almost complete elimination of spam" but user friendliness and minimization of false positives.

I would advocate a rule-based filter with simple rules (that can be dynamic for example in SN and subject line checking_ and user friendly interface over complex "God know how it figured it out" type of filters especially Baessian filters (BTW "bogofilter" is definitely written in a wrong language ;-). Even "spam assassin" written in a more suitable for the task language (Perl) evolved from more or less simple (and reasonable) tool into a complex (and rather unpredictable) beast :-).

BTW the whole idea of assigning probabilities to individual words and using Bayesian logic is a very attractive and false direction. IMHO text pattern recognition even in the most primitive form (static regular expressions) is a more predictabe from the user standpoint  (and thus more reliable) approach.

My feeling is that unless you have flexible user controlled exception lists, it's very unclear how you can diminish chances of the filtering out an extremely important letter (a miss, that essentially kills the usefulness of the filter once and forever for a particular user), the problem that I outlined above. If the exception list is dynamic and user controlled then I can add the return address to the filter exception list each time I send e-mail to somebody. That's prevents reply from being blocked and is already something useful.

Summarizing I would like to state:

Spam seems to demonstrated that in its current implementation SMTP mail outlived its usefulness and need to enhanced replaced to deal with the changed environment If you need a spam filter I would look into the following features:

But what we probably need is not a better mousetrap (spam filter), but a new mail protocol (or a revision of SMTP, which is an outdated protocol, anyway) that helps to restore confidence in email, may be along with strong legal framework for bringing obnoxious spammers to justice.  Of course not all spammers are created equal, but some of them might benefit from some period of complete isolation from society.
 

Dr. Nikolai Bezroukov



Notes:
  • Those pages are written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • This is a Spartan WHYFF (We Help You For Free) site. It cannot replace the best teachers and the best books.
  • The site contain some obsolete pages as it develops like a living tree... Some links on older pages are broken. Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.

Search Amazon by keywords:

Google   
Open directory

Research Index

 

Old News ;-)

John Beck's Weblog Weblog

I've spent a lot of time over the past couple of months trying out some new (and some not so new) anti-spam techniques. Note that this article assumes some familiarity with sendmail m4 macros; see $CFDIR/README for background and all sorts of details on these, where $CFDIR is one of:

These techniques are in the form of FEATURE and HACK m4 macros (the difference being that the former are provided and blessed by sendmail.org / Solaris whereas the latter are not, though a HACK may evolve into a FEATURE in a future release). For a HACK, one would use

        HACK(`hack-name')dnl

in one's .mc file, likewise

        FEATURE(`feature-name')dnl

When installing hacks, one must create $CFDIR/hack (if it does not already exist) and place hack-name.m4 in that directory. Note that the sendmail distribution comes with such a sub-directory but Solaris does not.

Also, to explain some terms used below: the access list is enabled by the FEATURE(`access_db') macro; details on this are in $CFDIR/README, both in its sub-section in the FEATURES section, and in the ANTI-SPAM CONFIGURATION CONTROL section. And FEATURE(`delay_checks') is strongly recommended, as it is needed to enable the overrule by an OK entry in the access list that I mention in a few places; this feature is also described in its subsection in the FEATURES section, as well as in the "Delay all checks" sub-section of the ANTI-SPAM CONFIGURATION CONTROL section.

Anyway, onto the details. In the order I started deploying them:

Overall, spam getting thru my personal domain's mail server to my users (including myself, my wife, my siblings, our mom, etc.) has dropped about 90% since I started using these techniques, despite the ever-increasing spam trends on the rest of the Internet. [E-mail] ( January 26, 2005 04:16 PM ) Permalink | Comments [2]

 

My simple solution to spam (Score:5, Informative)
by KalvinB (205500) on Saturday January 03, @08:33PM (#7870109)
(http://www.icarusindie.com/)
Spammers need images to get past word filters and to make an ad "stand out." Images can't be sent with the e-mail so src tags are used. href tags are also used for links they expect people to click on. "http://" is a unique identifier that absolutly cannot be obfuscated or it will not work. You can add a lot of junk before an @ symbol but eventually the real link must be there. Simply block that link and poof, no more spam from spammers advertising using that domain. You can block countless spammers by blocking a single 100% unique URL that no legitimate e-mail will ever contain.

The full write up [icarusindie.com] of my take on what I see as horribly flawed ways to combat spam and source code for the custom programs I use to strip links out of e-mails.

I have an example of spam posted there where everything is just a mess in the e-mail. The headers are forged, the text is all obfuscated. But there, clear as day is an "HTTP://"

Poof, killed the spam domain. And there's no way to circumvent my method except by not having links of any form in the e-mail. If you put a link in a spam, I will find it and I will block it.

Slashdot SpamAssassin Gets a Promotion

Re:Bout Time! (Score:5, Informative)
by Just Some Guy (3352) <kirk+slashdot.strauser@com> on Monday June 28, @09:45AM (#9550230)
(http://subwiki.honeypot.net/ | Last Journal: Wednesday December 31, @03:36PM)
I "augmented" SpamAssassin with an extremely tight Postfix ruleset. A remote server has to jump through these hoops before SA ever gets a crack at it:

1. HELO Filtering

  1. Reject any connection that doesn't start with HELO or EHLO.
  2. Allow any host on my LAN to continue on to step 2.
  3. Reject any host not on my LAN that sends a hostname or IP of a machine on my LAN.
  4. Reject non-FQDN hostnames (ala "mailserver").
  5. Reject invalid hostnames (ala "432$@@112").
  6. Let everyone who makes it this far continue on to step 2.

2. Sender Filtering

  1. Allow authenticated senders to continue on to step 3.
  2. Allow hosts on my LAN to continue on to step 3.
  3. Reject non-FQDN sender domains ("foo@bar").
  4. Reject unknown sender domain ("foo@imaginarydomain.com") - after all, if I can resolve their domain, then I couldn't reply to them anyway, right?
  5. Let everyone who makes it this far continue on to step 3.

3. Recipient Filtering

  1. Reject non-FQDN recipient domains (they'd bounce anyway).
  2. Reject unknown recipient domains (same as above).
  3. Allow authenticated users to send their mail and stop processing.
  4. Allow hosts on my LAN to send their mail and stop processing.
  5. Reject mail from anyone else that isn't to one of my domains, or one I'm an MX for.
  6. Use SPF to reject spoofed email.
  7. Use the relays.ordb.org, list.dsbl.org, and sbl-xbl.spamhaus.org DNS blackhole lists.
  8. Greylist all email not coming in from or going out to peer MXes.
  9. Pass everything else to step 4.

4. Content Filtering and Delivery

  1. Use ClamAV to reject viruses. This takes a big load off SpamAssassin.
  2. Use SpamAssassin to tag messages.
  3. Use Cyrus's Sieve to reject high-probability spam, put medium-probability messages into a "review" folder, and filter everything else into the appropriate folders.

I reject over 95% of all incoming mail before it ever gets to SpamAssassin. This means that SA's success rate isn't as good as on other systems (since I weed out all of the obvious spam), but my mailbox is happy and shiny.

SpamAssassin is a brilliant last line of defense, but I wouldn't advise just dumping your raw incoming stream into it. Much of the useful information about a message isn't available to spamd (such as your list of local domain names, relay domains, etc.) and you should consider using a set of cheaper filters to flush out the blatant chaff.

Re:Great News! (Score:5, Interesting)
by NigritudeUltramarine (778354) on Saturday June 26, @04:59AM (#9535803)
A success rate of 95% really sucks when (like me) you get just over 2,500 spams a day. That'd still mean around 125 spams a day would be getting through. (I've had the same email address since the early 1990's, back when there was no reason to keep your email address "secret.")

Personally I do use SpamAssassin, but as an intermediate step.

First step: Check a whitelist of known senders. Deliver if the sender is on the list, AND the message originated from an IP subnet that I allow for them personally.

Second step: Scan with SpamAssassin. If the score is really high (above 20) throw it the hell out.

Third step: If the score is less than 20, and the person wasn't whitelisted, run the message through TMDA [tmda.net] and politely tell the sender I'm not sure who they are, and I get a lot of spam, and could you please click this link to prove that you're a real person.

I've been using this three-step system for eighteen months now, and out of over one million messages that have come into my mailbox (really), exactly FOUR spam messages have made it all the way through. Apparently the spammers decided to go ahead and click on the little link, or they used a real person's return address, and when that person got they autoreply, they were too stupid to understand what was going on.

Even better, I have not received ANY indiciation that I've lost any messages; at least, no one has ever mentioned anything about an email that I didn't get.

I've got five other people at my domain using the same system, although for not quite as long (one for fifteen months, three for about a year, and one for just a month now); they have all had similar success.

So based on those numbers I'd estimate a success rate of 99.9997% for eliminating spam (which is, admittedly, COMPLETELY INSANE), and a false-positive (or at least "lost message") rate of 0% so far (fingers crossed). A few people have had to confirm their messages, of course, but I've whitelisted them as that happens.

I actually wrote all the connecting code in PHP, believe it or not, with a MySQL database as a backend. It's invoked using
.qmail files. PHP is indeed good for things other than web pages; and was a little bit easier for me to maintain and deal with than Perl. The whole thing is less than 25KB of code. There is also a web backend which I use to configure it; that adds another 40KB.

The whole system took about twelve hours of programming to set up, on one Saturday.

Now, for correspondence to companies (such as Microsoft, or Amazon.com), I use a different scheme (although it's handled by the same PHP code). I create up a unique email address for each of them, which ONLY allows mail to or from that domain (for example "rptamazon@mydomain.com" only allows messages from amazon.com). Those addresses are also easily cancellable, individually, if the company starts to annoy me with spam. Basically, each email address can be assigned its own unique whitelist, and can be cancelled individually at any time, through the little web interface.

I also have a number of email addresses for things such as customer support for our company (I write computer software). I'm using the same system for those, also, but instead of checking whitelists based on the sender, I've found a simple way to do it is to check for ANY of our product names anywhere in the message body or subject. If the message doesn't mention any of them, it sends a simple autoreply back similar to that in (3) above, but mentioning that the message didn't seem to be about any of our products, but if it was, please click here, blah blah. We don't have a high volume of support messages (about one or two a day; we're a small company) but in the last year only three or four people have had to click through like that, and, honestly, their support requests were so f*cked up anyways that I'd rather it just dropped them on the floor.
;-)

Then, as a very last step in all this, I also catch all email sent to invalid addresses in my various domains (which come to over 5,000 messages a day), and report those as spam to Vipul's Razor [sourceforge.net]. Which helps out the community, and me indirectly because my SpamAssassin installation also uses the Razor.
3.0, late-July, early August (Score:5, Informative)
by chathamhouse (302679) on Saturday June 26, @04:31AM (#9535751)
(http://www.chathamhouse.org)
3.0.0pre1 was made available last week.

It will apparently take another month or so to finalize the weighting of the rules.

I've put 3.0.0pre1 on a production system that filters ~350k messages per day. With some tweaking of the RBL, bayes, and AWL rules, it is much (~10%) more efficient at tagging spam than 2.63, which I'm running on a parallel server that also sees ~350k messages/day (load balancing is your friend).

More info:
http://www.au.spamassassin.org/full/3.0.x/dist/bui ld/3.0.0_change_summary
sorting mail by spamassassin score (Score:5, Informative)
by David Jao (2759) * <djao@dominia.org> on Saturday June 26, @03:08AM (#9535570)
(http://dominia.org/djao/)
I'd like to delete anything with a score > 15, simply store anything with a score > 5, and send an auto-reply for scores between 5 and 10 indicating that the message was marked as spam and I'll probably never look at it.

I can't speak for auto-replies, but you can do the sorting part client-side. The key is that spamassassin adds a line like "X-Spam-Level: *****" where the number of *'s is the score of the email. Almost any email client can filter mail to different folders based on headers. The unary representation of the spam score ensures that even a primitive filter can work.

For example, one popular client is Microsoft Outlook, and there are several web pages in google (such as this one [carleton.ca]) that explain how to reroute mail to specific folders depending on the spamassassin score.

Get the owner, not the dog..... (Score:5, Insightful)
by Univac_1004 (643570) on Saturday June 26, @04:13AM (#9535710)
(Last Journal: Monday June 21, @11:35PM)
Spam Assassin, while a very clever program, is as misdirected as the "Canned Spam" legislation. It has no effect on the real economics of spam: who pays for it.

Somebody is paying for the spamming, and we know exactly who it is. The URL of that organization is prominently displayed in every item of spamail. It is the advertiser.

The advertiser is right there out in the open, easy to locate. If they're not, the spam isn't doing its job, and wouldn't have been sent. And easy to locate means easy to go after, easy to sue, to fine, DoS or whatever.

Dinging the advertisers, and dinging them hard, will instantly put the spammers out of business.

Spamming can be eliminated without blocking, white lists, or anti-spoofing RFC's. Just go to where it's pointing.

To draw an [ugly, graphic] picture: a dog comes and poops on sidewalk in front of my house, and I step in it. Yelling at the dog is going to be only moderately successful, building a poop filter is difficult, messy, and leaky (as Spam Assassin demonstrates) . Following the dog's leash and fining the owner is what works.

The owner doesn't bring the dog back since s/he doesn't want to pay another fine.

No owner, no dog, no spam.

Get the owner.
OT: Spam Cannibal (Score:2)
by gilgongo (57446) on Saturday June 26, @06:50PM (#9539590)
(http://www.hatters.org.uk/ | Last Journal: Tuesday July 29, @04:19PM)
As it seems now obligatory to mention anti-spam systems whenever a /. story mentions spam, I thought I'd add the following:

Please have a look at Spam Cannibal [spamcannibal.org]

It's an interesting concept that if correctly deployed (big "if") by even a relatively few admins around the world, could really make a difference to the amount of spam on the net. It can also protect hosts against DoS attacks of various kinds.

Don't get me wrong, I'm not astroturfing this (much...). It has flaws - there are those who think blacklisting is a bad idea, and I can see their point of view on that - but I just think Spam Cannibal needs more visibility as an approach.
Challenge-Response schemes are more effective (Score:2, Interesting)
by cpghost (719344) on Saturday June 26, @04:59AM (#9535802)
(http://www.cordula.ws/)
Filtering spam generates way too many false positives. Challenge/Response schemes are IMHO much more effective. TMDA [tmda.net] and similar programs can be configured with whitelists for your regular mail partners, auto-whitelists for everyone who confirms their e-mail identity, and, if necessary, with blacklists too.
Re:DSpam (Score:4, Interesting)
by Chief Typist (110285) on Saturday June 26, @11:42AM (#9537291)
(http://www.iconfactory.com/)
The best feature of DSPAM, in my opinion, is that the SPAM never leaves the mail server.

The bad messages go into a quarantine on the server and can be reviewed by the end user using a web-based interface (looking for false positives.) In the press of a button, that quarantine can be emptied, freeing up disk resources on the server.

Other SPAM solutions (like SpamAssassin) mark the message and continue with delivery. What's the point in downloading the SPAM to your mail client just to throw them away?
 

MsExchange.org -- antispam info. Mainly for Ms Exchange, but contains some interesting general links too.

Slashdot AOL Blocking Spammers' Web Sites

Re:Is this a *smart* idea? (Score:5, Interesting)
by DocSnyder (10755) on Saturday March 20, @07:58AM (#8620216)
(http://docsnyder.de/)
I don't know, whether this is such a brilliant idea - if this gets widely adopted it can't be long before some idiot will get the idea of paying for a spam to "advertise" one of his competitors just to get HIS site blocked...

I'm sure AOL won't block any joe-jobbed targets but only bulletproof servers hosted at Chinanet, Telecom Malaysia, Procergs.com.br etc. which have been spamvertised by known spam gangs.

This is *really* a good idea - Alan Ralsky uses several "throw-away" domains per spam run, but only a handful of different servers to host his crap. Null route these and Ralsky can enlarge his own penis.

This is mandatory for webmails (Score:5, Interesting)
by chrysalis (50680) on Saturday March 20, @07:59AM (#8620220)
(http://www.pureftpd.org/)
The company I'm working for provides free web service ( http://www.skymail.fr ).

This kind of service frequently gets abused by spammers. Two they abuse it :

1) they open an account, just to have a valid address in order to bypass basic spam filters. Then, they send their spam through other servers using this address as the sender.

2) they use scripts to send spam through the service, as any regular user would. This is extremely annoying.

For 1) we publish SPF for all domains we send mail from. Now, it's up to people to enable SPF on their mail servers.

For 2) we filter _all_ packets coming from China, Korea, Nigeria and addresses listed in Spews and Spamhaus databases. That's about 13000+ filtered networks. Thanks to OpenBSD packet filter, it's trivial to set up and it doesn't introduce any slowdown.
 

Stopping Spam and Trojan Horses with BSD (Brett Glass )

This tutorial describes how to configure BSD systems to use DNS blacklists, procmail, mail "sanitizing" scripts, daemons that watch logs for evidence of spamming and "mail bombing," and similar utilities. Prevention of unauthorized relaying and detection and blocking of outbound spam are also discussed. Countermeasures against address harvesting and privacy invasion techniques such as "Rumplestiltskin" attacks, fingerd scans, tracking via identd, e-mail cookies, and malicious image tags in HTML mail are covered in detail.

Plussed users and spammer e-mail harvesting (Diaries)

By tbc Mon Mar 24th, 2003 at 10:02:22 PM EST  

I use the "plussed user" feature of sendmail. I searched for plus sign email at Google, and the first ten results weren't too helpful. If nothing else, I hope this diary entry changes that after Google indexes it.

I delete most spam and think nothing of it. Then I started getting spam on my cellphone, and I started logging them on my spammer blacklist wiki page. I got spam a couple days ago, though, that warrants a diary entry. 8 copies of the same message were sent to addresses undeniably harvested off my Web pages: timc+web+writing@divide.net, timc+issre2k@divide.net, timc+web+cancer@divide.net, timc+ca125@divide.net, timc+uflaccid@divide.net, timc+hacks@divide.net, timc+geekcode@divide.net, and timc+web@divide.net.

My ISP supports this feature, which allows mail addressed to timc+anything@divide.net to be delivered to timc@divide.net, and I have procmail rules that delete all mail sent to plain timc@divide.net.

"Plussed users" are explained at sendmail.org. Not all ISPs support this, but it's easy enough to try. Just send a message to yourself with +anything appended to your regular e-mail account name and see if you get it. I tested yahoo.com and hotpop.com; neither one supports it.

I pepper my Web pages with these tagged e-mail addresses so I know why people are writing to me. Each of the hyperlinked plussed users in this article's introduction triggers a Google search to see which page the spammer was harvesting from.

Here's the spam, with my commentary.

Received: from [198.126.104.216] by 207.76.102.240 with ESMTP id XSZCPC; Sat, 22 Mar 03 08:36:11 +0400
Received: from [175.59.87.96] by 198.126.104.216 with ESMTP id ZDJEED; Sat, 22 Mar 03 08:20:11 +0400
From: "Joyce Bryant" <FYI@mail.com>

The new 2003 edition of the xxxxxxxx xxxxxxxx
xxxxxxxxx is out!  It includes comprehensive and
updated information on xxxxxxxx xxxxxxxxx, xxxx,
xxxxxxxxx, xxxxxxxxxx, xxxxxxxxxx, xxxxxx xxxxx
xxxxxxxx, xxxxxxxxxxxxxxxxxxxx, email addresses
and much more. The cost of the xxxxxxxxx is $285.

To order the xxxxxxxx xxxxxxxx xxxxxxxxx, please
print this email, complete the information below
and fax it to 905-751-0199 (tel: 905-751-0919).

...

To unsubscribe:  Send a blank email to: FYI@mail.com
with "Remove" in the subject line.

Yeah, right.

See also: the c2 wiki's SpamProof page.  

Slice the Spam into workable chunks (Score:1)
by JumperCable (673155) on Sunday January 04, @12:01AM (#7870832)

Everyone is complaining that no solution works against the spam problem. True, there is no single magic bullet. But instead of throwing up our hands and yelling that we are screwed and let the bastards over run us, we need to break the problem down into workable chunks.
 

InformationWeek Spam Spam Nation November 10, 2003

Pinpointing the origin of spam, a necessary step for effective law enforcement, is one of the thorniest problems, because of the mutability of message-header information and "relay raping," the practice of using open server relays to conceal the path of a message. And anti-spam tools don't help, Richter contends. "All these technology companies are doing is taking legitimate marketers who aren't causing problems and filtering our mail because that's all they can catch consistently," he says.

... ... ...

Internet service providers put a lot of effort into combating spam, blocking illegitimate incoming messages and bouncing spammers sending out messages from their systems. While technology can be employed to automate the identification and blocking of unsolicited bulk E-mail, catching and legally removing a spam sender remains a human-driven process. "The way we find out that spam has traveled across our network is when we receive a complaint from a user," says Craig Silliman, director of the network and facilities legal team for MCI. Mary Youngblood, abuse team manager for EarthLink Inc., says it can take months to get a resilient spammer off the network through the legal system.

Youngblood at EarthLink says for this reason, the ISP relies on monitoring tools to seek out spammers: "We look at E-mails themselves, we look at the products they're selling, we look at how many times our automatic processes had to end the connection with their mail machine because of 'user unknowns' [undeliverable mail], we look at our spam filters."

Spammers, she says, make no effort to fine-tune lists to get higher-percentage response rates. "They don't think that way. What they say is, 'Gee, if I get a one-out-of-a-thousand response, think how much I would get if I doubled my E-mail," she says. "Spammers deal in volume, instead of only sending E-mail to those who want it." Of course, it's possible to disagree about whether permission was given to receive messages. Many of those who believe they've been spammed, Richter says, received the unwanted E-mail as a result of their own actions, such as registering for prizes at Web sites.

... ... ...

Atkins sees the cost of enforcement as a problem. "Most of the spam out there breaks existing consumer-protection, criminal, or fraud laws," she says, echoing similar concerns voiced by ePrivacyGroup's Everett-Church. "But spammers are hard to prosecute. They hide, they lie, they cheat, and it costs a lot of money to track them down and build a case against them. That is money a lot of states don't have."

Richter concurs. "The people who these laws are supposed to be trying to attack, they're not going to be affected," he says. "The guy overseas isn't affected."

The Next Step in the Spam Control War Greylisting -- another name for the idea is "tempfailing".

Greylisting got it's name because it is kind of a cross between black- and white-listing, with mostly automatic maintenance. A key element of the Greylisting method is this automatic maintenance.

The Greylisting method is very simple. It only looks at three pieces of information (which we will refer to as a "triplet" from now on) about any particular mail delivery attempt:

  1. The IP address of the host attempting the delivery
  2. The envelope sender address
  3. The envelope recipient address

From this, we now have a unique triplet for identifying a mail "relationship". With this data, we simply follow a basic rule, which is:

If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure.

Since SMTP is considered an unreliable transport, the possibility of temporary failures is built into the core spec (see RFC 821). As such, any well behaved message transfer agent (MTA) should attempt retries if given an appropriate temporary failure code for a delivery attempt (see below for discussion of issues concerning non-conforming MTA's).

During the initial testing of Greylisting, it was observed that the vast majority of spam appears to be sent from applications designed specifically for spamming. These applications appear to adopt the "fire-and-forget" methodology. That is, they attempt to send the spam to one or several MX hosts for a domain, but then never attempt a true retry as a real MTA would. From our testing, this means that currently, based on a fairly conservative interpretation of testing data, we see effectiveness of over 95%, and that is with no legitimate mail ever being permanently blocked.

This blocking comes with a minimal price from the terms of local resources. Assuming the use of a local datastore for the triplet and other metadata, there is no required network traffic caused by Greylisting other than that associated with the connection itself. Since we are not checking the contents of the message at all there is very little processing overhead, unlike many other spam blocking methods.

There is one effect that could be seen as either a positive or negative. Since the Greylisting method delays acceptance of unknown mail, that will generate a little more work for the sending MTA of legitimate mail. The flip side is that it generates a lot more work and smarts for the spammer's systems, hopefully enough to make the costs of spamming higher, possibly even to the point of making spamming unprofitable for some of them.

The best part is that since we never permanently fail a message delivery, as long as the delivering MTA's are well behaved, we should never cause a legitimate mail to bounce. There should never be a false positive!

Slashdot The Next Step in Fighting Spam Greylisting

Re:security through obscurity, again? (Score:5, Interesting)
by blakestah (91866) on Friday June 20, @02:48PM (#6256300)
(http://www.keck.ucsf.edu/~dblake)
The thing that is wrong is the SMTP protocol, and most people's conception of a spammer. Once you see a few "confessions of ex-spammers", everything changes.

There are people out there who pay $10000 in startup costs, and then make $2000/week for spamming. The $10000 gets them software written by knowledgable internet security experts. This software finds any and every way to anonymify the email spam, and finds lists of people to spam.

As long as knowledgable internet security experts are getting paid good cash to enable spammers, and SMTP doesn't change, spam will only continue to get worse. There needs to be a fundamental change in SMTP protocols. It oughta take the spammers about 2 days to fix their MTA bug to get around greylisting.
Re:security through obscurity, again? (Score:4, Insightful)
by SillySlashdotName (466702) on Friday June 20, @03:14PM (#6256584)
I see that, in fine /. tradition, you didn't RTFA.

From the article: If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure. (emphasis addded)

Later in the article it goes into much more detail about the delay, how long to delay if the triplet has not been seen before, life time of the whitelist, etc.

It also talks about configuring the times - they mention the default delay is 1 hour, but that their records suggest that 1 minute would have caught 99% of the same spam messages - "The data collected during testing showed that more than 99% of the mail that was blocked with the tested setting of 1 hour would still have been blocked with a delay setting of only 1 minute. At that point, having a larger initial delay will definitely help, as it gives time for other blocking methods to act. For this reason, it is suggested that at least a one hour delay value be kept as a default, since spammers will start adapting as soon as this method becomes known and starts being used.
Re:security through obscurity, again? (Score:5, Interesting)
by letxa2000 (215841) on Friday June 20, @04:48PM (#6257552)
(http://www.geocities.com/efaxslams)
is reject the mails on the greylist after holding the connection for, say, 10 minutes. That will help deter spamming software,

I doubt it. I would assume the spam software would have a timeout, and I doubt it's ten minutes. If they want to hit-and-run and aren't even willing to make a second delivery attempt when an error code is returned, I doubt they're going to wait 10 minutes. I'm sure that within 30 seconds or less they'll consider it a dead connection and hang up.

Problem is, I used to have my sendmail HANG UP in real-time on an incoming connection as soon as it realized a message was spam. I.e., the incoming message was filtered in the DATA phase and if it was spam I hung up immediately. It worked great and it felt good, but there were many spam programs that took the disconnection as some kind of TCP/IP failure and immediatelty tried again. So I had one day where a single message was attempted to be delivered about 30,000 times as the spammer connected, I hung up, spammer software said "Oops, let me try again!" About one delivery attempt every second or so.

I'd be willing to bet if you put a 10 minute timeout in sendmail you'll see lots of spammer software disconnecting sooner and just trying again. It takes more of their resources, but takes more of yours, too.

Re:security through obscurity, again? (Score:5, Insightful)
by blakestah (91866) on Friday June 20, @03:33PM (#6256813)
(http://www.keck.ucsf.edu/~dblake)
RTFA!

There is no magical waiting period or re-try period that cannot be trivially coded around. And, with good money on the line, will be trivially coded around.

You don't get it. Really smart people are getting paid a whole lot of money to make programs to exploit every possible crack in the way we send email. There is no general rule to spammers, except that it is a lot of money and they are very clever. Little bandaids are not going to stop this one - there needs to be a much more fundamental change. And I am not talking about laws against spam - I am talking about changes in the protocols we use to send email.
Re:your first mistake (Score:5, Interesting)
by Henry Stern (30869) <henry@stern.ca> on Friday June 20, @03:29PM (#6256774)
(http://www.stern.ca/)

It means they have to do retrys...that means spam runs take longer, especially since they have to run...then wait for a locally defined timeout, and run all those addresses again

AND they have to do it from the same IP.

Not to mention that if this is used in conjunction with other collaborative tools (i.e. RBL, checksums), by the time that the spamming MTA can return its IP address will have been submitted to MAPS/etc. and the contents of the message will have been submitted to Razor/Pyzor/DCC.

I think that this greylisting idea will be pretty hard to beat by Joe spammer. Since the game of spam detection is pretty much an arms race, slowing him down will probably be enough to turn the battle in your favour.

Re:can't believe their numbers (Score:5, Informative)
by McDutchie (151611) on Friday June 20, @02:49PM (#6256312)
(http://slashdot.org/)
Eh, open relays are soooo 20th century. :) Actually most open relays today are either blocked or closed, and newly installed MTAs are secure against third-party relaying by default, so this spam method is dying out [it-analysis.com]. Most spam today is sent either directly to the receiving MTA, through open proxies, or through formmail.pl and similar exploits.
Tempfailing is not new and unique (Score:5, Informative)
by HiKarma (531392) * on Friday June 20, @02:39PM (#6256198)
This idea isn't so new or unique. It's been discussed a fair bit on the ASRG [ietf.org] mailing list under the name "tempfailing".

First I heard of it was from Landon Noll and Mel Pleasant. It is noted in brief as one of the techniques in this plan to end spam [templetons.com] (though their plan, which did include the triplets, is not laid out in full there.)

It is a worthwhile technique for a little while, and if spammers were rational, would be worthwhile for some time to come. But spammers are not rational, and already this technique is not as useful as would be hoped.

Do a Google Search for Tempfailing [google.com] especially in ASRG to see statistics etc.

Re:1 false positive is not acceptable. (Score:5, Interesting)
by pclminion (145572) on Friday June 20, @03:00PM (#6256426)
Wrong. 1 false positive can be acceptable, and in fact is probably better than how things are now.

At USENIX '03 there was a paper presented on artificial intelligence techniques for spam detection. I can't provide a link since only USENIX members can download the paper (at this point, at least). I was a coauthor of that paper.

One of the things we've discovered in our research is that some classes of filters (most notably, the one I have been developing along with a few other individuals) are actually more effective at correctly classifying email than humans are. That is to say, you can train the learning algorithm on mostly-correctly-classified data, then re-run it over the training data, and almost miraculously, it discovers all kinds of email in the training set that was incorrectly classified.

I.e., this filter has discovered mail that I myself incorrectly thought was spam. It's scary, because there's a lot of it.

To assume that a human will always be 100% accurate at classifying their own email isn't just arrogant, it's plain wrong. Newer filters that will be introduced in the near future might possibly be more accurate than you, a frail human, could ever be.

How about Habeas' haiku method? (Score:4, Interesting)
by siskbc (598067) on Friday June 20, @02:56PM (#6256372)
The best idea I've seen in YEARS was to have people start using a specific, original poem as their signatures. Then, the author granted license to anyone who WASN'T sending spam. Therefore, they could sue any spammer for copyright infringement if they used it, and you could train your mail filter to look for the signature. Once spamassassin took it up, it pretty much snowballed. See story here [wired.com]
Re:Bayesian Filtering (Score:2)
by anti$pam (682702) on Friday June 20, @04:42PM (#6257503)
The key is to make spammers not make money!

If people start adopting anti-spam technologies we would reduce the return spammers get from sending spam. Reduce this enough and the spamming business will no longer be profitable.

POPFile is great. I've also used SAProxy (http://saproxy.bloomba.com/) under windows and it works great too.

Again, the idea is not to eliminate all spam, but to reduce the return rate, and therefore the money made by spammers.
 
Published a paper? (Score:4, Informative)
by Call Me Black Cloud (616282) on Friday June 20, @02:58PM (#6256400)
Where? To me, publishing a paper means your writing appeared in some peer-reviewed journal (where the "peers" are acknowledged as domain experts). What you did was put up a web page. With a donation link at the bottom.

For others looking for a solution, try POPFile [sourceforge.net]. Open source, cross platform, gives me 96% accuracy.

One more thing: "practically eliminates" is not the same as "eliminates".
Re:Published a paper? (Score:4, Insightful)
by vidarh (309115) <vidarh@hokstad.name> on Friday June 20, @03:33PM (#6256824)
(http://www.personalnames.com/ | Last Journal: Friday April 04, @04:47AM)
To me publishing a paper in a peer reviewed journal instead of on the web would mean that I'd expect audience to be reduced to a ridiculously small fraction of people that might be interested. If I wanted to publish something I'd do it on the web first, and if it stacks up people I respect would start talking about it and link to it.

Yes, I realize that for "serious" science still expect things to be published in peer reviewed journals, but in most cases I can't help but think that getting the article out there would be more useful. Sure, peer review is important, and somewhere to look for some kind of verification of the value of a paper is useful. But I much prefer the Research Index [researchindex.com] way, where I can get a good indication of the value of a paper by looking at how many people have cited a paper and WHO have cited a paper.

Anyway, pretending that putting up a document on a website is somehow less publishing a paper than having it printed in a journal, is just plain elitist. You should propably be a bit more critical to papers that are published that you don't know have been through a proper review, especially if you're not a domain expert yourself, but being aware of the source is something that you always need to be.

Delaying email by one hour! (Score:5, Insightful)
by pjrc (134994) <paul@pjrc.com> on Friday June 20, @03:04PM (#6256484)
(http://www.pjrc.com/ | Last Journal: Thursday June 27, @05:31PM)
From the linked paper:

An hour is short enough that in most cases, users will not notice the delay.

I'm wondering how I'm going to explain that to a new customer over the phone who says "I'll just email that file right now so we can go over it together".

Re:Delaying email by one hour! (Score:5, Insightful)
by vidarh (309115) <vidarh@hokstad.name> on Friday June 20, @03:24PM (#6256712)
(http://www.personalnames.com/ | Last Journal: Friday April 04, @04:47AM)
Agreed. I've been involed in operating a larger (hundreds of thousands of active users) mail system a couple of years ago, and users would complain if their mail took more than seconds. We had to upgrade our system at one point because rapid growth had made mail delivery take a couple of minutes on average, and it caused bad publicity - a lot of users had a clear expectation that e-mail should be delivered in a few seconds and that if it didn't something was wrong.

I think changing that perception of e-mail as near instant will be incredibly hard. And if you succeed it will just move even more traffic over to the IM networks and cause spamming of IM networks to escalate instead.

Bogofilter does pretty well for a client filter (Score:4, Interesting)
by lxdbxr (655786) on Friday June 20, @03:15PM (#6256612)
(http://www.oenone.demon.co.uk/)
The summary does not seem completely accurate; since the greylisting MTA sends an SMTP temp failure there should never be any false positives as long as the sending MTA is vaguely RFC-compliant (sadly not true I suspect). Or at least that was my reading of the paper...

I'm currently using Bogofilter [sourceforge.net] (and looking into CRM114 [sourceforge.net]) and getting better than 99% accuracy (about 1 in 200 false negatives at the moment) and very very few false positives (maybe 2 in 5000 messages).

Of course these are MUA level filters (and yes, I know, I've already "paid" with bandwidth to download the spam) - however since the proposed "greylister" would have to be installed as the MTA at major ISPs (as the authors note) I'm not convinced that is more likely to get widespread adoption than the various sorts of adaptive client-based filtering now available, particularly as it requires a database to back the method up.

As far as I am concerned the major factor in a spam filter should be zero false positives - personally I don't mind reviewing one or two spams a week but I get really annoyed if I were to lose a real message (note the two false positives I have sent to date with bogofilter contained forwarded sales pitches along with a message).

97%? not impressive. It's POPfile for me (Score:4, Informative)
by YE (23647) on Friday June 20, @03:24PM (#6256710)
I get 98-98.5% accuracy with POPfile [sourceforge.net]. I get about 200 mails a day, of which around 30% spam. I get about 1 false negative a day, and maybe 2 or 3 false positives a month. It's a personal solution and as such is much more attractive to me than something server-based which has to be installed by a [typically VERY uncooperative] BOFH.

I use it experimentally for general mail classification (business/personal/a variety of mailing lists etc., all in all 7 buckets) on my home machine, and it works fine in these conditions too, although the accuracy is a bit lower (around 95%).
Greylisting is dead (Score:1)
by MasTRE (588396) on Friday June 20, @05:27PM (#6257886)
All of you naysayers out there (I'd be one too if I said it but I won't, read on to find out why) are making a terrible, terrible assumption: that every mail system admin out there will jump on the greylisting bandwagon and implement this.

Back in reality, a lot less than 0.01% will actually implement this technique, especially after reading this thread. So, it's a non-issue. Greylisting is dead.
I'm skeptical (Score:2)
by chrysalis (50680) on Friday June 20, @05:15PM (#6257791)
(http://www.pureftpd.org/)
Greylisting mainly relies on this (quote) :

"These applications appear to adopt the "fire-and-forget" methodology. That is, they attempt to send the spam to one or several MX hosts for a domain, but then never attempt a true retry as a real MTA would."

I strongly disagree. A vaste majority of spammers actually use real mail servers like Qmail. Or strange spam-specific software with support for retries.

Apart from Spam Assassin, I'm using OpenBSD built-in "spamd" ip-based filter. A quick look at the spamd log files shows that the same spammers retry over and over, usually during 7 days.

What I like in Greylistings is that it actually prioritizes mails. A mail coming from a known source will be processed before a mail coming from an unknown source (that will have to wait for the next try) . Not really an antispam feature, but still nice to have.
Anti-Spam Techniques: Honeypot spam detection! (Score:4, Informative)
by mabu (178417) on Friday June 20, @07:06PM (#6258537)
Aside from the obvious of getting the authorities to crack down on the existing illegal activities (relay hijacking, violation of TOS of ISPs, header forging, etc.) which is the only true solution, I think there are much better approaches than this "greylisting" method.

The problem with the greylist method is it still slows down mail service, and potentially more than the relay blacklist features. The objective here is that end-user/networks should not be penalized in the fight against spam. We already waste too many resources, and according to my latest mail server stats, more than 65% of our inbound mail is UCE. I'm fed up with more than half my e-mail bandwidth being crap my users didn't request so more resource allocation on a local level in the fight against spam is counterproductive!

Here's a very clever, much more practical method I cound recently.

A company is Canada has set up what it calls SORBS [sorbs.net]: Spam and Open Relay Blocking System.

What's different from their blacklist is that they maintain "honeypots" strategically located around the Internet. These are servers they specifically set up as inbound mail relays, but never for legitimate purposes. If the servers get [select] mail activity, it's assumed to not be legitimate and it flags the source as a potential spammer... it makes a lot of sense. You create a domain name, but don't promote it in any legitimate manner, and/or you seed spam lists with these e-mail addresses and then let the spammers send to your key systems around the internet and *bam*, they're identified in real time, and then added to a blacklist.

I really like this idea. Like any other system, it has the potential for abuse but the beauty is the identity of the honeypot systems is kept secret, so it's very difficult for anyone other than spammers to exploit the network.

Slashdot Confronting Address Space Hijackers

Spammers, scorched earth and stolen subnets (Score:5, Interesting)
by Xeger (20906) <slashdot@@@tracker...xeger...net> on Wednesday June 11, @04:13PM (#6174798)
(http://www.eatgod.com/)
This article raises an interesting point. When a spammer successfuly hijacks address space and uses it to send spam, his IPs are naturally going to appear on various blacklists before too long.

The problem isn't limited to blacklists, either. Bayesian spam filters [paulgraham.com] will quickly learn to recognize Received-From headers bearing the stolen IPs. Collaborative hashing filters [sourceforge.net] will also be affected, to a degree.

So...the spammer steals a subnet, uses it to spam for awhile, and then is either shut down or abandons his activities. He leaves behind a zone of "scorched earth" -- addresses that are effectively cannot host a mail transfer agent. It is now the job of the next legitimate recipient to clean up the spammer's mess. He might not even notice anything's wrong until half his emails have gone missing and the other have are bounced with mysterious messages. Having identified the problem, it is now up to him to track down various blacklists and get his addresses removed. The damage done to the Bayesian and collaborative filters simply cannot be undone. Mail will be lost.

To me, this is the real tragedy. Once an address block has been used for spamming, it's effectively ruined until someone inherits it and puts a great deal of time and effort into restoring its good reputation.

i've seen this firsthand (Score:3, Interesting)
by Tancred (3904) on Wednesday June 11, @07:02PM (#6176336)
I'm part of the IP Admin group of a large international ISP and have seen this firsthand. New customers routinely ask us to route space, and sometimes it's difficult to tell if it's theirs or not what with all the mergers, acquisitions and renaming of companies. There's definitely more scrutiny of these requests than there was a year ago.

A few months ago spammers started to hijack IP space that was registered to companies that are now out of business, which means that most likely nobody is going to notice what they've done.

After a while it's almost like getting squatters' rights - I've been using it and nobody else has a real claim to it, so it's mine.

SecurityFocus HOME News Cracking Down on Cyberspace Land Grabs

Network operators were galvanized by a particularly brazen case in April, when a trail of spam led to the discovery that no-less than six /16s -- nearly 400,000 addresses -- had been misappropriated from Trafalgar House, a British construction and shipping conglomerate that's now part of Aker Kvaerner, headquartered in Norway. From the U.K., Cox discovered that the perpetrators conned the American Registry for Internet Numbers (ARIN) into changing the contact information for the space. One of the /16s was traced to a Dutch spammer, and the other five to a mysterious company called "Fedfinancial Corp."

Fedfinancial managed to convince ARIN that it had been contracted to provide network management services for Trafalgar. ARIN won't say exactly how it was swindled, but registration records show the grifters had an authentic-looking e-mail address at a newly-minted "traf-infosystems.net" domain, and a genuine street address with matching voice and fax telephone numbers. But the phone numbers ring to Nevada and Offshore Business Formation, a company that sets up corporations for a fee, and takes orders over the Web. Public records show that they incorporated Fedfinancial as a Nevada corporation last January, on behalf of an unnamed client. The street address is also theirs.

ARIN president Ray Plzak says the registry doesn't comment on specific cases, but acknowledged that address space hijacking is a problem. "We have measures in place to detect these kinds of things, and we have a set of procedures that we follow to verify information, and we're continuously looking into ways of improving that" says Plzak. "No procedure is ever 100% perfect, and we recognize that."

Once the ARIN record for a block of space has been tweaked, the new "owner" can show it to a network access provider as proof that he has the right to use the addresses. Kacperski found three providers for his purloined L.A. County block; anyone who questioned his sudden good fortune was treated to a tall tale about an old friend who bequeathed Kacperski the mammoth space when his company went bankrupt.

Anti-spammers argue that access providers should be more skeptical when someone comes in with a ridiculously large allocation. "If it's a customer connecting with T1 and walking in with a /16, or two or three of them, this is something that should set off some alarm bells," says Schlichting. But additional vigilance goes against an access provider's financial interest -- they make money by connecting people, not by turning them away.

And until spammers discovered the technique, IP hijacking was largely considered a dishonest but forgivable path to acquiring old, unused address space belonging to defunct companies. The perpetrators were what the Spamhaus Project describes as "a few crufty geeks" in search of "cheap digs." The scam is victimless in that it normally targets dormant allocations that are otherwise going to waste, in many cases taking blocks of space that belong to defunct companies, or, like the Trafalgar House space, have long faded from corporate memory.

But like the mob moving in on a neighborhood poker game, spammers have turned a once-harmless misdemeanor into an organized and well-funded scheme. Internet defenders shudder at the thought of large portions of the net's real-estate under the control of anonymous rogue entities. "There's no accountability. You don't know who really owns this particular address space. You have no way of finding out," says Schlichting." Some even worry that malefactors will go a step further, and begin hijacking address space that's already in active use. "This whole episode has identified huge weaknesses in the Internet's own infrastructure," says Cox. "What we've seen happen is trivial compared to what we've seen possible."

InformationWeek Messaging Anti-Spam Program Raises Backfire Fears June 5, 2003

In light of EarthLink's announcement and the prospect of millions more users sending challenges, many list administrators already have vowed to ignore them, effectively barring recipients who employ the technique.

"They can get pretty overwhelming is a nice polite way of putting it," said David Farber, a former Federal Communications Commission chief technologist who runs a 25,000-member list on technology.

Though Farber is sympathetic to the war on spam--up to half his inbox is junk--he considers challenge-based techniques too simplistic.

EarthLink's spam filter blocks up to 80 percent of spam. But spam has increased sixfold over the past 18 months.

The company decided to offer its customers the challenge-response option because cranking up spam filtering would only cause more legitimate mailings to get tossed by mistake, said Jim Anderson, vice president of product development.

"It's as close to a silver bullet as you're going to get," Anderson said. "We're simply providing a tool for customers to retake control of the inbox from spammers."

Others deem challenge-response a knee-jerk reaction.

"I'm worried people are going to implement systems like that too quickly because they are so desperate," said Eric Thomas, chief executive of L-Soft International Inc., a Swedish company that makes the popular Listserv mailing list software. "The cure might be worse than the ailment."

America Online now blocks up to 80 percent of incoming E-mail traffic, or more than 2 billion messages a day.

But company spokesman Nicholas Graham says AOL won't adopt challenge-response because having to send out 2 billion challenges a day would tax the system. And why create delays for subscribers?

"They don't want to hear 'You got mail and you just have to wait a few minutes longer,'" Graham said. "They expect to get E-mail quickly and responses quickly."

Anderson said EarthLink has developed the system over several months to minimize the burden on users and list administrators.

Standards call for messages from mailing lists to come with a priority code marked "list" or "bulk." EarthLink's software wouldn't challenge such messages. But because spammers can easily incorporate such coding, such messages would be sorted to a "suspect mail" folder.

The pre-approved sender scheme also difficulties because it doesn't work well with Yahoo Groups and other services where multiple list members post.

Online receipts from Amazon.com and other E-commerce sites also create problems; because they are automated, they won't respond to challenges.

Robert Craddock, chief executive of challenge-response developer DirectPop.net, said that although the system requires legitimate senders to do more work, "I don't think that's a lot to ask in this day and age when everybody's E-mail box is getting inundated."

Some spam experts question whether such techniques will even work. They believe spammers will figure out how to automate responses to challenges--and also learn to make messages appear to come from pre-approved senders--or are themselves "challenges," said John Levine, a board member of the Coalition Against Unsolicited Commercial E-Mail.

"It's very easy to come up with things that look like a solution," Levine said. "Lots of people say this will solve everything, spam won't be a problem anymore. Of course, they said the same things about a variety of previous techniques."

Unsuspecting Computer Users Relay Spam

As spam has proliferated — and with it the attempts by big Internet providers to block messages sent from the addresses of known spammers — many mass e-mailers have become more clever in avoiding the blockades by aggressively bouncing messages off the computers of unaware third parties.

In the last two years, more than 200,000 computers worldwide have been hijacked without the owners' knowledge and are currently being used to forward spam, according to AOL and other Internet service providers. And each day thousands of additional PC's are compromised at companies, institutions and — most commonly of all — homes with high-speed Internet connections shared by two or more computers.

"The spammers have mutated their techniques," said Ronald F. Guilmette, a computer consultant in Roseville, Calif., who has developed a list of computers that are forwarding spam. "Today, if you are trying to do a really mass spamming, it is de rigueur to do it in an underhanded manner."

Just last Thursday, 17 law enforcement agencies and the Federal Trade Commission issued a public warning about some of the ways spammers now commandeer computers to evade detection. The officials translated the warning into 11 languages because many of the exploited computers are known to be in China, South Korea, Japan and other countries with heavy Internet use.

Mostly, the spammers are exploiting security holes in existing software, but increasingly they are covertly installing e-mail forwarding software, much like a computer virus. For some, hacking is no longer about pranks, but making a profit.

"This is not about a hacker trying to show off, or give you a hard time," said William Hancock, chief security officer for Cable and Wireless, the British telecommunications company. "This is about money. As long as there are people who want spam to go out, this is not going to go away."

Spam fighters say that some software is too easy to exploit and should be fixed. Moreover, computer users can take technical precautions to safeguard their machines. But not everyone will bother to take those steps, even if he or she discovers having been dragooned into the spammers' global army.

To begin with, most users do not see much effect when their computer has been co-opted. Surfing the Web from the victimized computer may be slower than usual but that is not always easy to detect. In most cases, the owners' e-mail addresses are not added to the spammed messages, so there is no need to worry that friends and associates will think the PC owners have suddenly started peddling herbal Viagra.

Indeed, the only way most users even become aware of such hijackings is when they receive telephone calls or e-mail from their Internet service providers saying a piece of spam was traced back to their machines.

"People are shocked," said Bobby Arnold, a network abuse engineer at Earthlink, the big Internet provider. "Someone will say, `I thought my computer was running a little slow, but I had no idea it was being used to send spam.' "

Some of the victims of the hidden spammers are revolted to learn, Mr. Arnold said, that they are aiding the hucksters and pornographers responsible for what many Internet users consider the medium's great blight. The truly offended rush to safeguard their machines.

But others, who see no direct impact to themselves, simply shrug off the problem, Internet providers say. Intent on reducing their network clutter, the providers then often try to cajole them into cooperating — and, if that fails, will sometimes cut off a user's service.

Sometimes people do find that someone has been sending spam and using their e-mail address as the sender, but this does not mean that their computers were used. Nothing on the Internet verifies that an e-mail message was actually sent by the person listed in the "From" address, which is one reason fighting spam is so hard.

And spammers like to send e-mail that appears to be from their enemies or names chosen at random. The legitimate owners of those addresses are often left to clean out hundreds or thousands of complaints from their e-mailboxes.

When a computer receives an e-mail message, it does record a code number, called an Internet protocol address, that can be traced to the computer that is connecting to it. But often e-mail is passed from one machine to another and the identity of the original sender cannot be verified.

Indeed, the rapid rise in the number of spammers trying to hijack innocent computers is a direct result of their desire to hide their own Internet protocol addresses from spam blockers. Most commonly, they are taking advantage of a backdoor in much of the software that office users or people with high-speed connections at home often install to share an Internet link among several computers — or so-called proxy servers. Some other types of e-mail and Web surfing software, typically run by larger companies, can also be taken advantage of if security features are not properly set up.

Because it essentially enables one computer to masquerade as another, a proxy server is an ideal tool for anyone seeking to use the Internet anonymously. So proxy servers are used by people in some countries to visit Web sites blocked by government censors. They are also used by hackers trying to attack other machines. And they are perfect for spammers trying to avoid filters.

None of these uses would be possible if the owners of the proxy servers made sure to configure them for access only by authorized users. But whether from laziness or ignorance, many users of proxy servers leave them open to anyone on the Internet.

AnalogX Proxy, a free proxy-server program that has been downloaded by more than a million people, is automatically in the open state when it is first installed. Mark Thompson, the author of AnalogX, said he had rebuffed the requests of many antispam activists to distribute the software with the security features already activated because doing so would make it harder to set up.

"The biggest plug for the proxy is it is really easy to get it running," he explained. Mr. Thompson said he did try to achieve a compromise by revising the program to give people a warning about security problems every time it starts.

Even so, Wirehub, a Dutch Internet service provider, says that 45,000 of the 150,000 open proxy servers it has identified as sending spam appear to be using AnalogX.

To find all these vulnerable machines, spammers and other hackers deploy computers that do nothing more than try to connect to millions of computers across the Internet, looking for open proxy servers to exploit.

At the Flint Hills School, "it was pretty amazing how fast our vulnerability was picked up by the spammers," Robert Hampton, the school's director of technology, said recently. Once the problem was identified, the school was able to fix it immediately.

Spammers and hackers trade or sell lists of open proxy servers on dozens of Web sites. And other sites sell software a would-be spammer can use to find new servers.

In the last six months, an increasingly common trick has been for spammers to attach rogue e-mail-forwarding software to other e-mail messages or hide it in files that are meant to emulate songs on music sharing sites like KaZaA.

As with all such hacker contraptions, and much spam, it is difficult to figure out who is behind these programs. But there is some evidence that one of the major spam-sending programs, known as Jeem, originated in Russia, which has been a fertile ground for both spammers and hackers.

Last October, Michael Tokarev, a Russian computer programmer active in the worldwide antispam effort, noticed a lot of spam in Russian that offerred bulk-mailing services. The messages were identical, but they came from many different computers. He investigated and found they were forwarded by a program, calling itself Jeem, that had not been seen before.

Mr. Tokarev said that in December, a Russian forum for spammers called Carderplanet.com contained a posting offering to sell the Internet addresses of open proxy servers, for $1 each, that appeared to be machines infected with Jeem. "Since the last week of December, several big U.S. spammers started to use those Jeems, too," Mr. Tokarev wrote in an instant message interview last week.

Machines infected with Jeem, which is especially hard to find because it keeps switching its identity on the computers it borrows, seem to be used these days mostly by spammers selling pornography, David Ritz, a volunteer spam fighter, said. Using a software monitoring tool he helps run, Mr. Ritz last week examined the messages sent to Internet news groups from just one home computer infected with Jeem. On one day last week, this computer sent 773 pornographic news postings with subjects like "Lolita paradise" and "N.U.D.E —— L,O,L,I,T,A,S."

"Open proxies are the single greatest threat to the integrity of the network that we see now," he said.

AOL, which has made fighting spam a central part of its marketing thrust, is taking what some see as radical action against open proxy servers. It will no longer accept any incoming e-mail sent directly from the computers of individual home users with high-speed service. This will not affect most home users because they typically do not run e-mail servers on their own computers but connect their e-mail programs to servers run by their Internet providers. But a handful of advanced users and small businesses do run their own e-mail servers co