2009-02-28 17:31Why is there so much spam?Email is an open standard, or several standards, as well as being many other things, such as a means of communication. Something that people might consider from time to time is “Why there is so much unsolicited bulk email?”, although for some people the answer is obvious. Others might not ever ask themselves that question, even if they use email, because they believe the spam problem has been beaten, but it hasn’t, and it is costing email providers and society a lot of money. I have been thinking about this issue myself recently, and on and off for a while, and I have some thoughts about it. While I have neither come up with a means for ending all spam, nor decided that the current email system should be abandoned, I do have a collection of thoughts which help me frame the issue and might help other people in some way. The numbersNetwork hardware company Cisco said that in 2008, there were 200 billion spam emails sent every day. With about 1.6 billion people online, that suggests an average amount of 125 spam emails per person per day. So even if anti-spam technology is on average 99% effective, everyone should expect to get on average at least 1 spam email per day. Another interesting number to consider is that only 1 in 12.5 million spam emails results in a “purchase” by the recipient, a “conversion” rate of less than 0.00001%. This would mean that of those 200 billion spams per day, there are 16000 purchases per day. That number might be a bit unrealistic, but the same study that produced that conversion rate statistic says the Storm botnet could have been generating $3.5 million income per year. Those are some interesting numbers, and perhaps a little bit of a Fermi estimate on my part, but it doesn’t answer the original question. I’ve already mentioned the existence of botnets, and I think that is a good general response when asked what is the source of spam, but perhaps another good response would be “Windows”. Maybe that is cheap point scoring, and people argue that if Linux were more popular there would be more Linux botnets, but even if that rather vague statement is true, I think it is true that if Microsoft didn’t have a monopoly share in the operating system market, they would be forced to compete more and make more secure software. It could also be said that Microsoft are limited in what changes they can make to Windows because of the need for backwards compatibility, but the longer you put off doing a secure rewrite, the harder it gets and the more your customers get attacked while waiting. Looking at the numbers some more, then, let’s imagine that soon there will be 2.2 billion people online, that they will each have on average 2 email addresses (a work address and a home address, for instance), and that about 90% of those email accounts are accessed from a Windows PC. That means Windows PCs have access to about 2 billion email accounts. Now assume that 2% of those Windows machines are infected with a key logging and spam sending virus at any given time. This works out as 40 million email accounts compromised. So, even if the global email system was tightened so much that emails could only be sent through legitimate email accounts which limit the number of emails sent to 50 per account per day, then that would still be 2 billion spam emails possible per day. It might sound good to have a 100 fold drop in the amount of spam, but this would only be a drop back to 2002 levels. Also, it wouldn’t be hard for the botnet to inject adverts into the inbox file on the infected PC, or edit HTML pages on the fly so that the user saw spam in their webmail, which they couldn’t delete. Those 40 million Windows users would be guaranteed to see spam, then, at least. Perhaps, though, if the amount of money made by spammers decreased the same way, botnets would be generating $35000 per year rather than $3.5 million, which might not be profitable enough to consider. Alternatively, using a less theoretical figure for the number of infected machines, like 9 million, that would be 450 million emails per day, assuming each infected machine could only send from one email account. If, however, email account details were harvested and stored centrally (and users didn’t change their passwords frequently), a given machine in the botnet could be using several email accounts at a time. You’re advocating a…While I am mostly just playing around with the numbers here, I have inadvertently suggested that spam might become unprofitable if certain restrictions on email were in place. I did think that these restrictions were conceivable, but I was prepared to propose a system that was unacceptably strict, to show that even then, spam would still be possible and profitable. Perhaps, though, these restrictions are more than just conceivable and are not unacceptably strict, and perhaps I have shown that spam would become unprofitable if they were implemented. If this is the case, it is time to break out the trusty universal crackpot spam solution rebuttal. I won’t actually include it below, but I will deal with some of its major points. The most obvious is that “Mailing lists and other legitimate email uses would be affected”, which any rate limiting plan would suffer from. I would say, though, that it is rare that someone wants to email more than 50 complete strangers in a day. Perhaps my anti-spam system, if you can call it that, could be extended to allowing people to bypass the limit with the use of a special email header. I know that “SMTP headers should not be the subject of legislation”, but this would be a purely technological system where the recipient’s email client could drop all email that has this “I’m a mailing list” header unless the recipient has previously subscribed to that mailing list (which the client could detect based on the sent emails, or the inbox). There is also the issue of “chain letters” or similar “I must send this to all my friends” emails, including legitimate things like invitations to birthday parties, which are sort of “informal” mailing lists. Again, though, this special header could be added by the SMTP server when it sees a large number of recipients in the To, Cc or Bcc fields, and recipients could decide whether they wanted to accept the email, based on the sender. These emails would only count once towards the quota, regardless of recipients, so the sender would not be penalised for having lots of friends. Spammers could abuse this loophole by just sending emails to trusting friends of the person infected, but hopefully friends could contact each other through means other than email and tell the one who was infected to unplug their network cable. When people apply the spam solution rebuttal, I feel that one particular entry is often misunderstood: “Requires immediate total cooperation from everybody at once”. What tends to happen is that either the person putting forward the spam solution doesn’t realise that their system provides no benefit until it has 100% adoption, or, more often, the person trying to criticise the solution doesn’t realise that 10% adoption would lead to 10% less spam. An ISP could limit their residential customers to sending perhaps 100 emails per day (perhaps with an opt-out policy which could be carried out over the phone) and based on the reaction to that then reduce it to 50 emails. This could also be averaged over a week, in case on one busy day you had to coordinate lots of different people doing different things. I know that “Any scheme based on opt-out is unacceptable”, but I think that refers to people having to opt-out to every spam email they receive, whereas I am proposing that the small number of users who need to send more than 50 emails to strangers every day have to opt-out once to their ISP. Phasing it inAs mentioned, the system must provide benefits at each stage of its implementation, and not rely on universal implementation. It should be obvious to see how spam is reduced every time rate limits are introduced, and these limits do not have to be universal to have a positive effect, and I have almost explained how the phasing in works for the special mail header. Basically, if an ISP or webmail provider adds the header, that is extra information for the client, but if they don’t then the client is no worse off, and if the client understands the header then that reduces the amount of spam that someone receives, but if the client doesn’t understand it then they are no worse off. This does mean that both the client and the server have to support the protocol for it to work, but if this were a reason to abandon the idea then HTTPS would never have been implemented. The more complicated case is the issue I haven’t really talked about yet, which is how we restrict computers to sending email only through accounts where rate limits can be applied. Obviously with webmail this is a simple software change at the provider, and presumably webmail providers already do have some sort of limits on the number of emails that can be sent. Webmail will probably always have the problem of bots signing up for accounts, and so there will be an arms race with CAPTCHAs, but I think that you can generate AI-hard problems without needing an AI, so I am confident that this is an easier problem than filtering spam at least. To secure accounts that use SMTP directly, I think we need ISPs to start blocking out-going port 25 except to their own SMTP servers, where they can apply the rate limiting. Again I would accept a phone-based opt-out system for people who really do want to use a third-party SMTP server. If an ISP doesn’t do this blocking, we are in no worse position than we are now, and each one which does helps to reduce spam a little. You can say the same about SPF and DKIM, which can be layered on top to stop spammers impersonating various domains, and that should give clients even more information to base decisions on. So, how big a problem is it that an infected machine would be able to send 50 emails a day to everyone in its address book? Trackbacks
Trackback specific URI for this entry
No Trackbacks
|
QuicksearchCategoriesSyndicate This BlogBlog Administration |