Over here in Outlook.com (and Office 365), we hate spam (and phishing, and malware). We’re doing everything we can, every single day, to keep it out of your Inbox.But we know that there are many of you out there as well that also hate spam as much as we do, and that’s where the Spam Fighters program comes in.
Most of you reading this will have no idea what this “Spam Fighters” program is. So, for those of you who don’t know what it is, and the few of you who do, here’s a rundown.
You may already be familiar with Smartscreen, the spam filter that powers Outlook.com but also is used in other Microsoft products like Edge (the new web browser in Windows 10). Smartscreen works by machine learning, and it’s been machine-learning before the term “machine learning” was a buzzword in the industry.
In order for Smartscreen to be both predictive and effective, it has to learn on a corpus of good and bad email. While we can always find sources of spam (through honeypots and user feedback reports), it can be difficult to acquire a corpus of good email. That’s where the Spam Fighters program comes in.
The Spam Fighters program asks a random sample of users in Outlook.com if they’d like to volunteer to help fighting spam. These invitations are sent periodically, and they are randomly sampled in order to get a good cross-section of users. If you agree, then each day at most we’ll send you a copy of an email that was meant for you and ask you to grade it – Is it spam? Or non-spam?
This is all down-sampled from your own email stream, you aren’t looking at anyone else’s email. You take a look at it, and vote on what you think it is. There is a little wrapper at the top of the message that contains a banner letting you know what the message is, the voting buttons, and then the Not junk and Junk buttons.
Below is an example:
These are all messages that were already sent to you, they are not held up waiting for you to vote.
When you do vote:
- First, your choice of spam or non-spam is record for that particular message
. - Your choice is then compared to what the spam filter said for that message when it was scanned through the filter:
.
a) Did you say it was spam and the filter agreed? Then everything is good
b) Did you say it was non-spam and the filter agreed? Then everything is good
c) Did you say it was spam but the filter said it was non-spam? Then we have a false negative (missed spam)
d) Did you say it was non-spam but the filter said it was spam? Then we have a false positive (good email classified as spam)
. - Your vote is then compared against the votes of all other users receiving similar email. Does everyone overwhelmingly agree with you? Or disagree with you? Or are the votes split up?
These votes from all the users across the entire Spam Fighters program are combined, and the messages combined to create a corpus, and then Smartscreen learns across numerous features within a message – sending IP, sending domains, authentication status, headers, body of message, attachments, encodings, and so forth. This feeds into our IP reputation, and into the Smartscreen spam filtering algorithm. This algorithm is what does the filtering for spam, malware, and phishing as well as legitimate email. It’s updated multiple times per day.
So, by participating in the Spam Fighters program and voting on a regular basis, you really are helping to fight spam, and your vote counts!
Your email is downsampled and the Spam Fighter vote only happens at most once per day, but it also depends on your email stream so it may not happen every day. But the key point to remember is that machine-learning depends on random sampling.
This means that even though your email is downsampled, Spam Fighter voting does not keep a history of the previous message that was sent to you. Thus:
- It is entirely possible that you may get two similar, or even identical messages, in a row. Or more.
.. - We’re not asking for you to vote on messages because we don’t know whether or not it’s spam. Sometimes we do know, sometimes we don’t. Rather, we’re building a cross-section of email that is representative of the entire user base, and using that to train on a corpus of labeled messages..
. - Similarly, it’s entirely possible that legitimate messages will end up in the Spam Fighters program, asking you to vote on it. “Don’t you know that’s my Amazon newsletter?” Or “This is a message from Microsoft! How can you not know whether or not your own email is spam or not?!” But here again, we’re looking to learn from a representative section of all email, legitimate or not, bulk or not, newsletter or not, transactional or not, or whether it’s from Microsoft or not. It’s important that even newsletters get into the training system.
.
Remember that machine learning is based upon random sampling, and by artificially bumping up or down certain types of messages, that changes the way the filter operates and makes it less accurate in the future..That’s also why you can’t volunteer for the program. Users in it are selected at random, and participation is entirely voluntary. You can opt-out at any time by clicking the “stop participating” link in the email (although you can also just ignore the email from time to time if you’re busy but still want to participate in the fight against spam). You really are helping.
. - When you vote Spam or Not spam, this is not directly tied to the equivalent message in your Inbox. That is, if a spam is in your Inbox and you get a Spam Fighter email that sampled that exact spam, clicking “Spam” in the Spam Fighter email doesn’t report the message as spam back to us, nor does it add to Blocked Senders, nor does it move the original message out of your Inbox and into Junk Email if you hadn’t yet moved or deleted it (the reverse is also true for clicking “Not spam” if the message was originally detected as spam).
.
Instead, your vote is recorded and used as an input for future messages. In order for you to rescue a message (for false positives) or remove it from your Inbox (for missed spam), you’ll need to use the regular UX tools.
That’s an overview of the Spam Fighters program in Outlook.com. For those of you involved in it, we thank you for your participation.