How to catch Russian spam by using the MSWregexp

 

Block spam there is in a foreign language can be hard and I have got this question a number of times on how to do it, so that is why I have make this tutorial so it is easier for anybody to Config MSWregexp to catch them..


First task is to find a copy of the mail that is having the foreign language that you like to block and it most be in the standard RFC mail format as the Mailsweeper files normally is.

Here you can see the mail open up in notepad to show how it is looking in the RAW code and note that notepad does not show the text right because it is not decoding it with the charset "Windows-1251" as the header tells the mail client to do, it simply show the standard Text ANSI version..


 


If you try rename the .MSG file to a .EML file and double click on it, it will then open in Outlook express and this way you can now see how looks like when the receiver see the mail inside a mail client..... Remember to scan the mail for virus before you do this because there can why easy be some bad content in the mail and you don't like that to execute on your computer...

 

Next task is to copy the .MSG file into the folder where you have the MSWregexp application and when you have done that then open a DOS prompt and go to the folder of mswregexp.
Try execute the MSWregexp on the .MSG file to see if it detect the mail as spam..



As you can see it say that it has not detected this mail as Spam and that is because I use a empty mswregexp.ini for this tutorial as you can see in the screen dump below


as you can see in the INI file it is using the ISO-8859-1 to read the whole file, so the MSWregexp will see the whole mail text the same way as when open in notepad and not as the mail client there is using different encoding in the body parts. 

Next task is to see how MSWregexp actual is seeing the content of the body part's that is having the Russian text in it, to do that you can run the MSWregexp with the /showpart command to output how it read the content..

First look at the subject then use /showpart:msg_subject

Note the text is not showing as Russian text

Now look at the mail body and to show both the Text body and the decode html body as one string then use /showpart:msg_textdecodehtml

as with the subject the content is not showing as Russian.

Now simply copy-paste the letters you like to catch into the INI file and build a expression there can catch the mail

As you can see I have split each letter into it own string so it is easier to read and edit when there is new thing you like to add.
The basic idea here is that every time it finds one of the letters it will add 25 to the mail score and what value to use is a bit of a guess of when it catch all spam mails with the strings and do not have any false positive mails so it can be that you must fine turn the score a number of time when you see it detect some wrong mails...
The expressions are place under the [subjecttextdecodehtml] section and that tells MSWregexp to look for the expressions in both the Subject, Text body and Decode version of HTML body.

Next try test the mail with the new INI file and see what it detects and what the score is, to do that use the /showtest:* to show at the console what, where and how many times it has found one of the expressions there is in the INI file.
Note: because of the long output I hade to split the output into two pictures and cut some of it away, but it still shows the important things.


As you can see the mail now score 17000 and that is more then the 10000 is must have to be detected as spam, also the Logtext string is telling what section it has found what string..

 

The task showing in this tutorial is basically the same for each SPAM mail that you will try config mswregexp to detect, but to make it easier to build the more complexes regular expressions I have include the GUI app where you can copy the text into and then build the expression in a much faster way and when it is ready, you can copy it to the INI file.

 

Forum post about this tutorial
http://www.tooms.dk/forum/topic.asp?TOPIC_ID=43