WikiMatrix

#1 2006-10-17 11:49:12

andi
Administrator
From: Berlin, Germany
Registered: 2005-11-17
Posts: 188
Website

Anti-Spam Technologies

Hi fellow developers!

Because of a recent spamming of my Wiki, I'd like to discuss Anti-Spam technologies with you. I'm interested in what technologies you use and how well they work. I'll start with what is currently in DokuWiki:

Delayed indexing is used to make sure search engines won't index spammed pages. Only pages which weren't modified in a configurable time are allowed to be indexed. This seems to work well.

rel=nofollow can be used but in my experience spammers don't care for it.

The best method so far is checking submitted edits against the blacklist provided by http://chonqed.org - unfortunately this works only for known spam-URLs.

What other methods do you use? What's your experience with CAPTCHAs?

Andi


Careful: I'm the lead developer of WikiMatrix and DokuWiki so my posts may be biased ;-)

Offline

 

#2 2006-10-18 13:04:12

michal_frackowiak
Member
Registered: 2006-08-11
Posts: 11

Re: Anti-Spam Technologies

andi wrote:

Hi fellow developers!

Because of a recent spamming of my Wiki, I'd like to discuss Anti-Spam technologies with you. I'm interested in what technologies you use and how well they work.

What other methods do you use? What's your experience with CAPTCHAs?

Andi

Hi,

there are 2 things: "automatic spam" and "manual spam".

For automatic spam CAPTCHAs might be good but there are bots that can read them to some extent. If a spambot author considers writing a bot for your engine worth it - he will do it. Personally I am also using cookie tokens (that also prevents "session riding") that might help against bots.

For both manual and automatic spam I am thinking about implementing some kind of Bayesian filtering and have been experimenting wih DSPAM for the purpose of Wiki. Hard rules are not good alone.

It would be very nice to create a service similar to http://akismet.com/ that would check if the difference between 2 versions of the document can be considered spam. This could work as a web service and eventually make us (developers of all the wiki engines, admin and users) all happy ;-)

I would also encourage building mechanisms that makes it easier for users to mark spam revisions.

good luck! michal

update: I have just checked with Akismet and it is possible to use it not only with blogs - there are many libraries to support it (http://akismet.com/development/). In my spare time I will try to test it with Wikis too.

Last edited by michal_frackowiak (2006-10-18 14:33:19)


Wikidot.com - free wiki publishing network
http://www.wikidot.com

Offline

 

#3 2006-10-19 12:19:01

andi
Administrator
From: Berlin, Germany
Registered: 2005-11-17
Posts: 188
Website

Re: Anti-Spam Technologies

Hi Micha!

Thanks for your reply. The Akismet idea is really interesting - I will have a look at it.

Manual spam has not been a big problem in my wiki yet. However I was recently hit by an automatic spammer who used different Proxies for each request and even changed his useragent string. I guess your cookie token mechanism might help in this case. Could you elaborate on how this exactly works?

Andi


Careful: I'm the lead developer of WikiMatrix and DokuWiki so my posts may be biased ;-)

Offline

 

#4 2006-10-20 01:18:54

PeterThoeny
Member
From: San Jose, CA
Registered: 2005-12-14
Posts: 230
Website

Re: Anti-Spam Technologies

Wiki spam is a growing problem on public wiki sites. Actually, it is not isolated to wikis; any website that can be updated by users is a potential target for spam, such as blogs and bulletin boards. What can you do as an administrator of a public wiki site?

* Rule number one: Enable spam protection.
* Rule number two: Remove spam as quickly as possible when it happens. Reason: Spammers identify easy targets by searching sites for known spam keywords. It pays off to spam sites where spam survives long enough for search engines to pick up the content.

Public TWiki sites are spam targets for some time already; it is discussed on TWiki.org. I can see in the twiki.org logs that there are many failed spam attacks every day. TWiki has a BlackListPlugin that is quite effective in fighting spam. The Plugin gets updated every time a new spam twist is discovered, such as an HTML redirect obfuscated in a JavaScript eval statement. The BlackListPlugin fights spam on several fronts:

* Multiple registrations by the same IP address in rapid succession
* Multiple page saves by the same IP address in rapid succession
* Saving text with known wiki-spam (spam list is maintained and shared by TWiki, MoinMoin and Mediawiki sites)
* Attaching files with known wiki-spam
* Attaching files with JavaScript eval statements
* Manually maintained BLACKLIST of malicious IP addresses
* Automatically updated BANLIST of IP addresses with suspicious activities
* Registration form with magic number in hidden form field to make scripted registrations harder
* Add a rel="nofollow" parameter to external URLs to defeat the purpose of spamming TWiki sites

Related links on wiki spam:

* http://www.structuredwikis.com/peter_2006-08-06.html - my blog entry on wiki spam
* http://en.wikipedia.org/wiki/Link_spam - link spam info on Wikipedia
* http://en.wikipedia.org/wiki/Wikipedia:Spam - guidelines on how to address wiki spam on Wikipedia
* http://chongqed.org/fightback.html - chongqed.org, fighting wiki spam
* http://c2.com/cgi/wiki?WikiSpam - wiki spam info on Ward's original wiki
* http://www.usemod.com/cgi-bin/mb.pl?WikiSpam - wiki spam info on MeatBall wiki
* http://arch.thinkmo.de/cgi-bin/spam-merge - MoinMoin's merged spam list
* http://openwiki.com/ow.asp?WikiSpam - wiki spam info on OpenWiki
* http://spamhuntress.com/ - blog by Ann Elisabeth covering web spam

-- Peter AT StructuredWikis DOT com - http://www.structuredwikis.com/ - http://twiki.org/

Offline

 

#5 2006-11-09 21:15:33

andi
Administrator
From: Berlin, Germany
Registered: 2005-11-17
Posts: 188
Website

Re: Anti-Spam Technologies

I'm still fighting with spam. I wrote an Akismet plugin [1] for DokuWiki - unfortunately Akismet seems to suffer from a lot of false positives, probably because it's tuned on blog comments.

I also wrote a plugin[2] to implement the "bad behavior" toolkit [3] - it blocks a lot of bad user agents but a small number of legit users as well, because their (dynamic) IPs are listed on certain blacklists. Despite this I'd say bad behavior is working good, but isn't enough to protect your wiki.

Peter, can you give some more info about the MoinMoin/TWiki/Mediawiki spam blacklist? Who maintains and adds to it?

I also have some questions about reverting spam. What tools do you have to revert spam? Do you have tools to mass revert an automated spam attack? Do you have ways to one-click-submit spam to blacklists like chonqued? Do you list spam and it's reversion in the changelog? Do you treat reverts as minor edit?

Andi

[1] http://wiki.splitbrain.org/plugin:akismet
[2] http://wiki.splitbrain.org/plugin:badbehaviour
[3] http://www.homelandstupidity.us/software/bad-behavior/


Careful: I'm the lead developer of WikiMatrix and DokuWiki so my posts may be biased ;-)

Offline

 

#6 2006-11-10 02:48:59

oyejorge
Member
Registered: 2006-08-20
Posts: 5
Website

Re: Anti-Spam Technologies

Hi guys, thanks for the ideas, I've been looking into adding some more prevention methods for WikyBlog.

A couple of things I like, and were easy to implement are requiring cookies and limiting the frequency of edits made by unregistered users (idea from phpbb and configurable for the needs of each site).

Josh

* http://www.wikimatrix.org/show/WikyBlog
* http://www.wikyblog.com


-  Josh Schmidt
   WikyBlog Developer
   http://www.wikyblog.com - http://www.wikimatrix.org/show/WikyBlog

Offline

 

#7 2006-11-11 10:26:17

PeterThoeny
Member
From: San Jose, CA
Registered: 2005-12-14
Posts: 230
Website

Re: Anti-Spam Technologies

On shared spamlist: I think it all started at MoinMoin [1]. Any wiki maintainer can use the merged list [2] for spam filtering; it currently has over 6000 entries. Entries are URLs or fragments thereof, in regular expression format. Only a handful of people are adding to the list, I contribute at [3]. Contact me by e-mail if you would like to get involved in contributing spam signatures to the shared list, I can introduce you to the maintainer of the shared list.

If you plan to add a spam filtering feature to your wiki (based on the merged list) I recommend to maintain also a local spam list and a local white list. The local spam list is a defence against immediate attacks. I had to add a local white list since I do not consider some entries on the shared spam list as spam, such as alphaworks.ibm.com or tinyurl.com. Especially the MediaWiki folks have a different perception on what is considered spam.

On reverting spam: TWiki has a "roll back top revison of page" feature, which is restricted to the admins. I have not seen large scale scrited spam on TWiki sites that have the BlackListPlugin installed, so at this time there is no need for a batch mode rollback feature.

[1] http://moinmoin.wikiwikiweb.de/AntiSpamGlobalSolution
[2] http://arch.thinkmo.de/cgi-bin/spam-merge
[3] http://twiki.org/cgi-bin/view/TWiki04/B … n#SpamList

-- Peter AT StructuredWikis DOT com - http://www.structuredwikis.com/ - http://twiki.org/

Offline

 

#8 2006-12-11 13:53:15

Ren
Member
Registered: 2006-12-11
Posts: 1

Re: Anti-Spam Technologies

The idea I have is to only ask for human verification (captcha) if the new revision submitted contains external urls the wiki hasn't seen before. So keeping trivial edits, like spell correcting, remain hassle free.

Offline

 

#9 2007-01-23 00:01:18

david
Member
From: Paris
Registered: 2005-11-29
Posts: 2
Website

Re: Anti-Spam Technologies

Wiclear uses a list of black listed words. I began using it 2 months ago and the number of daily spam fell from 20/30 a day to 3/4 per week. Of course, each time a new spammer shows up, chance are it will go through the filter, but it will go through only once :-)

Wiclear also uses a list of banned IPs but I'm not sure this is really effective.

I'm also thinking of implementing simple heuristics to reject spam.
Something like "more than 60% of content are links to external sites" => spam

I don't plan to use human confirmation (aka CAPTCHA). They are often easy to defeat and hurts a lot usability (one more thing to do, to enter and to click) and accessibility...

Offline

 

You are not logged in.


Board footer

Forum powered by PunBB (© 2002–2005 Rickard Andersson)