How much money do spammers make from your ...

15 downloads 188823 Views 517KB Size Report
response test to distinguish human users from automated requests. CAPTCHA challenges are ... associated with Spam 2.0 and not email spams, and so we will.
How much money do spammers make from your website? Pedram Hayati

Nazanin Firoozeh

Vidyasagar Potdar

Kevin Chai

Stratsec, BAE Systems Perth, Australia

Erasmus Mundus Data Mining and Knowledge Management, University of Pierre and Marie Curie, Paris, France

Anti Spam Research Lab School of Information Systems Curtin University Perth, Australia

Australian Institute of Innovative Health University of New South Wales, Sydney, Australia

v.potdar @curtin.edu.au

k.chai @unsw.edu.au

pedram.hayati @stratsec.net

nazanin.firoozeh @etu.upmc.fr ABSTRACT

1. INTRODUCTION Several efforts have been made to estimate how much money spammers can make for email spams [3]. However, efforts to answer this question for spam-based marketing on the web are still lacking.

Despite years of researcher’s contribution in the domain of spam filtering, the question as to how much money spammers can make has largely remained unanswered. The value of spam-marketing on the web can be determined by discovering the cost of distributing spam in Web 2.0 platforms, and the success ratio of turning a spamming campaign into a profitable activity. Currently, there is limited knowledge on the nature of spam distribution in web applications, and public methods for estimating the turnover rate for spammers, in the existing literature. Therefore, we adopted a methodological approach to address these issues and measure the value of spam-marketing on the web. Using current spam tactics, we targeted 66,226 websites both in English and non-English languages. We launched a spam campaign and set up a website to replicate spam practices. We posted spam content to 7,772 websites that resulted in 2059 unique visits to our website, and 3 purchase transactions, in a period of a month. The total conversion visit rate for this experiment was 26.49% and purchase rate was 0.14%.

Moreover, this question gives rise to further questions, such as how many people are purchasing spam-advertised products, and what is the cost of distributing spam on the web? Finding answers to these questions can provide significant insights into and understanding of anti-spam communities, and improve their spam filtering technologies. A new generation of spam has emerged today which targets Web 2.0 platforms, and is different from the older forms of spam distributed through emails and instant messengers. This new form of spam is known as Spam 2.0 and is defined as “propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications” [8]. Recent research indicates that 75% or more pings originating from blogs to search engines are spam [14], and recent statistics show that the amount of spam in comments dramatically increased in 2011 [1]. This overwhelming volume of spam content is degrading the quality of information and services on the web. In fact, Spam 2.0 has grown to have a far greater reach and impact than traditional spam.

Categories and Subject Descriptors K.4.4 [Electronic Commerce]: Cybercash, digital cash, distributed commercial transactions, electronic data interchange, intellectual property, payment schemes, security.

General Terms

These statistics reveal the productivity of Web 2.0 environments for spamming purposes. However, there is currently no report on the value of spam-marketing the profits spammers can make on the web.

Measurement, Documentation, Performance, Design, Economics.

Keywords Spam Conversion, Spam Marketing, Target Harvesting, Pharmacy, Spammers Revenue, X-Rumer, Spam tactics, Advertising tactics.

The value of spam-marketing on the web can be determined on the basis of two factors – finding out the cost of distributing spam in Web 2.0 platforms, and the success ratio of turning a spamming campaign into a profitable venture. Currently, the knowledge of the nature of spam distribution in web applications and public methods to estimate the turnover rate for spammers is limited. Therefore, we adopted a methodological approach to address these issues and measure the value of spam-marketing on the web. This approach involved:

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CUBE 2012, September 3–5, 2012, Pune, Maharashtra, India. Copyright 2012 ACM 978-1-4503-1185-4/12/09…$10.00.

732



Discovering and implementing common spam distribution tactics on the web.



Harvesting spam targets.



Constructing a website to sell products.



Advertising the website using spam distribution tactics.



utilized navigation patterns, such as session length, set of visited webpages, and request method type (e.g. GET, POST, etc.) for classification. Their main assumption was that the navigational patterns of web robots are different from human users. Similarly, [17] has proposed a malicious robot detection method by employing both rule-based and machine learning algorithms.

Recording the number of website impressions and product sales.

Our study involved developing a spam advertisement campaign for a host website, distributing spam content on existing Web 2.0 applications, and recording the response rate. We studied current spamming tactics on the web to implement a framework to harvest potential websites (targets) and distribute spam content. We constructed a website that imitated an online retail pharmacy website. We disabled some website functionalities in order to neutralize any harm to the website visitors, such as disabling credit card payment. Additionally, we implemented a tracking system on the website to record the response rate.

Several works have also focused on estimating the revenues of spammers as well as the cost of sending and receiving spams in email systems. Research [4] estimated that the average revenue per spam message was 0.00434 Euro in 2004. Research [13] found out that only 5 responses out 500,000 spam messages generated enough revenue in their experiment. Additionally, a report by [22] identified that it was economical for spammers even if they received only one response per 10,000 spam emails.

We distributed spam messages based on this methodology to 98,358 harvested websites. We measured the amount of delivered and filtered spam posts, the response rate and the ratio of sales. Finally, we derived the value of spam-marketing on Web 2.0 platforms by analyzing the results of the experiment. We received 2,059 unique visits on our retail pharmacy website, from the 7,772 websites that we had successfully posted spam messages on. This resulted in three product purchase transactions during a one month experiment period. Therefore, the total conversion visit rate was 26.49% and purchase rate was 0.14%.

A report by [26] showed that the annual cost of spam management for the magazine subscribers was $2,388,091 in 2003. In addition, Nucleus Research estimated that in 2003 and 2004 the cost of spam management for each employee was $874 and $1,934 per year respectively [16]. Research [19] has estimated the accumulated cost factors involved in distribution of Spam 2.0, including the cost of storage, bandwidth, human resources, annual productivity and software. The authors estimated costs from the points of view of spammers, application providers, content providers and content consumers. In the extension of their study [18], they estimated that the cost of 10,000 Spam 2.0 posts, profiles and personal messages was about AUD $3 for a self-hosted web server.

The remainder of this paper is organized as follows: Section 2 provides some background information. Section 3 discusses our methodology and framework. Section 4 covers the experiments conducted. The outcome and results of the paper are discussed in Section 5, while conclusions are drawn in Section 6.

The closest work to this study is the research done by the authors of [3]. They proposed a methodology to measure the actual spam marketing conversion rate, including the costs associated with sending email spams and the number of sales generated by spammers. Two major spamming campaigns were investigated which included one for spreading executable programs, including malwares, and the other for spreading advertisements for a pharmaceutical website to find out the number of sales. They infiltrated the storm botnet in order to distribute spam email messages. Their conversion rate was found to be under 0.00001%. Our focus, in contrast, is on discovering the costs and revenues associated with Spam 2.0 and not email spams, and so we will employ Spam 2.0 distribution tools, rather than botnets.

2. RELATED WORKS There have been numerous studies that have focused on spam filtering in Web 2.0 applications [6, 9-11]. Research [23] has proposed a content-based method to detect spam in forums and blogs. They claim that the document complexity of normal messages is much higher than that of the spam, since spam messages are auto-generated and their generation costs are quite low. The authors of [12] have proposed a spam filtering method to detect unrelated tweets in Twitter. They used a machine learning classifier to identify spam tweets inside the tweet content. Research [2] focused on detecting malicious users who posted spam tweets inside Twitter. Their content-based method was based on extracting discriminating features from the content of the tweets and spam profiles. Similar work was done by the authors of [25] to detect spam profiles. However, they created a social graph to model users and spam users in Twitter. Reseach [5] proposed a method to detect spam campaigns on online social networks, specifically on Facebook. Their method involved textbased similarity and Uniform Resource Locator (URL) correlation between different wall posts.

3. METHODOLOGY In this section, we will describe the research methodology adopted for conducting our research. The methodology comprised of 7 parts, including Ethical Approval, Website Setup, Advertisement Design, Selection of Spam Tactics, Target Harvests, Implementation and Launch, and Analysis and Investigation. We will now explain each of these components in detail in the following subsections.

One of the most popular prevention-based methods to stop automated requests and automated distribution of spam content is Humans Apart (CAPTCHA) [24]. CAPTCHA is a challenge and response test to distinguish human users from automated requests. CAPTCHA challenges are easy for the human users to solve, but hard for automated programs to bypass. CAPTCHA has been widely used in emails and on the web. Many web applications include CAPTCHA as a part of their content submission and registration process.

3.1 Ethical Approval To meet the ethical requirements, we did not record any personal details of the users or their payment methods, including credit card numbers. We took all possible steps to ensure that our experiment did not harm the website users in any way.

3.2 Website Setup The underground nature of spamming on the web makes data collection a challenging and difficult task. In order to procure data for our study, we decided to create an online retail pharmacy

Research [21] has proposed a rule-based framework to discover search engine crawlers and camouflaged web robots. They

733

sample of the irrelevant advertisement that we used in the experiment.

website, and record the number of sales. As illustrated in Figure 1, we created the website content to imitate pharmaceutical spam websites and replicated similar product categories, prices, images, descriptions and website structure. However, we modified our website to ensure ethical research policies that involved:

Thanks for the post. It was very informative. I have also found a similar article at www.spamwebsite.com

Disabling the payment page: The aim of this research was to record the users’ intention of purchasing the products and not actual sale. Therefore, we disabled the payment section of the website; when the users proceeded to “checkout” page, we redirected them to a blank page.

Figure 2. Sample of an irrelevant English advertisement for comment

Disabling the personal information page: We also disabled the personal information page where users would record their personal information, including contact details and shipping address.

3.4 Selection of Spam Tactics The major part of our methodology was to identify and select the current tactics for Spam 2.0 distribution. The use of automated tools is common, including that of auto-submitters (i.e. spambots), for fast distribution of spam messages in Web 2.0 platforms. XRumer, AutoPligg, ScrapeBox, SEnuker are examples of autosubmitters for posting spam messages on blog comments, forums and guest books [20]. We made use of XRumer as our main tool to post spam ads for our website. XRumer has many inbuilt features to bypass common filtering mechanisms (e.g. CAPTCHA and email validation), create user accounts, and reply to forum threads. We adopted a number of tactics to post spam content that will now be discussed.

3.4.1 Comment Comments are used by website visitors to share their thoughts or provide feedbacks. For example, visitors can share their opinions by commenting on blog articles. However, commenting systems can be misused, and have actually been misused widely, to distribute spam content [15]. Therefore, we also targeted commenting systems on a number of blogs to post our ads.

Figure 1. Screenshot of the created website in the experiment

Additionally, we targeted forums (online discussion boards) in our spam campaign. The creation of a user account is necessary in the majority of the online forums, in order to post content. Therefore, we used our automated tool to create a user account, log in to the forum using this user account, and post new threads or reply to existing posts for posting our ads.

We implemented a tracking module to monitor user navigation to record the visitor behaviour on the website. Six parameters were tracked in our tracking module, including: 

Date and time of visit



Demographic information for each visitor



URL of the site the visitors were referred from



Visited webpages’ URL and their frequency



Number of visitors that “checked out”



If the visitor was a web robot. Web crawlers could also visit our website since we had published the website URL on a number of Web 2.0 websites. So, we identified and removed all web crawler data from our dataset by evaluating their IP address and user agent details.

3.4.2 Referrer URL Suppose webpage A has a link to webpage B. If a user navigates through A to B, webpage A is the referrer URL for B. This information is passed to webservers by the user’s web browser through HTTP headers. A referrer URL can provide valuable information for web operators. For example, a web operator can identify webpages that have linked to their website. Referrer URLs, along with website traffic data, are logged by back-end applications. Sometimes, this information is publicly available for website visitors, and can also be indexed by search engine crawlers. However, there is no guarantee of the integrity of the referral URL data sent from a user’s browser because it can be tampered with. Therefore, a referrer URL can be modified to include spam links. In this study, we modified the referrer URL to link to our website, while our tools accessed the main page of each targeted website.

3.3 Advertisement Design We designed the advertisements (ads) based on a study of real spam messages. We classified ads into two types, relevant and irrelevant ads. Content of the former was related to the content on the spam website (i.e. related to pharmaceutical medicine), while content of the latter was not related to website content. For each type of ad, we included a URL linking back to our website, along with an identifier to allow us to identify different ads when visitors accessed the website from comments, online forum posts and personal messages. Additionally, we created ads in two languages (i.e. English and Farsi) to measure the impact of spam messages in languages other than English. Figure 2 shows a

3.4.3 Personal Message Majority of web applications, such as forums, provide a facility through which the website’s members can send personal messages to other members. Personal messages can be sent to one or many members, while the sender’s email address or IP address might

734

public and private proxy servers to hide the traces of our campaign. Our campaign infrastructure ensured that no harm was caused by our project and the experiment was within the ethical bounds.

not be available to the receivers. We adopted this tactic and sent personal messages to users of these web applications with our ads.

3.5 Target Harvesting This part of the methodology involved collecting a list of websites to distribute our ads. In one of our previous studies, we had found that spam users typically queried search engines to discover web applications [7]. Therefore, we also adopted the tactic of using search engines to extract a list of target websites. Specific keywords were used as inputs for search engines in order to extract the target websites. For example, a common practice to generate a list of websites running common web applications is to query search engines for keywords in the URL structure. Executing a search query, such as inurl:exampleforum.php, would return search results to websites that contain exampleforum.php in their URL structure. Based on similar queries, we extracted a list of websites that ran common web applications (e.g. open source software packages, such as phpBB, Wordpress, Wikimedia). Additionally, we included pharmaceutical keywords, such as “health, medicine, drug, etc.” as input for search engines to extract relevant target websites. We repeated the same practice without the pharmaceutical keywords (i.e. with unwanted words) to generate irrelevant target websites for each type of our created ads. Some of the automated spam tools previously mentioned also contain a list of targeted forums and blogs which we also utilized in our study.

Table 1: Summary of Experiments #

Dataset

Ad Type & Tactic

# Websites

Date(s)1

1

Health related websites collected from the provided lists.

Relevant (English)

1,150

26 May

Non health related websites collected from the provided lists.

Irrelevant (English)

2,071

27 May

Health related websites collected from search engines.

Relevant (English)

74,625

3 Jun – 10 Jun

Non-health related websites collected from search engines.

Irrelevant (Farsi)

2,071

9 Jun, 17 Jun

Mass personal messages targeting health related websites collected from search engines.

Irrelevant (English)

18,172

21 Jun – 23 Jun

Selected websites from Exp. 32

Relevant (English)

110

24 Jun

Selected websites from Exp. 32

Irrelevant (English)

110

24 Jun

2

3

3.6 Implementation

4

Figure 3 provides overall design of our experiment to replicate spam campaigns.

5

6

Figure 3. Overall campaign design

7

The design had 6 components, including an ad campaign machine, a web server to host the pharmacy website, an anonymous virtual private network, firewall, proxy servers, and target websites. We hosted our ad campaign on VMWare ESX4 servers under highly controlled and monitored environment. The ad campaign used the XRumer tool in Windows XP operating system. All outbound and inbound ports, except port 80 (default port for HTTP connections), were blocked. We used a dedicated broadband Internet connection and monitored all the traffic to and from the server. We employed virtual private network (VPN) service to route all the traffic through anonymous channels. Additionally, we used

735

Comment

Comment

Comment & ref. URL

Comment & ref. URL

Comment & ref. URL

Comment (reply) & ref. URL

Comment (reply) & ref. URL

1

Dates are subject to server, network and Internet connection issues

2

A selected list of websites from Exp.3 on which we successfully submitted the ad was used in this experiment

One of the challenges we expected to face was search engines blacklisting our IP address. Most search engines are equipped with facilities to detect automated search requests. For example, if a search engine receives many simultaneous and similar requests from one host it may try to block future requests. In extreme cases, search engines may even block IP addresses and not accept any further requests from these IP addresses. In order to get around this issue, we devised a distributed strategy where the search queries were sent to search engines via different proxy servers with different IP addresses. This was done to ensure anonymity, but it was still possible that some anonymous proxy servers might reveal client IP addresses by forwarding this information from the original request to the destination. Therefore we evaluated the anonymity of each proxy server before using it in our campaign to protect our IP address. Additionally, the use of third-party search engines provided us with a further layer of security in terms of hiding our identity.

Figure 4a. The daily network bandwidth used in the entire experiment by our campaign (presents sent traffic)

To imitate spam practices we had to overcome a number of challenges. First, we were required to automatically bypass varying levels of CAPTCHA. Secondly, we were also required to automatically register user accounts to forums and blogs. Finally, we needed a way to automatically bypass email verification process. Once we were able to register an account on a website by bypassing all the three challenges, we began posting advertisements in the form of forum posts or blog comments. We also used the same advertisements to send personal messages to forum members. This was done in a parallel fashion whereby we sent and received up to 10 simultaneous requests by using different proxy servers. Figure 4b. The daily network bandwidth used in the entire experiment by our campaign (presents received traffic).

4. EXPERIMENTAL SETUP We present the details relating to the datasets, experimental design and results of our experiment in this section.

Table 2: Received and Sent Traffic for Each Experiment

4.1 Dataset

Exp.

Running time

Received Traffic (MB)

Sent Traffic (MB)

1

26 May

661

66.2

2

27 May

355

37.6

3

3 Jun – 10 Jun

9,502

1,241.1

4

9 Jun & 17 Jun

30,600

2,637.1

5. RESULTS 5.1 Traffic Analysis

5

21 Jun – 23 Jun

6,160

897

Figure 4 shows the daily network bandwidth usage in the entire experiment in running our tool. The highest bandwidth usage for sending and receiving data was on the 15th of June, which was related to Exp. 4. This may be due to the size of the non-English language webpages that we observed, as they are larger in size. During the periods of the 28th of May to 2nd of June and the 18th of June to the 20th of June, the bandwidth usage was close to zero since no experiment was executed during these periods. The overall bandwidth used for received and sent traffic during each experiment is shown in Table 2.

6

24 Jun

80.4

18.9

7

24 Jun

80.4

18.9

Total

26 May – 24 Jun

36,523.66

4,680.31

We harvested a list of over 98,358 websites including 91,797 relevant (i.e. pharmaceutical), 2,071 irrelevant, and 2,340 Farsi language websites. Target harvesting was conducted for approximately one month (20/5/2010 to 17/6/2010) in order to retrieve 95,137 unique websites. The remaining 3,221 websites were obtained from website lists provided by our spam tools.

4.2 Experiments We split the dataset into 6 different subsets for performing 7 experiments. A summary describing each of our experiments is provided in Table 1.

The total network bandwidth usage for received and sent traffic was 41,203.97MB in one month. We observed that the total network bandwidth usage was quite low given the large number

736

3 product purchase transactions during the entire period of experiment. Therefore, the total conversion visit rate was 26.49% and purchase rate was 0.14%. Figure 5 illustrates the geographic hit-map location of each visit to our website.

of targeted websites used by our tool. This shows that there is a relatively low cost involved in distributing spam.3

5.2 Distribution Rate Our spam tool could reach only 66,226 websites out of the 98,358 targeted websites due to broken website URLs as well as proxy server and network failures. Table 3 presents the number of websites that we successfully published our ads on (7,772), versus the number of websites targeted by our tool (66,226). The overall distribution rate for the entire experiment was 11.73%, with 16.18%, 0.03%, and 2.88% for comment, personal messages and comment reply spam tactics respectively. Additionally, our tool was unable to post ads for web applications that had custom modifications, complex CAPTCHA and other customized antispam protection within the website. Experiment 1 and 2 had the highest distribution rate. This shows that the datasets collected from spam tools were more reliable. The low distribution rate in experiment 5 could be due to the following issues: 

Restrictions placed on new users for sending personal messages. For example, some forum platforms only allow users who have performed a specific number of activities to use the personal messaging facility.



Restrictions placed on obtaining account names of other members. Our tool required a list of users in order to send personal messages. The majority of the targeted websites (11,853) did not make this list available to new users.

Figure 5. Geographic hit-map location of the website visitors By studying the identifier on each visited URL, we found that 805 visits originated from comments and personal messaging spams, and the remaining 1,254 visits were from referrer URL spam tactics, as shown in Table 4. Table 4. Number of unique visits per each experiment

We attempted to reply to our submitted posts in experiments 6 and 7. Hence, we selected a list of targets (110) that we could successfully post our ads on in experiment 3. However, the website moderators removed the majority of our submitted ads and our tool was not able to distribute comment replies on 87 of these targeted websites.

Experiment

1

2

3

4

5

6

7

# of visits through Referrer URL

-

-

1,040

172

40

1

1

# of visits through comment and PM / direct

2

6

145

584

54

7

7

Table 3: Success rate for each experiment Exp. #

# Posted

# Targeted

% Ratio

1

306

836

36.6%

2

474

1,531

30.96%

3

6,905

44,536

15.5%

4

75

1,047

7.16%

5

6

18,166

0.03%

6

3

98

3.06%

7

3

110

2.72%

However, the figures in Table 4 do not explicitly show the actual number of visits for each experiment due to the following reasons. 

To determine the origin of each visit we analyzed the referrer URL associated with each visit. However, there is no guarantee of the integrity of the referrer URL as users could also directly enter our website address and visit the website.



Users could remove the URL identifiers before visiting our website.



We only calculated the URL identifiers to measure the number of visits through referrer URL.

5.3 Conversion Ratio

Overall, the total number of visits through referrer URLs was higher than the number of visits through comment or personal messages. This possibly shows user awareness of spam content at different places in web applications (e.g. web operators place more trust in referral URLs than links within comments and personal messages).

3

4

We received 2,059 unique visits4 on our website from the 7,772 ads that we had successfully posted (ref. Section 3.2), resulting in The average monthly fee for 41,360MB (40GB) bit-cap in Australia was about $38 in 2011.

737

We have removed all the known web robot visits from this figure (e.g. search engine crawlers).

[6] Hayati, P & Potdar, V., 2009. Toward Spam 2.0: An Evaluation of Web 2.0 Anti-Spam Methods. In 7th IEEE International Conference on Industrial Informatics. Cardiff, Wales.

A total of 756 visits occurred to the website out of 75 posted ads on Farsi websites (experiment 4), which was 40 times more than 1,867 visits from 7,697 posted ads on English websites. This possibly shows the effect of language and cultural characteristics in relation to the success of Spam 2.0 distribution. Furthermore, 1,864 visits occurred after the experiment period (25/6/2010 to 17/10/2010). It shows that some of the ads remained in Web 2.0 applications for 4 months after concluding the experiments. Each website visit can bring monetary value for spammers through a cost per click (CPC) service as well as the revenue from selling the products. This revenue could belong either to the spammer if they host the website, or to the owner who pays commission to the spammers to bring visitors to the website.

[7] Hayati, P. et al., 2009. HoneySpam 2.0: Profiling Web Spambot Behaviour. In 12th International Conference on Principles of Practise in Multi-Agent Systems. Nagoya, Japan: Lecture Notes in Artificial Intelligence, pp. 335-344. [8] Hayati, P. et al., 2010. Definition of Spam 2.0: New Spamming Boom. In IEEE International Conference on Digital Ecosystems and Technologies (DEST 2010). Dubai, UAE: IEEE. [9] Hayati, et al., 2010. Behaviour-Based Web Spambot Detection by Utilising Action Time and Action Frequency. In The 2010 International Conference on Computational Science and Applications (ICCSA 2010). Fukuoka, Japan.

6. Conclusion and Future Works This research investigated the question as to how much money spammers can make. For the study, we adopted a methodological approach to discover and implement common tactics for spam distribution on the web, harvest spam targets, construct a website to sell products, advertise the website via spam distribution tactics and record the number of website impressions and product sales. We conducted our experiment over a period of a month, and targeted 66,226 websites out of 98,358 harvested websites. Our dataset included both English and non-English (Farsi) websites. We employed 3 spam tactics, including comments, personal messages and referrer URL spams, using a common spam tool. We setup a campaign to replicate spam practices. Using our tool, we posted spam content to 7,772 websites that resulted in 2059 unique visits to our website and 3 purchase transactions. Therefore, the total conversion visit rate was 26.49% and purchase rate was 0.14%. We found that the referrer URL spam tactic (60% of total visits) was more successful than the comment and personal message spams. Custom modified web applications, complex CAPTCHAs and customized anti-spam protection tools on web applications were the main reasons for our tool not being able to post spam content on some websites. We observed that the user awareness, language and cultural characteristics influenced the ratio of user clicks, as we found that the spam content on nonEnglish websites attracted 40 times more visitors than on English language websites. There are several ways to expand this experiment. With this aim, similar experiments can be conducted for different domains, such as music instead of pharmacy.

[10] Hayati, et al., 2010. Web Spambot Characterising using Self Organising Maps. International Journal of Computer Systems Science and Engineering. [11] Hayati, et al., 2010. Rule-Based Web Spambot Detection Using Action Strings. In The Seventh Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS 2010). Redmond, Washington. [12] Irani, D. et al., 2010. Study of TrendStuffing on Twitter through Text Classification. In CEAS2010. Redmond, Washington. [13] Judge, W.Y.P. & Alperovitch, D., 2005. Understanding and Reversingthe Profit Model of Spam. In Workshop on Economics of Information Security 2005 (WEIS 2005). Boston, MA, USA. [14] Kolari, P. et al., 2007. Towards Spam Detection at Ping Servers. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM). [15] Live-Spam-Zeitgeist, 2009. http://www.akismet.com

Some

Stats,

Akismet.

[16] Nucleus Research, 2003. Spam: the silent ROI killer, Report. [17] Park, K. et al., 2006. Securing Web Service by Automatic Robot Detection. In USENIX 2006 Annual Technical Conference Refereed Paper.

7. REFERENCES

[18] Ridzuan, F., Potdar, V. & Singh, J., 2011. Storage Cost of Spam 2.0 in a Web Discussion Forum. In ACM International Conference Proceedings Series. 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS 2011). Perth, Western Australia.

[1] Akismet, 2011. About Akismet - The automattic spam killer. Available at: http://akismet.com/about/ [Accessed March 23, 2011]. [2] Benevenuto, F. et al., 2010. Detecting Spammers on Twitter. In CEAS2010. Redmond, Washington.

[19] Ridzuan, F., Potdar, V. & Talevski, A., 2010. Key Parameters in Identifying Cost of Spam 2.0. In 24th IEEE International Conference on Advanced Information Networking and Applications (AINA 2010). Perth, Western Australia.

[3] Chris, K. et al., 2008. Spamalytics: an empirical analysis of spam marketing conversion. In Proceedings of the 15th ACM conference on Computer and communications security. [4] Gansterer, W. et al., 2005. Phases 2 and 3 of Project “Spamabwehr I”: SMTP Based Concepts and Cost-Profit Models, Available at: http://eprints.cs.univie.ac.at/717/.

[20] Shin, Y., Gupta, M. & Myers, S., 2011. The Nuts and Bolts of a Forum Spam Automator. In USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET). Boston, MA, U.S.A.

[5] Gao, H. et al., 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 17th ACM conference on Computer and communications security. Chicago, Illinois, USA: ACM, pp. 681-683.

[21] Tan, P.-N. & Kumar, V., 2002. Discovery of Web Robot Sessions Based on their Navigational Patterns. Data Mining and Knowledge Discovery, 6(1), pp.9-35.

738

[25] Wang, A., 2010. Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach. In Data and Applications Security and Privacy XXIV. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp. 335342. Available at: http://dx.doi.org/10.1007/978-3-64213739-6_25.

[22] The Economist, 2007. Tech.view: Canning spam | The Economist. Available at: http://www.economist.com/node/9805795 [Accessed August 3, 2011]. [23] Uemura, T., Ikeda, D. & Arimura, H., 2008. Unsupervised Spam Detection by Document Complexity Estimation. In Discovery Science. pp. 319-331. Available at: http://dx.doi.org/10.1007/978-3-540-88411-8_30.

[26] Windows & .NET Magazine, 2004. The Secret Cost of Spam IT Management. Available at: http://www.itmanagement.com/whitepaper/the-secret-costof-spam/ [Accessed August 3, 2011]

[24] von Ahn, L. et al., 2003. CAPTCHA: Using Hard AI Problems for Security. In Advances in Cryptology — EUROCRYPT 2003. pp. 646-646. Available at: http://dx.doi.org/10.1007/3-540-39200-9_18.

739