The business and politics of search engines: A ... - SAGE Journals

481196

NMS16210.1177/1461444813481196new media & society JiangJiang

13

Article

The business and politics of search engines: A comparative study of Baidu and Google’s search results of Internet events in China

new media & society 2014, Vol. 16(2) 212–233 © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1461444813481196 nms.sagepub.com

Min Jiang

UNC Charlotte, USA; University of Pennsylvania, USA

Abstract Despite growing interest in search engines in China, relatively few empirical studies have examined their sociopolitical implications. This study fills several research gaps by comparing query results (N = 6320) from China’s two leading search engines, Baidu and Google, focusing on accessibility, overlap, ranking, and bias patterns. Analysis of query results of 316 popular Chinese Internet events reveals the following: (1) after Google moved its servers from Mainland China to Hong Kong, its results are equally if not more likely to be inaccessible than Baidu’s, and Baidu’s filtering is much subtler than the Great Firewall’s wholesale blocking of Google’s results; (2) there is low overlap (6.8%) and little ranking similarity between Baidu’s and Google’s results, implying different search engines, different results and different social realities; and (3) Baidu rarely links to its competitors Hudong Baike or Chinese Wikipedia, while their presence in Google’s results is much more prominent, raising search bias concerns. These results suggest search engines can be architecturally altered to serve political regimes, arbitrary in rendering social realities and biased toward self-interest. Keywords Baidu, bias, China, Google, Hudong, Internet events, ranking, search engine, Wikipedia

Introduction In societies overloaded with information, search engines play a crucial role in locating, organizing and distributing information and knowledge. China is no exception. Among Corresponding author: Min Jiang, Department of Communication Studies, UNC Charlotte, Charlotte, NC 28223, USA. Email: [email protected]

Jiang

213

China’s 538 million netizens, 79.7% reported using search engines, making online search the second most popular online activity (China Internet Network Information Center (CNNIC), 2012). Although users often trust search engines more than their own judgment (Pan et al., 2007), search operations are far from perfect. A blind faith in search engines to deliver trustworthy results is further complicated by government requests to filter information (Villeneuve, 2008). Google, worth US$190 billion, and Baidu, worth US$42 billion, capture 15% and 78% of China’s search market, respectively (Yahoo! Finance, 2012). In January 2010, four years after entering China, Google announced it would stop censoring its results due to cyber attacks and security breaches. It eventually moved its servers to Hong Kong, a free speech zone, and started directing its Mainland Chinese users to its uncensored Hong Kong site. Consequently, the censorship burden shifted to Beijing who claims ‘Internet sovereignty’ over its territory (Jiang, 2010a). Such events highlight search engines’ information gatekeeping functions, especially their importance to nation-states. Previously, research has examined various aspects of online searching, such as search technology (Brin and Page, 1998), search retrieval (Jansen et al., 2007), search politics (Introna and Nissenbaum, 2000), search quality (Taylor, 2013), social impact (Halavais, 2008) and legal implications (Goldman, 2006). With regard to online searching in China, Western research focused on censorship and policy, while Chinese studies emphasized business strategy, technology and user behavior. Despite a growing interest in search engines in China, few studies have empirically examined actual search results and their implications beyond censorship. This study reports differences between Google’s and Baidu’s search results in China. After an overview of the literature on the increasing commercial and political nature of search engines, I review studies of Google’s and Baidu’s search results in China, followed by a summary of four major dimensions of search results research. Through an empirical study, I analyze Baidu’s and Google’s top 10 query results of 316 Chinese Internet events, focusing on accessibility, overlap, ranking and bias patterns. Results are reported and discussed.

Literature review The business and politics of search engines Search firms, largely trusted by the public, reveal little about how their searches work. Hinman chides that ‘[n]ever before will so few have controlled so much with so little public oversight or regulation’ (2008: 74). Although the details of search algorithms remain commercial secrets, their mode of operation is largely known (Granka, 2010). Online searching involves three major steps: crawling, indexing and ranking (Brin and Page, 1998). In crawling, a search engine reads and downloads a webpage, looking for updates. Indexing creates a catalogued database of crawled webpages. Ranking is the ordering of results by a search engine when the user queries. Search engines do not find webpages in real time. Instead, they algorithmically retrieve content crawled and indexed periodically. Ranking is critical in determining the relative prominence of webpages and the ordering of results.

214

new media & society 16(2)

Figure 1. Top search results for ‘Lawyer’ in Baidu. Note: Circles, arrows and emboldened text in English have been added by the author for illustration. Baidu uses ‘promotion’ instead of ‘ads’ to denote paid rankings and does not always use a colored background to distinguish promoted rankings from organic ones. The screen was captured on May 28, 2012. Results are subject to change depending on the evolving process of ad bidding.

Three major components – linguistic cues (e.g., webpage title), popularity cues (e.g., inbound links or ‘votes’ for a webpage) and user cues (e.g., user clicks) – make up the search algorithmic rules to rank the vast amount of web content (Granka, 2010). Search engines are often seen as impersonal, given their largely automated operations. However, editorial judgment is inevitable. Ranking and weighting criteria are built into search algorithms. Google’s PageRank uses a popularity metric, treating inbound links to a website as popularity votes, and votes from more popular sites are weighted more than the lesser known ones. Such a metric favors majority over quality (Cho et al., 2005), often giving preference to those with financial power (Introna and Nissenbaum, 2000). Its limitation is noted by the Google founders themselves: ‘We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers’ (Brin and Page, 1998). Search firms are essentially advertising companies whose products are free of charge to users. By tracking user behavior via cookies, firms make profit by selling user information to advertisers as commodities. Google displays ads separate from organic results.1 Baidu, however, continues to mix ‘promoted’ results with organic ones in China (see Figure 1). In recent years, Google has expanded its collection of user data, raising serious privacy concerns (Zimmer, 2007). Research also shows search engine development is driven overwhelmingly by market and technological factors, whereas fairness and representativeness, determinants of quality media content, are not major concerns to developers (Couvering,

Jiang

215

2007). Lately, the push to deliver more relevant results and precisely targeted ads not only deepens privacy concerns but also draws public attention to the ‘filter bubble’ phenomenon, whereby search engines can ‘form an opinion’ about users, customize results and shape our tastes and preferences (Pariser, 2011). In China, search has an unparalleled political dimension. Online censorship, the suppression of online speech directly by the state or indirectly by Internet companies for security, political, social and economic reasons, was found to be ‘pervasive’ in the country (Faris and Villeneuve, 2008). Internet firms are required to follow Chinese legal provisions to enforce filtering of information of a wide range, from the kinds threatening national security, unity and stability to those spreading rumor, pornography and violence, often vaguely defined (Qiu, 1999/2000). An evolving blacklist of banned words is kept by Internet service/content providers (ISPs/ICPs), for failing to do so can jeopardize their operating licenses (Levy, 2011).

Comparative research on search results in China Multiple methods have been used to examine search engines in China. Overall, research from China tends to be technology-focused, market-driven and light on politics, whereas Western researchers have focused overwhelmingly on censorship. Besides surveys and experiments used to assess the Chinese search market and user behaviors (CNNIC, 2011), perceptions of Baidu’s and Google’s effectiveness (Liu et al., 2010), and the two search engines’ retrieval performance (Tan et al., 2005), researchers have also engaged in comparative search results analysis. Comparing search results from selected queries across different search engines is typical information retrieval research. This method evaluates search quality and search engine properties by identifying search results patterns. For example, Wang and Liu (2007) examined Baidu’s, Google’s and Yahoo’s results based on 11,171 queries in China and found the overlap between Baidu and Google was 7.8% and 3.9% across the three search engines. Fei (2010) assessed the efficiency and ranking quality of five search engines in China. Using five types of queries (titles, PDF, URL, phrases and general search), the study collected 50,000 search results and found Google was superior to Baidu when queries were titles, PDF, URL and general searches, but was indistinguishable from Baidu using phrases. In terms of ranking quality, Baidu was the best with PDF queries and Google was the best with titles. While Chinese research largely steers clear of politics, China’s political censorship of searching has been scrutinized in the West. Reporters without Borders (2006) conducted the first search engine filtering study, comparing Google, Microsoft, Yahoo! and Baidu. ‘Sensitive’ keywords were picked to query search engines from outside China.2 Yahoo! turned out to be the worst offender. However, as China implemented the Great Firewall (GFW; a national filtering system erected at its border), queries from abroad to Yahoo! and Baidu (with servers in China) were filtered by the GFW, whereas those made to Google and Microsoft (with servers outside China) were unfiltered. In a more sophisticated study, Human Rights Watch (2006) compared Google, Microsoft, Yahoo! and Baidu. It tested 25 websites and 25 keywords in each search engine both inside and outside China to distinguish filtering by the GFW and by search

216


engines. Baidu’s filtering was found to be more extensive than Google’s or Microsoft’s Chinese versions. Observed variance was attributed to a mixture of reasons: the GFW, search firm filtering and proprietary algorithms. Similarly, the Search Monitor Project (Villeneuve, 2008) compared search results of Google, Microsoft, Yahoo! and Baidu, all with their servers in China at the time. Not only did the project examine search result variations, it also evaluated Google’s and Yahoo’s Chinese versions against their ‘global’ versions. Sixty keywords were used, covering politically sensitive topics and social taboos. The study found that Google censors considerably less. Although foreign search engines provided alternative information, politically sensitive content was often blocked. Search filtering, instead of following orders from a single authority, was tailored by individual firms with a significant degree of flexibility.

Search results: accessibility, overlap, ranking and bias Formerly the domain of information and computer science, search results research has evolved to probe the embedded values in search engines and their implications for knowledge, equality, diversity and democracy (Spink and Zimmer, 2008). Technical research on search engine stability (Bar-Ilan, 1999) and search results relevance, overlap and ranking (Spink et al., 2006; Vaughan, 2004) has been adapted to examine search engines’ rewriting of the past (Hellsten et al., 2006), equality in website coverage (Vaughan and Thelwall, 2004), search bias (Cho et al., 2005; Edelman, 2011; Mowshowitz and Kawaguchi, 2002; Pariser, 2011; Wright, 2011) and search filtering (Villeneuve, 2008). Discussed here are four dimensions of search results research pertaining to search engines’ political, social and commercial implications in China – accessibility, overlap, ranking and bias – although it is recognized that this research agenda may include other aspects, such as relevance, stability, coverage and precision (Machill et al., 2008; Vaughan, 2004). I define accessibility as the availability of search results to the user for a given query. Accessibility is framed as network connectivity, not the inherent quality of search results to be understandable or the percentage of the Web indexed by a search engine (Vaughan and Thelwall, 2004). Inaccessibility can be caused by normal connectivity problems, ‘bad links’ (e.g., defunct domain name) and censorship (e.g., by search engines, other ISPs and governments) (Faris and Villeneuve, 2008). The aforementioned Chinese search filtering studies all focus on accessibility. Besides accessibility, overlap and ranking patterns are windows into search engines’ differences as well. For a given query, overlap refers to a URL appearing in the returned results across multiple search engines. Ranking similarity examines the degree to which the same search results appear across search engines in similar ranking positions, usually operationalized by comparing the top one, top three and top five returned results (Spink et al., 2006). Both dimensions are assessed based on the first 10 results, since most users never go past the first page of a web search. Prior research has consistently identified low overlap and ranking similarity between search engines, largely attributed to differences in search firms’ crawling, indexing and ranking practices. For example, Spink and

Jiang

217

colleagues (2006) detected an overlap of 11.4% between any two of the four search engines (Google, Yahoo!, MSN Search and Ask Jeeves), where 7% of the four search engines’ number 1 results matched. Finally, search bias is perhaps the most complex and controversial. Algorithmic secrecy prevents spammers from gaming the system but also deters accurate assessment of bias. Google has been accused of biased presentation of its own views (e.g., its position on network neutrality), favored placement of its own services (e.g., Google Product Search) and disfavored rankings of its competitors’ sites (e.g., Foundem case) (Edelman, 2011). Baidu faces similar charges in China. It is the target of China’s first anti-trust investigation, brought by its competitor Hudong Baike (a Chinese Wikipedia-like site), which sued Baidu for US$124 million for demoting the site’s ranking in favor of Baidu’s own encyclopedia service (Agarwal and Round, 2011). Research on search bias has produced conflicting definitions and empirical evidence. Mowshowitz and Kawaguchi (2002) define bias as lack of ‘representativeness’ of retrieved content. Cho et al. (2005) argue that popularity-based PageRank is inherently biased against new webpages of good quality, perpetuating the ‘rich-get-richer’ phenomenon. Couvering (2007) reframed search bias as an outcome of economic conflict inherent in search development. Taking these views into account, I define search bias as practices where search results ‘systematically favor certain types of content over others’ (Goldman, 2006: 189). Specifically, search results bias includes biased display of search firms’ own views, favored placement of search engines’ own services and disfavored treatment of competitors’ sites (Edelman, 2011). Recently, Edelman (2011) found instances of Google’s search results bias across search engines, searches and over time, while Wright (2011) argued that Google’s practice is not inherently harmful and discovered own-content bias (favored inclusion and ranking of one’s own content) in Microsoft’s Bing to be more salient than in Google.

Research question Despite its importance, online search in China is under-explored. Firstly, research has been largely confined to technical aspects of information retrieval, information literacy, market analysis and political censorship, but rarely questions search engines’ larger social implications. Moreover, no empirical study has assessed Baidu’s and Google’s search results bias in China. Finally, no empirical study has examined how effective GFW’s filtering is after Google moved its servers to Hong Kong. To address these research gaps, this study proposes to compare Baidu’s and Google’s search results in terms of accessibility, overlap, ranking and bias. In addition, this project moves beyond a mere focus on censorship by using current events as search queries. With an expanding boundary of public discourse, Chinese netizens actively consume news and debate on the issues of the day that define the country’s public life (Jiang, 2010b), making online news the fourth most popular online activity in China (CNNIC, 2012). So instead of using only politically sensitive terms as keywords, this study queries search engines with ‘Internet events’ (Jiang, forthcoming). Widely circulated online, news of such events attracts the attention of many netizens even though they may not exert large social impact. The shift away from censorship toward Internet

218


Table 1. Distribution of 316 popular Chinese Internet events in 2009 by types. Event types

# of cases

Event type example

Gov Internet policies, legal cases Pop culture

117

Government’s anti-vulgarity campaign

63

Exposure

59

Internet industry

31

International events Individual or group rights defense Nationalism

25 15

Other

3 3

‘Knock off’ Spring Festival Gala: Grassroots spoof official gala Zhou Jiugeng: corrupt official’s luxurious lifestyle exposed online Netease ready to launch World of Warcraft (China edition) Twitter: Emerging media giant Deng Yujiao’s self-defense against officials’ sexual advances Russian–Shanghai Embassy’s website hacked: Protest against Russian sinking of Chinese cargo ship Police saves suicidal netizen’s life using IP address

Note: The author categorized the events by their primary types based on an interview with the blogger about his focus and organizing principles. Grounded theory was adopted in coding the events by types.

events will generate findings more reflective of everyday search experience in China. The study poses the following research question (RQ) pertaining to the political, social and commercial implications of Baidu’s and Google’s search results in China: RQ: To Mainland Chinese search engine users, how do search results differ between Baidu and Google (Chinese versions) along the four dimensions of accessibility, overlap, ranking and bias over popular Internet events in China?

Data collection This section details the process of data collection, focusing on query sample, archiving and aspects of comparison. Previous studies have used anywhere from one query to tens of thousands as a sample (Jansen et al., 2007). Given its focus on popular Internet events, this study adopts blogger Wen Yunchao’s archive of 316 Internet events in 2009 as the query sample.3 A former news editor, Wen is a social activist now. Although his blog is blocked in Mainland China for being too critical of government policies, his collection of Internet events is comprehensive and is widely circulated. In fact, his 2009 event list covers all the officially sanctioned Top 10 Internet Events of 2009 (South Net, 2010). Table 1 shows the most salient categories of events are ‘government Internet policies, legal cases’ (117 events), ‘pop culture’ (63 events) and ‘exposure’ (59 events). Overall, this list includes a wide range of topics: official (e.g., state anti-vulgarity campaign), neutral (e.g., Netease’s launch of World of Warcraft), trivial (e.g., grassroots knock-off Spring Gala) and critical (e.g., Liu Xiaobo). Although the sample

Jiang

219

is critical and emphasizes state Internet policies, it reflects diverse netizen interests very well. To archive data, all in simplified Chinese, the author, a Chinese native, entered information about the 316 events into a spreadsheet where each event is organized by date and title. After reading each blog entry, I prepared a query for each event. In most cases, original blog titles were kept as the query. For example, the most popular event in 2009 was ‘Eluding the Cat.’ To cover up the death of a detainee, Yunnan police floated a ridiculous excuse, claiming the young man died by accident while playing hide and seek, or ‘eluding the cat’ in prison (South Net, 2010). For a few events, such as ‘Blogger Inaccessible,’ ‘2009’ is added to provide temporal context. The spreadsheet was then sent via secure email to seven researchers in China: six in the south, one in the north.4 One researcher in southern China collected data for Baidu and Google by querying the general search box of Baidu (www.baidu.com) and Google (www.google.cn). Following Jansen and colleagues (2007), the webpage containing the first 10 results was saved. After archiving was completed in the last two weeks of October 2010, the author checked the dataset for completeness. Five other researchers in the same southern city then copied the first 10 textual hyperlinks of each webpage into the Excel document, retaining the embedded URL of each result.5 They also recorded inaccessible links and the reasons for inaccessibility. This yielded a total of 3160 links for Baidu and 3160 for Google. The author checked the dataset for missing or duplicated data. Data collection was completed in December 2010. The research group sorted the inaccessible links into three categories: normal connectivity problem, ‘bad links’ and search results filtering. Figure 2 maps out search results filtering, performed by the GFW, search engines and domestic ISP/ICPs. While filtering by the GFW is centralized, filtering by Baidu or other domestic ISP/ICPs is decentralized. Not only are users’ queries to Google filtered by the GFW, Baidu’s results pointing to foreign content also have to pass the GFW. We compared all inaccessible links in China and the US right after archiving to separate filtering of foreign content by the GFW and Baidu from technical failures. However, we could not detect or prove whether search engines pre-filter certain results from their index entirely or lower their rankings manually. We recognize that search opacity and personalization raise a number of issues. Results may vary by location, user, time, etc., thus making research efforts unreplicable (Hargittai, 2007). To minimize external influences, researchers in China were asked not to enable cookies that track users on their computers or log into their Baidu/Google account during data collection. Internet Explorer was set as the default browser. To control for geographical and other differences, a researcher in northern China collected search results on the first five online events of each month in 2009 for Baidu and Google at the same time. The sample was then compared to results of those events in the larger dataset. Disparity between them is shown to be small, as the overlap for Baidu reached 95% and Google 97%. Also, Baidu did not launch search personalization until September 2011 (Tencent, 2011). Google’s personalization was minimized by disabling cookies and data triangulation from two locations. Although it is impossible to control for all factors, the sample displays relatively high consistency and captures a valuable snapshot of the two search engines in time, yielding results not atypical of what users might

220


Domes c Content “Bad Links” (Various Scenarios) ISP/ICP Search Results Filtering

Great Firewall (GFW) Baidu Blockpage Par al Filtering

China Mainland Ne zen

“Connec on Reset” IP Blocking URL Filtering DNS Tampering Keyword Blocking Packet Filtering

Centralized Filtering

Foreign Content

Google in HK SafeSearch for Family Filters Pornography

Figure 2. Web filtering of search results in China. Note: The Great Firewall employs various filtering methods such as internet protocol blocking, keyword blocking and packet filtering (Faris and Villeneuve, 2008) where the notice ‘Your connection has been reset’ is often displayed. Search filtering by Baidu uses blockpage (returning no results) and partial filtering (returning fewer than 10 results). Google’s SafeSearch filters pornographic content. Filtering of domestic search results is more complex. Unless domestic the Internet service/content provider (ISP/ICP) clearly displays a censoring notice (e.g., ‘The webpage has not passed review’ or ‘The webpage is removed for violating related national laws and regulations’), ISP/ICP censorship of search results is indistinguishable from normal causes of ‘bad links’, such as real system failure or removal of content by authors themselves. Bad links, from the average Chinese searcher’s perspective, are commonly experienced as follows: (1) webpage non-existent (website displaying content non-existent or removed notice, error page or auto-directed to homepage); (2) user autodirected to third-party, domain registration site; (3) system failure notices (connectivity problem, server not found, connection reset).

experience. We then compared results from Baidu and Google along the following dimensions: 1. accessibility: availability of results from the location of the query; 2. overlap: a URL appearing twice in the first 10 returned results for a given query across the two search engines (e.g., ranked first in Baidu and fifth in Google); 3. ranking: the appearance of the same URL across the two search engines in their top one, top three, and top five results, in the same or different ranking (e.g., ranked first in Baidu and third in Google); 4. bias patterns: own-content bias (favored inclusion and ranking of search engines’ own content) and other-content bias (exclusion from and lowered ranking of rivals’ sites). Inaccessible links, recorded by Chinese researchers during archiving, are singled out for qualitative analysis to detect patterns. For overlap and ranking assessment, the author and a Chinese-speaking student in the US compared Baidu’s and Google’s results separately. We marked overlapping and same-ranking pairs. For search bias, we extracted from the dataset links that direct users to: (1) Baidu’s or Google’s own services; and (2) Chinese Wikipedia and Hudong Baike, given their open dispute.6 We use Excel’s ‘find’ function to identify matching results from the dataset. If any of the 316 queries already

221

Jiang Table 2. Distribution of Baidu’s and Google’s link accessibility.

Total number of links Links inaccessible from China Bad links Links inaccessible due to search engine filtering Links inaccessible from China but accessible from the US (blocked by GFW) Links inaccessible due to GFW blocking

Baidu

Google

3160 171 125 22 (partial filtering or blockpage by Baidu) 24

3160 400 91 10 (by Google SafeSearch auto-block) 299 (among them, 180 ‘connection reset’)

24

299

GFW: Great Firewall.

contains ‘Baidu,’ ‘Google,’ ‘Wikipedia’ or ‘Hudong Baike,’ the corresponding matching results are discarded from analysis.

Data analysis Accessibility Among Google’s 3160 search results, 400, or 11.1%, were inaccessible from Mainland China. A total of 171 of Baidu’s 3160 search results were inaccessible, 4.7% of its total. Of Google’s 400 inaccessible links, 180 were blocked entirely by the GFW for political or whimsical reasons, while 10 were auto-blocked by Google’s SafeSearch function because the query contained pornographic references (Table 2). Of the remaining 210 inaccessible links on Google, 119 were actually accessible from the US and 91 were classified as bad links (see Table 2, Figure 2). In total, 299 Google search results were blocked by the GFW, nearly three-quarters of all Google’s inaccessible links. In contrast, among Baidu’s 171 inaccessible links, 22 were inaccessible either because a query was blocked entirely by Baidu (e.g., a query on Liu Xiaobo, dissident and Nobel peace laureate) or because Baidu returned fewer than five results for a query (results are pre-filtered to comply with state laws). Of the remaining 149 inaccessible links for Baidu, 24 were actually accessible from the US (meaning they were blocked by the GFW), leaving 125 as bad links. Altogether, the GFW censored 299 of Google’s links and 24 of Baidu’s. While users are occasionally notified by ISPs that the content was deleted due to violation of ‘relevant laws,’ such practice is by no means standardized. Our data suggests that after the Google–Beijing conflict, the GFW assumed the burden of censorship, previously enforced by Google. Qualitative analysis of the blocked topics produces a more complex picture. Among the 19 topics blocked by the GFW for Google’s results, some were politically sensitive (see Table 3, last six entries). Other topics, such as public censure of a forced installation of the Green Dam filtering software on all computers sold in China, were deemed delicate initially, but they are no longer banned. However, the blocking for the query ‘Jia

222


Table 3. Nineteen topics blocked entirely by the Great Firewall for Google. 1. ‘Jia Junpeng’ viral online 2. ‘Demolition Man’ 3. 20 pornographic literary websites closed 4. Zhou Jiugeng got 11 years in prison 5. Hangzhou Speeding Accident culprit Hu Bing interviewed 6. Wenzhou ‘House Purchase Gate’ 7. Minister Li Yizhong responds to Green Dam debacle 8. Liu Yiming arrested 9. Li Kaifu: ‘Goodbye, Google’ 10. Hu Qiheng supports real name registration online 11. BIT professor Hu Xingdou sues Beijing XinNet Co. and Suzhou web censors 12. Attention on Sichu Earthquake Anniversary press conference 13. Wu Baoquan defamation case retried 14. Sexy photos of ‘Maritime Girl’ (blocked by Google’s SafeSearch) 15. Circumvention software TOR ineffective in China 16. Economist/Activist Feng Zhenghu became Twitter star 17. Artist/Activist Ai Weiwei documentary ‘Disturbing the Peace’ 18. Internet Human Rights Declaration 19. Online ‘Yellow Ribbons’ for Liu Xiaobo Note: Events are arranged on a continuum of ‘sensitivity’ roughly from the least to the most politically sensitive.

Junpeng,’ a harmless, fabricated post left in China’s World of Warcraft calling Jia to go home to eat was quite arbitrary. In our dataset, Baidu blocked far fewer topics, its filtering much subtler. The only topic blocked entirely was Liu Xiaobo. Other topics yielding fewer than five results or many bad links largely mirror those blocked in Google. Compared to the GFW’s wholesale blocking for Google, Baidu’s filtering is more fine-tuned. A closer look reveals: (1) Baidu occasionally provides links to overseas Chinese content including, in this dataset, a link to a Voice of America article critical of Beijing, although it was blocked by the GFW; (2) more than half of Baidu’s links on critical queries were broken, mostly due to ISP content removal; and (3) accessible Baidu results for ‘sensitive’ queries (e.g., Ai Weiwei) carry political tones neutral or agreeable to authorities’ views. To re-evaluate the accessibility of blocked content, a year later, in October 2011, one of the researchers in southern China entered the same heavily blocked queries again in both Baidu and Google. Among the 19 initially banned topics for Google, only the last six (see Table 3) were kept blocked, thus reducing the GFW-blocked Google search results from 190 to 60. Baidu’s results patterns remained largely the same. Among the original nine contentious topics, two were completely blocked: besides ‘Liu Xiaobo,’ the query ‘Feng Zhenghu’ was banned this time. Keyword blocking seems subject to change depending on the political winds of the time. Overall, our data show that Google’s results are equally if not more likely to be inaccessible than Baidu’s, although for different reasons: GFW filtering for the former and bad links for the latter. The combination of cruder

223

Jiang Table 4. Overlapping results distribution between Baidu and Google by event. Number of overlapping URLs for a given query 1 Pair (different ranking) 1 Pair (same ranking) 2 Pairs 3 Pairs 4 Pairs 5 Pairs Total number of overlapping pairs Total number of same-ranking pairs Top 1 results matched

Number of events 75 22 28 × 2 = 56 (16 same-ranking pairs) 18 × 3 = 54 ( 6 same-ranking pairs) 1 × 4 = 4 1 × 5 = 5 216 44 24

filtering by the GFW and subtler filtering by Baidu, although imperfect, proves effective in blocking alternative political content in Mainland China.

Overlap and ranking To assess overlap and ranking, two Chinese researchers coded the data, with an excellent overall inter-coder reliability of 0.85 (k = Cohen’s Kappa) for overlap and 0.94 (k = Cohen’s Kappa) for ranking. Out of a total of 3160 pairs between the two search engines, 216 pairs of links share the same URL (blocked content not considered). This yields an overlap of 6.8%, that is, 6.8% of Baidu’s and Google’s search results share the same URLs (see Table 4). The distribution of these 216 overlapping pairs by event is as follows: one event has five overlapping pairs; another event has four overlapping pairs; 18 have three overlapping pairs; 28 have two overlapping pairs; 22 have a single overlapping pair sharing the same ranking in Baidu and Google (e.g., both URLs are ranked fourth); and 75 have a single overlapping pair that do not share the same ranking. Altogether, 44 pairs share the same ranking across Baidu and Google. None of the event queries yields a set of results with more than five overlapping pairs (see Table 4). The event with the most overlapping pairs is a story on Chinese Internet giant Netease’s CEO, William Ding, who started a big farm in light of the deplorable food safety standards in China. In terms of ranking similarity, 22 pairs have matched top one results; 61 have matched top three results; and 84 have matched top five results (with the same or different rankings), corresponding to 0.7%, 1.9% and 2.7% out of a total of 3160 potentially matching pairs of links. Even if one takes into account inaccessible links (171 for Baidu and 400 for Google) for a total of 2589 potentially matching pairs of links, the percentages of ranking similarity for top one, top three and top five results remain quite low at 0.8%, 2.4% and 3.2%, respectively. While the study’s overlap rate of 6.8% is similar to the rate of 7.78% between Google and Baidu in Wang and Liu’s study (2007), both are much lower than the average overlap rate of 11.4% between any two search engines in Spink et al.’s study (2006).

224


Table 5. Frequency counts of Baidu, Google, Hudong and Wikipedia services in search results.

Baidu Zhidao (Q&A) Baidu Docs (document sharing) Baidu Space (personal space, blogs) Baidu Baike (encyclopedia) Baidu Tieba (post bar) Baidu Pictures Baidu Videos Baidu Search Baidu Services (aggregated) Google Hudong Baike Wikipedia

Baidu

Google

20 17 17 10 14 6 4 1 89 0 1 0

7 5 72 13 0 0 0 1 98 0 31 16

Compared to 7% of matching top one results in Spink et al. (2006), this study’s 0.8% is almost negligible. As Wang and Liu (2007) observed, declining overlap and ranking similarity between search engines is an ongoing trend among English- and Chineselanguage search engines. Various reasons may have contributed to it, not the least of which may be an enlarged Web, variant methods of crawling, indexing, ranking and political filtering.

Bias patterns For bias patterns, the study focuses on a search engine’s tendency to favor its own content and disfavor its rivals in linking and ranking patterns. In terms of links to Baidu’s services, Baidu and Google are on par, with 89 and 98 links each. Frequency counts (see Table 5) show that of the 89 times Baidu references its own services, 20 are links to Baidu Zhidao (a Q&A service), 17 to Baidu Docs (a document sharing service), 17 to Baidu Space (its personal space, blogging service), 14 to Baidu Tieba (a community posting service) and 10 to Baidu Baike (a Wikipedia-like site). In contrast, Google results include 98 links to Baidu services: 72 to Baidu Space, 13 to Baidu Baike, 7 to Baidu Zhidao, 5 to Baidu Docs and 1 to Baidu Search. Their comparison yields some interesting patterns. While Baidu’s 89 links to its own services are spread somewhat evenly across different types, Google refers to Baidu Space 72 times, perhaps due to Baidu Space’s popularity and the algorithmic weight Google assigned to blogging services. Neither Baidu nor Google provides any links to Google’s services. Such an absence is understandable given Google’s partial exit from China and limited provision of its services in China. On the other hand, some intriguing patterns emerged in the Chinese online encyclopedia spaces. In our dataset, the numbers of Baidu’s references to Baidu Baike, Hudong Baidke and Chinese Wikipedia are 10, 1 and 0, respectively, as opposed to 13, 31 and 16 in Google’s. At the time of our data collection in late 2010, Hudong

225

Jiang Table 6. Baidu, Hudong and Wikipedia content ranked in Baidu and Google.

Baidu Services in Baidu * Baidu Services in Google * Baidu Q&A (in Baidu) Baidu Q&A (in Google) Baidu Docs (in Baidu) Baidu Docs (in Google) Baidu Space (in Baidu) Baidu Space (in Google) Baidu Baike (in Baidu) Baidu Baike (in Google) Hudong Baike in Baidu Hudong Baike in Google Chinese Wikipedia in Baidu Chinese Wikipedia in Google

Top 1

Top 3

Top 5

Top 10

8 11 2 1 1 0 0 9 5 1 0 4 0

23 35 8 2 5 1 4 26 6 6 0 10 0

38 56 11 3 11 2 9 41 7 10 1 19 0

64 97 20 7 17 5 17 72 10 13 1 31 0

1

4

12

16

Note: ‘Baidu Services in Baidu’ and ‘Baidu Services in Google’ are aggregates based on four Baidu services, excluding Baidu Post Bar, Pictures, Videos and Search.

Baike had nearly 5 million entries, Baidu Baike 3 million and Chinese Wikipedia about 300,000 (Chinese Wikipedia, 2012). At the time, Hudong had about 66% more entries than Baidu Baike. Within such contexts, our data raises some issues about the disproportionately feeble presence of Hudong and Chinese Wikipedia in Baidu’s results and their prominence in Google’s. Besides linking patterns, this study also examines ranking patterns by comparing how a search engine ranks its own services and its competitors’. In particular, ranking patterns of Baidu Baike, Hudong Baike and Wikipedia are appraised (see Table 6). When Baidu’s major services (Baidu Q&A, Baidu Docs, Baidu Space and Baidu Baike) are aggregated, their rankings in the search results of Baidu and Google are on par with each other. Overall, Baidu’s services are ranked in the top five results in both search engines about 60% of the time. However, Baidu Docs is more prominent in Baidu’s top five results, appearing about 65% of the time, compared to 40% in Google’s. Baidu Space also stands out in Google’s rankings of search results. In addition, Baidu ranks Baidu Baike prominently. Baidu Baike appears more frequently in Baidu’s top spot (five times and 50% of the time) than in Google’s (once and 7.7% of the time). Baidu’s rankings of Baidu Baike also eclipse Google’s rankings of Hudong and Chinese Wikipedia for the top spot, 12.9% and 6.3% of the time, respectively. However, when aggregated, Baidu Baike is only slightly more noticeable in Google’s top five search results (76.9%) than Baidu’s (70%). Moreover, Google ranks Hudong Baike (12.9%) more frequently in the top spot than Chinese Wikipedia (6.1%). However, in the top five search results, Chinese Wikipedia is ranked slightly higher (75%) than Baidu Baike (61.3%).

226


Discussion Over the years, search engines have become increasingly powerful in shaping our information choices and have attracted public scrutiny. Not only is political filtering a highprofile public issue, there is also growing concern that search engines, like legacy media, may exert undue influence through mainstreaming, hyper-commercialism and consolidation (Diaz, 2008). This study has empirically examined the political, social and commercial implications of search engines in the Chinese context.

Internet filtering This study has offered some empirical evidence that after Google moved its servers to Hong Kong, the GFW continued to police search engines in China effectively. Consequently, Google’s search results in Mainland China are a function of both Google’s algorithms and the GFW filtering system. Baidu’s results, on the other hand, are determined by its own algorithms plus self-censorship and filtering by the GFW (see Figure 2). On the surface, Baidu’s results seem more accessible than Google’s, as ‘harmful’ content is pre-filtered. A closer look at the data reveals that bad links are the main cause of inaccessibility for Baidu in China, while a considerable number of results from Google are blocked by the GFW. Compared to Baidu’s subtler filtering, the GFW’s wholesale blocking of Google’s results is much more obtuse. These results lend support to a recent national survey on search engine use in China (CNNIC, 2011) where 44% of respondents reported (and objected to) the prevalence of ‘garbage information’ and bad links on Baidu while citing ‘slow response’ and ‘system breakdown’ as the Achilles’ heel for Google. With filtering implemented through the GFW and Internet firms, traditional topdown propaganda has evolved into a two-tiered system, embodying both centralized and decentralized features. While Party propaganda was traditionally carried out by the state’s own media channels including the press, radio and TV networks, digital propaganda is characterized by decentralization where filtering is delegated to individual Internet firms. However, decentralized control does not negate certain centralized state propaganda functions. Besides maintaining the effectiveness of the GFW, the state publicity department regularly issues orders to block or remove ‘dangerous’ content (Levy, 2011). Not monolithic by any means, the two-tiered online filtering system has been largely effective, growing in parallel with the expanding Chinese Internet. Search filtering has a particularly perverse effect on public discourse and memory, as search engines are molded to reproduce prevailing ideologies. Given search engines’ centrality in the information universe, when public discourse is removed at the whim of search firms or the state’s will, particular conversations or events, for many, practically did not happen. Blocking by search algorithm or removal of content from computer memory for censorship purposes is more than a ‘rewrite’ of the past (Hellsten et al., 2006): it is tantamount to virtual political disappearance. The loss of search memory contributes to public amnesia in such a way that makes search engines part of the new repertoire of the state’s disciplinary mechanisms (Foucault, 1980).

Jiang

227

Search and social reality Search engines derive their power not only from their engineering superiority and a remarkable ability to serve as an advertising platform, but also from their potential to mediate between users and social reality. As supercharged metamedia (McLuhan, 2001) of our time whose form is as important as the content it carries, search engines ‘bring the online world into focus. … [They] can redirect, reveal, magnify, and distort’ (Grimmelmann, 2010: 435). As fantastic man-made assemblages, search engines can help shape our perception of the world to produce a sense of social reality. While Berger and Luckmann (1966) posit that reality is socially constructed, postmodernists such as Boorstin (1962) and Baudrillard (1981) went further to argue that in a media-saturated world, our encounter with events and reality itself is nothing more than a simulated experience often perceived to be more ‘real’ than the event or reality itself. Search engines’ potential in shaping this simulated reality has been largely overlooked in the literature, despite frequent references to their importance in constructing and distributing knowledge (Halavais, 2008). As this study shows, not only can search engines be architecturally altered to reproduce dominant political values, different search engines also offer different results and different realities. Although previous studies also found low overlap and ranking similarity between search engines (Spink et al., 2006; Wang and Liu, 2007), they have fallen short, arguably, by merely reporting such technical differences without probing search engines’ larger sociopolitical implications for the construction of knowledge and social realities. The not-so-small divergence between Baidu’s and Google’s search results alerts us to the arbitrariness with which search engines produce social realities, as well as the means and rules with which they determine presence and prominence, especially over controversial public issues. If search rankings cannot be perfect, they are, at their very best, simulated approximations of a constructed ‘natural order’ of things. However, user trust in search engines tends to turn algorithmically generated reality into truth in its own right. The programmable nature of search engines makes it possible for political authorities, search firms and other powerful interest groups to shape and control social realities via search.

Search engine bias Lately, both Baidu and Google have become the target for unfair competition (Agarwal and Round, 2011; Edelman, 2011). Meanwhile, little empirical work has examined the extent to which search engines might favor their own content and discriminate against that of others. By investigating Baidu’s and Google’s linking and ranking bias patterns, this study finds Baidu rarely links to its competitors Hudong Baike or Chinese Wikipedia, while their presence in Google’s results is much more prominent. At the time of our data collection, Hudong Baike had 66% more wiki entries than Baidu Baike, yet was referenced only once by Baidu in our dataset, compared to Google’s 31 times. Chinese Wikipedia had only 300,000 wiki entries compared to Hudong’s 5 million entries and Baidu Baike’ 3 million, and yet appeared in Google results 16 times as opposed to none in Baidu. Moreover, Baidu Baike was ranked much more favorably by Baidu than by Google. These results raise search bias concerns for both Baidu and Google.

228


There are certainly limitations to assertions of search bias. The size of the query sample, 316, is relatively small and not entirely randomized. One may argue the relative significance of a Wikipedia-type service is not determined by the number of entries alone and may be influenced by other factors. Yet, the number of entries in a user-generated site is arguably a good proxy for its online presence and reputation. The sheer absence of Hudong and Chinese Wikipedia in Baidu’s results and their prominence in Google’s seems disproportional and questionable. It raises the possibility that both search engines might favorably treat a particular service. Charges of search engine bias face almost insurmountable hurdles, however. It is difficult to prove search bias given the myriad factors in ranking and relevance computation. Moreover, search results, interpreted as ‘opinions,’ are also currently recognized by US courts as free speech ‘entitled to full constitutional protection’ under the First Amendment (Bracha and Pasquale, 2008: 1151), despite the fact that search companies often unequivocally claim to be algorithmically objective. One may reasonably argue search ranking by definition has to discriminate (Goldman, 2006; Grimmelmann, 2010), but intentional bias or power abuse hardly seems defensible if they were the cause of harm to advertisers or users.

Methodological reflections and research limitations Methodologically, the dynamic nature of search poses major challenges to search research. The author cautions that research results may not be generalizable. Firstly, our sample is non-random, with an emphasis on government policies and controversial public affairs, which may have lowered link accessibility, overlap and ranking similarity, while increasing search bias patterns. Including such sensitive queries as ‘Ai Weiwei’ and ‘Liu Xiaobo’ may seem to contradict the study’s intent to reflect average users’ experience while generate more inaccessible links. However, it is entirely possible for Chinese users to search these terms. In fact, ‘Ai Weiwei’ has not been blocked in Baidu given his artistic legacies, including the Beijing Olympic Stadium. What counts as ‘sensitive’ is also subject to change. It is precisely around the edges that the social fabric of search is defined. Secondly, the enormous size of the Web and ever-shifting search dynamics further complicate sampling procedures. This study sampled ‘older’ events in 2009 and collected data in late 2010. The dataset is more stable than that of ongoing events yet faces link ‘decay’ as Web content refreshes. So what the study captures is a snapshot of the two search engines in time. Its results may or may not hold in the future but do serve as a useful baseline for comparisons. Besides a well-defined focus, future research may include (semi-)automatic, longitudinal archiving/indexing strategies to chart the evolution of search. Thirdly, this study does not use multiple queries for each event, given resource constraints. Moreover, search queries, derived from a critical blogger with slight modification by the author, may contain the influence of both with unpredictable results. Multiple queries for the same event may be desirable if researchers wish to examine the impact of linguistic variations on search results with proper means to compare data across query strings, topics and search engines.

Jiang

229

Search personalization, adopted by Baidu in September 2011 and by Google in late 2009 (Tencent, 2011), may also have an impact on the study, despite efforts to minimize it. Personalization poses particular challenges to generalized research findings. At the time of our data collection, Baidu had not turned up its personalization switch, but Google presumably did despite being outside Mainland China. Besides taking technical precautions, the study minimized the impact of personalization through data triangulation at two geographical locations. Without disclosure from search firms regarding personalization, variation patterns cannot be fully accounted for. Future research may collect large random datasets to discern search preferential patterns or use more precise, contextual measures to assess the evolution, quality or bias of search results.

Conclusion Web search has become a basic economic and social utility in the information society, popularly viewed as objective, complete and even divine. Although the impartiality of search engines has been called into question by scholars and high-profile incidents such as Google’s clash with Beijing, political filtering is often seen as an exception to the otherwise impersonal and impartial operation of search engines. This study has argued that search engines are products of specific economic, social and political forces. The study has contributed to a better understanding of such forces in China, filling several research gaps. It has revealed that after Google moved from the Chinese Mainland to Hong Kong, its results are equally if not more likely to be inaccessible than Baidu’s, due to filtering by the GFW and bad links. Baidu’s filtering is more subtle than GFW’s wholesale blocking of Google’s results. Moreover, there is little overlap or ranking similarity between Baidu’s and Google’s results, implying different search engines, different results and different social realities. This study also finds Baidu rarely links to its competitors Hudong Baike or Chinese Wikipedia, while their presence in Google’s results is much more prominent, raising search bias concerns. Together, these results suggest that search engines can be architecturally altered to serve political regimes, arbitrary in rendering social realities and biased toward self-interest. Large gaps still exist in Chinese search engine research. While various search filtering mechanisms and patterns have been identified and tested, their corrosive impact upon Chinese social life and politics is far from clear. We know relatively little about user awareness of, attitudes toward and means of coping with search filtering beyond anecdotal accounts. Neither do we know much about the actual influence of search differences and search personalization on user knowledge and perception. In addition, future studies could compare search quality (e.g., quality of indexing, hits and usability) more systematically across various search engines in China. Baidu’s unique practice of mixing ‘promoted’ and organic results, as well as its impact on the user and the bottom line, are worth closer examination too. Almost without doubt, the ongoing anti-trust case brought against Baidu by Hudong Baike will push forward investigation in search monopoly and search bias in unprecedented ways. ‘We shape our tools, and thereafter our tools shape us’ (McLuhan, 2001: xi). This study has made a start in identifying some of the biases and implications of search

230


engines in China, but their impact on Chinese users and society are sure to receive much more scholarly scrutiny in the years to come. Acknowledgements The author thanks Abbas Akhtar, founder of Vidpk.com and Acumen Fund Global Fellow, for technical consultation. Three anonymous reviewers, editors of this issue, as well as Nick Jankowski, Jon Crane, Wendy Anderson, Ashely Esarey, Ang Peng Hwa, Hu Yong, Xiao Qiang, Lokman Tsui and Teresa Harrison provided valuable feedback. Huijing Yang assisted with coding. Graduate students Yujie Liu, Lingyao Yuan, Youfei Jiang, Jingyi Chen, Lei Zhao, Xiaoxuan Wang, Ziyi Peng, Zhipeng Xun, Liuning Zhu, Shuai Zhu, Jingjing Zhao and Ruming Chen worked on early iterations of the project.

Funding This work was partly supported by a UNC Charlotte Faculty Research Grant (FRG #1-11214).

Notes 1. Organic results are results relevant to the query, not influenced by advertisements. Both Google and Baidu adopt an open auction model where the highest bidder/advertiser gets placed most prominently. Google displays ‘ads’ and ‘sponsored links’ typically at screen top, bottom and right with a colored background, separate from organic results. Baidu displays ads in similar areas, but uses the ‘promoted’ label instead of ‘ads.’ Unlike Google, Baidu does not always place ‘promoted’ results in colored background and continues to mix ‘organic’ and ‘promoted’ results (see Figure 1). 2. Sensitive keywords are those deemed dangerous or harmful by authorities. They encompass a wide range, subject to change. For instance, ‘June 4,’ used to denote Tiananmen protests in 1989, is thought to threaten national security; ‘Tibet independence’ is thought to be harmful to national unity; and ‘sexy photo’ is often blocked for vulgarity. See Villeneuve (2008) for a partial list of ‘sensitive’ keywords. 3. Wen’s collection (http://wenyunchao.com; now defunct) was archived on April 2, 2010. The dataset is available upon request. 4. All data collectors are Chinese, college-educated, Internet-savvy professionals. They were modestly compensated. For safety reasons, their names and locations will remain anonymous. 5. Visual links (pictures, videos or maps) and sponsored (advertising) links are excluded. 6. A filtered version of Wikipedia in Simplified Chinese (http://zh.wikipedia.org) is accessible in Mainland China.

References Agarwal M and Round D (2011) The emergence of global search engine: trends in history and competition. Competition Policy International 7: 115–132. Bar-Ilan, J (1999) Search engine results over time. A case study on search engine stability. Cybermetrics 2/3(1). Available at http://cybermetrics.cindoc.csic.es/articles/v2i1p1.html Baudrillard J (1981) Simulacra and Simulation. Ann Arbor, MI: University of Michigan Press. Berger P and Luckmann T (1966) The Social Construction of Reality: A Treatise in the Sociology of Knowledge. Garden City, NY: Anchor Books. Boorstin D (1962) The Image: A Guide to Pseudo-Events in America. New York: Vintage. Bracha O and Pasquale F (2008) Federal search commission? Access, fairness and accountability in the law of search. Cornell Law Review 93: 1149–1209.

Jiang

231

Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. Available at: http://infolab.stanford.edu/~backrub/google.html China Internet Network Information Center (CNNIC) (2011) 2011 Chinese search engine market research report. Available at: http://h.cnnicresearch.cn/download/report/rid/34 China Internet Network Information Center (CNNIC) (2012) 30th statistical survey report on Internet development in China. Available at: http://www.cnnic.cn/research/bgxz/tjbg/201207/ P020120719489935146937.pdf Chinese Wikipedia (2012) Baidu Baike. Hudong Baike, and Chinese Wikipedia. Chinese Wikipedia. Available at: http://zh.wikipedia.org Cho J, Roy S and Adams R (2005) Page quality: in search of an unbiased web ranking. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, 14-16 June 2005, pp. 551–562. Couvering E (2007) Is relevance relevant? Market, science, and war: discourses of search engine quality. Journal of Computer-Mediated Communication 12(3): 866–887. Diaz A (2008). Through the Google goggles: sociopolitical bias in search engine design. In: A Spink and M Zimmer (eds) Web Search: Multidisciplinary Perspectives. Berlin: Springer, pp.11–34. Edelman B (2011) Bias in search results? Diagnosis and response. The Indiana Journal of Law and Technology 7: 16–32. Faris R and Villeneuve N (2008) Measuring global internet filtering. In Deibert R, Palfrey J, Rohozinski R et al. (eds) Access Denied. Cambridge, MA: MIT Press, pp. 5–27. Fei W (2010) Evaluating the effectiveness of search engines’ search functions. (Doctoral dissertation). Retrieved from China National Knowledge Infrastructure (CNKI) database. Foucault M (1980) Knowledge/Power. New York: Pantheon Books. Goldman E (2006) Search engine bias and the demise of search engine utopianism. Yale Journal of Law and Technology 8: 188–200. Granka L (2010) The politics of search. Information Society 26: 364–374. Grimmelmann J (2010) Some skepticism about search neutrality. In: Szoka B and Marcus A (eds) The Next Digital Decade. Washington, DC: TechFreedom, pp. 435–459. Halavais A (2008) Search Engine Society. Cambridge: Polity Press. Hargittai E (2007) The social, political, economic, and cultural dimensions of search engines: an introduction. Journal of Computer-Mediated Communication 12: 769–777. Hellsten L, Leydesdoff L and Wouters P (2006) Multiple presents: how search engines rewrite the past. New Media & Society 8: 901–924. Hinman L (2008) Searching ethics: the role of search engines in the construction and distribution of knowledge. In: Spink A and Zimmer M (eds) Web Search Multidisciplinary Perspectives. Berlin: Springer, pp. 67–76. Human Rights Watch (2006) Race to the bottom: corporate complicity in Chinese internet censorship. Human Rights Watch 18:8C. Available at: http://www.hrw.org/sites/default/files/reports/ china0806webwcover.pdf Introna L and Nissenbaum H (2000) Shaping the web: why the politics of search engines matter. Information Society 16: 69–185. Jansen J, Spink A and Koshman S (2007) Web searcher interaction with the Dogpile.com metasearch engine. Journal of the American Society for Information Science and Technology 58(5): 744–755. Jiang M (2010a) Authoritarian informationalism: China’s approach to Internet sovereignty. SAIS Review of International Affairs 30(2): 71–89. Jiang M (2010b) Authoritarian deliberation on Chinese Internet. Electronic Journal of Communication, 20 (3&4). Available at: http://www.cios.org/www/ejc/v20n34toc.htm

232


Jiang M (forthcoming) Internet events in China. In: Esarey A and Kluver R (eds) The Internet in China: Online Business, Information, Distribution, and Social connectivity. New York: Berkshire Publishing. Levy S (2011) In the Plex: How Google Thinks, Works, and Shapes Our Lives. New York: Simon & Schuster. Liu Z, Zhang F and Chen S (2010) Comparative study on search effectiveness of Google and Baidu based on user experience. Journal of Zhejiang University (Science Edition) 37(5): 605–610. Machill M, Beiler M and Zenker M (2008) Search-engine research: a European-American overview and systemization of an interdisciplinary and international research field. Media, Culture & Society 30: 591–608. McLuhan M (2001) Understanding Media: The Extension of Man. 9th ed. Cambridge, MA: MIT Press. Mowshowitz A and Kawaguchi A (2002) Assessing bias in search engines. Information Processing & Management 38: 141–156. Pan B, Hembrooke H, Joachims T et al. (2007) In Google we trust. Journal of Computer-Mediated Communication 12(3). Available at: http://jcmc.indiana.edu/vol12/issue3/pan.html Pariser E (2011) The Filter Bubble. New York: Penguin Press. Qiu J (1999/2000) Virtual censorship in China. International Journal of Communications Law and Policy 4: 1–25. Reporters without Borders (2006) Test of filtering by Sohu and Sina search engines following upgrade. Available at: http://en.rsf.org/china-test-of-filtering-by-sohu-andsina-22-06-2006,18015.html South Net (2010) Top ten Chinese Internet events in 2009. Available at: http://news.southcn.com/ c/2009-12/22/content_7381301.htm Spink A and Zimmer M (eds) (2008) Web Search: Multidisciplinary Perspectives. Berlin: Springer. Spink A, Jansen B, Blakely C et al. (2006) A study of results overlap and uniqueness among major web search engines. Information Processing & Management 42:1379–1391. Tan D, Lin M and Ye S (2005) A comparative analysis of Chinese Google and Baidu’s results ranking and retrieval efficiency. Journal of Modern Information 3: 87–89. Taylor G (2013) Search quality and revenue cannibalisation by competing search engines. Journal of Economics & Management Strategy 22(3): 445–467. Tencent (2011) Baidu announces new homepage: personalized search becomes reality. Tencent Technology, 7 September. Available at: http://tech.qq.com/a/20110907/000479.htm Vaughan L (2004) New measurements for search engine evaluation proposed and tested. Information Processing & Management 40(4): 677–691. Vaughan L and Thelwall M (2004) Search engine coverage bias: evidence and possible causes. Information Processing & Management 40(4): 693–707. Villeneuve N (2008) Search monitor project: toward a measure of transparency. Citizen Lab Occasional paper, no. 1. University of Toronto. Available at: http://citizenlab.org/wp-content/ uploads/2011/08/nartv-searchmonitor.pdf Wang Y and Liu F (2007) Chinese search engine results overlap research report 2007. Chinese Search Behavior Research Lab, Information Management Department, Peking University, China, December 2007. Wright J (2011) Defining and measuring search bias: some preliminary evidence. International Center for Law and Economics Paper Series, November 2011, George Mason University, Fairfax, VA. Yahoo! Finance (2012) Get quote. Yahoo! Finance. Available at: http://finance.yahoo.com/ Zimmer M (2007) The quest for the perfect search engine. Dissertation, New York University, New York.

Jiang

233

Author biography Min Jiang is assistant professor of communication at the University of North Carolina at Charlotte and an affiliate researcher at the Center for Global Communication Studies, University of Pennsylvania. Her research explores the intersections of Chinese Internet politics, social activism, media policies and international relations. She is writing a book about the Chinese Internet. She has given talks at Harvard University, Oxford University, Johns Hopkins University and the French Institute of International Relations, among others. Prior to pursing her doctor’s degree in the United States, she had worked as an international news editor at Chinese Central Television and as an assistant to director for Kill Bill I in her native country, China.