Privacy Concerns and Information Disclosure: An ... - Semantic Scholar

1 downloads 0 Views 569KB Size Report
Apr 19, 2009 - revelation system enacted: a “pulling” model, where one has to go and ...... Norberg, Patricia A., Horne, Daniel R. and Horne, David A., 2007, ...
H. John Heinz III College – School of Public Policy and Management Carnegie Mellon University

Privacy Concerns and Information Disclosure: An illusion of control hypothesis Laura Brandimarte First Research Paper

DRAFT April 2009

Advisory Committee Professor Alessandro Acquisti Professor George Loewenstein Professor Linda Babcock

Abstract This paper investigates one possible explanation for people’s conflicting attitudes regarding protection of private information. The proliferation of studies about crimes such as identity theft and cyber-stalking, together with the sharp increase in the number of victims, and the easiness with which data can be retrieved on the Web has raised people’s awareness and concerns about the consequences of revelation of private information, especially on the Internet. On the other hand, the increasing popularity of online social networks and blogs suggests that more and more people are willing to reveal all kinds of information about themselves to the members of these communities. In order to understand this dichotomy, we introduce and test the hypothesis that people may confuse control over publication of private information with control over accessibility and use of that information by third parties. Borrowing the terminology of psychology and behavioral decision research, we refer to this hypothesis as “illusion of control”. We designed two experiments in the form of online surveys to students at Carnegie Mellon University, in which we manipulated control over information publication. The results provide empirical evidence of illusion of control. Keywords: illusion of control, online social networks, privacy protection, experimental design

I. Introduction Oxford, 1952. C. S. (“Jack”) Lewis, author, among other renowned writings, of the collection of children stories “The Chronicles of Narnia”, is about to meet for the first time the woman that, a few years later, will become his wife, Helen Joy Gresham. She knows the author pretty well from his writings, but she is about to know the man. She enters the crowded tea room where they agreed to meet and, after vainly sounding out a waiter about Lewis’s countenance, she asks loud: “Anybody here called Lewis”? Jack waves at her from the back of the room and, after a very polite, formal introduction, he can’t help showing his unease in suddenly being recognized by all the customers present in the room: “Well, I’m not what you might call a public figure, Mrs. Gresham…” Surprised by the embarrassed reaction of the well-known writer, the feisty, direct American poet notices: “Oh, you’re not… I mean, you write all these books and you give all those talks and everything just so everybody’ll leave you alone”? What could be a convincing, rational explanation to contradictory behaviors like the one described in this scene of the 1993 movie “Shadowlands”? Why would someone, who wants to remain anonymous and unidentified, give lectures open to the public and advertise his books in the city bookstores? C.S. Lewis wasn’t doing it for money: he was a famous, successful writer, his books selling very well both in the old and the new continent. He probably wasn’t doing it for pleasure either: if exposure to the friendly eyes of a couple of dozens of his fellow citizens made him uncomfortable, a lecture in front of 200 Londoners should have made him miserable… or maybe not. What if one of the reasons for Jack’s annoyed reaction to Joy’s approach were the lack of control? When Jack lectures the association of Christian teachers in London, or when he goes to a bookstore to sign dedications for his readers on the first page of “The Lion, The Witch and The Wardrobe”, he has previously decided to make himself physically visible and recognizable to his audience, he has voluntarily chosen to add a face to his thoughts and words. On the other hand, when he plans to meet a pen-friend in a tea room, he is not the celebrity who unfolds to his admirers: he’s just a man, brilliant but introverted intellectual, who accepts to unveil to a sharp, interesting woman, and he reluctantly finds himself under the intent look of other observers.

Decisions that are apparently inconsistent, when assessed from the classical rational agent perspective, could lose their contradictory nature if analyzed in the context of psychological and behavioral models. Decisions regarding privacy protection are one such example: people are typically very concerned about protecting their lives and private information from the indiscreet eye of outsiders; yet, they reveal a lot about themselves to strangers, especially on the Internet and online social networks. These websites are virtual communities whose members can share potentially every kind of information – from name to occupation, from date of birth to phone number, from pictures and videos to tastes and preferences over music, movies and books – and have become extremely popular all over the world. Indeed, many users of online social networks compile very detailed profiles for themselves, containing a lot of private information. But what if they didn’t have control over its publication? What if the same exact information had been published by another party? Specifically considering the users of Facebook, they busted in what Solove (2007) calls “an enormous outcry” when, in 2006, the website launched a feature called News Feed, that posted on users’ first webpage, after logging in, all the latest activities of their “friends”: whether they changed their relationship status from “single” to “engaged”, whether they posted new pictures or new videos, whether they became “friends” with somebody, and so on. All this information was accessible even before the introduction of News Feed of course, but one would only see it exploring each of his friends’ profiles. Nonetheless, users reacted furiously to the introduction of this feature, suggesting that their privacy concerns strongly depended on the type of information revelation system enacted: a “pulling” model, where one has to go and search for information if he is interested, is perceived as less privacy-invasive than a “pushing” model, where the same amount of information is provided by default, without one asking for it or putting any effort in searching for it. As noticed in Boyd (2008), “participants had to shift their default expectation that each action would most likely be unnoticed to an expectation that every move would be announced”. Lack of awareness of the consequences of such all-inclusive information sharing attitudes is probably not a plausible explanation for this behavior: first of all, a quick look through the online version of some major newspapers suggests that the dangers of revealing private information on

the web are well advertised and known among Internet users1. Information on identity theft is also easily retrievable: for example Wikipedia explicitly lists “Browsing social network (MySpace, Facebook, Bebo etc) sites online for personal details that have been posted by users” 2 as one of the techniques used by criminals in order to obtain personally identifiable information (PII) and thus steal their victims’ identities. The Facebook statistics say that the average user has 120 friends on the site; that more than 3.5 billion minutes are spent on Facebook each day (worldwide); that more than 20 million users update their statuses at least once a day; that more than 4 million users become fans of Pages each day – revealing their preferences over celebrities from the worlds of music, sports, show business… –; that more than 850 million photos and 8 million videos are uploaded to the site each month; that more than 1 billion pieces of content (web links, news stories, blog posts, notes, photos, etc.) are shared each week. A massive flow of more or less private information that users voluntarily make visible and available on the web. But do they realize that? When deciding whether to make that information visible to others or not, were they considering the fact that they were also making it available for use by others? This paper addresses the question of why people are willing to disclose so much private information, and particularly PII, if, at the same time, they express serious concerns about privacy protection. More specifically, we test the hypothesis that one of the psychological mechanisms that urges people to expose themselves to such a large extent is illusion of control on the information they reveal: since they have control over the publication of their private information, they believe they also have control over the accessibility and use of that information by others. Privacy and control are multifaceted concepts that require careful definition: the level of perceived privacy is certainly related to the amount of control that one has over his own private information, and this interaction has been already recognized in the literature. But what does control over personal information mean? Is it control over the way in which the information is communicated or circulated? Is it control over the accessibility of data? and which data

1

“Beware of Facebook ‘Friends’ Who May Trash Your Laptop”, The Wall Street Journal, January 29, 2009; “Phishing becoming more common on 'trusted' sites - Be skeptical when you're sent a link in an e-mail or posting”, The Chicago Tribune, January 8, 2009; “Koobface computer virus attacks Facebook users”, The San Francisco Chronicle, December 6, 2008; “How to Friend Mom, Dad, and the Boss on Facebook...Safely”, The New York Times, January 30, 2009. 2

http://en.wikipedia.org/wiki/Identity_theft#Techniques_for_obtaining_personal_information

specifically? Are privacy-related decisions “rational” – in the economic sense of the word? Do people thoroughly examine pros and cons of publishing information and act consequently? Analyzing the processes of decision making and understanding the motives of people to act in a certain way, leading to outcomes that might be sooner or later regrettable, is challenging and interesting in its own right, but is also pivotal in policy perspective: it is the first step towards the design and implementation of policies that can guide to more efficient outcomes. The role of law is one example: given people’s attitudes and motivations, sometimes incoherent, and given the unprecedented easiness in collecting and disseminating data around the globe in today’s “information age”, is there anything we can do to protect privacy? Citing Solove (2007), “what can and should the law do? *…+ take a libertarian approach and remain as ''hands off'' as possible[?] *…+ adopt an authoritarian approach and attempt to radically limit the ability of people to spread information on the Internet*?+ Or *…+ take some middle-ground approach between these extremes*?+”. The remaining of this paper is structured as follows: section II provides an overview of the existing literature about privacy, control and different explanations of people’s contradictory privacyrelated decisions. Section III presents our hypothesis and describes the empirical methodology we used in order to test it. Section IV shows and explains the results of our tests. Section V concludes with final remarks and suggestions for future work.

II. Related literature Traditionally, the concept of privacy of information has overlapped with the concept of control. Quoting Alan Westin (1967), “privacy is the claim of individuals, groups, or institutions to determine for themselves when, how and to what extent information about them is communicated to others”. Along these lines, Miller (1971) says that “the basic attribute of an effective right of privacy is the individual’s ability to control the circulation of information relating to him”. Fried (1984) defines privacy as “not simply an absence of information about us in the minds of others, rather it is the control we have over information about ourselves”. And Elgesem (1996) believes that “to have personal privacy is to have the ability to consent to the dissemination of personal information”. According to Lessig (2002), privacy, similarly to copyright, is a way of controlling information: “Just as the individual concerned about privacy wants to

control who gets access to what and when, the copyright holder wants to control who gets access to what and when”. The digital age and the Internet have certainly reduced our ability to control the flow of information about ourselves. Solove (2007) notices that “with blogs and social network sites *…+ there will be more instances when information we want to keep on a short leash will escape from our control”. The paradox of the Internet consists in the fact that, on the one hand, it gives people an extraordinary freedom of expression and communication, that has never been possible at this extent before; but on the other hand, the Internet also constrains people, because it makes their private information more likely to be diffused in ways that can thwart future opportunities. “As people use the freedom-enhancing dimensions of the Internet, as they express themselves and engage in self-development, they may be constraining the freedom and self-development of others – and even of themselves” (Solove, 2007). Does this reduced control necessarily imply reduced privacy? In contrast with the literature referred to above, that defines privacy in terms of control, Tavani and Moor (2001) state that “the concept of privacy itself is best defined in terms of restricted access, not control. Privacy is fundamentally about protection from intrusion and information gathering by others. Individual control of personal information, on the other hand, is part of the justification of privacy”. This definition is probably the most appropriate in this context: distinguishing privacy from control, helps to understand how having control doesn’t necessarily mean having privacy and vice versa. We don’t have control when we have to reveal our private information to the Census Bureau, but we certainly expect the government to have high regard for our right to privacy. On the other hand, we have control when we create our profile on Facebook but we accept the idea that the information we post is no more private. Some studies empirically addressed the relationship between privacy and control. Acquisti, John and Loewenstein (working paper) use field and hypothetical experiments to analyze the discrepancy between the amount of money that people are willing to pay to protect their private information – keeping control of it and thus not having to make it accessible by third parties – and the amount they are willing to accept in order to reveal it – losing control of it and allowing third parties to access it.

In September 2008 the Consumer Reports National Research Center published the results of a Consumer Reports Poll3 showing that most Americans are very concerned about what is being done with their personal information online and want more control over how their information is collected and used. Norberg et al. (2007), focusing on the same topic addressed in this paper, empirically test for what they call the “privacy paradox”: individuals’ intentions to disclose private information and actual behaviors. They show that consumers voice concerns that their rights and ability to control their personal information in the marketplace are being violated. However, despite the complaints, it appears that consumers freely provide personal data. In the psychological literature, the illusion of control is defined as a cognitive bias by which one is convinced that he can influence an event with his behavior, while rationally it is clear that he has no power to affect the outcome. Langer (1975) refers to the attitude of people to “behave as though chance events are subject to control”. She emphasizes the fact that people have troubles in distinguishing cases where skill is necessary for success from instances where success relies exclusively on chance, a common feeling that is also captured by proverbs like “God helps those who help themselves”, or “Fortune favors the bold”. This heuristic has been detected in several experiments, which provide significant supporting evidence of illusion of control, not only when the outcome depends totally on chance but also when it is contingent on somebody else’s behavior. For instance, Sanderson et al. (1989) tested the effect of illusion of control on anxiety and panic attacks caused by inhalation of CO2-enriched air. When compared with patients who believed they had control, patients who believed they could not control the CO2 administration reported a greater number of panic attack symptoms, rated the symptoms as more intense, reported greater anxiety and were significantly more likely to report panic attacks. As another example, Fenton-O’Creevy et al. (2003) report on a study of 107 traders in 4 investment banks in London testing – and finding evidence in support of – the hypothesis that traders with low or moderate level of propensity to illusion of control perform better than those with high levels: illusion of control is inversely related to performance.

3

http://www.consumersunion.org/pub/core_telecom_and_utilities/006189.html, accessed on April 19, 2009.

One of the most cited studies regarding illusion of control is Henslin (1967), where the author observes that dice shooters seem to believe in “the principle that a hard throw produces a large number, and a soft or easy throw produces a low number. *…+ Other shooting techniques that are believed to maximize control over the outcome of the dice involve evidencing concentration and effort *…+ which consists of taking one’s time in shooting, of working on it and of talking to the dice”. What we mean by “illusion of control” in the context of privacy is the belief that direct publication of private information implies, in the publisher’s mind, control over access and use of that information by third parties. The argument of this paper is that, even after the individual makes this information about himself accessible by the members of online communities (or even to the larger universe of internet users), he suffers from an illusion of control upon it. Even though he is perfectly aware that the information he posts on his profile or on a blog becomes available to his friends (or to everyone on the internet), he unconsciously assumes nobody will use it without his authorization, for example publishing it on web pages that are outside the social network, or using it for malicious scopes. The simple fact that the individual is personally responsible for the publication of some information, and has voluntarily and consciously decided to disclose it, makes him feel endowed with the power of controlling it. We believe this is particularly true when information disclosure takes place through Information Technologies, because the intermediation of IT makes the action of "access" by others seem so remote and intangible. This perception of control will make the individual feel comfortable with revealing private information, possibly more information than it would be optimal and more than he would reveal if he wasn’t subject to this illusion. Illusion of control is only one of the mechanisms that contribute to explain apparently inconsistent privacy-related decisions. Other concurring explanations for people’s willingness to reveal private information can be derived from the literature and include: wrong estimation of probabilities (prospect theory), trust, hyperbolic time discounting and immediate gratification, and perfectly rational models of decision making. Kahneman and Tversky (1979) in their seminal paper on prospect theory derive a model for decision under risk that is alternative to classical expected utility theory. As part of that model, they define the value that people attach to a risky prospect as a function of the subjective value of the possible outcomes of the prospect, and of an associated weighting function that is related to

objective probabilities, but is not exactly equal to the probability of each outcome happening. Essentially the authors claim that people tend to underweight or overweight objective probabilities and what happens at the extremes of the probability distribution is not clear: people could either underestimate or overestimate very small and very large probabilities. For example, small probabilities of disaster are sometimes entirely ignored (Kunreuther, 1978). The 2007 Identity Fraud Survey Report4, released by Javelin Strategy and Research in February 2007, estimates that 8.4 million US adults were victims of identity fraud that year, approximately corresponding to “only” 2.8% of the US population5. If people underestimate the probability of becoming victims of id theft, they might consequently underestimate the importance of id theft preventive actions, such as avoiding to reveal private information in online social networks. Trust in this kind of websites could also contribute to explain willingness to reveal information. As discussed in Acquisti and Gross (2006), the Facebook is perceived “as a closed, trusted, and trustworthy community” by its users, which might justify the impressively detailed nature of the profiles created therein. Moreover, “the easier it is for people to join and to find points of contact with other users (by providing vast amounts of personal information, and by perusing equally vast amounts of data provided by others), the higher the utility of the network to the users themselves”, therefore the higher the incentive to add as much information as possible to their own profiles. Another contributing factor could be people’s preference for immediate gratification and hyperbolic time discounting. Acquisti (2004) shows how a perfectly rational individual, who strongly values privacy, might end up revealing more information than it would be optimal for him because of self-control problems and time-inconsistencies in his optimizing behavior: what appears to be the best action now may not be the best action once the time of taking it actually arrives. These apparent contradictions are due to hyperbolic time discounting, or the fact that people tend to value what is close in time much more than what is distant in time and the discount factor that they apply to future events is not constant (for a complete review of the literature on hyperbolic time discounting see Frederick, Loewenstein and O’Donoghue, 2002). Applying this result to our framework, the immediate benefits of sharing private information in online social networks and the discounted value of preserving privacy (that, on the other hand, 4

http://www.javelinstrategy.com/uploads/701.R_2007IdentityFraudSurveyReport_Brochure.pdf This estimate is based on population projections provided by the US Census Bureau and published on http://www.census.gov/popest/national/national.html 5

might only appear much further away in the future) will be calculated based on different discount factors, possibly leading to suboptimal choices – revelation of too much private information. In that same paper (Acquisti, 2004) the author also presents a model of rational choice in privacy decisions, where he shows that what we’ve been referring to here as revelation of “too much” information might indeed be the optimal level of revelation, given a certain utility function depending on some expected costs and benefits from private information disclosure. Of course it’s always possible to define a model where rational optimization leads to the actual, observed choice and behavior, but, as Acquisti notices, “it is unlikely that individuals can act rationally in the economic sense when facing privacy sensitive decisions”. Our contribution goes precisely in this direction: trying to motivate privacy related decisions abstracting from classical economic rational agent models and borrowing concepts from the psychology and behavioral economics literature to solve the “privacy puzzle”.

III. Experimental design

In order to test for illusion of control in online social networks, we ran two randomized experiments, both of them survey-based. The questions contained in the survey, distributed among CMU students, were essentially the same across the two experiments (we only added a few questions about demographic characteristics in the second experiment and slightly changed the wording for two other questions) and asked about students’ life in Pittsburgh and on campus (the list of questions is in the appendix). The first experiment was run in March 2008 and the second one in March 2009. Ostensibly, the justification for the survey was the creation of a new CMU networking website that would be launched at the end of the ongoing semester, and we were looking for students who were willing to become members of the network. Some questions asked about PII, other were privacy intrusive, other were not privacy intrusive and were asked just with the intent of making the networking website story more credible. For both experiments, when subjects clicked on the link to the survey, they were randomly redirected to one of two conditions, the manipulation in the two conditions being the amount of control participants had on the publication of their private information.

First experiment The first page of the questionnaire contained three lines of instructions (the exact phrasing can be found in the appendix), explaining that none of the questions required an answer, but that all the answers provided would be part of a profile that would appear on a new CMU networking website under construction, accessible to the CMU community only (students, professors, staff). In one condition, subjects were told that a profile would have been automatically created for them, containing the information they provided, and that this profile would have been published online once the website was completed, without any intervention by the researcher. In the other condition, they were told that a researcher would have collected the data, created a profile for them and published it on the network. Notice that the manipulation was very subtle, since in none of the conditions the subjects actually saw the resulting profile, which in fact wasn’t truly being created. In the first condition, though, people should have felt more control over their information: they decided exactly what to publish, if they wanted to publish anything at all. In the second condition, on the other hand, the presence of an unknown “researcher” should have made them feel less in control of their information: if they decided to disclose it, they couldn’t be perfectly sure of what was going to happen to it, because it ended up in possession of a third party. This scenario was designed to emulate the condition of online social network users, who can obviously decide if and what to reveal, but might not realize that, once that decision is implemented, they can’t decide about how third parties are going to use these revealed pieces of information. Except for the control manipulation, the surveys were otherwise identical: 38 questions requiring approximately 5 minutes for completion. Of course the website didn’t really exist, so in order to make the setup credible, some questions regarded everyday life of students at CMU, which one would expect given the nature of the website: what program they were enrolled in, what courses they were taking, how satisfied they were with their program, whether they practiced any sport on campus… The questionnaire included open-ended, multiple choice and rating questions. There were two main variables of interest: whether the subject decided to answer or not – and in particular whether he/she answered to PII and more privacy intrusive questions – and whether the subject admitted certain sensitive behaviors. We included 3 privacy-invasive questions, regarding cheating at school and with the partner, 2 mildly-intrusive questions regarding the participant’s romantic relationships and 7 questions about PII. Technically, the information requested did not

consist of PII but of quasi-identifiers: pieces of information that, taken by themselves, do not allow for unique identification, but that do if combined together6. We were especially interested in evaluating the effect of the treatment (control manipulation) on the response rate to these sets of questions. If indeed people suffer from illusion of control, they should be willing to reveal more information in the first condition, where they feel personally responsible for the publication of that information. The fact that, in the second condition, there is a third person between them and the online publication should negatively affect their propensity to reveal private information. If subjects reveal that information in the first but not in the second condition, it means that it is not the publication of their private information per se that disturbs them, but the fact that someone else will publish it for them. One possible confounding factor in this design is that people are less willing to reveal private information in the second condition not only because they have less control, but also because they don’t trust the unspecified “researcher”. He may not report the information correctly, he may not securely store it, he may use it maliciously. All this is a consequence of lack of control, of course, but we can’t exclude that lack of trust has a direct effect on willingness to reveal information, not mediated by control. Second experiment In order to eliminate the confounding factor described above, we ran a second experiment, essentially equal to the first one in terms of design: same type of survey, minor phrasing modifications to a few questions, addition of 2 questions (gender and age in years), random sample from the population of CMU students, same recruitment methods; what we changed was the control manipulation. The first condition remained unaltered with respect to the first experiment, while in the second condition participants were told that a 50% subset of the profiles created would have been randomly picked and published on the new CMU networking website. This way participants had control over the information revealed, but not over the information published. One would expect that they revealed more in the second condition, because the probability of their private information being published – and therefore visible and available to others – was halved in that case. If, on the other hand, their decision was based on the level of control that they had over the information published on the network, then they should have been

6

According to Sweeney (2002), 87% of the US population is uniquely identified by date of birth, gender, and 5-digit zip code.

willing to reveal more in the first condition: the feeling of control decreases privacy concerns, which leads to larger willingness to reveal sensitive information.

IV. Results First experiment For consistency with the terminology of experimental economics, we now refer to the first condition in both experiments as the control condition, which in our context happens to be the one where subjects have more control over the publication of private information; and we refer to the second condition as the treatment condition, where the treatment consists in depriving subjects of the control over information publication. This manipulation allows us to separate subjects’ feeling of control over revelation from the feeling of control over publication of private information, which according to our hypothesis are two concepts that people usually conflate, leading to larger propensity to disclosure. In the first experiment, 29 subjects took the survey in the control condition and 32 took the survey in the treatment condition. The survey contained 38 questions: 7 quasi-identifiers – first name, last name, date and place of birth, email and home address, phone number –, 6 questions that we considered privacy invasive questions – whether the subject was married, whether he/she had a girlfriend/boyfriend, whether he/she ever cheated on her/him, whether he/she cheated for exams or homework, whether he/she saw somebody else cheating and, in that case, whether he/she informed the instructor – and 25 non-intrusive questions that asked, among other things, about sports practiced, membership to fraternities, sororities or similar university associations, type of accommodation in which the subjects were leaving, how often did they see their family, whether they had a job, and so on. 16 questions had open-ended responses, 14 questions had binary – yes/no – responses, 5 were multiple choice questions and 3 were rating questions. The average response rate (percentage of questions answered, averaged across subjects) was 84% and 71%, respectively. This might seem an unexpectedly high response rate, given the little amount of information that participants received about the study, but it might be misleading, given the fact that most questions were not privacy intrusive or quasi-identifiers. The two lowest response rates in the control condition correspond to the questions regarding home address (with an average response rate of 45%) and phone number, (48%). The same two questions have,

respectively, the third to lowest (50%) and lowest (37.5%) response rate in the treatment condition, with the second to lowest (41%) corresponding to one of the 3 privacy-intrusive questions, asking subjects whether they ever informed their teacher in case they saw somebody cheating during an exam or for a project or a homework assignment. Figure 1 shows the percentage of subjects answering each of the questions in the two conditions. Figure 1: Percentage of subjects answering each question in the control condition (blue, 29 subjects total) and in the treatment condition (red, 32 subjects total). 100.0% 80.0% 60.0% 40.0% 20.0% First Name Last Name DoB PoB Email Address Phone # On FB How long in Pitt Like the city Happy Sport Which sport Sport on campus Rate facilities Group Which group Friends Friends at CMU? Friends elsewhere Spare time Family See family Married Girlfriend Cheated on partner Accommodation Roommates Move out Program Courses Cheated at school Others cheated Instructor Rate program Competitive Hours studying Job

0.0%

In order to test for illusion of control we performed a Pearson’s chi-square test of independence and we applied a regression approach. For the latter, we used a dummy variable - equal to 1 if the question was answered and zero otherwise - as a dependent variable, and a treatment dummy variable - equal to 1 if the subject was in the treatment condition and zero otherwise - as the explanatory variable. Since the treatment was randomly assigned, we didn’t need to control for any effect other than the treatment. We didn’t have information on the demographic characteristics of the subjects (they were all encrypted due to confidentiality issues), but we have no reason to believe that subjects should systematically differ across groups, therefore the answers of the subjects in the treatment group can be interpreted as the counterfactual answers that subjects in the control group would have given, had they been exposed to treatment. We first considered the response rate separately for each question and we regressed it on the treatment variable. The equation of interest is: q ij  a j  b * treatmenti   ij , where i=,1,…,29- for the control group and i=,1,…,32} for the treatment group indexes the subject, and j=,1,…,38indexes the question. A negative coefficient on treatment would suggest that people are less willing to reveal private information if they are not personally responsible for its publication (so, it

would suggest that, when revealing and publishing private information, people suffer from illusion of control). Using a simple linear probability model, we estimated the equation above with OLS and, in order to take into account the binary nature of the dependent variable, we also estimated a Probit model. For the latter specification, the classical interpretation of an unobserved “latent” variable and a corresponding observed indicator variable applies: we don’t observe to what extent people suffer from illusion of control (call this unobserved continuous variable q*), but we observe whether or not providing them with that illusion (control condition) they are more or less willing to reveal private information. Therefore qij will be equal to 1 if people suffer from illusion of control (q*>0) and zero otherwise. Assuming that the error term in the equation for the unobserved variable is normally distributed, we will obtain a standard Probit model. The results are similar in the two specifications, both in terms of sign and magnitude of the effect of the treatment, so we report only the coefficients for the Linear Probability Model, which have a straightforward interpretation of marginal effects of the treatment on the willingness to reveal private information. Supporting our hypothesis, except for question 14, which asks whether the subject practices any sport on campus, the treatment effect is significant only for quasi-identifiers or for intrusive questions (last name, email address, having a girlfriend or boyfriend, cheating at school, seeing someone else cheat, informing the teacher about it). For brevity, Table 1 only includes the questions with a significant coefficient7. Our explanation for significant coefficients on questions 36 to 38 is that asking a self-evaluation of their own competitiveness on the market after graduation could make the subjects feel uncomfortable, because, especially if very optimistic, this evaluation might be seen by their friends (who, they believe, will be able to access their profile) as an expression of arrogance. Similarly, questions about the amount of hours spent studying and about having a job might be perceived as intrusive by some participants in this context, because they believe that their answers will be seen (among others) by their professors, who could infer their level of commitment in school from these answers. In order to prove this interpretation we plan to ask for an independent evaluation of the questions’ level of invasiveness by subjects who did not participate to the experiment. A two-tailed test of two proportions shows that the percentage of subjects answering “No” to the question about cheating for exams or homework assignments doesn’t change significantly (10% 7

Questions 8 to 12 and 15-16 had 100% response rate in the control condition; since there is no variation within the group, the treatment variable could perfectly predict the dependent variable for the whole group.

level) between the control (81.5%) and the treatment condition (75%), so it doesn’t seem to be the case that the introduction of the treatment pushed subjects to admit to these behaviors more. The question regarding cheating on the partner is the only intrusive question for which the treatment effect is not significant (p-value for chi-squared test = .14), even though the response rate drops from approximately 90% to 75%. As we can see from Table 1, all the coefficients are negative, supporting our hypothesis that people will reveal less privacy intrusive information if they do not feel directly responsible for its publication. On the other hand, the effect of the treatment on other non-intrusive questions is not statistically significant – again, except for questions 14 and 36 to 38.

Table 1.

OLS coefficients of regression of response rate on treatment – first experiment. * indicates significance at 10% level; robust standard errors in brackets; chi-squared statistic and corresponding p-value.

In order to evaluate the overall effect of illusion of control on willingness to reveal private information, we also used a panel approach: 38 observations per individual, for a total of 2,318 answers. A pooled-OLS estimation, with adjusted (clustered) standard errors, and a random effects Probit estimation were performed; the results are summarized in Table 2.

Table 2.

OLS and random effects Probit coefficients of regression of response on treatment. * indicates significance at 10% level, ** indicates significance at 5% level; standard errors in brackets, clustered for OLS regression.

The results suggest that people suffer from illusion of control: the coefficient in both regressions is negative and significant8, indicating that when people do not feel personally responsible for the publication of their private information, they tend to reveal less. Nevertheless, there are some potential confounding factors that might lead to overestimation of the treatment effect. First of all, the presence of an unspecified researcher in the treatment condition may have pushed subjects to reveal less because of lack of trust, rather than or besides lack of control: subjects might think that the researcher will not report their answers correctly, or that he/she may not guarantee confidentiality of the answers provided, and so on. The second experiment solves this problem. Lack of awareness Another confounding factor that deserves careful evaluation is the possibility that some subjects may be unaware of the consequences of disclosing certain type of private information, particularly personally identifiable information. Even though, as we noticed in the introduction, Internet users should be familiar with the risks of such behavior, given the great amount of resources available on newspapers’ websites and online encyclopedias, we can’t rule out the possibility that the subjects who participated to the experiment were not informed. One mitigating factor, though, is that 18 out of 30 subjects who answered the questions regarding program and courses attended said they were enrolled in programs that fall within the area of computer science and information systems (Computer Science, Software Engineering, Electrical and Computer Engineering, Human Computer Interaction, Information Systems, Carnegie Institute of Technology). This makes the assumption of unawareness harder to justify. We should also mention that most of the students at CMU are familiar with the use of the Internet and online social networks – 48 subjects actually said that they were members of the Facebook – which should reduce the relevance of the confounding factor even more. If one wanted to be conservative and assume that some online social networks users do not know about the consequences of sharing PII with their “friends”, still one has to consider that awareness doesn’t always explain people’s behavior: tobacco smoking or drug use have been scientifically proved to damage people’s health and cause addiction, and yet many people consume them. Similarly, everybody reckons that waste separation and recycling is a friendly attitude towards the environment, but not everybody does it. Knowing about the risks of disclosing private information 8

The Probit coefficient estimate doesn’t tell us the marginal effect of the treatment but the sign is the same.

doesn’t prevent everybody from doing it. There must be something else behind people’s decisions: rational awareness often has too much of a short-term effect and its absence can’t fully explain inconsistencies in decision making. Acquisti and Gross (2006), among the several issues addressed in their study, analyze the changes in people’s behavior in the context of Facebook’s online social network after exposing them to privacy-related information. The authors find that the number of people who actually change their behavior (modify the privacy settings of their profile or even abandon the network completely) after exposure to privacy-related information is small, so even though the overall difference in the information revealed before and after the exposure is statistically significant, the behavior of each single individual does not show a significant change. These results are explained by the fact that those few people who changed their profiles, changed them dramatically, suggesting some sort of bimodal distribution of privacy awareness, where some people are very privacy sensitive (those whom Ackerman et al., 1999, defined “fundamentalists”), others are not sensitive at all (“marginally concerned”). From a policy perspective, this is quite discouraging since it implies that trying to make people aware of the risks they run revealing their private information not always has an effect on their behavior. It would be interesting to check what is the current distribution of privacy fundamentalists and marginally concerned users. Our sense is that the percentage of people who are very privacy-aware grew in the last three years (since the time when the study by Acquisti and Gross was run): according to Facebook’s Chief Privacy Officer, the percentage of members who ever changed their privacy settings, 1% at the time, is 20% today9. Leading to similar results, Spiekermann, Grossklags and Berendt (2001) conducted an experiment on people’s privacy concerns and attitudes in the context of online shopping. They find that even the most privacy-aware and concerned subjects reveal a lot of private information, regardless of its relevance with respect to the product being bought. This result is quite daunting, especially considering that in this study people were asked to sign a consent form allowing for their data to be sold to an unspecified third party. Indeed, it seems that imperfect information is not the main reason for people’s choices concerning privacy.

9

This is what Randall Stross reports in his New http://www.nytimes.com/2009/03/08/business/08digi.html

York

Times

article

on

March

th

7 ,

2009:

In April 2008, Gemalto, a US company in the digital security business, released the results of a survey10 conducted on their behalf by TNS Sofres, a market research information provider. Only 22 percent of the subjects felt “very good” about the security in any of the digital technology they use: if, as the company claims, the sample is representative of the US population, these results suggest that the vast majority of Americans remain wary. Identity theft topped the list of their fears at 74 percent, followed by online bank account hijacking at 44 percent. 21 percent of respondents had already suffered from bank data theft attempts, and 11 percent had identity or personal data stolen. In December 2008, Unisys published a report on their Global Security Index 11, a measure of public perception of major security issues in 13 countries, including the US. After bankcard fraud, identity theft is reported as the second greatest area of concern across all countries, being the first or second highest concern in 10 countries. So it seems that, when asked about their privacy concerns, people react by showing high apprehension and awareness of the risks that the digital age brought about in this respect. However, the way they behave is not consistent with this assertion: they would trade their DNA for a Big Mac (Shostack, 2003), they reveal all sorts of private information to strangers (Acquisti and Grossklags, 2003), they create amazingly detailed profiles of themselves in online social networks. Typical users of social networks put together and post demographic data, photographs, information about their tastes, their contacts, their education, job and so on, and only a fraction of them restrict the accessibility of their profiles to known “friends”.

Second experiment In this experiment we added questions about gender and age (two quasi-identifiers), so that we could include some demographic characteristics in our analysis. With this addition, the survey contained 40 questions and the distribution of questions by type was the same as the one in the first experiment, except that here we have 9 rather than 7 quasi-identifiers, one more open-ended question and one more binary response question. We were able to get a much larger sample of students than for the previous experiment: 67 subjects took the survey in the control condition (34 females and 29 males, 4 missing answers) and 65 took the survey in the treatment condition 10

st

http://www.gemalto.com/php/pr_view.php?id=321, last accessed on February 1 , 2009. http://www.unisyssecurityindex.com/resources/reports/Global%20Security%20Index%20-%20Dec08.pdf, last st accessed on February 1 , 2009. 11

(28 females and 33 males 4 missing answers). The age distribution of subjects in the control and treatment conditions was described by the following statistics: the mode is 19 and 20 respectively, the average is 21.4 and 21.6 (not statistically different: t-stat = 0.05, well below 1.645, the value corresponding to 10% significance level), the standard deviation is 2.85 and 2.86. The average response rate was 89% and 87% respectively, suggesting a much smaller overall effect of the manipulation relative to the first experiment. As with the first experiment, the two lowest response rates in the control condition correspond to the questions regarding home address (with an average response rate of 57%) and phone number (60%). The same two questions also have the lowest response rate in the treatment condition, their average being 35% and 30% respectively. 27 subjects (40% of the people) in the control condition answered to all sensitive questions and provided all quasi-identifiers, while only 12 (18%) did so in the treatment condition. Figure 2 shows the percentage of subjects answering each of the questions in the two conditions. Figure 2: Percentage of subjects answering each question in the control condition (blue, 67 subjects total) and in the treatment condition (red, 65 subjects total). 100.0% 80.0% 60.0% 40.0% 20.0% First Name Last Name Gender DoB Age PoB Email Address Phone # On FB How long in Pitt Like the city Happy Sport Which sport Sport on campus Rate facilities Group Which group Friends Friends at CMU? Friends elsewhere Spare time Family See family Married Girlfriend Cheated on partner Accommodation Roommates Move out Program Courses Cheated at school Others cheated Instructor Rate program Competitive Hours studying Job

0.0%

The regression analysis gives results that are qualitatively similar to the ones obtained in experiment 1, but quantitatively stronger. Again confirming our hypothesis, the treatment effect is significant for questions about last name, email address, home address and phone number (quasiidentifiers) and cheating (privacy sensitive)12 and not significant for other questions. The only 12

Questions 8 to 14, 16, 24 and 29 had 100% response rate in the treatment condition; since there is no variation within the group, the treatment variable could perfectly predict the dependent variable for the whole group.

exception is the positive and statistically significant coefficient (10% level) on question 20 (number of close friends in town), but the effect is much smaller than the one observed for the other questions.

Table 3.

OLS coefficients of regression of response rate on treatment – second experiment. * indicates significance at 10% level; ** indicates significance at 5% level; robust standard errors in brackets; chi-squared statistic and corresponding p-value.

There was no significant difference in willingness to reveal across genders: in the control condition, the average response rate for female was 90%, while for male it was 95%. In the treatment condition, average response rate was approximately 87% for both sexes. If we specifically look at quasi-identifiers and privacy-intrusive questions, in the control group females responded to 84% of them on average, while males responded to 92% of them. The corresponding quantities for the treatment group were 76% and 75%. Similarly to what we observed in the first experiment, we couldn’t find any statistically significant difference between the percentage of participants who answered “No” to questions regarding cheating in the two conditions13: we can’t reject the null hypothesis that lack of control doesn’t affect the propensity to admit certain behaviors. The coefficients from the pooled-OLS panel regression are not statistically significant, as it could be inferred from the small difference in the average response rates of the two groups. This is not inconsistent with our hypothesis: having control over the publication of your private information

13

We recorded the absence of an answer as a negative answer.

makes you reveal more identifying or privacy-intrusive information, while it has no effect on revelation of tame, non invasive questions. Given the randomness of the publication in the treatment condition, there are no more trust issues in this experiment. Awareness remains a possible confounding factor but the same mitigating effects as in the first experiment apply. One could argue that another factor that might have influenced subjects’ lower willingness to reveal PII and sensitive information in the treatment condition is that they cared less about the survey overall, because they were not even sure that their profile would have been published. If that was the case though, then subjects in the treatment condition should have been less willing to reveal all types of information, and particularly those pieces of information that required more effort in providing the answer (program in which they were enrolled and list of courses attended). We do not observe a significant decrease in willingness to reveal this type of information, which should reduce the relevance of this confounding factor.

V. Conclusions and directions for future work Our between-subject studies provide empirical evidence in support of one possible reason why people tend to reveal private information in online social networks: since they can’t distinguish between control over publication and control over access and use of their private information by others, they feel “safe” when they personally create online profiles and they don’t have concerns in publishing them. But when someone else is responsible for the publication, or when the publication itself becomes uncertain, they feel they’ve lost control over the access and use of that information by others (which, in fact, they never had) and, consequently, refrain from revealing it. The overall high response rate reflects the general tendency of people to reveal a lot of private information, even though participants to our experiments were given very little detail about the purpose of the study and the actual content of the survey they were about to take. Our results are based on two surveys conducted among students at Carnegie Mellon University, an educated community that is likely to be familiar with the technology of online social networks and aware of the implications of joining them. Nonetheless, they showed high willingness to reveal private information. More than lack of awareness, it seems that this is at least in part due to the particular sense of control that new technologies transmit to users, making them feel endowed with the

power of managing the flow of information about them that stems from their voluntary willingness to reveal. It is the users’ choice to join the network, it is their deliberate and conscious decision to make their profile meticulously detailed, it is again their choice to accept “friendship” or not and make the content of their profiles visible to them or to everybody on the network. On the one hand, the illusory power conveyed by the click of a mouse on the computer screen generates confusion between control over publication of private information and control over accessibility and availability/usability of that information by others. On the other hand, the voluntary nature of the disclosure makes people perceive it as “safe” relative to the situation where disclosure is solicited or required, in which case reactive devaluation14 might instill suspicion in people and prevent them from revealing private information. This paper addressed the first issue, but the voluntary versus required nature of information disclosure deserves more direct and accurate analysis and provides the direction for future research. Apart from the general problems related to survey-based studies, regarding truthful response from participants, one limitation of this study is external validity: the Facebook was created as a community specifically dedicated to college or graduate students, and our sample could be at most considered representative of that population. But members of the Facebook today are not only university students: they include younger students, professionals, self-employed people who try to advertise their business, parents who want to control their sons’ and daughters’ activities on the Internet15: a great variety of users that are very likely to have different sensitivity and concerns about privacy protection. Arguing that our results are generalizable to all these type of users would be obviously inappropriate. In fact, it would be interesting to observe the inter-cultural differences in people’s behavior in online social networks. Our sample was way too overrepresentative of American students in order to allow us to address that question. Nonetheless, we believe that our results, specifically tested in the context of online social networks, can be generalized to similar IT environments such as blogs or emails: even in these cases, people are likely to confuse control over publication with control over access/use by others, resulting in revelation of “too much” private information.

14

Reactive devaluation is the phenomenon by which the simple fact that an offer or a concession is suggested by somebody other than the self diminishes the apparent value or attractiveness of the offer. 15 Indeed, the statistics published by Facebook in April 2009 report that the fastest growing group of users is 35 yearolds and older.

As an extension to these two experiments, we also plan to run a third one directly through the Facebook. The study is on hold for IRB approval and therefore its design has not been defined yet. In order to test whether people confuse control over accessibility with control over availability of information, we plan to contact a number of Facebook users with whom we are not “friends” (our profile won’t be in the subjects contact list) and randomly assign them to one of three conditions: in the first one, we will simply ask for “friendship” – the Facebook will automatically send subjects a “friend request”, that they can accept or ignore. In the second condition we will add the following message to the Facebook automatic request (or similar): “I would like to see the pictures and the information on your profile. Would you add me to your list of friends”? In the last condition, we will add the following message (or similar): “I would like to download the pictures and the information on your profile. Would you add me to your list of friends”? The consequences of accepting these three types of friend requests are of course the same: if the subjects accept a friend request, regardless of the way in which friendship has been requested, they allow the person they add to their contact list to be able to see and use the information posted on their profile. Therefore, a significant difference in acceptance rates across the three conditions will directly prove that people cannot distinguish between control over accessibility and control over use of private information by others.

References Ackerman, Mark S., Cranor, Lorrie F. and Reagle, Joseph, “Privacy in E-Commerce: Examining User Scenarios and Privacy Preferences”, Proceedings of the ACM Conference on Electronic Commerce, 1999 Alessandro Acquisti, “Privacy in electronic commerce and the economics of immediate gratification”, Proceedings of the 5th ACM conference on Electronic commerce, 2004. Alessandro Acquisti and Ralph Gross, “Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook”, Proceedings of the Sixth Workshop on Privacy Enhancing Technologies, 2006. Acquisti A., Grossklags J., “Losses, gains, and hyperbolic discounting: An experimental approach to information security attitudes and behaviors”, Proceedings of the 2nd Annual Workshop of on Economics and Information Security, 2003. Acquisti, Alessandro, John, Leslie K. and Loewenstein, George, “What is privacy worth?”, Working paper. Alba, J.W., Lynch, J., Weitz, B., Janiszewski, C., Lutz, R., Sawyer, A., Wood, S., 1997, “Interactive Home Shopping: Consumer, Retailer, and Manufacturer Incentives to Participate in Electronic Marketplaces”, Journal of Marketing, Vol. 61, p. 38-53. Boyd, Danah, 2008, “Facebook's Privacy Trainwreck: Exposure, Invasion, and Social Convergence”, Convergence: The International Journal of Research into New Media Technologies, 14(1):13-20. Burke, R.R., Harlam, B.A., Kahn, B., Lodish, L.M., 1992, “Comparing dynamic consumer choice in real and computer-simulated environments”, Journal of Consumer Research, Vol. 19, p. 71-82. Degeratu, Alexandru M., Rangaswamy Arvind and Wu, Jianan, 2000, “Consumer Choice Behavior in Online and Traditional Supermarkets: The Effects of Brand Name, Price, and other Search Attributes”, International Journal of Research in Marketing, 17(1):55-78. Elgesem, Dag, “Privacy, Respect for Persons, and Risk”, 1996, in Philosophical Perspectives on Computer-Mediated Communication, C. Ess ed., New York, State University of New York Press.

Fenton-O’Creevy, Mark, Nicholson, Nigel, Soane, Emma and Willman Paul, 2003, “Trading on illusions: Unrealistic perceptions of control and trading performance”, Journal of Occupational and Organizational Psychology, 76(1):53-68. Fried, Charles, “Privacy”, 1984, in Philosophical Dimensions of Privacy, F.D. Schoeman ed., New York, Cambridge University Press. Goodwin, C., 1991, “Privacy: Recognition of a Consumer Right”, Journal of Public Policy and Marketing, 10(1):149-166. Henslin, James M., 1967, “Craps and Magic”, The American Journal of Sociology, 73(3):316-330. Johnson, C.A., 1974, “Privacy as Personal Control”, in Carson, D.H., (ed.), Man-Environment Interactions: Evaluations and Applications: Part 2, Washington, D.C.: Environmental Design Research Association, pp. 83-100. Kahneman, Daniel and Tversky Amos, 1979, “Prospect Theory: An Analysis of Decision under Risk", Econometrica, 47(2), 263-291. Kunreuther, Howard, 1978, “Disaster insurance protection: public policy lessons”, New York: Wiley. Langer, Ellen J., 1975, “The Illusion of Control”, Journal of Personality and Social Psychology, 32(2):311-328. Lessig, Lawrence, 2002, “Privacy as Property”, Social Research, 69(1). Malhotra, A., Gosain, S. and Lee Z., 1997, “Push-Pull: The Information Tug of War – A framework for Information Delivery and Acquisition Systems Design”, Proceedings of the Third America’s Conference on Information Systems, Indianapolis. Margulis, S.T., 2003, “Privacy as a Social Issue and Behavioral Concept”, Journal of Social Issues, 59(2):243-261. Miller, Arthur R., 1971, “The Assault on Privacy: Computers, Data Banks, and Dossiers”, Ann Arbor ed., University of Michigan Press.

Norberg, Patricia A., Horne, Daniel R. and Horne, David A., 2007, “The privacy paradox: personal information disclosure intentions versus behaviors”, Journal of Consumer Affairs, 41(1):100-126. Pew Internet & American Life Project, “Teens, Privacy & Online Social Networks”, April 2007, http://www.pewinternet.org/pdfs/PIP_Teens_Privacy_SNS_Report_Final.pdf. Ross, Lee and Stillinger, Constance, “Psychological barriers to conflict resolution”, Negotiation Journal, 7(4):389-404. Sanderson, William C., Rapee, Ronald M. and Barlow, David H., 1989, “The Influence of an Illusion of Control on Panic Attacks Induced via Inhalation of 5.5% Carbon Dioxide-Enriched Air”, Archives of General Psychiatry, 46(2):157-162 Shostack, Adam, May 2003, “Paying for Privacy: Consumers and Infrastructures”, 2nd Annual Workshop on Economics and Information Security, Maryland. Solove, Daniel J., 2007, “The future of reputation – gossip, rumor, and privacy on the internet”, Yale University Press, New Haven & London. Spiekerman, Sarah, Grossklags, Jens and Berendt, Bettina, 2001, “E-privacy in 2nd Generation ECommerce: Privacy Preferences versus actual Behavior”, Proceedings of the 3rd ACM Conference on Electronic Commerce. Sweeney, Latanya, 2002, “k-Anonymity: a model for protecting privacy”, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5):557-570 Tavani, Herman T. and James H. Moor, March 2001, “Privacy Protection, Control of Information, and Privacy-Enhancing Technologies”, Computers and Society. Westin, Alan R., 1967, “Privacy and Freedom”, New York Atheneum. White, Tiffany Barnett, 2004, “Consumer Disclosure and Disclosure Avoidance: A Motivational Framework”, Journal of Consumer Psychology, 14(1-2):41-51.

APPENDIX First experiment - Questions asked in the two surveys Q1: First name, Middle name Q2: Last name Q3: Date of birth Q4: Place of birth Q5: Email address Q6: Home address Q7: Phone number Q8: Do you have a Facebook profile? Q9: How long have you been in Pittsburgh? Q10: On a scale from 1 to 10, how do you like the city overall? Q11: How happy are you here? Q12: Do you practice any sport? Q13: If so, which sport do you practice? Q14: Do you practice on campus? Q15: How would rate the sport facilities offered on campus? Q16: Are you a member of any group/community/fraternity/sorority? Q17: If so, which group are you a member of? Q18: How many friends do you have in Pittsburgh? Q19: How many of those are students at CMU? Q20: How many are students at other universities in Pittsburgh? Q21: How much do you enjoy spending your spare time with your friends, relative to the time you spend alone? Q22: Is your family in Pittsburgh? Q23: How often do you see your family? Q24: Are you single or married? Q25: Do you have a girlfriend/boyfriend? Q26: Have you ever cheated on your partner (current or ex)? Q27: Where are you staying? (University housing, Private housing-alone, Private housing-shared) Q28: Have you ever had troubles with your roommates? Q29: Would you like to move somewhere else? Q30: What program are you in? Q31: Which courses are you taking? Q32: Have you ever cheated for homework/projects/exams? Q33: Have you ever seen someone else cheating? Q34: If so, did you inform the instructor? Q35: Up to now, how would you rate your program overall on a scale from 1 to 5? Q36: Do you think it will make you competitive on the job market? Q37: How many hours/day do you spend studying? Q38: Are you working at the same time?

Instructions in the control condition Please read these instructions carefully before you move on. No question/field is required. If you decide to answer, a profile will be automatically created for you, with no intervention by the researcher, and published on a new CMU networking website, which will only be accessible by members of the CMU community, starting from the end of April. The data will not be used in any other way. Instructions in the treatment condition Please read these instructions carefully before you move on. No question/field is required. If you decide to answer, your data will be collected by the researcher, who will create a profile for you and publish it on a new CMU networking website, which will only be accessible by members of the CMU community, starting from the end of April. The data will not be used in any other way.

Second experiment - Questions asked in the two surveys Q1: First name, Middle name Q2: Last name Q3: Gender Q4: Date of birth Q5: Age (in years) Q6: Country of birth Q7: Email address Q8: Home address Q9: Phone number Q10: Do you have a Facebook profile? Q11: How long have you been in Pittsburgh? Q12: On a scale from 1 (not at all) to 10 (very much), how do you like the city overall? Q13: How happy are you here? Q14: Do you do any sport? Q15: If so, which sport do you do? Q16: Do you do any sport on campus? Q17: How would rate the sport facilities offered on campus? Q18: Are you a member of any group/community/fraternity/sorority? Q19: If so, which group or groups are you a member of? Q20: How many of the people you know in Pittsburgh do you consider close friends? Q21: How many of those are students at CMU? Q22: How many are students at other universities in Pittsburgh? Q23: Do you enjoy spending your spare time with your friends much more/ with your friends more/ alone more/ alone much more/ with your friends just as much as alone? Q24: Is your family in Pittsburgh? Q25: How often do you see your family? Q26: Are you single or married? Q27: Do you have a girlfriend/boyfriend? Q28: Have you ever had a sexual relationship with somebody other than your partner without their knowledge or consent? Q29: Where do you live? (University housing, Private housing-alone, Private housing-shared) Q30: Have you ever had troubles with your roommates? Q31: Would you like to move somewhere else? Q32: What program are you in? (e.g.: Undergrad Psychology, Grad Math) Q33: Which courses are you taking at the moment? Q34: Have you ever cheated for homework/projects (e.g. copy, plagiarize) or on an exam? Q35: Have you ever seen someone else cheating? Q36: If so, did you inform the instructor? Q37: Up How would you rate the quality of the education you are receiving on a scale from 1 (very bad) to 5 (very good)? Q38: Do you think it will make you competitive on the job market? Q39: How many hours a day do you spend studying? Q40: Are you working at the same time?

Instructions in the control condition Please read these instructions carefully before you move on. This is not the usual yada-yada. The information you provide will appear on a profile that will be automatically created for you. The profile will be published on a new CMU networking website, which will only be accessible by members of the CMU community, starting at the end of this semester. The data will not be used in any other way. NO QUESTION/FIELD REQUIRES AN ANSWER. Did you understand these instructions? If so, click on Next. Instructions in the treatment condition Please read these instructions carefully before you move on. This is not the usual yada-yada. The information you provide will appear on a profile that will be automatically created for you. Half of the profiles created for the participants will be randomly picked to be published on a new CMU networking website, which will only be accessible by members of the CMU community, starting at the end of this semester. The data will not be used in any other way. NO QUESTION/FIELD REQUIRES AN ANSWER. Did you understand these instructions? If so, click on Next.