Information for People - CiteSeerX

Information for People Laura M. Haas, Steve B. Cousins IBM Almaden Research Center Laura at almaden.ibm.com, scousins at us.ibm.com Abstract Ordinary people have access to unprecedented volumes of information today. Researchers in the fields of information management (IM) and human-computer interaction (HCI) are reacting to this challenge from their own unique perspectives. Having access to a billion records is cool, but having access to a billion people is awesome. In this paper, we look at recent research from both communities, and speculate on how interactions between the communities could enhance the user experience of information.

1. Introduction The Information Management (IM) and Human Computer Interaction (HCI) communities have traditionally had different interests, and different claims to fame. The IM community has worried about storing vast volumes of data, and supporting the complex, mission-critical manipulations of data needed by the business world. Core concerns have included features and functions, and of course, systems issues, such as scalability (in both number of users and data volumes), performance and robustness. The traditional motivations were to simplify application development (by raising the level of abstraction beyond bits and bytes), to keep information safe (meeting the ACID test), and to meet the needs of the business using the data. As a result, the IM community’s core technologies include query languages and query optimization, data models, support for particular types of data, concurrency control, transaction processing, security and other “systems” issues. Meanwhile, the HCI community cares about people, and, in particular, how to make it easier for people to use computers. Their core concerns are centered around learnability, usability, and accessibility. Since social and physical issues impact how people work, they think about supporting collaboration, creating interfaces that work for people with disabilities, and leveraging human perceptual abilities when designing user experience. Typical HCI competencies are the optimization of interactions, graphical user interfaces, social aspects of interaction, design, user studies, visualization, multi-modal interaction, end-user programming and ethnography.

These two communities rarely work together, and in fact, hardly overlap. A recent call for papers for a major database conference lists forty major topics, only one of which is directly relevant to users1. The rejection rate for user interface submissions to mainline database conferences is legendary. At the major HCI conferences, having a properly conducted user study is de rigeur, and often the topics of interest to that community would not attract mainstream information management practitioners. In fact, the call for papers for an upcoming conference2 reads to an information management person like a promotional piece about a self-help convention. Not only are the interests of the two communities distinct, but their styles are also completely different. Both goals and styles keep the communities apart. Where HCI researchers are concerned with making applications more usable on the front end, IM researchers work to make the applications easier to implement from the back end. The applications themselves are often implemented by a layer of people who come between the two communities, making it even less likely that these communities will interact without the help of intermediaries. Occasionally, both disciplines get interested in similar problems, of course. When they do, the difference in the emphases is instructive. In the late 1970’s, for example, two very different (and yet, very similar) tools for information storage and analysis were introduced to the world. From the information management community came the relational database. Out of the user-focused community, the spreadsheet emerged. Both are based on a tabular view of data. In both cases, the idea is credited to a brilliant researcher, but one (Codd) worked for a provider of information technology, while the other (Mattessich) was a business school professor, interested in helping a particular group of users (accountants). While the information management folks focused on retrieval and management for large amounts of data (how to select just the data desired, and quickly retrieve it), the inventors of the spreadsheet focused on simple manipulation of information to accomplish a task. Of course, the two technologies were later wed, so that 1 2

http://www.vldb2007.org/callforpapers.html http://www.chi2007.org/

spreadsheets can store information in relational databases, and relational database users can use spreadsheets to access their data. Today, the world has shifted, and the emphasis is no longer on building standalone applications which can then be connected to monolithic databases. In fact, information is likely to be spread over many sources, including databases, files, and applications. Both communities are addressing the problem of creating applications over diverse and distributed sources. The information management community, for example, pioneered data federation technologies, while the usercentric folks created “mashups”. Now, the IM community is trying to create some of the solid systems advantages of federation as an underpinning that makes mashups easier to create and more robust [1].

2. Today’s Challenges With the coming of Web 2.0, there is a shift from a world in which there are very few information producers relative to the number of consumers, to one in which there is much more end-user created content. People are more likely to be computer-literate and they are interacting with information in new and unanticipated ways., Contributing to advances in this new world requires knowledge and skills from both the IM and HCI communities. Challenges include, for example, information quality (when data is increasingly produced by random people, how trustworthy is it?), ubiquitous access to data, and finding information in massive (and widely distributed), user-created information collections. Data in the hands of the people. Information constructed by many people, even if not edited beforehand, can be very valuable. In fact, it has been argued that large groups of people are smarter than an elite few [2]. In recent work in social computing, the HCI community has been building tools to exploit this phenomenon, such as delicious3 and ePinions4. But how good is the information? What kind of tools and methods will make it easier to author high quality information? The HCI folks approach these problems from a social engineering perspective, while the IM community has work on such issues as tracking provenance and answering queries in the presence of uncertainty. Ubiquitous access. The push toward ubiquitous computing has consumed many HCI cycles. Nowadays, people take for granted the ability to be connected almost anywhere, anytime. And that doesn’t always mean from a laptop. A lot of the techniques that permit text to be entered on small devices, from a phone keypad to stylus 3 4

http://del.icio.us http://www.epinions.com

input to thumb keyboards, have been published and evaluated in the HCI literature. Similarly, HCI folks are interested in how various sensors can be deployed in ways that preserve privacy. Recent work has focused on instrumenting homes of the elderly in order to help them remain self-sufficient for as long as possible, while reducing the risk that a fall or other medical emergency goes undetected. The IM community, meanwhile, has looked at caching information on mobile devices, at p2p and networked data management, and at other distributed information management issues such as processing queries over streams of sensor data. Finding a needle in a (growing) haystack. The amount of data available online is huge, and growing constantly. Much of it is produced by people, unstructured, and with insufficient or inaccurate metadata. As a result, human input is often needed today to interpret or filter information. For example, how can a user search a large corpus of photographs? An IM researcher would think about how to store and then index them, what language to use to query them, how to make the metadata searchable, and how to let people search for photographs across multiple repositories. He or she might even borrow from the machine learning community, and think about how we can categorize the photographs automatically (i.e., generate metadata). An HCI researcher, on the other hand, might first ask how to enlist people to label the pictures. In one clever approach, a video game was created where people are shown a picture and try to guess the label [3]. Since it is a large distributed system, many people are looking at the picture at once. When they agree, they win, and so does the system because now it has a label for the photograph!

3. Separate but equal? Despite these shared challenges, the IM and HCI communities still work largely separately. The different values, languages and literatures make cooperation difficult. Yet in several of the above examples, the work of the two communities is complementary, and a full solution to the challenge would need both parts. The individual communities are doing good work, which should be exposed to a broader audience. For example, the IM community has produced a number of systems that help to collect, analyze or present particular types of information for specific user groups. One such system is IBM’s WebFountain [4], which does a tailored crawl of the Web to find information along a given theme, then analyzes it, categorizes it, crossreferences it, and exposes it to privileged users via a powerful (if somewhat arcane) query language. John Battelle [5] wishes for the power of WebFountain – for the masses. More recently, DBLife [6] trawls the Web for

information of interest to database researchers, then uses text analytics, entity resolution, and a model of what types of events (e.g., talks, paper acceptances) database researchers are interested in to create a portal for the IM community. Other examples of IM work that should be of interest to the HCI community include work on tools for database administration or schema mapping, text analytics, privacy and de-identification, to name a few. Likewise, the HCI community has a sizable literature on topics of interest to the IM world, starting, of course, with papers on information visualization [7]. For example, Treemaps and related techniques provide mechanisms for visualizing hierarchically-structured information [8], which should be important when so much attention in the IM community is focused on XML. Domain-specific interactive visualizations can be particularly powerful, a good example being the Baby Name Visualizer [9]. Beyond visualization, work on animated interaction [10], activity-centric computing [11], and interacting with information on small devices [12] should be read and discussed by IM researchers. Designers of information systems should know the HCI literature on design for people with disabilities [13], usability methods [14], and end-user programming [15]. The HCI community, like the IM community, is also concerned with supporting particular communities of users (though they are more interested in social interaction), and has also done work on privacy, not to mention applications of concern to both communities, such as healthcare informatics. Interestingly, both communities are starting to cooperate more closely with machine learning (ML) researchers. IM has interacted with ML sporadically over the years, on such challenges as data mining (pattern recognition), matching, and discovery. Likewise, HCI has worked with ML on end-user programming and intelligent user interfaces. What powerful new systems and tools could come from pooling all of these diverse talents?

4. Cooperation Needed? Arguably, both IM and HCI are fast followers, rather than innovators. Academic disciplines are much better at focusing on problems than identifying new opportunities. Innovation often comes from outsiders, who have understood the central ideas of the discipline, and use those ideas as a component of a new solution. Consider the World-Wide Web, which is now a theme of any conference on IM or HCI. Clearly it drew on ideas from hypertext, information retrieval, database, HCI and others, yet none of those communities can claim to have given birth to the Web. In fact, the Web came from the physics community. Tim Berners-Lee built it to solve a problem in his

community, using simplified techniques from those other communities. The hypertext community scoffed at the World-Wide Web: it didn’t even have bi-directional links, much less links as first-class objects, so how could it be truly hypertext? The database community was similarly disinterested. (This wouldn’t be so much fun to talk about if it had only happened once….) Neither community was interested in wikis when they were invented. Content management systems already existed. Wikis were just a new application. Why should the IM community care? Likewise, the HCI community was unimpressed. Where is the new interaction? Aren’t wikis a step backward from true WYSIWYG editing? Yet today, wikis have proven to be an important way that people interact with information, and point to interesting social phenomena for further study, as well as new challenges for connecting, navigating and searching for information. MySQL and other light-weight databases were much more interesting for what they left out than for any new features they added. This is a classic “innovator’s dilemma” situation: while the established community attacks the “real” problems (scalability, query optimization), a lightweight contender attacks a different market (small, uninteresting databases behind websites) but continues to improve until it ultimately commoditizes the original community’s technology and business [16]. But when members of different communities go out of their way to get to know another community amazing results are possible. In 1994, the Stanford Digital Library Initiative started as a collaboration between the database group, the HCI group, and the AI group. One professor from each discipline was a PI on the project, because the funders were intent on breaking down silos. The link between database and HCI was particularly fruitful, and produced a number of theses, some from each area [17]. The sub-project from the Digital Library Initiative that had the biggest impact didn’t result in a thesis, and didn’t quite fit with the theme of the larger initiative. In fact, it wasn’t even viewed as the deepest research, since after all, the world already had five successful search engine companies, so what could a new academic initiative possibly add? But the tight pairing of one student from HCI and one database student led to Google.

5. Time to Break Down the Silos? The future, whether of the Web or of the enterprise, requires us to join forces. A joint community would be able to make a stronger contribution to today’s challenges than the individual communities can alone. One of the most pressing problems in today’s world is information overload. Information flows at us from all sides. The volume of email alone has grown to the point

that high-school students view email as something for old people (email = junk mail). The growth of digital storage has been growing faster than Moore’s law. Instant messages are being captured and retained, along with voice mail, news feeds, stock price trends, and so on. Thanks to the Web, individuals have access to more information at home than they had at the best academic libraries just a decade ago. How can they get any value out of all this information? Information overload is a problem that requires skills from both IM and HCI (and probably other disciplines as well) to solve. Rather than chipping away at this problem from separate silos, researchers should form a new community to attack it. Successful HCI researchers in this new field will learn to embrace research approaches and results from the IM field. Successful IM researchers will become more “touchy-feely.” Not a proposition for the faint of heart, but a combined attack may get to a solution more quickly than if either community attacks the problem alone. A shared community could help accelerate solutions to any number of problems. Everyone could benefit from better decision support for individuals. Which long term care insurance is best, if any? Which is the best car for a teenage son to drive? These types of questions require not only finding relevant information, but presenting it in a way that allows users to understand and act on it quickly. Rapid deployment in response to crises similarly creates a need for appropriate information, but additionally requires coordinating multiple parties, with their own information sources, processes and tools. Even more exciting is the opportunity to reduce death from medical errors dramatically if the communities work together. The right information, presented in the right way, on the right device, at the optimal time, could literally save lives. There are issues here to keep both communities, as well as researchers in medical informatics, gainfully employed for many years to come. We are testing the value of cross-community collaboration in a number of projects at Almaden. For example, Avatar [18] provides semantic search over text. The text is run past a set of annotators that label portions of the text with the concepts they represent, e.g., “person’s phone number”. But where do the annotators come from? People build them, so we are leveraging our HCI colleagues’ skills not only to make the tools for building annotators user-friendly, but also to apply social tagging principles to the building of annotators, so that they can be spread virally. Indeed, whole new sciences may be born of multidisciplinary research. For example, services science is emerging through the interaction of computer scientists, mathematicians, and economists [19]. But computer science is broad enough that intra-disciplinary work is needed. We believe it is time for a new discipline within

computer science, a new “information interaction” community. That community should jointly pursue the information-intensive challenges that are increasingly facing people today.

6. Acknowledgements Thanks to Eser Kandogan and Kevin Beyer for their thoughtful comments on this manuscript.

7. Bibliography [1] A. Jhingran, “Enterprise Information Mashups: Integrating Information, Simply”, Proc. VLDB, Seoul, Korea, September 2006, pp. 3-4. [2] J. Surowiecki, The Wisdom of Crowds, Random House, 2004. [3] Luis von Ahn and Laura Dabbish. Labeling Images with a Computer Game. In ACM Conference on Human Factors in Computing Systems, CHI 2004. Pages 319-326. Try it at http://www.espgame.org/ [4] D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien, “How to build a WebFountain: An architecture for very large-scale text analytics”, IBM Systems Journal, (43):1, 2004. [5] http://battellemedia.com/archives/000428.php [6] A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1), 2006. or try it at http://dblife.cs.wisc.edu/ [7] S.K. Card, J Mackinlay and B. Shneiderman, Readings in Information Visualization: Using Vision to Think, Morgan Kaufman, 1999. [8] http://www.cs.umd.edu/hcil/treemap-history/index.shtml [9] http://babynamewizard.com/namevoyager/lnv0105.html [10] B.-W. Chang, D. Ungar: “Animation: From Cartoons to the User Interface.” ACM Symposium on User Interface Software and Technology 1993: 45-55. [11] T.P. Moran, A. Cozzi, S.P. Farrell. “Unified activity management: supporting people in e-business.” Communications of the ACM, (48), 2005. pp. 67-70. [12] Trevor, J., Hilbert, D. M., Schilit, B. N., and Koh, T. K. 2001. From Desktop to Phonetop: a UI for Web Interaction on Very Small Devices. In Proceedings of the 14th Annual ACM Symposium on User interface Software and Technology (Orlando, Florida, November 11 - 14, 2001). UIST '01. ACM Press, New York, NY, 121-130. [13] ACM Transactions on Accessible Computing, http://www.is.umbc.edu/taccess/index.html [14] J. Tidwell, Designing Interfaces, O’Reilly Media, 2005. [15] H. Lieberman, F. Paternò, and V. Wulf, End-User Development, Springer, 2006. [16] C. M. Christensen and M. E. Raynor The Innovator’s Solution: Creating and Sustaining Successful Growth, Harvard Business School Press, 2003. [17] A. Paepcke, S. B. Cousins, H. Garcia-Molina, S. W. Hassan, S. P. Ketchpel, M. Roscheisen, and T. Winograd. "Using Distributed Objects for Digital Library Interoperability." Computer 29 (May 1996): 61-68

[18] T.S. Jayram, R. Krishnamurthy, S. Raghavan, S.Vaithyanathan and H.Zhu Avatar Information Extraction System, IEEE Data Engineering Bulletin, 2006 [19] Paul P. Maglio, Savitha Srinivasan, Jeffrey T. Kreulen, and Jim Spohrer, “Service Systems, Service Scientists, SSME, and Innovation”, Communications of the ACM, (49), 2007, pp. 8185.