Get your students to SMILE!

Get your students to SMILE! Exploring emerging interaction technologies

Torben E. Svane School of Information Science, Computer and Electrical Engineering Halmstad University Halmstad, Sweden [email protected] Abstract—Staying updated on new interface developments is crucial to any programming or software design class. This paper reports on experiences from introducing some of the many new post-WIMP I/O possibilities emerging from e.g. mobile and sensor (Microsoft Kinect) technologies. To help students remember the technologies in mind, the acronym SMILE (Speech - Movement Image - Language - Environment) is introduced. The paper also describes some exercises developed (for C# and JavaScript) as well as lectures and classroom activities prepared especially for introducing the topic. Keywords—teaching; programming; design; interfaces; WIMP; SILK; SMILE; C#; WAMI; JavaScript.

I. INTRODUCTION Since the early 1980’s [1], much computer interaction has followed the WIMP (Windows-Icons-Menus-Pointers] paradigm. In the mid 1990’s [2], a new vision for interaction was labeled SILK (Speech-Image-Language-Knowledge). Since then, the use of Internet technologies as well as communication speeds has increased drastically. New units for interaction, e.g. “smart” mobile phones, tablets and game devices such as Nintendo Wii and Microsoft Kinect are today common to numerous consumers, and on a global scale. Many computer science and information systems curricula have however not yet (fully or at all) embraced the possibilities of these new technologies in their classes – Halmstad University for one. We have continued our focus on WIMP as new versions of programming languages have been introduced. Even in web programming courses, mobile web projects and exercises have mainly dealt with “pages and databases”. Few, if any, have had a focus on what this paper calls SMILE technologies (Speech-Movement-Image-Language-Environment). Adding new content to class work, e.g. modules on interaction is however not always an easy task. New equipment is needed in some cases and shorter, hands-on exercises must be developed and tested. Hardware investments are however reasonably low and when it comes to mobile applications, students often have their own unit to test on. The challenge is hence mainly one of pedagogy: creating an atmosphere for awareness and inspiration, ensuring that the necessary hardware is available, and creating a set of exercises which enables the students to try out various new modes of interaction, one function a time.

Our work has focused a very basic task: to raise the general awareness about new interface possibilities, and to provide a few introductory lab exercises/workshops to get the students familiar with some of the hardware/web services available for incorporating more than traditional interaction in their projects. Although this paper concentrates on Windows and web programming, it is important to also mention work that has taken the discussion even further; this is where we hope to go in the development of new lectures in the years to come. Wigdor [3] has elaborated on gestures, touching and languages. Drucker et al. [4] also reports on a study involving gesture interfaces and touch displays, in connection with visualization on tablets. In comparison to traditional WIMP UI design, the authors (ibid.) found that the majority of participants (in the study) both preferred and performed better using a fluid, gesture-based, touch UI. So, new interfaces are here and seem to become increasingly more popular for IT interaction. Although not yet a buzzword in peer reviewed reports, “mobile first” strategies [5], [6] indicate we will do more with mobile phones, tablets etc. in years to come, rather than sit in front of traditional computers. If so, should not also we as teaching institutions increase focus on and content about these new technologies? We are probably many academics who have a long journey ahead catching up. II. SMILE TECHNOLOGIES Shifts concerning transitions in human-computer interfaces and interaction have been discussed thoroughly during the past decades. One often used acronym is WIMP [1] which stands for three common presentation screen artifacts (windows, icons and menus) and a common interaction technology (to point and click, using a computer mouse). WIMP as a means for interaction dates back to post-PC times [7] but gained public acceptance with the introduction of the Apple Macintosh in 1984 (ibid.). It is still very common today as a HCI paradigm and in what we teach students to work with in programming classes. The SILK (speech, image, language, knowledge) paradigm took ideas about interaction beyond WIMP. The coining of the term SILK is attributed to Raj Reddy (1996) (ibid.) although it is not explicitly mentioned in the reference [2]. As users, we are often exposed to speech, image, and to some extent, language technology but never really interact with the knowledge database component (the “K”) which drives the other.

As technology progresses into new generations, users are often both willing to explore and quick to adopt new means for interaction. To include current sensing (movement) technologies and the ability of mobile phones and tablets to act as transponders, constantly transmitting data to the environment, we have introduced the acronym SMILE (speech-movement-imagelanguage-environment) as an easy-to-remember abbreviation which can remind the students about the many possibilities to interact – and that the technology evolution most likely has not reached its peak with what we have today. A. The SMILE components When we go through examples of the various technologies we focus on examples from the present, with some reflections to the past, more than probe too far into the future. The term NUI (natural user interfaces) [8] is also mentioned, although it does not fully cover what we include in SMILE. Some of the technologies mentioned in our run-through of SMILE are: 

Speech: not to be confused with the similar concept voice recognition (which we see as recognizing a specific individual’s voice), speech recognition searches for patterns in audio input, most commonly as speech generated by the user. Within the speech group we also cover speech synthesis, e.g. from TTS (text-to-speech) engines – hence, speaking to, and be spoken to by, technology. Combining speech and voice recognition, some systems invoke “speaker recognition” to verify authorized use of the device/application. In the “twilight zone” between speech and language is what we have called “emotional sounds”, such as a sigh or someone yelling “ouch”. Whereas we programmatically are able to detect “ouch” as it has an actual spelling it is still a challenge to detect the sigh with any greater accuracy. This is brought up again in the demos for the programming classes. Ending this section more in the emotional than traditional speech area, we recognize the work with “socially intelligent robots” [9] carried out at MIT at the turn of the millennium. Related to speech recognition is song recognition [10]. We also mention vocaloids (software for synthesized singing; not for composing songs) and research with a focus on singing robots [11], [12].



Movement: not novel per se, changes in location coordinates (x, y, and for some applications, z) came with WIMP. Moving the mouse and detecting its location on a screen is a fundamental function in much human interaction with computer systems. Pressure-sensitive touchscreens added the z-factor but more as haptic (see next section) input than as movements in open space. The new implementations are direct movements by the user, either through gestures or from body movements. Another type of movement input comes from the use of GPS (global positioning systems) in e.g. cars, and the registration of mobile phone locations, if turned on.



Image: As with speech, the image component can include both input and output functions. Input could e.g. be fingerprint or more common, facial recognition but

also three-dimensional detection (depth sensing) as used by the Microsoft Kinect devices (for PC and Xbox). Combining detection and rendering, many immersive games scan the player and render an in-game avatar [13]. Companies such as IKEA [14] use simple-graphic avatars to answer customer questions. User to user video conferencing (e.g. as with Skype) has got widespread use on a global scale (not the least among international students on exchange at a foreign country university). When covering simpler video applications as webcam and mobile phone video-chats we also mention commercial-grade three-dimensional applications as the Cisco Tele-presence conference system and their 3-D holographic on-stage implementation, and the HP Halo collaboration studio. Another type of 3-D holographic presentations is a virtual artists such as the Japanese holographic vocaloid pop-star Miku Hatsune [15], [16]. Image handling can also be rather basic, as a barcode scanner in a retail store, or information retrieved when using a mobile phone QR (quick response) scanner. In this autumn’s classes – and as a continuation of the story of how fast technologies come and go – we will also mention Microsoft’s recent decision to terminate their QR code alternative, the Microsoft Tag [17]. Covering Image is a considerable undertaking. In a twohour lecture, topic coverage must be restricted. We end our “I” section with examples of transparent OLED (organic light-emitting diode) displays and introduce the concept augmented reality (AR), with the String app for Apple devices [18] as example. Google’s ARoriented game Ingress [19] and “near-future available devices” such as Google Glass [20] are also mentioned.  Language: This is currently our weakest section in the overview. Translation services as Google translate are well known to essentially all students and digital dictionaries and used by many Asian students. Hardware for specific purposes such as the SRI IraqComm ArabicEnglish translator [21] and similar solutions indicate an improved performance potential for devices and applications in this area in the years to come. Simple, command-line (URL) web services such as the Google translation API [22] are also mentioned here and shown as a demo for non-IT majors. For IT majors, it is an exercise in the web programming workshops. We also mention the Google web translator option, adding automatic translation functionality to a website.  Environment: with some overlap to the movement section, we define this area as covering “artifacts, transported by a carrier, transmitting and/or receiving data from the surrounding environment”. Carriers can e.g. be humans, dogs, cars or boxes. Artifacts can be mobile phones, laptops, transponders, “smart” clothing [23] or NFC (near-field communication) units, to name a few. One reason for including this section is the ongoing shift from the traditional WWW (where we sat by our computers) to the new WWW, meaning a Web-Wired

World [24] – a society where we are more or less constantly connected, and where we transmit and receive data to and from our surrounding environment also at times when we may not be aware of it. In this context, we also mention Reality Mining [25] and technologies that use visitor (mobile phone) data for mapping customer flows e.g. in a shopping center [26]. Built-in functionality as compass direction will also fit in here (with a short discussion on how this ability can be useful in future mobile and other services).

III. WORK Most of the exercises and lectures have been developed during the past two (2011-2013) academic years. Lectures have been used in four classes with a general student audience (nonIT majors) and in three IT major (computer science/information systems) classes. Exercises have been tested in two different types of classes: Windows programming (using Visual Studio) and web programming. Some exercises have been developed together with final year computer science students, as part of their research projects/internships.

B. Beyond SMILE In some classes we add a seminar to the SMILE package and discuss two more areas for input and output which may be important for students to know about: A for artifacts and H for haptic (turning the acronym into AH, SMILE). We have added these two types of functionality as independent sections as we think they represent changes in what people in common think technology can be used for. As fairly novel topics, these areas also add to our examples on current technological transitions.

All in all, we have so far developed two lectures, matched with discussions and group exercises and six shorter programming lab workshops with instructions. Most deal with getting to know the Microsoft Kinect unit (different from the Xbox game device) and control it through C#. To promote the idea of “the web as your device” we have also created an exercise that explores the MIT WAMI [34] speech-enabled web browser.



Artifacts: output has traditionally been directed to computer screens (glass) or printed on paper, with sound (audio) as a third but less used option. It is evident for students that Green IT [27] and insights about a need for sustainability have triggered a change from paper to glass. We exemplify how this transition can affect entire industries; a big paper mill, located in the same county as our institution has recently closed two newsprint machines, leaving hundreds of workers unemployed [28]. Printing does however also take new forms. Having been around for many years (but at considerably higher prices), 3-D printers are now readily available at costs below 400 US$. If there is no need for a printer in your home, printing services are available from online communities. Some researchers see collaborative, opensource developer innovations as having the potential to change entire industries [29] and that an ability to cheaply “print things” has a big potential also for students and start-up companies. We hope to see many innovative ideas from our own students, as our first 3-D printer will be installed in a student design studio 2014.



Lectures and programming exercises are free to use and modify. We would appreciate copies of improved slides/texts. You can download our original work from a webpage we hope to update as more lectures and exercises are developed. The url is http://hh.mywr.net/misc/smile_links.html. A. SMILE in the classroom (lectures) Our two lectures introducing SMILE have a common body but differ in technical depth and explanations. We have found that in general, each new cohort of arriving students seems to be more “tech savvy” than the previous. Few however think about the underlying technologies that must work for them to send a text message, withdraw money from the campus ATM, or connect with their loved ones at home using Skype. To get the students into thinking about the rapid change of technology, we start by speaking about well known, familiar items such as e.g. a USB memory stick. Students will of course have a range of GB capacities but when we show a 128MB USB stick from the late 1990s, it often triggers reflection. To build further on this thinking, we do The Jurassic Park Test:

Haptic: The term haptics refers to sensing and manipulation through the sense of touch [30]. We are all familiar with haptic devices, as both a keyboard and a mouse can be included in this concept [31]. Haptic units covered in our lectures are however different. We start with technology known to many students: old gaming devices (steering wheels) for car racing software using Force Feedback technology; then, we move on to more modern applications as vibration and accelerometers (built into mobile phones). Tactile applications, such as companion robots [32] are also discussed and displayed. For our classes with IT students, we also mention HVR (haptic voice recognition) [33], as an example of crosstechnology initiatives. In short, HVR combines speech recognition with user touch-screen input in an effort to provide better background data for the speech recognition application.

Fig. 1. The Jurassic Park Test slide.

During the lecture the texts will be visible one bullet point at the time. The questions bring out many smiles and nodding heads when thinking about younger brothers, sisters and cousins who have never seen, and will never hold in their hands, this symbol of recently but forever (?) obsolete technologies. We then continue with a discussion about other “modern” devices which have disappeared. The move from “fat-screens” (CRT monitors) to “flat-screens” is an often mentioned example. From upgrading technology shifts (hardware to hardware, as floppy disks to memory sticks, and CRT to LCD monitors) we then continue the lecture and turn our attention to shifts in delivery methods and user activity: from hardware to software (as typing to speech recognition) and from clicking to swiping (mechanical sensing) and gestures (image/motion sensing). Here in the lecture we also touch upon the future (if any) for laptops, and the move towards display, rather than real, keyboards. Some students arriving also still remember game-hall and home entertainment dance games. It comes as a surprise to many students that “active” interaction through voice, movements and gestures have been researched and reported on for a long time [35], [36]. When the old technologies are explained in a layman format, the ongoing transition to SMILE technologies also become more evident: from rather huge consoles to “dance-pads” on the living room floor, to plain motion sensing. To follow up on the Jurassic Park Generation and rapidly changing technologies themes we look into the technology that seem very intuitive to small children today. Here, the aim is to make the students aware of the need to stay updated (in their private as well as business lives) and to realize that skills and demands from future consumers will not stop at what students in the lecture think is modern. We try to make this evident by setting up a scenario 15 years from now – when most of our current students have formed their own families – and a true story from one of our colleagues (regarding her daughter).

A lecture on “New technologies in e- and m-commerce” (with a more business oriented focus) has also been created and will be introduced for both audiences from autumn 2013. The lecture covers, among other topics, QR codes, geo-location and payment issues, which later are followed up in a workshop for web design students. All lectures are written in English, are free to use under a CC license and can be downloaded in both .pdf and .pptx formats from the smile_links web page. B. SMILE in the computer lab (workshops) For the Microsoft Kinect (not the Xbox game device) five short (targeting a 2-hours lab session) exercises have been developed as a final year CS student project. Another project, also resulting in a similar lab exercise, investigates the WAMI speech-controlled web browser project (from MIT) and uses JavaScript instead of C#. Used mainly for Web Design students, a workshop on e-/m-commerce introduces a number of web services and functions for “modern” use and will build on the lecture we introduce this autumn. Instructions for all workshops are written in English, are free to use, and can be downloaded in .pdf and .docx formats. 1) Kinect: Getting started This five-page exercise introduces the device and helps the student download all necessary SDK tools. It describes the basic programming functions and vocabulary required to initiate a Kinect programming project.

Fig. 3. The Microsoft Kinect.

2) Creating a Kinect WPF template Here, students are taught how to add a Kinect template to their Visual Studio environment from a short, twopage instruction.

Fig. 2. Transitions in input technologies slide.

For the IT major group, we also hold a short discussion exercise focusing on past, present, and future interaction, with the aim to deepen their understanding further before workshops.

Fig. 4. A Kinect WPF template in Visual Studio.

3) Image manipulation This exercise introduces color and infrared cameras, image streams, manipulating colors, masking out backgrounds, handle the depth sensor, and track players.

6) WAMI Different from the Kinect exercises which build in the C# language, the WAMI speech-enabled web browser uses a JavaScript API for its functionality. It is not as flexible in word recognition as the Kinect (which can be set to detect specific words) but it has the advantage of being web based and not requiring other hardware than a microphone.

Fig. 5. Changing an image to pure grayscale and masking the background.

4) Movement An extensive exercise (14 pages) covering skeletons, positioning and movements for various applications. Fig. 8. The MIT WAMI website.

We have also worked with two other Kinect image exercises but have at time of this paper’s submission not fully completed the editing. Our goal is to make exercises that are rather fast (some hours) to complete, and that have a practical application. The next in line is called “the fake postcard” and uses masking and insertion of a background image to produce an image showing a destination behind the masked Kinect image. It can look something like this:

Fig. 6. The Connect the Dots exercise.

5) Sounds The text introduces audio functions and elaborates on speech recognition in a Stroop Effect [37] application.

Fig. 7. The Stroop Effect speech recognition exercise.

Fig. 9. A prototype of the fake postcard.

IV. CONCLUSIONS Introducing SMILE, as well as the older WIMP and SILK paradigms (as background, and for a discussion on progression) seems to make it easier to grasp the concepts later in the workshops although most of the younger students are seasoned users of these technologies. Slide presentations, including what we have named “the Jurassic Park Test” make many students see the rapid change in technologies with new eyes. One limiting factor in introducing new technologies is the lack of equipment (or adequate numbers of devices, so more than a few student groups can test their code). Equipment also has a short lifespan: some Xbox Kinect devices bought for another study were targeted for possible use in our labs; the new Xbox One will have different specs (and this could also affect how it is programmed). Overall, covering other topics than “merely the traditional” when it comes to interfaces as general content (and in programming) seems to wet the students’ appetite for implementing new functionality and add to their curiosity about technology development. Course evaluations (although we only have a few so far) are also positive to our initiative of revamping part of the course content. ACKNOWLEDGMENT Part of the workshop exercises were written (under supervision) by Sjoerd Houben, Thomas More University College, Belgium (Microsoft Kinect). Manuel Aelbrecht, University College Ghent, Belgium provided input for the WAMI unit. Product names and other trademarks referred to in this paper are the property of the respective trademark holders. REFERENCES [1]

L. M. Surhone, M. T. Timpledon, and S. F. Marseken, WIMP Computing VDM Publishing Group (Germany), 2010. [2] R. Reddy, Turing Award Lecture: To dream the possible dream. Communications of the ACM 39, 5 (May 1996), pp. 105-112. [3] D. Wigdor, “Architecting Next-Generation User Interfaces”. ACM AVI’10 (Rome, Italy), pp. 16-22. [4] S. M. Drucker, D. Fisher, R. Sadana, J. Herron, and M. C. Shraefel, ”TouchViz: A Case Study Comparing Two Interfaces for Data Analytics on Tablets”. ACM CHI 2013 (Paris, France) pp. 2301-2310. [5] C. N. Quinn, “Mobile Learning: Landscape and Trends”, The eLearning Guild Research, 2011. Download at https://commons.lbl.gov/download/ attachments/77828943/mobile2011report-f2.pdf [6] A. Matias, and D. F. Wolf, “Engaging Students in Online Courses Through the Use of Mobile Technology”. In Cutting-Edge Technologies in Higher Education, Vol 6 Part D, pp. 115-142. (Emerald Group Publishing Limited, 2013). [7] A. van Dam, “Post-WIMP User Interfaces”. Communications of the ACM 40, 2 ( February 1997), pp. 63-67. [8] J. Jain, A. Lund, and D. Wixon, “The Future of Natural User Interfaces”. ACM CHI 2011(Vancouver, BC, Canada), pp. 211-214. [9] C. Breazeal, “Socially Intelligent Robots: Research, Development and Applications”. IEEE International conference on Systems, Man, and Cybernetics (2001), pp. 2121-2126. [10] P. Khunarsal, C. Lursinsap, and T. Raicharoen, “Singing voice recognition based on matching of spectrogram pattern”. Proceedings of the 2009 international joint conference on Neural Networks (IEEE), pp. 3012-3016. [11] The YAMAHA Vocaloid website: http://www.vocaloid.com/en/ [12] Video on: http://www.youtube.com/watch?v=mfxkhzGqZIs

[13] C. Lee, H. Lee, and K. Oh, “Real-time Image-based 3D Avatar for Immersive Game”. ACM VRCAI 2008 (Singapore), article 48 (no page enumeration). [14] IKEA’s Anna avatar: http://www.ikea.com/gb/en/customerservices/faq [15] B. Froding, and M. Peterson, “Why computer games can be essential for human flourishing”. Journal of Information, Communication and Ethics in Society (2013), 11, 2, pp. 81-91. [16] Video from a concert: http://www.youtube.com/watch?v=O17f3lB7BFY [17] Microsoft Tag website: http://tag.microsoft.com/home.aspx [18] String’s website: http://www.poweredbystring.com/ [19] Google’s Ingress website: http://www.ingress.com/ [20] The Google Glass website: http://www.google.com/glass/start/ [21] M. Akbacak et al. “Recent advances in SRI'S IraqComm™ Iraqi ArabicEnglish speech-to-speech translation system”. IEEE International Conference on Acoustics, Speech and Signal Processing (2009), pp. 4809-4812. [22] Google translation API: https://developers.google.com/translate/ [23] J. Rantanen et al., “Smart Clothing Prototype for the Arctic Environment”, Personal and Ubiquitous Computing (2002) 6, 1, pp. 3–16. [24] http://www.independent.co.uk/life-style/gadgets-and-tech/features/webwired-world-a-guide-to-the-twospeed-online-planet-8168278.html [25] N. Eagle, and A. Portland, “Reality Mining: Sensing Complex Social Systems”. Journal of Personal and Ubiquitous Computing (2006) 10, 4. Can be downloaded at http://hd.media.mit.edu/tech-reports/TR-588.pdf [26] PathIntelligence website: http://www.pathintelligence.com/ [27] B. Unhelkar, “Green IT: the Next Five Years”. IEEE IT Professional (March-April 2011), 13, 2, pp. 56-59. [28] http://www.euwid-paper.com/news/singlenews/Artikel/stora-enso-toclose-two-newsprint-mills-in-sweden.html [29] J. P. J. de Jong, and E. de Bruin, “Innovation Lessons From 3-D Printing”. MIT Sloan Management Review (Winter 2013), 54, 2, pp. 43-52. [30] H. Z. Tan, “Perceptual user interfaces: haptic interfaces”. Communications of the ACM (March 2000), 43, 3, pp. 40-41. [31] V. Hayward, O. R. Astley, M. Cruz-Hernandez, D. Grant, and G. Robles-De-La-Torre, “Tutorial: Haptic interfaces and devices”. Sensor Review (2004), 24, 1, pp. 16-29. [32] M. Heerink, B. Kröse, V. Evers, and B. Wielinga, “The Influence of Social Presence on Acceptance of a Companion Robot by Older People”. Journal of Physical Agents, 2, 2 (June 2008), pp. 33-40. [33] K. C. Sim, S. Zhao, K. Yu, and H. Liao, “ICMI’12 Grand Challenge – Haptic Voice Recognition”. ACM IMCI 2012 (Santa Monica, CA, USA), pp. 363-370. [34] The WAMI website: http://www.csail.mit.edu/research/playground/wami [35] C. S. Pinhanez et al., “Pysically interactive story environments”. IBM Systems Journal (2000), 39, 3/4, pp. 434-455. [36] M. J. Klein, and C. S. Simmers, “Exergaming: virtual inspiration, real perspiration”. Young Consumers (2009), 10, 1, pp. 35-45. [37] R. de Young’s website covering the Stroop Effect is a good starting point: http://www.snre.umich.edu/eplab/demos/st0/stroopdesc.html