Spatiotemporal data from mobile phones for personal mobility ...

10 downloads 17719 Views 1021KB Size Report
phones around the world, the data produced by mobile phone usage seem to offer a ... data could serve as the basis for a large-scale transportation research project. ... Passive localization involves extracting localization information from mo-.
In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Spatiotemporal data from mobile phones for personal mobility assessment Zbigniew Smoreda, Ana-Maria Olteanu-Raimond, and Thomas Couronné Sociology and Economics of Networks and Services department Orange Labs R&D, Paris, France Abstract

In this paper, we will review several alternative methods of collecting data from mobile phones for human mobility analysis. We will briefly describe cellular phone network architecture and the location data it can provide, and will discuss two types of data collection: active and passive localization. Active localization is something like a personal travel diary. It provides a tool for recording positioning data on a survey sample over a long period of time. Passive localization, on the other hand, is based on phone network data which are automatically recorded for technical or billing purposes. It offers the advantage of access to very large user populations for mobility flow analysis of a broad area. We propose considering cellular network location data as a useful complementary source for human mobility research and provide case studies to illustrate the advantages and disadvantages of each method. Keywords: mobile phone data, human mobility, localization tracing methods

1. Introduction In transportation studies, the difficulty of conducting classic travel surveys using self-reporting records (diaries or questionnaires) and the growing need to collect long-term mobility data have drawn attention to automatic data collection systems. In particular, researchers have considered using Global Positioning System (GPS) technologies to supplement the data gathered using more traditional techniques such as paper or electronic travel diaries (e.g. Stopher & Wilmot 2000, Wolf et al. 2001). GPS location data are highly precise, offer a very high recording frequency, and can be used to measure both the speed and direction of an equipped individual or vehicle, meaning that a wide range of information on mobility can be extracted from them. GPS technologies can distinguish between different modes of transportation, even when both are travelling at similar speeds. The newest technologies can even monitor a device‟s altitude and capture vertical movements (see: xxx in this volume). Automated smart card fare collection systems can also be a valuable source of massive amounts of data on public transportation. In cities where these systems have been implemented, they generate a huge quantity of real-time data on transactions completed on public transit vehicles. Since these data are time-referenced and sometimes precisely space-referenced, they can be used to estimate ridership across the transportation network (Pelletier et al. 2011). We propose considering cellular network data as a useful complementary source for research on mobility. The main purpose of a telephone is, of course, to enable its user to hold remote conver1

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

sations, and more recently to send SMS or surf the Internet. However, in the case of cellular systems, a connection to the mobile phone antenna infrastructure also makes it possible to collect the handset‟s geographical location. Given the extremely broad and rapid spread of mobile phones around the world, the data produced by mobile phone usage seem to offer a valuable tool for research. Cell phone localization is far less precise than GPS technology (within a hundred meters in densely populated cities, and within several kilometers in rural areas) and offers only infrequent location records. However, it also offers clear advantages compared to GPS-based methods: there is no need to provide subjects with a supplementary device or to develop a data collection facility, and, more importantly, almost the entire population is already equipped for a study. Last but not least, the existing technical network offers the possibility of real-time and continuous localization monitoring. Mobile phones have become a kind of “personal sensor.” In addition to being used for communication, they also serve as calendars, address-books, or even full-fledged personal computers and Internet browsers, so we are essentially constantly carrying sensors in our bags or pockets. We also connect to mobile networks more and more frequently, whether to communicate (via calls or SMS) or to check our email, view the weather forecast, read the newspaper, listen to music, watch videos, etc. Handsets sometimes connect to the network automatically, to collect email or update an installed application or program. Their position is also regularly updated (see below), since operators need to know the approximate location of all the devices connected to their networks. An ever increasing amount of location data on mobile phone users over long time periods (weeks, months, even years) is thus available. Mobile phone data also offer certain benefits as compared to RFID fare card data. They cover almost all areas and types of situations (not only transit networks), and are available for public transportation users and non-users alike. A combination of RFID fare card and mobile phone data could serve as the basis for a large-scale transportation research project. We will begin with a brief description of cellular phone network architecture and the location data it can provide before discussing two types of data collection, active and passive localization. Active localization is similar to “augmented” personal diary data collection, as it aims to provide tools for recording positioning data on a classic survey sample over a long time period. Passive localization, on the other hand, is based on phone network data which are automatically recorded for technical or billing purposes. It offers the advantage of access to very large user populations for mobility flow analysis of a broad area. We provide a case study for each method in order to exemplify the technical issues associated with each dataset.

2. Mobile phone networks and localization In this section, we briefly introduce the location data which can be extracted from mobile phone networks. A cellular network is a radio network of base transceiver stations (BTS) - each of which has one or more antennas - distributed over a given area in order to provide the best possible radio coverage via small regions called cells, cf. Figure 1. When linked together, these cells provide radio coverage over a wide geographic area, enabling a large number of portable radio devices (e.g. mobile phones) to access the phone network via BTS, even if they move through 2

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

more than one cell during transmission. In order to connect the devices in the area covered, the cellular network must identify the position of the mobile phones that call each other using a cell identifier (ID). The cell ID can be translated into BTS geographical coordinates or the center of the cell area, i.e., the approximate geographic position of a device. During movement between cells, the network will command the mobile unit to switch to the next cell in order to avoid dropping the call and ensure an uninterrupted connection. This cellular handover will record all of the cells through which the mobile phone passes during the connection, providing a kind of microtrajectory.

Legend:

BTS - Base Transceiver Station C - Cell LAC - Location Area Code

Figure 1. Global System for Mobile Communications (GSM) cellular network coverage and its local architecture In order to manage users‟ mobility, the network is also divided into larger (geo) administrative zones called Location Areas (LA).1 A user moving from one LA to another triggers a Location Area Update (LAU), which provides the cell ID of the new LA, even if the mobile unit is not connected. This procedure is also performed periodically when a device has not been connected to the network for several hours. The cellular network must know the approximate position of all mobile units at all times in order to be able to connect them on demand in a reasonable timeframe. Switching a handset on or off also generates a specific cell record. All of these data can be used to obtain location records on mobile phone users for human mobility studies. We experimented with three different methodologies based on mobile phone localization: (1) trying to extend the duration of observation by integrating location tracing inside a 1

Each Location Area is composed of several hundred of cells. 3

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

cell phone, (2) seeking to generalize this method to all types of mobile phone present on a cellular network using active localization, and (3) testing the re-use of large mobile phone datasets (billing or network signaling data) to study population mobility patterns.

3. Mobile phone location collection methods This section will present the main mobile phone tracing technologies. As discussed above, cellular network localization methods can be roughly divided into two categories: active and passive localization tracing. Active localization entails provoking cell localization of a specific mobile device using mobility-aware software integrated into the device or generating localization through the network. Passive localization involves extracting localization information from mobile phone data collected for other purposes. 3.1. Active handset localization 3.1.1. Integrated collection software

If we define mobile phones as “personal sensors”, the first idea which comes to mind for collecting individual mobility data is capturing location data directly on handsets. In our first trial, we decided to start with this idea and embedded tracing software in a cell phone in order to run an automatic "mobility diary" for a long period of time. In 2005, when the study was developed, the use of a GPS embedded in the phone was unrealistic due to excessive power consumption, even in the case of professional hybrid GSM/GPS devices. The study aimed to monitor several months of mobility and communication practices, making it impossible to ask subjects to carry several supplementary batteries and charge them every six hours. We therefore selected an energy-efficient cell tracking solution integrated into Nokia 3650 phones. Symbian operating system software capable of capturing each cell change was used and a specific communication data collection and transmission module was developed for this study. Case study An ethnographic study was carried out using this method of recording users‟ successive locations and all types of communication events involving their mobile phones (Licoppe et al. 2008). The study focused on the articulation of the movements and mobile communications of twenty-four users. The scope of this paper only allows us to quickly review a few points which seem relevant to mobility studies in terms of the method used. The procedure of collecting data and transforming them into meaningful individual traces is depicted in Figure 2. Each cell change (NEWCID code - cf. Fig.2) or communication (call, SMS, internet WAP connection) generates a cell ID record. This cell code must be translated into the geographic location. A time analysis is then performed to calculate stops between two cell ID changes and their duration.

4

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Figure 2. Diagram of data collection and decoding using cell tracing on a mobile phone (dotted line - real trajectory; solid line - cell change based trajectory). Each cell change is detected and transformed into geographic coordinates. Figure 2 also shows the difference between the user‟s real trajectory (dotted line) and the trajectory reconstructed using mobile phone data (continuous line). This indicates the possible imprecision of mobile phone location data (cell sizes in the areas crossed). After a short period of observation, we went back to the subjects with the automatically transcribed data, which had been turned into a time ordered print-out that could be read as a personal diary. The “diary” covered ten consecutive days over the preceding two weeks, so the relevant information was still fairly fresh in the subjects‟ minds. They were asked to describe and qualify the type of places and activities which corresponded to the cells visited, moment by moment, and to categorize the people contacted by cell phone as well as their relationship with them. We soon realized that these interviews provided much richer information than the data qualification scope we had set. Every time a discrepancy between the data and the subjects‟ memories appeared, they worked hard to make sense of it using the data presented to them. They turned into investigators of their own actions, commenting aloud – for their own benefit and that of the interviewer – on their navigation between the clues provided by the data and their own memories. This went 5

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

on until they were able to produce an account which was satisfactory for all practical purposes (Licoppe et al., op.cit). The mobility information provided by the automatic recordings thus seemed to provide subjects with a good basis for analyzing their own movements. The automatic recording of movements and communications lasted for six months with no real problems, with the exception of several people who disappeared from the sample, but there were no refusals to continue the study. To illustrate the interest of this approach and the type of results obtained, we have chosen to discuss two cases that display diametrically opposed mobility patterns. Figure 3 shows a spatiotemporal comparison of movement data on two subjects over a six month period.2

Figure 3. Six-month space-time paths of two selected participants using cell change tracing on a mobile phone. Figure 3 (left) shows location data on a thirty-two-year-old man working in Paris and living in a Parisian suburb during the work week. His wife was living in Cherbourg (Normandy), where he spent all weekends and most holidays. A comparison with the mobile phone logs enabled us to characterize the places visited and to show highly regular mobile behavior over a period of almost two weeks. This was further confirmed when the data were used to examine his movements over a six-month period. Qualifying the places visited over ten days was enough to identify 96 percent of the locations (in terms of cells) visited in six months! We can clearly distinguish his workdays, when he commuted between his flat and his office in Paris, and weekends and other time spent with his wife in Cherbourg. The two holiday periods, one spent in Cherbourg and other in the Alps during the winter, are also obvious. Figure 3 (right side) represents six months of localizations of a thirty-six-year-old woman living in Paris, who had a very different mobility pattern. The automatic diary data for the two weeks preceding the interview showed numerous daily movements and a high level of overlap of all her activities. Her work and lifestyle fit into the scheme of urban mobility oriented between several stable places: her home, her office in the north of Paris and the association where she taught, as well as many short moves to meet people or to look for new job opportunities. As we can see, 2

This analysis was carried out by Hongbo Yu (Oklahoma State University). 6

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

most of her mobility is confined to Paris; even when she moves in other regions, continuous short-distance movements are discerned. In her case, mobility is clearly more than mere movement and has a specific function in itself (Urry 2000). Frequent short trips correlated with intensive mobile phone use give us a picture of a form of mobility which produces situations brimming with opportunities for action, which are grasped by a subject focused on creative exploitation of her stays in different places, related to the multiple activities in which she is engaged. The apparent meaning of mobility behaviors can change during mobility observations which last several months. The first subject‟s long distance mobility now appears static, while the second subject‟s short range movements display real versatility - this would be very difficult to observe using techniques like a one-day travel diary. The mobile phone cell ID decoding method seems to be headed for obsolescence, however, due to the growing popularity of new generation smartphones equipped with GPS receivers. These new generation phones will most likely be a more suitable technology for handset data recording based protocols. For example, the Nokia Mobile Data Challenge recently provided data from nearly 200 volunteers in the Lake Geneva region recorded over one year, including GPS locations and motion information (accelerometer) throughout the period (Laurila et al., 2012). This type of information collected via survey participants‟ phones could become a worthy supplement to transportation survey data in the near future. 3.1.2. Location platform The embedded handset software solution discussed above is not entirely satisfactory, despite the rich data it can generate. In particular, the need for a specific device running an advanced Operating System (OS) limits the scope of the sample or means providing subjects with a new handset. This is especially problematic for research on communications, as a new device can change communication practices.3 We therefore started looking for a non handset-dependent solution. The answer was provided by location-based services (LBS), which began to appear in the early 2000s. LBS for mobile phones are based on cell localization, and mobile carriers have established a selective on-demand mobile phone positioning system. Different telecommunications operators use different methods to determine a mobile phone's position with greater precision. Some are based on a wave propagation model, others on time differences in the signal‟s arrival at the antenna. The important thing is that the system must be able to localize every mobile phone connected to the network regardless of its OS. The location platform, usually used for ondemand user localization to provide contextual information, can also be used – as in the case of fleet management vehicle tracking – to periodically localize a sample of mobile phones. The location server sends a localization command to a mobile phone and translates its response into a geographical position. Case studies We developed an alternative version of our original method using the operator‟s commercial location platform to periodically localize participants‟ mobile phones. In this prototype, the 3

As it could be misleading to provide someone who usually drives an old VW Beetle with the latest BMW for a study of driving habits. 7

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

user‟s mobile phone is actively localized by a targeted network “ping” without disturbing the user. The presence and mobility traces are then made available to participants on a secure personal Web interface. They can annotate their traces (e.g. with meaningful place names, motivation, means of transportation used) or correct locations by moving the points recorded on the map to their real location (similar to Doherty et al. 2001). A six-month user test was carried out with a dozen volunteers using a 15-minute location frequency. As shown in Figure 4, this method provides slightly different traces as compared to our initial experiment: communication events are not collected and the frequency at which the mobile phone is located (in this example 15 min) can disguise some short movements. The first issue can be corrected by coupling this method with passive mobile phone data (cf. next section), while the second can be managed by adjusting the location frequency to suit the study‟s objectives. The stop computation method (with cell precision) is, however, more or less the same: several consecutive locations in the same cell are considered a stop. Trajectories are defined in a similar way, as consecutive points in space between previously identified stops.

Figure 4. Location data using a location platform with a 15-min handset positioning frequency (dotted line - real trajectory; solid line - GSM based trajectory). The device is located in its current cell and cell coordinates are recorded every 15 minutes. Location data can be collected over long periods using this method. It is non-intrusive and participants do not have to do anything to submit the data, which are available to them in real-time on a Web interface. They can also replay the sequence of places visited by choosing a specific day. The interface also gives them the opportunity to view their aggregated personal data, i.e., the most visited places, in order to quickly add semantic information on the most important places in their daily lives. During the test study, we were also able to compare mobile phone traces to GPS localization in an urban context. A mobile phone with an integrated GPS was placed in a car (with tracing frequency set to 60 sec). It was also localized every 15 min using cell position. Figure 5 shows a comparison of the traces. The difference seen in the close-up view (Figure 5-a) is fairly significant. The fifteen-minute pace of cell phone tracking and the cell sizes involved make it impossible to record faster cor8

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

rectly movements along complicated trajectories (in 15 min a car progressing at 20 km/h – the average speed in Paris - will travel 5 km). However, when the image is zoomed out (Figure 5-b) the two trajectories appear similar. The usefulness of our method thus depends on the objectives of the study. If it aims to analyze movement within a neighborhood, it is clear that observation via mobile phone tracking is not suitable. If, on the other hand, the study aims to investigate mobility patterns at on a city or region-wide scale, our method can be very useful. It can be used to select a sample with no concerns about the number of GPS devices required, since participants simply have to be equipped with a mobile phone, which is far from uncommon.4

Figure 5. Comparison of GSM (blue line) and GPS (green) traces inside Paris, both devices on a car board. This method was recently applied to a sample of 60 users observed over five weeks in December 2011 (Ramus 2012). In this study, the localization frequency was reduced to 6 minutes. Andrienko et al. (2012) have produced a temporally aggregated view of the data collected on the national level. This visualization – see figure 6 – clearly shows users‟ routes along the main French highways and railroads, as well as air travel. It is also important to note that in their study, semantic information such as stops and points of interest (home, work, and leisure activities) can be automatically extracted using spatiotemporal analysis, meaning that user intervention is no longer necessary (see also for GPS data: Schüssler & Axhausen 2009). This opens up the possibility of using anonymous passive mobile phone location data for mobility studies.

4

This method is operator independent; each national mobile operator has a LBS platform. 9

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Figure 6. Overview of the 6-minute mobile phone localizations over a five-week period in December 2011 (source: Andrienko et al. 2012)

3.2. Passive location data Up to this point, we have focused on data production methods based on mobile phone network geography. In addition to these active localization methods, a large mass of data is constantly collected by operators for billing purposes or for use in technical network management. These data only cover the company‟s customers, but each company‟s customers can number in the millions. It is no surprise that this source of data has attracted the attention of mathematicians and physicists working in complexity science. Researchers were initially drawn by the fact that social networks can be built from phone exchanges (Eagle & Pentland 2006; Palla et al. 2006; Onnela et al. 2007; Hidalgo & Rodriguez-Sickert 2008), but call locations soon became a topic of research (Ratti et al. 2006, Gonzalez et al. 2008; Ahas et al., 2010; Song et al. 2010; Wang et al. 2010). Some researchers have even proposed the idea of a “cellular census” to assess urban dynamics (Reades et al. 2007; Kostakos et al. 2009; Couronné, Olteanu et al. 2011), or more globally, to renew the social sciences using electronic personal data (Lazer at al. 2009). The advantage for mobility analysis based on mobile phone traces (or other spontaneously produced geolocation data) lies in the number of people concerned (the only limit is the number of equipped people in a population) and the longevity of these studies (theoretically, there is no limit to the observation period). In fact, the use of this technique can profoundly change transportation research, which is generally based of infrequent one-shot surveys. It will not replace such surveys, but can provide a framework for a comprehensive and longitudinal study of temporal dynamics, and can be used to capture ephemeral events and fluctuations in day-to-day mobility behavior.

10

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

3.2.1. Technical network logs

Cellular networks collect a wide range of technical and communication data in order to recognize authorized devices, provide them with signal, and ensure continuous service. These data can be used for mobility analysis (Caceres et al. 2007). As we described above, a cell grouping called Location Area (LA), which is controlled by a Mobile Switching Center (MSC), forms a middle layer of network management. To operate the network, each MSC maintains a database of “signaling events” involving devices present on its territory. The most common signaling events are calls and handovers (cell changes during a communication), SMS, attachment/detachment information (when the mobile is switched on/off), and Location Area updates (crossing LA border, refreshing position of an inactive device). In order to analyze human mobility in a given area, data must be gathered from all MSCs in the area. This is no easy task because this function is not (yet) native in these technically oriented systems. Figure 7 shows a comparison of location traces drawn from MSC data and the other data collection methods.

Figure 7. Location information using MSC data (dotted line - real trajectory; solid line - GSM based trajectory). The SMS received, call starting cell and associated between-cell handovers, as well as two LA updates provide location points with cell center precision. As we can see, data are generated by both communication events (cells connected during a call, sending or receiving SMS, and data sessions) and by mobility (handover during communication or movement between LAs). They are heterogeneous in terms of time, a fact which must be taken into account during analysis. As always in the case of cellular data, geographical precision is limited to the cell center, to be evaluated or calculated using the radio propagation model, if available to the researcher.

11

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Case study The shortcomings of mobile phone data may be partially offset by the inclusion of semantic information on the context of observed mobility. The area where mobile phone users live and move is more or less precisely described using topographic data. These data can be used to correct some of the actual paths taken by people on the move - particularly in the case of longdistance trips. In this context, our approach is based on data fusion - cf. Figure 8. It consists in merging different sources of information (e.g. data from mobile phone networks, geographic and socioeconomic data). The integration of socioeconomic data (population density, economic activity...) is essential in order to define and take into account the geographic context and validate assumptions about semantic information. The approach is divided into three steps. The first step is to model spatiotemporal features which define human mobility, i.e., sources of information and spatiotemporal trajectories (Olteanu-Raimond et al., 2012). The second step is to compute individual trajectories and map match them using geographic data (e.g. topographic or raster data). The final step consists of integrating socioeconomic data in order to better characterize the territory and to add more semantic information to individual trajectories. Once all of these sources of information have been taken into account, spatiotemporal data mining methods are developed to analyze human mobility and behavior.

Figure 8. Human mobility approach to mobile phone data based on data fusion: merging mobile phone data with geographical information to define individual trajectories. We are interested in adding more semantic information to spatiotemporal trajectories. In this context, a study of long-distance travel mode identification was carried out. A map-matching algorithm was developed to identify the different modes of transportation. The algorithm matches mobile phone location data and vector data (e.g. railways, roads, airports) using the two main steps described below. (1) The local approach, which entails matching the recorded mobile phone locations to the correct vector data based on two criteria: Euclidian distance and speed between two consecutive recorded locations. More precisely: (a) for each user and for each recorded location, P ij, where 12

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

i=1..N (N represents the total number of locations for user j), we look for close spatial features in vector data, based on a distance criterion. These features are the candidates for matching with P ij; (b) each candidate Ck is then separately analyzed on the basis of two criteria: the distance between Pij and the candidate to match and the speed between two consecutive recorded locations. Three hypotheses are defined: (i) Pij is matched with candidate Ck; (ii) Pij is not matched with candidate Ck; and (iii) “we don‟t know”. The last hypothesis represents ignorance, i.e., we are unable to make a decision on the criteria. For example, if the speed observed is greater than 180 km/h, it is easy to assume that the mode of transportation is a high speed train, but when the speed observed is 30 km/h, we cannot be sure; it could be a car, a bicycle or a train. (2) The global approach: this step consists in identifying the route travelled and the mode of transportation. The assumption with the highest number of points is chosen. Due to the fact that there is no uncertainty at this level, we defined a confidence measure: C = (max1-max2)/(max1+max2)

(1)

where max1 and max2 represent respectively the first maximum and the second maximum for the same user. If the confidence measure is less than a set threshold, the algorithm raises doubts about the validity of the results. Doubtful results are classified as ambiguous. The cardinality of matching links is 1:1, i.e., one mobile phone point is matched with a geographic feature. It is important to note that the matching algorithm we proposed is not suitable for dense urban areas, and needs to be improved for use at a multi-scale resolution. One possibility is to begin by defining more decision criteria and to propose n:m links (i.e., one mobile phone point n is matched with m geographic features). One-day‟s worth of mobile phone MSC data were used to test our approach (~10,000 users). The set of stops was composed of two cities in France: Paris and Strasbourg, located 500 km apart. The matching algorithm - as sketched in Figure 9 - can distinguish between the following modes of transportation: plane, train, and car (by highway or national roads).

Figure 9. Paris - Strasbourg study: matching localization from MSC data with vector data. Top, location points and vector routes/rails data superposition; below, points attribution to the paths. 13

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Our dataset produced the following results: 58% of trajectories are matched with the railway network, 23% of trajectories are matched with the road network, 2% of trajectories are matched with airlines routes (airports in the two cities), 2% of trajectories are non-matched, and 15% of trajectories are classified as ambiguous. This last class consists of trajectories composed of less than four points or with a confidence measure of less than 0.6. 3.2.2. Billing data

Call Detail Records (CDR), which are the basis of telephone operators‟ billing systems, are the most popular form of mobile phone data among researchers. They cover each customer‟s communications history and are recorded for each communication (call, SMS, internet connection...). In practice, operators must archive this information for at least one year for litigation management purposes. The CDR contains the timestamp, call duration and type of call (voice, SMS, data), as well as the code of the cell in which the communication occurs. The system may also record handovers (cells crossed during a call) or the first and the last cell of each call, but it most cases only the cell in which the call begins is recorded. The popularity of CDRs in the research is mainly due to their huge size (months and months of communication and mobility behavior traces of millions of people) and their relative ease of extraction and use (they have a standard format, are preprocessed for billing purposes, and are recorded in highly secured databases) rather than the location data itself. As shown in figure 10, the localization information provided by CDRs is poorer than that obtained using any the methods discussed above. The phone position records are based solely on the user‟s communication activities.

Figure 10. Location information using CDRs (dotted line - real trajectory; solid line - GSM based trajectory reconstruction). The only recorded points are the receipt of two SMS and the cell in which the call began.

14

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Case study Studies of massive mobile phone location data have, however, been shown to offer great potential for modeling human mobility. For example, González et al. (2008) studied the trajectories of 100,000 mobile phone users whose positions were recorded over a six-month period. They concluded that human trajectories show a high degree of temporal and spatial regularity. Individuals are characterized by characteristic non time-dependent travel distances and a significant probability of returning to a few frequently visited locations. Despite the diversity of their travel histories and social characteristics, humans generally follow simple, reproducible patterns. This result was confirmed and extended by Song et al. (2010): "there is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population." (p.1020). Following on from this research, we analyzed five months of 2007 CDRs for the Paris region 5 (Schneider et al., 2012). As CDR data are relatively poor in terms of point positioning and the main challenge in this case is identifying the activity locations despite incomplete mobile phone data, the 40,000 most active users were chosen for the analysis which can, of course, introduce a sample bias. 6 We chose to run this risk because our study was intended to compare mobile phone data with travel surveys on a very general level. We combined information on phone location and time of location to construct small movement oriented networks (places connected by trips), called mobility motifs. Our study covered closed trips, which should start and end in the same location, in our case the point where the mobile phone was observed at night (or the user‟s home in a travel survey). The number of motifs is significantly reduced by this constraint, but remains large, with 1,047,008 possibilities for just 6 locations. The same procedure was applied to two household travel surveys, the Paris region Enquête Globale de Transport (2002) and the Chicago Regional Household Travel Inventory (2008).7 Each individual‟s motif was generated for each day and the most common motifs identified. Despite the fact that millions of possible combinations exist, a mere seven describe over 80% of the population, for both phone and survey data - cf. figure 11. Surprisingly, a few simple mobility motifs provide a view of daily activity. The Parisian phone traces and travel survey data reveal the same motifs, as do the Chicago survey data! We can suppose that the extracted motifs are general human mobility characteristics which can be further used with travel time analysis to model and simulate urban activity.

5

Approximately 12 million inhabitants, 12,012 km² area. We have already identified this bias by comparing communication (calls and SMS) to mobility traces (LA updates) in the MSC data. Very mobile people are also more frequent communicators - see: Couronné, Smoreda, & Olteanu 2011. 7 Samples are for the EGT- 10,500 households and for the CRHTI - 14,390 households. 6

15

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Figure 11. Distribution of daily travel motifs extracted from mobile phone data, Paris (light blue) and Chicago (deep blue) travel surveys. Each daily motif is represented on the top as places (nodes) - trips (arrows) oriented graph (where 1=no move, 2=round-trip between two places, etc.). In addition to these rather abstract studies, CDRs are also directly tested in the field of travel behavior. While surveys provide only static information on origin-destination flows, the OD matrices derived from mobile phone data reveal the differences in travel demands over time (Calabrese, Di Lorenzo et al. 2011). Wang and colleagues (2010) performed a case study using CDRs for the city of Boston to prove the suitability of this type of data for statistical analysis of the transportation modes used by a large population when fine-grained sampling is unavailable. They conclude that: ”The method can be easily implemented and applied in real world and for large populations, so could be a suitable candidate for augmenting existing transportation datasets used for city planning and transportation management” (Wang et al. 2010, p. 323). Similarly, Doyle et al. (2011) proposed a methodology for inferring the mode of transportation which individuals took between two cities in Ireland. Despite the heterogeneous spatial resolution (uneven mobile cell reception areas) and sampling rates (the timing of communications), the large volume of CDR data makes it possible to reconstruct many salient aspects of individual daily routines, such as the most frequently visited locations and the time and periodicity of such visits (Ahas 2010).

4. Conclusion Cellular network data, although limited in terms of location precision and recording frequency, offer two major advantages for studying human mobility. First, the wide adoption of mobile phones by populations around the world makes it possible to study the behavior of a very large number of individuals. No less important, this type of data allows researchers to choose a spe16

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

cific data collection methodology (active or passive), depending on the objectives of their research. The main advantage of active data collection is the fact that the quality of the raw data can be improved in terms of location and semantic information by asking subjects to validate or change certain recorded locations and add information about their movements, stops, modes of transportation, etc. Furthermore, the recording frequency can be constant and adjusted as necessary. The disadvantage of this technique – and of surveys in general – is that it can only be applied to a limited number of people who agree to participate in the study. Compared to this technique, GPS data collection provides the advantage of highly accurate location, to within a few meters. However, GPS data collection requires a dedicated device, in certain situations a GPS receiver may lose its satellite signal, and its power consumption makes it difficult to follow a person for more than one day without needing to recharge its battery. The data collected must also be transferred to an external server, increasing the complexity of the project. Passive data collection can be used to record data on a significant portion of the population over a long period of time. Its main disadvantage is the lack of semantic information (i.e., reasons for travel, modes of transportation, stops, and personal characteristics). Several methods for partially overcoming this lack of information have been proposed in the literature (for example: Spaccapietra et al. 2008; Spinsanti et al. 2010; Andrienko et al., 2011; Olteanu-Raimond et al., 2012). However, the “big issue” for passive mobile phone data today is the need for validation using other measures and survey data. With the exception of a few tests similar to those described above, no concerted research project crossing transportation survey and mobile phone data, which would be comparable to GPS technology validation (Stopher et al. 2007), has yet been undertaken. Automatic data recording also generates privacy issues. The use of geolocation means that traditional user anonymization no longer offers adequate privacy protection. We could suggest a variety of different methods depending on the research topic involved, ranging from sampling by short time window (daily or weekly) to geographical aggregation of individual localizations. However, the National Regulation Authority has the last word on the matter, and evaluates the research project and its pros and cons in terms of the public good. The new opportunity provided by this type of behavioral data has led to a rapidly growing field of interdisciplinary research with a genuine interest in human mobility studies based on mobile phone localization, as evidenced by the number of papers presented at the NetMob 2011 (the second conference on the Analysis of Mobile Phone Datasets and Networks), for example. Recent work on the question has also sought to understand the interplay between mobility patterns and social ties (Calabrese, Smoreda et al. 2011; Phithakkitnukoon et al. 2012; Wang et al. 2011), opening up new perspectives not only for complexity research but also for a more intensive dialogue between transportation research and sociology.

17

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

5. References Ahas, R. (2010). Mobile positioning in mobility studies, in M. Büscher, J. Urry, K. Witchger (eds.), Mobile Methods. London: Routledge, 183-199. Ahas, R., Aasa, A., Silm, S., Tiru, M. (2010). Daily rhythms of suburban commuters' movements in the Tallinn metropolitan area: Case study with mobile positioning data. Transportation Research Part C: Emerging Technologies, 18, 45-54. Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., Wrobel, S. (2011). From movement tracks through events to places: Extracting and characterizing significant places from mobility data. Proceedings of IEEE Visual Analytics Science and Technology, 161-170. Andrienko, G., Andrienko, N., Olteanu-Raimond, A.M., Symanzik, J., Ziemlicki, C. (2012). Towards extracting semantics from movement data by Visual Analytics approaches. GIScience workshop on GeoVisual Analytics: Time to Focus on Time, Columbus, OH, September 2012. Caceres, N., Wideberg, J., Benitez, F. (2007). Deriving origin-destination data from a mobile phone net-work. Intelligent Transport Systems, 1, 15–26. Calabrese, F., Di Lorenzo, G., Liu, L., Ratti, C. (2011). Estimating origin-destination flows using mobile phone location data. Pervasive Computing, 10, 36-44. Calabrese, F., Smoreda, Z., Blondel, V., Ratti, C. (2011). Interplay between telecommunications and face-to-face interactions: A study using mobile phone data, PLoS ONE 6(7). Chicago Regional Household Travel Inventory (2008). http://www.cmap.illinois.gov/traveltracker-survey Couronné, T., Olteanu-Raimond, A.M., Smoreda, Z. (2011). Looking at spatio-temporal city dynamics through mobile phone lenses. Proceedings of the IEEE International Conference of Network of the Future, Paris, November 2011. Couronné, T., Smoreda, Z., Olteanu-Raimond, A.M. (2011). Chatty Mobiles: Individual mobility and communication pattern. NETMOB Conference, Cambridge, MA, October 2011. Doherty, S.T., Noel, C., Lee-Gosselin, M., Sirois, C., Ueno, M., Theberge, F. (2001). Moving beyond observed outcomes: Integrating Global Positioning Systems and interactive computer-based travel behaviour surveys. Transportation Research E-Circular, C026, 449–466. Doyley, J., Hung, P., Kelly, D., McLoone, S., Farrell, R. (2011). Utilising mobile phone billing records for travel mode discovery. ISSC 2011, Trinity College Dublin, June 2011. Eagle, N., Pentland, A. (2006). Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 10, 255-268. Enquete Globale de Transport (2002). http://www.stif.info/IMG/pdf/EGT_2001_2002.pdf González, M., Hidalgo, C., Barabási, A.L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779–782. Hidalgo, C., Rodriguez-Sickert, C. (2008). The dynamics of a mobile phone network. Physica A: Statistical Mechanics and its Applications, 387(12), 3017-3024. Kostakos, V., Nicolai, T., Yoneki, E., O‟Neill, E., Kenn, H., Crowcroft, J. (2009). Understanding and measuring the urban pervasive infrastructure. Personal and Ubiquitous Computing, 13, 355–364. Laurila, J.K., Gatica-Perez, D., Aad, I., Blom, J., Bornet, O. et al. (2012). The Mobile Data Challenge: Big Data for Mobile Computing Research. Proceeding of Mobile Data Challenge Workshop (MDC), PURBA, Newcastle, June 2012. 18

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.L. et al. (2009). Computational social science. Science, 323(5915), 721–723. Licoppe, C., Diminescu, D., Smoreda, Z., Ziemlicki, C. (2008). Using mobile phone geolocalisation for „socio-geographical‟ analysis of co-ordination, urban mobilities, and social integration patterns. Tijdschrift voor economische en sociale geografie, 99(5), 584–601. Olteanu-Raimond, A.M., Couronné, T., Fen-Chong, J., Smoreda, Z. (2012). Le Paris des visiteurs, qu‟en disent les téléphones mobiles? Inférence des pratiques spatiales et fréquentations des sites touristiques en Ile-de-France, Revue Internationale de la Géomatique, 22 (in press). Onnela, J., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertész, J., Barabási, A.L. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences, 104(18), 7332-7336. Palla, G., Barabási, A.L., Vicsek, V. (2007). Quantifying social group evolution. Nature, 446, 664-667. Pelletier, M.P., Trépanier, M., Morency, C. (2011). Smart card data in public transit: A review. Transportation Research C: Emerging Technologies, 19(4), 557-568. Phithakkitnukoon, S., Smoreda, Z., Olivier, P. (2012). Socio-Geography of human mobility: A study using longitudinal mobile phone data. PLoS ONE, 7(6). Ramus, C. (2012). Venez voir à quoi ressemblent vos données de connexion mobile: un angle surprenant! Journal du Net (on-line: http://www.journaldunet.com/ebusiness/expert/51781/venez-voir-a-quoi-ressemble-vosdonnees-de-connexion-mobile---un-angle-surprenant.shtml ) Ratti, C., Pulselli, R.M., Williams, S., Frenchman, D. (2006). Mobile Landscapes: Using location data from cell phones for urban analysis. Environment and Planning B: Planning and Design, 33(5), 727-748. Reades, J., Calabrese, F., Sevtsuk, A., Ratti, C. (2007). Cellular Census: Explorations in urban data collection. IEEE Pervasive Computing, 6, 30-38. Schneider, C., Couronné, T., Smoreda, Z., González, M. (2012). Are we in our travel decisions self-determined? Bulletin of the American Physical Society, APS March Meeting 2012, 57(1). Schüssler, N., Axhausen, K.W. (2009). Processing GPS raw data without additional information. Transportation Research Record, 2105, 28–36. Song, C., Qu, Z., Blumm, N., Barabasi A.L. (2010). Limits of predictability in human mobility. Science, 327(5968), 1018–1021. Spaccapietra, S., Parent, C., Damiani, M.L., De Macedo, J.A., Porto, F., Vangenot, C. (2008). A conceptual view on trajectories. Data and Knowledge Engineering, 65, 126-146. Spinsanti, L., Celli, F., Renso, C. (2010). Where you stop is who you are: Understanding people‟s activities. 5th Workshop on Behavior Monitoring and Interpretation, Karlsruhe, September 2010. Stopher, P., FitzGerald, C., Xu, M. (2007). Assessing the accuracy of the Sydney Household Travel Survey with GPS. Transportation, 34, 723-741. Stopher, P., Wilmot, C. (2000). Some new approaches to designing Household Travel Surveys – Time-Use Diaries and GPS. 79th Annual Meeting of the Transportation Research Board, Washington, D.C., January 2000.

19

In J. Zmud et al. (eds) Transport Survey Methods: Best Practice for Decision Making, Emerald, 2013.

Urry, J. (2000). Sociology Beyond Societies: Mobilities for the Twenty-first Century. London: Routledge. Wang, D., Pedreschi, D., Song, C., Giannotti, F., Barabási, A.L. (2011). Human mobility, social ties, and link prediction. ACM SIGKDD 2011, San Diego, CA, August 2011. Wang, H., Calabrese, F., Di Lorenzo, G., Ratti, C. (2010). Transportation mode inference from anonymized and aggregated mobile phone Call Detail Record. 13th International IEEE Annual Conference on Intelligent Transportation Systems, Madeira Island, September 2010. Wolf, J., Guensler, R., Bachman, W. (2001). Elimination of the travel diary - experiment to derive trip purpose from Global Positioning System travel data. Transportation Research Record, 1768, 125–134.

20