Calibrated Data Simplification for Energy-efficient ... - IEEE Xplore

0 downloads 0 Views 1MB Size Report
Abstract—The Internet of Things (IoT) has gradually changed the way of people's lives due to its ability of connecting everything together, and meanwhile the ...
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

1

Calibrated Data Simplification for Energy-efficient Location Sensing in Internet of Things Mu Zhou, Senior Member, IEEE, Yanmeng Wang, Zengshan Tian, Yinghui Lian, Yong Wang, and Bang Wang

Abstract—The Internet of Things (IoT) has gradually changed the way of people’s lives due to its ability of connecting everything together, and meanwhile the accurate location sensing plays a crucial role in achieving this goal. Up to now, as one of the most representative outdoor localization systems, the Global Positioning System (GPS) has been widely-used, but its performance may be dramatically declined in indoor environment due to the serious multipath effect and signal attenuation caused by the complicated indoor structure. At the same time, the location fingerprint based localization approach has become a popular one in indoor environment, and meanwhile the corresponding calibrated signal simplification in location database construction has been primarily considered due to its significant practical meaning in avoiding the blind signal sampling. In this paper, we propose to use an information-theoretic lens to construct the energy-efficient location fingerprint database for the localization in IoT. Interestingly, by analyzing the information loss in signal sampling, we analogize the database construction process into the information propagation process in a lossy channel, and then formulate the relations of sample capacity and localization error from an information-theoretic view. After that, by selecting an appropriate time interval to sample the independent and nonredundant signal, the minimum number of sampled signal under the given expected localization accuracy is determined. Finally, the extensive experimental results show that compared with the state-of-the-art approaches, the proposed one can effectively simplify the calibrated data for the energy-efficient location database construction in different wireless localization networks. Index Terms—Localization in IoT, data simplification, location fingerprint database, mutual information, analogical lossy channel.

I. I NTRODUCTION

T

HE Internet of Things (IoT) has recently advanced from an experimental technology to what will become the backbone of future customer value for different types of services. With the IoT, the people and things are able to connect at any time to any place with anything and anyone [1]. To achieve this goal, the accurate location sensing is indispensable. Up This work was supported in part by the National Natural Science Foundation of China (61771083 and 61704015), Program for Changjiang Scholars and Innovative Research Team in University (IRT1299), and Fundamental and Frontier Research Project of Chongqing (cstc2017jcyjAX0380). M. Zhou, Y. Wang, Z. Tian, Y. Lian, and Y. Wang are with the Chongqing Key Lab of Mobile Communications Technology, School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). B. Wang is with the School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China (email: [email protected]). Manuscript received XX XX, XXXX; revised XX XX, XXXX.

to present, the widely-used Global Positioning System (GPS) has changed the people’s habit of outdoor location navigation, while its performance may be dramatically declined in indoor environment due to the multipath effect and signal attenuation caused by the complicated indoor structure [2]. As one of the most representative indoor localization approaches, the location fingerprint based approach collects calibrated data to construct location fingerprint database, which is then used to conduct localization in a real-time manner [3]. However, the involved database construction of this approach is timeconsuming and labor-intensive, and thereby it cannot be widely spread especially in the large-scale environment [4]. To solve this problem, many recent studies focus on the calibrated data simplification to construct the energy-efficient location fingerprint database [5]. The location fingerprint database construction approach with the limited data calibration collects the calibrated data on a few Reference Points (RPs), and then constructs the relations between the signal and physical spaces based on semi-supervised learning [6]. However, this approach is sensitive to the accuracy and timeliness of calibrated data, and meanwhile it may gradually become unstable with the increase of number of uncalibrated data. At the same time, the calibration-free approach is able to accomplish location fingerprint database construction without data calibration, but its performance normally declines with the increase of environmental size [7]. By now, the studies on the calibrated data simplification for the energy-efficient location sensing in IoT mainly focus on reducing RP number, but seldom consider the sample capacity reduction at each RP. In addition, most of the existing studies on minimizing the sample capacity of calibrated data measure the similarity between the sampled signals corresponding to different sample capacity to reduce the redundancy of location fingerprints. In [8], the Kullback-Leibler Divergence (KLD) is used to calculate the similarity between the location fingerprints under different sample capacity, and then a proper similarity threshold is determined to minimize the sample capacity in Wireless Sensor Network (WSN). In [9], the Operating Characteristics (OC) function is adopted to derive out the minimum number of calibrated Received Signal Strength (RSS) data at each RP, which is then determined to not be smaller than 22 in the actual indoor environment. The authors in [10] conduct sample capacity minimization for the efficient location fingerprint database construction by the Execution Characteristic Function (ECF), and then observe that the sample capacity at each RP should be larger than 13 for achieving the effective indoor WLAN localization. However,

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

these approaches do not consider the sampling interval of calibrated data as well as the requirement of localization accuracy. Different from the approaches above, we propose to use an information-theoretic lens to construct the energy-efficient location fingerprint database for the localization in IoT. To summarize, the four main contributions of this paper are listed as follows. First of all, by analyzing the information loss in signal sampling for location fingerprint database construction, the database construction process is analogized as the information propagation process in lossy channel. Second, based on the constraint of mutual information in the analogical lossy channel on the entropy of locations in target environment, an information-theoretic lens is used to formulate the relations of sample capacity and localization error. Third, by selecting an appropriate time interval for sampling the independent and non-redundant signal, the minimum sample capacity corresponding to the given expected localization accuracy is derived out, which can help a lot in simplifying calibrated data for the energy-efficient location fingerprint database construction in IoT. Finally, the experiments conducted in both the RSS and Channel State Information (CSI) based localization networks are provided to validate the effectiveness of the proposed approach in the aspect of the calibrated data simplification for the energy-efficient location sensing. The rest of this paper is organized as follows. Section II surveys some related work on the location fingerprint database construction approaches for the localization in IoT. Section III introduces the proposed calibrated data simplification approach for the energy-efficient location sensing in detail, and the corresponding experimental results are shown in Section IV. Finally, we conclude this paper and provide the future work in Section V. II. R ELATED WORK According to the different calibration load, the existing studies on location fingerprint database construction for the localization in IoT can be divided into three main categories as follows. A. Database Construction with Fully-calibrated Data There are a large number of studies focusing on location fingerprint database construction with fully-calibrated data. Based on the fact that the increasing sample capacity normally reduces the variance of sampled signals, the authors in [11] propose an adaptive variance based RSS model to estimate the distance between the target and pre-calibrated RPs. The authors in [12] rely on calibrated data to construct location fingerprint database, and then use the physical constraints with respect to the distance between different users to improve the real-time capacity of the system. By using the Gaussian process regression model to estimate the mean and variance of sampled signals, the approach in [13] achieves the 10% improvement of localization accuracy compared with the Bayesian based localization approach in [14]. The authors in [15] adopt the linear programming model to depict the random property of RSS data, which improves about 10%

2

localization accuracy from the one by the KNN [16]. A gradient location fingerprint based approach is studied in [17] to achieve target localization and tracking. However, the fullycalibrated location fingerprint database construction approach is time-consuming and labor-intensive especially in large-scale environment [18-20]. B. Database Construction with Limited-calibrated Data Combining the sparsely-collected location fingerprints with crowdsourced signals, the authors in [21] construct a costefficient location fingerprint database by conducting semisupervised learning to reduce the location fingerprints calibration effort. The authors in [22] use the manifold alignment approach to construct the mapping relations between physical locations and RSS data in manifold space. Based on this, the geometric location disturbance based manifold alignment approach is proposed in [23], which requires about only 1% of the conventional calibration effort for the location fingerprint database construction. In [24], by constructing the localization objective function with respect to the RSS data and physical coordinates, an improved manifold alignment approach is adopted to derive out the closed-form solution to target localization. The authors in [25] rely on the overlapping relations of adjacent RPs to embed local patches into location fingerprint database. However, these approaches are normally much sensitive to the accuracy and timeliness of calibrated data, which deteriorates the robustness and stability of the localization in IoT [26-28]. C. Database Construction with Calibration-free Data Depending on the transfer relations of both the sampled signals and physical areas, the authors in [29] use the skeleton mapping approach to construct the mapping relations between the signal and physical spaces with no data calibration. Without using motion sensors, the authors in [30] rely on the page rank approach to calculate the appearance frequency of each RSS cluster and physical area, and then construct the hotspot mapping from the signal to physical spaces to conduct target localization and motion analysis. Through investigating the motion behavior of users in target environment, the authors in [31] map the popular RSS clusters into frequently-visited physical areas for the localization. By the fact that the signal fluctuation varies with the change of motion behavior of users, the authors in [32] develop a new human-object interaction based device-free localization approach. Based on a hybrid global-local optimization model, the authors in [33] propose to use the unsupervised learning approach to construct a database for the localization with unlabeled location fingerprints. However, these approaches just achieve the area-level localization accuracy, which may even decrease as the environmental size increases [34-36]. Different from the approaches above, the proposed approach uses an information-theoretic lens to simplify calibrated data to construct the energy-efficient location fingerprint database for the localization in IoT. By modeling the database construction process as the information propagation process in lossy channel, the relations between the sample capacity and

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal 3

localization errors are formulated. After that, by calculating the mutual information of the analogical lossy channel, the minimum sample capacity corresponding to the given expected localization accuracy is derived out.

Continuous signal

xj(wi)

JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

t

A. Analogy of Location Fingerprint Database Construction An example of the localization in IoT is shown in Fig. 1, in which we assume that the receiver at each Sampling Point (SP) wi (i = 1, · · · , n) collects the continuous signal X(wi ) = [x1 (wi ), · · · ,xk (wi )] from k Access Points (APs), where X(wi ) is the signal in time domain such as RSS data, xj (wi )(j = 1, · · · , k) is the continuous signal from the j-th AP at wi , and k and n are AP and SP number respectively.

Sampled Signal

yj(wi)

III. S YSTEM D ESCRIPTION

123

···

Ni

Timestamps

1

2

____ yj(wi)

Signal statistic characteristics

··· Discrete distribution

Mean

···

AP2

AP1

Fig. 1.

SPi

Location of SPi : wi Signal at SPi : Xi = [X1(wi), ···,Xk(wi)]

APk

An example of the localization in IoT.

As shown in Fig. 2, during the process of location fingerprint database construction, a sequence of discrete signals from each AP are sampled at wi (i = 1, · · · , n), notated as Y(wi ) = [y1 (wi ), · · · ,yk (wi )], where yj (wi ) = [yj1 (wi ), · · · , yjNi (wi )](j = 1, · · · , k) is the sampled signal from the j-th AP at wi , yjl (wi ) is the l-th RSS data in yj (wi ), and Ni is the length of yj (wi ) (or called sample capacity). Then, the signal statistic characteristics such as the mean and distributions (including discrete and continuous distributions) of sampled signal at each SP are obtained. In this case, along with the location of wi , any one or more types of signal statistic characteristics can be selected to construct location fingerprint database for the localization in IoT. In the actual environment, since the signal propagation (including signal blocking and attenuation) between different APs and SPs are different, each SP can be ideally identified by the corresponding signal statistic characteristics. However, by the fact that the signal collection time is finite, there should be an information loss from the continuous signal X(wi ) to sampled one Y(wi ), which means Y(wi ) cannot fully depict the statistic characteristics of X(wi ). In this case, the statistic characteristics of sampled signal at wi cannot be accurately differentiated from the ones at the locations around wi within a certain region Awi . Thus, the target at wi may not be located precisely when the area of Awi is relatively large, which indicates that the localization performance significantly depends on the area of Awi . For simplicity, we assume that the size of Awi is 4d2 (= 2d × 2d), where d is the half length of side of Awi , and the wi is located at the geometrical center of Awi . By dividing target environment into M (= S/4d2 ) regions, where S is the area of target environment, the process of location fingerprint database construction can be analogized into the information propagation in lossy channel, as shown in Fig.

Location fingerprint database

Fig. 2.

3

Location of wi

+

Continuous Others distribution

One or more types of signal statistic characteristics at wi

Process of location fingerprint database construction.

3. To illustrate this process clearer, we assume the target to be actually located at or closely around wi , and its belonging region Awi is encoded by the corresponding continues signal from the APs, X(wi ), which is regarded as the input code sequence. By considering the information loss from the X(wi ) to Y(wi ), the signal collection process is equivalent to the information propagation process in lossy channel. Finally, the sampled signal Y(wi ), which is regarded as the output code sequence, is used to construct location fingerprint database, and consequently determine the region which the target mostlikely belongs to (or called estimated belonging region). Database construction process Actual Continuous belonging Signal signal Signal region propagation sampling conditions

A wi

X(wi)

Sampled signal

Estimated belonging region Localization

Aˆ wi

Y(wi)

Process analogy

Information propagation process Actual message

Fig. 3.

Encoder

Input code sequence

Lossy channel

Output code sequence

Decoder

Estimate of message

Information-theoretic view of localization database construction.

Due to the information loss in propagation channel, different input code sequences may correspond to similar output ones, which results in the belonging region mismatching. By taking the target in region Awi as an example, the probability of its belonging region mismatching is represented as Pe = P {Aˆwi (6= Awi )}, where P {Aˆwi (6= Awi )} is the probability of locating the target into its unbelonging region Aˆwi (6= Awi ). According to the previous study in [14], the increase of region area generally decreases the similarity of signal distributions

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

4

at nearby locations. Therefore, on one hand, with a certain information loss from the X(wi ) to Y(wi ), the region area should be large enough to satisfy the relations of Pe ≈ 0 (or called the distributions of Y(wi ) in any two nearby regions are almost not overlapped). On the other hand, to minimize localization error (defined as the value d), the dissimilarity between the Y(wi ) at different locations can be enlarged by increasing the value Ni . B. Relations of Sample Capacity and Localization Errors To obtain the lowest-bound of sample capacity with the given expected localization accuracy, the relations between the sample capacity and localization errors are required to be formulated. By taking the target in Awi as an example, we use the concept of entropy to measure the uncertainty of each region PM as H(Awi ) = − j=1 P (Awj )log2 P (Awj ), where P (Awj ) is the prior probability of the target located in Awj . If the prior location information with respect to the target is unknown, the possible locations of the target are recognized to be uniformly distributed in target environment, which indicates that the value P (Awj ) is a constant for each region, and then the value 1 ) = log2 ( 4dS2 ). H(Awi ) is obtained as H(Awi ) = −log2 ( M According to the previous study in [37], the mutual information between the actual and estimated belonging regions, i.e., Awi and Aˆwi , equals to I(Awi ; Aˆwi ) = H(Awi )−H(Awi |Aˆwi ), where H(Awi |Aˆwi ) is the conditional entropy of Awi with the given Aˆwi . The relations of 0 ≤ P (Awi |Aˆwi ) ≤ 1 indicate H(Awi |Aˆwi ) ≥ 0, and then H(Awi |Aˆwi ) ≥ I(Awi ; Aˆwi ) . Specially, when the target is precisely located into its belonging region, i.e., Aˆwi = Awi , we have H(Awi |Aˆwi ) = 0 and then I(Awi ; Aˆwi ) = H(Awi ). Since the localization process shown in Fig. 3 can be described as a Markov chain Awi ↔ X(wi ) ↔ Y(wi ) ↔ Aˆwi , the value H(Awi ) under the condition of no belonging region mismatching condition, i.e., Pe = 0, can be bounded by H(Awi ) = I(Awi ; Aˆwi ) ≤ I(X(wi ); Y(wi )), where I(X(wi ); Y(wi )) is the mutual information between the X(wi ) and Y(wi ). Then, we can obtain I(X(wi ); Y(wi )) ≥ log2 ( 4dS2 ), from which we can find that if the Y(wi ) is used to estimate the belonging region of the target, the smaller value d corresponds to higher correlation between X(wi ) and Y(wi ), i.e., larger value I(X(wi ); Y(wi )). Since I(X(wi ); Y(wi )) = H(X(wi )) − H(X(wi )|Y(wi )), where H(X(wi )|Y(wi )) is the conditional entropy of X(wi )) with the given Y(wi )), the mutual information I(X(wi ); Y(wi )) can be regarded as the reduction of uncertainty of X(wi ) with the given Y(wi ). In this case, the Y(wi ) corresponding to larger sample capacity can be used to depict the statistic characteristics of X(wi ) better, which results in the decrease of uncertainty of X(wi ). Based on the analysis above, the sample capacity is recognized to be positively correlated with the value I(X(wi ); Y(wi )). Finally, we construct the relations of the minimum sample ˆi , and expected localization error in target capacity at wi , N environment, dexp , as ˆi = arg min I(X(wi ); Y(wi )) − log2 ( S2 ) N 4d exp

Ni

s.t. I(X(wi ); Y(wi )) − log2 ( 4dS2 ) ≥ 0 exp

(1)

where dexp is determined by the actual demand of localization accuracy. For instance, the area-level localization corresponds to higher dexp , while the point-level one requires lower dexp . C. Mutual Information in Analogical Lossy Channel To calculate I(X(wi ); Y(wi ))we regard the channel used for the code sequence propagation (see Fig. 3) as a lossy channel in Fig. 4, in which each sub-channel corresponds to a AP and the continuous and sampled signal from different APs, i.e., xj (wi ) and yj (wi ), are independent, and then we calculate I(X(wi ); Y(wi )) = H(X(wi )) + H(Y(wi )) − H(X(wi ), Y(wi )) Pk Pk = j=1 H(xj (wi ))− j=1 H(xj (wi )|yj (wi )) Pk = j=1 H(xj (wi )) + H(yj (wi ))−H(xj (wi ), yj (wi ))

(2)

where H(xj (wi )) and H(yj (wi )) represent the entropys of xj (wi ) and yj (wi ) respectively, and H(xj (wi ), yj (wi )) is the joint entropy of xj (wi ) and yj (wi ). AP1

APj

APk Fig. 4.

x1(wi) .. .

Sub-channel 1

xj(wi) .. .

Sub-channel j

.. .

yj(wi) .. .

xk(wi)

Sub-channel k

yk(wi)

y1(wi) .. .

.. .

Analogical lossy channel model.

By considering that the value H(xj (wi )) measures the uncertainty of continuous signal from APj at wi , the increase of value H(xj (wi )) results in the increase of values Ni and I(X(wi ); Y(wi )), and consequently the increase of the correlation between the continuous and sampled signal. Due to the fact that normal distribution maximizes the entropy with the given data variance [38], we assume that the xj (wi ) follows normal distribution to guarantee that the yj (wi ) can fully depict the statistic characteristics of xj (wi ). In addition, since the yj (wi ) gradually tends to xj (wi ) with the increase of sample capacity, we assume that the yj (wi ) also follows normal distribution. Therefore, the xj (wi ) and yj (wi ) can be recognized to obey the two-dimensional normal distribution with the joint Probability Density Function (PDF) as follows. f (xj (wi ), yj (wi )) =

1 2π|Kij |

1

1 2

−1

e− 2 (zij −µij )Kij

(zij −µij )T

(3) where zij = [xj (wi ), yj (wi )], µij = [µXij , µYij ] , µXij and µYij are the means of xj (wi ) and yj (wi ) respectively, and Kij is the covariance matrix shown in (4)   2 σX ρij σXij σYij ij Kij = (4) ρij σYij σXij σY2 ij 2 where σX and σY2 ij are the variance of xj (wi ) and yj (wi ) ij respectively and ρij (0 ≤ ρij ≤ 1) is the correlation coefficient with respect to xj (wi ) and yj (wi ). Then, the joint entropy

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

H(xj (wi ), yj (wi )) is calculated by R H(xj (wi ), yhj (wi )) = − f (zij )log2 f (zij )dzij R T = f (zij ) log22 e (zij − µij )K−1 ij (zij − µij ) i 1 + log2 (2π|Kij | 2 ) dzij = 21 log2 e + 12 log2 ((2π)2 |Kij |) = 12 log2 ((2πe)2 |Kij |) 2 σY2 ij (1 − ρ2ij )) = 12 log2 ((2πe)2 σXij

5

is calculated by time delay λtorigin i NP i −λ λ γij =

(5)

2 Similarly, we calculate H(xj (wi ) = 21 log2 (2πeσX ) ij 1 2 log (2πeσ ), and then obtain and H(yj (wi ) = 2 Yij Pk 2 I(X(wi ); Y(wi )) = j=1 − 12 log2 (1 − ρ2ij ). As shown in Fig. 5, on one hand, if the sample capacity at wi , Ni , equals to 0, the xj (wi ) and yj (wi ) are independent and the corresponding correlation coefficient is ρij = 0. On the other hand, the distribution of yj (wi ) gradually tends to the one of xj (wi ) with the increase of value Ni , and consequently ρij converges to 1 under the relations of Ni → ∞. Therefore, the Ni is recognized as a positive correlated function of ρij .

PDF of xj(wi)

PDF of yj(wi)

  yjl (wi ) − E[yj (wi )] yj(l+λ) (wi ) − E[yj (wi )]

l=1 Ni P

2 yjl (wi ) − E[yj (wi )]

l=1

(6) where the notation “E[•]” represents the mean operation, λ (= 0, · · · , Ni − 1) is the number of lags, yjl (wi ) is the l-th RSS data in yj (wi ), and Ni is the length of yj (wi ). Specially, λ we have γij = 1 under the condition of λ = 0. According to λ the fact that the value γij falling into the range between 0.5 and 1 represents the significant correlation [39], the maximum ˆ i + 1)torigin to sampling interval at wi is defined as tˆi = (λ i ˆi = guaranteePthe independency of sampled signal, where λ P k k λ λ j=1 γij j=1 γij subjects to ≥ 0.5. arg max k k λ

Since the statistic characteristics of sampled signal vary with the location change, which results in different maximum sampling intervals at different locations, we define the minimum value tˆi as the maximum sampling interval corresponding to target environment, such that tˆmax = min tˆi .

Ni = 0, ρij = 0

E. Derivation of Minimum Sample Capacity

Increasing the value Ni Ni →∞, ρij = 1

Fig. 5.

Relations of sample capacity and correlation coefficient.

D. Determination of Sampling Interval Since the signal fluctuation (including small-scale and largescale fading) varies with the environmental change, the sampling interval is required to be adjusted according to signal variation for the sake of improving the effectiveness and efficiency of the localization in IoT. The high sampling frequency may lead to the data redundancy during the process of location fingerprint database construction, while the low one may result that the sampled signal Y(wi ) cannot well depict the statistic characteristics of X(wi ). Thus, it is important to select an appropriate time interval for sampling the independent and non-redundant signal at wi . To achieve this goal, we begin with collecting the signal from APj (j = 1, · · · , k) at wi , yj (wi ), with a relatively small time interval torigin , and then based i on the assumption that the yj (wi ) follows normal distribution, the autocorrelation coefficient with respect to yj (wi ) with the

Given expected localization error dexp and sampling interval tˆmax , we calculate the minimum sample capacity at wi as ˆi = arg min Pk − 1 log2 (1 − ρ2 ) − log2 ( S2 ) N ij j=1 2 4dexp Ni (7) Pk 1 S 2 s.t. j=1 − 2 log2 (1 − ρij ) − log2 ( 4d2 ) ≥ 0 exp

From (7), we can find that the lower localization error d corresponds to the larger correlation coefficient ρij and ˆi . In addition, if higher lowest-bound of sample capacity N the expected localization error is√not smaller than the size of target environment, i.e., dexp ≥ S/2, the right term in (7) is ˆi = 0. In this case, non-negative, and consequently we have N the area-level localization accuracy can be achieved without any prior information of xj (wi ) [40]. Considering that the signal distribution varies with the location change, which results in different minimum sample capacity with the given sampling interval, we define the maxiˆi as the minimum sample capacity corresponding mum value N ˆmin = max N ˆi . to target environment, such that N Finally, it is noteworthy that after setting the expected localization error as dexp , we cannot always guarantee Pe = 0 for each belonging region when the sample capacity at each ˆmin , but it can be guaranteed that SP, Ni , is not smaller than N ˆmin . Pe cannot equal to 0 when the value Ni is smaller than N IV. E XPERIMENTAL RESULTS To validate the analytical results above, we conduct experiments in both the RSS and CSI based localization networks. A. RSS based Localization Network An actual indoor environment consisting of one lab and two straight corridors is shown in Fig. 6, in which the target can receive the continuous RSS data from 5 APs. There are 5

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

6

SPs which are randomly selected in target environment for the sampling interval determination and minimum sample capacity derivation. In our experiment, 100 RSS data at each SP, wi with the sampling interval torigin = 1 s are used to simulate i the distribution of X(wi ), and meanwhile the RSS data at wi with different sample capacity Ni (= 0, · · · , 100) and time (λ = 0, · · · , 100), are recognized as the sampled delay λtorigin i signals, Y(wi ). 5.11m

the distributions of Y(wi ) and X(wi ). In addition, the variation of correlation coefficient between X(wi ) and Y(wi ) with the sampling interval torigin (= 1s) is similar to the one with the i sampling interval 2torigin (= 2s), whereas it is much different i (for AP4 and from the ones with the sampling intervals 3torigin i AP5 ) and 4torigin (for AP , AP , and AP ). Based on this, we 1 2 3 i can find that as the sampling interval increases from 1s to 2s, the sampled signal used for constructing location fingerprint database can still well depict the statistic characteristics of X(wi ) with low data calibration cost.

4

Sampling interval = 1s Sampling interval = 2s For AP1

8.08m

4 13.26m

2

5

Sampling interval = 3s Sampling interval = 4s For AP2 For AP3

7.80m

3

1 2

3

3.04m

5

56.63m AP

Fig. 6.

RP/TP

Toilets

SP

Staircase

Lift

Environmental layout with RSS based localization network.

As shown in Fig. 7, the RSS data are with high autocorreˆi lation when the number of lags is small. Since the value λ equals to 1 for all the SPs, and then we obtain that the value tˆmax corresponding to target environment is 2torigin (= 2 s). i AP1

AP2

AP3

AP4

AP5

Correlation coefficient

1

Mean At SP1

Autocorrelation coefficient

0

0

0

0

Fig. 7.

λ^1 = 1

20

λ^2 = 1

20

λ^3 = 1

20

λ^4 = 1

20

λ^5 = 1

20

40

60

80

99

At SP2

40

60

80

99

At SP3

40

60

80

99

At SP4

40

60

80

99

At SP5

40

60

80

99

Number of lags

Autocorrelation coefficient of RSS data vs. number of lags.

To demonstrate the effectiveness of the selected sampling interval, we further compare the correlation coefficient between X(wi ) and Y(wi ) under different sampling intervals in Fig. 8. As can be seen from this figure, the correlation between X(wi ) and Y(wi ) generally rises and tends to 1 with the increase of signal collection time, which indicates that the increase of sample capacity results in the increase of similarity between

For AP5

Signal collection time (s)

Fig. 8. 0

For AP4

Correlation coefficient for RSS data.

In our experiment, under the sample interval 2s, the RSS data are sampled at each RP (with the total number of 73) with the sample capacity derived from the expected localization error in target environment. At the same time, the RSS data at Testing Points (TPs) are also collected to examine the actual localization error. For simplicity, we set the locations of RPs and TPs to be the same with the purpose of ignoring the information loss caused by the interval of data calibration, and meanwhile require the sampling interval and sample capacity in the on-line localization phase equal to the ones in the offline location fingerprint database construction phase with the purpose of ignoring the information loss caused by the signal data deficiency. In addition, to ignore the information loss caused by the specific localization approaches, the correlation coefficient is used to measure the similarity between the sampled signals at each TP and RP, and then the RP corresponding to the minimum sum of correlation coefficients is recognized as the estimated belonging location. Fig. 9 compares the sample capacity required by the proposed and other three state-of-the-art approaches, including the KLD [8], OC function [9], and ECF [10], under the given expected localization errors. In concrete terms, the proposed approach minimize sample capacity according to the expected localization accuracy in target environment, the approach in [8] depends on the KLD between the distributions of X(wi ) and Y(wi ) to infer the minimum sample capacity, and the approaches in [9] and [10] calculate the closed-form solutions

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

7

to the minimum sample capacity as 22 and 13 respectively. For the comparison, the correlation coefficient is adopted to measure the similarity between the sampled signal at each TP and the one at each RP, and then the K-Nearest Neighbor (KNN) algorithm [16] is performed to obtain the estimated location of each TP. Since the signal at each TP and the one at the RP with same location are collected at different time period, they may correspond to different distribution, which will cause the gap between the expected and actual localization errors. As shown in Fig. 9, under the selected minimum sample capacity by the proposed approach, the actual localization errors are much similar to the expected ones especially when AP number is over 2. Proposed

[8]

[9]

Actual localization error With 1 AP 0.2

[10]

16 8 50 0

50 0

16

5 With 2 APs 0.2

8

0.1

0

0

50 0

1,2

2

50 0

50 0

50 0

50 0

1,4

50 0

1,5

50 0

2,3

4

50 0

2,4

50 0

2,5

50 0

3,4

50 0

50 0

16

3,5 4,5 With 3 APs 0.2

8

0.1

0

1,3

3

AP

3

RP/TP

1

SP 7.80m

2 3

Fig. 10.

1

Environmental layout with CSI based localization network.

AP1

AP2

AP3

Mean

At SP1

1 0

16

50 0

50 0

50 0

50 0

50 0

50 0

50 0

50 0

50 0

50

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5 With 4 APs

0.2 0.1

8 0

0

0

16

50 0 50 0 50 0 50 0 50 1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5 With 5 APs

0

0.2

8

0.1

0

0

0

10

20

30

40

50

0.5

Autocorrelation coefficient

Expected localization error (m)

1

50 0

2

KLD

0

13.26m

As shown in Fig. 11, the CSI data are with high autocorrelation when the number of lags is small, and meanwhile the ˆ i = 0 for all the SPs. Based on this, we obtain that the value λ value tˆmax corresponding to target environment is torigin (= i 0.01 s). Fig. 12 compares the correlation coefficient between X(wi ) and Y(wi ) under different sampling intervals, from which we can find that the variation of correlation coefficient with the sampling interval larger than torigin is much different i from the one with the sampling interval torigin especially for i AP3 . Therefore, for the data consistency purpose, the sampling interval is set as 0.01 s in the results that follows.

0.1

0

the amplitudes of different subcarriers are distributed with different weights to jointly estimate the effective amplitude of the corresponding CSI data [41].

Sample capacity required by different approaches for RSS data.

λ^1 = 0

50

λ^2 = 0

50

λ^3 = 0

50

100

150

199

At SP2

0.5 0 -0.5

0

100

150

199

At SP3

1 0.5 0

Fig. 11.

0

100

150

199

Number of lags

Autocorrelation coefficient of CSI data vs. number of lags.

Sampling interval = 0.01s Sampling interval = 0.02s For AP1

Sampling interval = 0.03s Sampling interval = 0.04s For AP2 For AP3

Correlation coefficient

B. CSI based Localization Network Fig. 10 shows a typical lab environment, in which target can receive the continuous CSI data from 3 APs. There are 3 SPs which are randomly selected in target environment for determining sampling interval and deriving out the corresponding minimum sample capacity. By setting the sampling interval torigin = 0.01 s, 200 CSI data are sampled at each i SP to simulate the distribution of X(wi ). Here, each CSI data contains the amplitude information from 30 subcarriers, in which the center frequency is f0 = 5.75 × 109 Hz and carrier frequency of the q-th (q = 1, · · · , 30) subcarrier fq = 5.75 × 109 + (4 × q − 62) × 0.3125 × 106 Hz, and meanwhile

0

1

-0.5

1,2,3,4,5 Sample capacity / AP IDs

Fig. 9.

0 -0.5

Signal collection time (10ms)

Fig. 12.

Correlation coefficient for CSI data.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

8

Similar to the RSS based localization, under the given expected localization accuracy, the CSI data are sampled at each RP (with the total number of 30), and meanwhile the CSI data at the TPs with the same locations to RPs are also collected to examine the actual localization error in Fig. 13. As can be seen from this figure, under the selected minimum sample capacity, the actual localization errors gradually approach expected ones with the increase of AP number. Proposed

[8]

[9]

Actual localization error With 1 AP 1

[10]

7

0.5

0

0

100 1

200 0

100 2

200 0

7

100 200 3 With 2 APs

1 0.5

3.5 00

100 1,2

200 0

100 1,3

200 0

6

100 200 1,4 With 3 APs

3 0

0

KLD

Expected localization error (m)

3.5

0 1 0.5

0

Fig. 13.

50

100 150 1,2,3 Sample capacity / AP IDs

200

0

Sample capacity required by different approaches for CSI data.

In this paper, we propose to use an information-theoretic lens construct the energy-efficient location fingerprint database for the localization in IoT. By analogizing the location fingerprint database construction process as the information propagation process in lossy channel, the minimum sample capacity corresponding to the given expected localization accuracy is derived out. Compared with the existing state-of-the-art approaches, the proposed one can accurately estimate the minimum sample capacity under different expected localization accuracy and AP selection. Furthermore, fusing various types of signal together to design a more comprehensive approach of energy-efficient location fingerprint database construction for the localization in IoT forms an interesting work in future. V. C ONCLUSION In this paper, we propose to use an information-theoretic lens construct the energy-efficient location fingerprint database for the localization in IoT. By analogizing the location fingerprint database construction process as the information propagation process in a lossy channel, the minimum sample capacity corresponding to the given expected localization accuracy is derived out. Compared with the existing stateof-the-art approaches, the proposed approach can accurately estimate the minimum sample capacity under different expected localization accuracy and AP combination. Furthermore, fusing various types of signal into our approach to obtain a more comprehensive scheme of energy-efficient location fingerprint database construction for the localization in IoT forms an interesting work in future.

R EFERENCES [1] P. Porambage, J. Okwuibe, M. Liyanage, and et al, “Survey on multiaccess edge computing for internet of things realization,” IEEE Communications Surveys and Tutorials, vol. 99, pp. 1-1, Jun. 2018. [2] D. Lymberopoulos and J. Liu, “The Microsoft indoor localization competition,” IEEE Signal Processing Magazine, vol. 34, no. 5, pp. 125-140, Sep. 2017. [3] M. Zhou, Q. Pu, Z. Tian, and et al, “Location fingerprint discrimination maximization for indoor WLAN access point optimization using fast discrete water-filling,” in Proc. IEEE GLOBECOM, San Diego, CA, USA, 2015, pp. 1-6. [4] B. Wang, S. Zhou, L. Yang, and et al, “Indoor positioning via subarea fingerprinting and surface fitting with received signal strength,” Pervasive and Mobile Computing, vol. 23, pp. 23-48, Jun. 2015. [5] A. Khalajmehrabadi, N. Gatsis, and D. Akopian, “Modern WLAN fingerprinting indoor positioning methods and deployment challenges,” IEEE Communications Surveys and Tutorials, vol. 19, no. 3, pp. 1974-2002, Thirdquarter 2017. [6] Q. Zhang, M. Zhou, Z. Tian, and et al, “Indoor localization using semi supervised manifold alignment with dimension expansion,” Applied Sciences, vol. 6, no. 11, pp. 1974-2002, Nov. 2016. [7] C. Wu, Z. Yang, Y. Liu, and et al, “WILL: Wireless indoor localization without site survey,” in Proc. IEEE INFOCOM, Orlando, FL, USA, 2012, pp. 64-72. [8] L. Nga, M. Linh, and L. Thuong, “KLD-resampling with adjusted variance and gradient data-based particle filter applied to wireless sensor networks,” in Proc. National Foundation for Science and Technology Development Conference on Information and Computer Science, Ho Chi Minh City, Vietnam, 2015, pp. 229-234. [9] M. Zhou, Y. Wei, Z. Tian, and et al, “Achieving cost-efficient indoor fingerprint localization on WLAN platform: A hypothetical test approach,” IEEE Access, vol. 5, pp. 15865C15874, Aug. 2017. [10] M. Zhou, Y. Tang, Z. Tian, and et al, “Neighborhood graphing for semisupervised indoor localization with light-loaded location fingerprinting,” IEEE Internet of Things Journal, vol. 99, pp. 1-1, Nov. 2017. [11] X. Xu, Y. Tang, X. Wang, and et al, “Variance-based fingerprint distance adjustment algorithm for indoor localization,” Journal of Systems Engineering and Electronics, vol. 26, no.6, pp. 1191-1201, Dec. 2015. [12] L. Chen, K. Yang, and X. Wang, “Robust cooperative Wi-Fi fingerprintbased indoor localization,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 1406-1417, Sep. 2016. [13] S. Kumar, R. Hegde, and N. Trigoni, “Gaussian process regression for fingerprinting based localization,” Ad Hoc Networks, vol. 51, pp. 1-10, Jul. 2016. [14] M. Youssef and A. Agrawala, “The Horus WLAN location determination system,” in Proc. International Conference on Mobile Systems, Seattle, Washington, USA, 2005, pp. 205-218. [15] S. He and S. Chan, “Tilejunction: Mitigating signal noise for fingerprintbased indoor localization,” IEEE Transactions on Mobile Computing, vol. 15, no. 6, pp. 1554-1568, Jul. 2016. [16] P. Bahl and V. N. Padmanabhan, “RADAR: An in-building RF-based user location and tracking system,” in Proc. INFOCOM, Tel Aviv, Israel, 2000, pp. 775-784. [17] Y. Shu, Y Huang, and Shin K, “Gradient-based fingerprinting for indoor localization and tracking,” IEEE Transactions on Industrial Electronics, vol. 63, no. 4, pp. 2424-2433, Dec. 2016. [18] Y. Ye and B. Wang, “RMapCS: Radio map construction from crowdsourced samples for indoor localization,” IEEE Access, vol. 6, pp. 2422424238, Apr. 2018. [19] F. Zhao, L. Wei, and H. Chen, “Optimal time allocation for wireless information and power transfer in wireless powered communication systems,” IEEE Transactions on Vehicular Technology, vol. 65, no. 3, pp. 1830-1835, Mar. 2016. [20] Z. Zhang, P. Zhang, D. Liu, and et al, “SRSM-based adaptive relay selection for D2D communications,” IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2323-2332, Aug. 2018. [21] M. Zhou, Y. Tang, Z. Tian, and et al, “Semi-supervised learning for indoor hybrid fingerprint database calibration with low effort,” IEEE Access, vol. 5, pp. 4388-4400, Mar. 2017. [22] S. Sorour, Y. Lostanlen, S. Valaee, and et al, “Joint indoor localization and radio map construction with limited deployment load,” IEEE Transactions on Mobile Computing, vol. 14, no. 5, pp. 1031-1043, Jul. 2015. [23] K. Majeed, S. Sorour, T. Al-Naffouri, and et al, “Indoor localization and radio map estimation using unsupervised manifold alignment with geometry perturbation,” IEEE Transactions on Mobile Computing, vol. 15, no. 11, pp. 2794-2808, Nov. 2016.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2018.2869671, IEEE Internet of Things Journal JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX XXXX

9

[24] M. Zhou, Q. Zhang, Z. Tian, and et al, “Indoor WLAN localization using high-dimensional manifold alignment with limited calibration load,” in Proc. IEEE ICC, Paris, France, 2017, pp. 1-6. [25] Y. Liu, J. Chen, and Y. Zhan, “Local patches alignment embedding based localization for wireless sensor networks,” Wireless Personal Communications, vol. 70, no. 1, pp. 373-389, May 2013. [26] B. Wang, S. Zhou, W. Liu, and et al, “Indoor localization based on curve fitting and location search using received signal strength,” IEEE Transactions on Industrial Electronics, vol. 62, no. 1, pp. 572-582, Jan. 2015. [27] F. Zhao, W. Wang, H. Chen, and et al, “Interference alignment and gametheoretic power allocation in MIMO Heterogeneous Sensor Networks communications,” Signal Processing, vol. 126, pp. 173-179, Sep. 2016. [28] F. Zhao, B. Li, H. Chen, and et al, “Joint beamforming and power allocation for cognitive MIMO systems under imperfect CSI based on game theory,” Wireless Personal Communications, vol. 73, no. 3, pp. 679694, Nov. 2013. [29] C. Wu, Z. Yang, Y. Liu, and et al, “WILL: Wireless indoor localization without site survey,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 4, pp. 839-848, Apr. 2013. [30] M. Zhou, Y. Wang, Z. Tian, and et al, “Indoor pedestrian motion detection via spatial clustering and mapping,” IEEE Sensors Letters, vol. 2, no.1, Article Number 7500204, Mar. 2018. [31] M. Zhou, Q. Zhang, Y. Wang, and et al, “Hotspot ranking based indoor mapping and mobility analysis using crowdsourced Wi-Fi signal,” IEEE Access, vol. 5, pp. 3594-3602, Mar. 2017. [32] W. Ruan, Q. Sheng, L. Yao, and et al, “Device-free indoor localization and tracking through human-object interactions,” IEEE International Symposium on A World of Wireless, Mobile and Multimedia Networks, Coimbra, Portugal, 2016, pp. 1-9. [33] S. Jung, B. Moon, and D. Han, “Unsupervised learning for crowdsourced indoor localization in wireless networks,” IEEE Transactions on Mobile Computing, vol. 15, no. 11, pp. 2892-2906, Nov. 2016. [34] B. Wang, Q. Chen, L. Yang, and et al, “Indoor smartphone localization via fingerprint crowdsourcing: Challenges and approaches,” IEEE Wireless Communications, vol. 23, no. 3, pp. 82-89, Jun. 2016. [35] Z. Zhang,T. Zeng, X. Yu, and et al, “Social-aware D2D pairing for cooperative video transmission using matching theory,” Mobile Networks and Applications, vol. 23, no. 3, pp. 639C649, Jun. 2018. [36] F. Zhao, H. Nie, and H. Chen, “Group buying spectrum auction algorithm for fractional frequency reuses cognitive cellular systems,” Ad Hoc Networks, vol. 58, pp. 239-246, Mar. 2016. [37] I. Bilik, K. Adhikari, and J. R. Buck, “Shannon capacity bound on mobile station localization accuracy in urban environments,” IEEE Transactions on Signal Processing, vol. 59, no. 12, pp. 6206-6216, Dec. 2011. [38] T. M. Cover and J. A. Thomas, Element of information theory, 2rd ed. Hoboken, NJ, USA: Wiley, 2006, pp. 254-255. [39] Junping Jia, Statistics, 4th ed. Beijing, China: China Renmin University Press, 2011, pp. 167-168. [40] M. Zhou, Y. Wang, W. Tan, and et al, “SCOPE: Sample Capacity Optimization for Positioning Database Establishment in Indoor Wi-Fi Environment,” in Proc. IEEE PIMRC, Bologna, Italy, 2018, pp. 1-5. [41] N. Patwari, A. O. Hero, M. Perkins, and et al, “Relative location estimation in wireless sensor networks,” IEEE Transactions on Signal Processing, vol. 51, no. 8, pp. 2137-2148, Apr. 2003.

Yanmeng Wang received the B.S. degree in telecommunication engineering from the Chongqing University of Posts and Telecommunications, China, in 2016, where she is currently pursuing the M.S. degree. Her current research interests include error bound estimation for indoor WLAN localization and crowd-sourced motion sensing in the wireless environment.

Mu Zhou (SM’17) received the Ph.D. degree from the Harbin Institute of Technology, China, in 2012. He was a joint-cultivated Ph.D. student at the University of Pittsburgh, USA and a Post-doctoral Research Fellow at the Hong Kong University of Science and Technology, China. He is currently a Full Professor with the Chongqing University of Posts and Telecommunications. His current research interests include wireless localization and navigation, signal reconnaissance and detection, and convex optimization and deep learning.

Bang Wang received the B.S. and M.S. degrees from the Department of Electronics and Information Engineering, Huazhong University of Science and Technology (HUST), Wuhan, China, in 1996 and 2000, respectively, and the Ph.D. degree from the Department of Electrical and Computer Engineering, National University of Singapore, in 2004. He is currently a Professor with the School of Electronic Information and Communications, HUST. He has authored or coauthored over 100 technical papers in international conferences and journals. His research interests include wireless networking issues, indoor localization systems, and social computing technologies.

Zengshan Tian received the Ph.D. degree from the University of Electronic Science and Technology of China, in 2002. He is currently a Full Professor with the Chongqing University of Posts and Telecommunications. His current research interests include personal communication, precise localization and attitude measure, and data fusion.

Yinghui Lian received the B.S. degree in telecommunication engineering from the Chongqing University of Posts and Telecommunications, China, in 2017, where she is currently pursuing the M.S. degree. Her current research interests include channel state information detection and single access point based indoor WLAN localization.

Yong Wang received the Ph.D. degree from the Harbin Institute of Technology, China, in 2018. He was a visiting Ph.D. student at the University of Toronto, Canada. He is currently a Lecturer with the Chongqing University of Posts and Telecommunications, China. His research interests include resource allocation and signal processing in cooperative networks, deep learning, and Wi-Fi indoor localization.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.