Maximize Secondary User Throughput via Optimal Sensing in ... - NTU

2 downloads 0 Views 161KB Size Report
Abstract—In a cognitive radio network, the full-spectrum is usually divided into ... secondary user. Index Terms—Cognitive radio, finite state Markov channel, ..... when observation matches the real state (i.e., k=j), where Ts is the length of a ..... [1] M. Mchenry, “Spectrum white space measurements,” New America. Foundation ...
Maximize Secondary User Throughput via Optimal Sensing in Multi-channel Cognitive Radio Networks ∗ Centre

Shimin Gong∗ , Ping Wang∗ , Wei Liu† and Wei Yuan†

for Multimedia and Network Technology (CeMNet), Nanyang Technological University, Singapore † Huazhong University of Science and Technology, China ∗ Email: [email protected], ∗ [email protected], † [email protected], † [email protected]

Abstract—In a cognitive radio network, the full-spectrum is usually divided into multiple channels. However, due to the hardware and energy constraints, a cognitive user (also called secondary user) may not be able to sense two or more channels simultaneously. As different channels may have different primary user activities and time-varying channel qualities, an important task is to select which channels to sense and access for a given time period so that the available spectrum left by the primary users can be fully utilized by the secondary user. In this paper, we propose an optimal sensing channel selection policy based on partially observable Markov decision process (POMDP). The proposed policy takes the time-varying channel state into consideration and intends to optimally exploit spectrum resources for the secondary user. In addition to selecting optimal channel to sense, we also derive the optimal sensing time which leads to maximized throughput of the secondary user. Index Terms—Cognitive radio, finite state Markov channel, partially observable Markov decision process (POMDP).

I. I In recent years, with the advances of wireless technologies and the rapid growth of wireless services, there has been a dramatic increase in the demand for radio spectrum. As most of the spectrum has been assigned to license holders for exclusive use, the spectrum has been a scarce resource. However, existing spectrum measurement reports [1], [2] state that the actual usage of licensed spectrum is highly inefficient at some locations and time periods. And the workload in different spectrum bands is rather diverse. Thus, cognitive radio and opportunistic spectrum access (OSA) [3], [4] were proposed to deal with the spectrum under-utilization by letting secondary users access licensed bands without interference to licensed users (also called primary users). Secondary users thus are required to identify the spectrum holes that are temporarily unoccupied by primary users. In this research, we consider a scenario in which the secondary user can sense one channel at a time. When there are multiple channels in the system, an important task is to select which channels to sense and access for a given time period so that the spectrum opportunities can be fully utilized by the secondary user. Some research efforts have been devoted to this research issue. In [5], opportunistic sensing capacity is defined to describe the expected transmission capacity of spectrum band that the secondary users can achieve. And the spectrum selection problem is formulated as binary integer programming to maximize the opportunistic sensing capacity, subject to the

number of available transceivers. In [6], partially observable Markov decision process (POMDP) framework is presented and the sensing policy is determined through dynamic programming. In [7], the sequential spectrum sensing decision problem is formulated as the optimal stopping problem [8] given the sensing order. The authors of [9] intend to find the optimal sensing order and prove that the intuitive sensing order may no longer be optimal when adaptive modulation is enabled. Some other works focus on the channel modeling and spectrum usage prediction so that the expected reward in next channel access can be optimized. In many previous works, e.g., [6], channel is considered stationary with constant rate. Actually, it is fluctuating all the time due to secondary user mobility and channel fading, etc. Therefore, secondary users not only need to identify primary user activities, but also need to identify the channel condition so that an idle channel with best channel condition can be selected for transmission. In this paper, channel condition is considered as an important criterion in selecting sensing channels. We employ finite state Markov channel (FSMC) [10] to model the channel condition transitions and formulate the sensing channel selection problem in POMDP framework, which is different from the work in [11] with complete state information. Furthermore, the optimal sensing time is also considered to maximize the secondary user throughput. The optimal policy in each decision stage includes the sensing channel index and optimal sensing time. The rest of the paper is organized as follows. Section II describes the primary activity model and finite state Markov channel model. In Section III, the secondary user throughput maximization problem is formulated based on POMDP. The numerical results regarding the system performance are presented in Section IV, and finally Section V concludes the paper. II. S M Consider a secondary user and a number of licensed channels. The secondary user can sense one channel at a time. We assume the channel access of primary users follows a Markov model, as illustrated in Fig. 1a. r0 and r1 denote the channel busy (due to the primary user activity) and idle state, respectively. Besides the channel state transition caused by primary user activity, the secondary user also observes

α

1− α

r

0

β

r

(busy)

1− β

1

(idle)

Fig. 1a: Markov model of primary activity. z11

z00

z01

r0

z22

z12

r2

r1 z21

z10

Fig. 1b: Finite state Markov channel with K = 3 states.

r00

r01

r10

r11

r20

r21

Fig. 2: Two-dimensional Markov channel state. fluctuated channel condition due to fading effect. This channel characteristic can be modeled as finite state Markov channel (FSMC) [10]. Fig. 1b illustrates a 3-state Markov channel as an example. Let {r0 , r1 , ..., rK−1 } denote the finite set of channel states where K is the number of states. With the maximum of K channel states, the signal-to-noise ratio (S NR) at the receiver, denoted by γ, can be partitioned into K nonoverlapping intervals by thresholds Γk (k ∈ {0, 1, . . . , K − 1}), where Γ0 < Γ1 < . . . < ΓK−1 . The channel is said to be in state k if Γk ≤ γ < Γk+1 . For each state, there is a corresponding transmission rate depending on the bandwidth and modulation. The higher γ, the higher the transmission rate. We assume the system is time slotted. The channel state remains unchanged within a time slot, and state transition (either due to primary user activity or due to channel fading) merely happens at the beginning of each slot. Since the state transitions in Fig. 1a and Fig. 1b are independent, when taking both the primary user activity and fading effect into consideration, the channel state can be modeled as a two-dimensional Markov chain as shown in Fig. 2. Assume the secondary user will exploit the spectrum resource in N channels. The set of channel states for each channel n (n = 1, 2..., N) is denoted as S n = {r00 , r01 , r10 , r11 , ..., rK0 n −1 , rK1 n −1 }, where the total number of states of channel n is 2Kn . For presentation simplicity, we let K1 = K2 =, ..., = KN = K. rij corresponds to the primary state r j (in Fig. 1a) and channel state ri (in Fig. 1b). Considering the 6-state model in Fig. 2, transition matrix {Qi j } in (1) describes the state transition probabilities, where α, β and zi j (i, j = 0, 1, 2) are the

corresponding state transition probabilities shown in Figs. 1a and 1b, respectively. The system state at any time t is given QN by st ∈ S, where S = S 1 × S 2 · · · × S N . M = |S| = n=1 (2Kn ) denotes the total number of states for N channels. Our goal is to design a sensing channel selection policy that makes full use of the channel opportunities. In every decision epoch, secondary user is not only required to avoid collision with primary user, but also intends to seek which channel has best condition and supports highest data rate.      Q =    

z00 (1 − α) z00 (1 − β) z10 (1 − α) z10 (1 − β) 0 0

z00 α z00 β z10 α z10 β 0 0

z01 (1 − α) z01 (1 − β) z11 (1 − α) z11 (1 − β) z21 (1 − α) z21 (1 − β)

z01 α z01 β z11 α z11 β z21 α z21 β

0 0 z12 (1 − α) z12 (1 − β) z22 (1 − α) z22 (1 − β)

    z12 α  . z12 β   z22 α  z22 β (1) 0 0

III. O D P  C S We formulate the above problem into partially observable Markov decision process < S, A, P, K, O, R >. S is the system state as described in Section II. For markov decision problem with incomplete information, belief vector π is taken as state in each epoch. The action a ∈ A of secondary user in each decision epoch specifies the index of the selected channel to sense and related sensing time. P gives the system state transition probability pi j ∈ P from its current state i to next state j. P is related to the channel state transition matrix Q described in (1). The observation model O is used to describe the probability of observing certain state dependent outcome k ∈ K, where K specifies the set of possible observations in each system state. R specifies the instant reward, which depends on the state transition and action taken. In this channel selection problem, Rajk ∈ R denotes the reward in state j when performing action a and making observation k. The instant reward can be defined as the number of packets transmitted by the secondary user in a slot. The optimal decision sequence {a1 , a2 , · · · , at , · · · } forms the policy that maximizes the aggregate reward in a long term. We express this optimization as follows, X X X   0 ∗ Vt∗ (π) = max πi pi j Oajk Rajk + Vt−1 (π ) , (2) a

i

j

k

Oajk

where is the probability of observing k in next state ∗ j when performing action a in state j. Future reward Vt−1 0 is determined by the updated belief vector π , which is a function of current belief vector π given action and resulting 0 observation and denoted as π = Λ(π|a, k). Considering all the possible combinations of channel states among all the channels, the belief vector space could be very huge. Thus, the computation complexity to solve (2) would be very high. However if channel is independent with each other, we can prove that (2) can be transformed to the following equivalent problem which requires much less computation, Vt∗ (Ω)

= max a

X j∈S na

   a X a ∗   T (ω j |a) R j + O jk Vt−1 (Λ(Ω|a, k)) . k

(3)

The proof of equivalence of (2) and (3) is given in Appendix. Ω = {Ωn }, n ∈ {1, 2, ..., N} denotes the belief vector of each channel, and Ωn = {ωni }, i ∈ {1, 2, ..., |S n |} denotes the probability distribution of staying in state i of channel n. Thus the system state space linearly increases with the total number P of channel states. T (ω j |a) = i∈S na ωni a Qni ja denotes one step transition of belief vector, where na denotes the selected channel index specified by action a and Qni ja is the state transition 0 probability of channel na as defined in (1). Similar to π , the 0 updated belief vector, denoted as Ω = Λ(Ω|a, k), is obtained by modifying the current belief vector based on observation P when action a is in effect. Raj = k∈K Oajk Rajk denotes the instant reward in state j considering all possible observation outcomes given action a. By solving (3), a secondary user can make an optimal decision on which channel to sense at each decision epoch. Using the similar approach as that in [6], a suboptimal solution of (3) can be obtained by maximizing the instant reward at each epoch. Considering rij in our modeling, when j = 1 (primary user is absent), secondary user can determine ri through observation after transmission. However, when j = 0 (primary is present), secondary user is refrained from transmitting, thus cannot obtain the channel condition information. Therefore, 1 the observation set can be written as K = {r01 , r11 , · · · , rK−1 , r∗0 }, 0 0 where r∗ denotes channel state ri with any i ∈ {1, 2, · · · , K−1}. Instant reward Raj can be expressed as follows. X Raj = Oajk Rajk k∈K  P    k∈K−{r∗0 } Oajk Rajk , =   0,

j ∈ K − {r∗0 }, otherwise.

(4)

Considering that the observation results may not be perfectly correct, for a secondary user in state j, we assume it indeed observes state j with probability 1 − Paf and gets false alarm with probability Paf . Thus,    1 − Paf , k = j, { j, k} ⊆ K − {r∗0 },     (5) Oajk =  Paf , k = r∗0 , j ∈ K − {r∗0 },     0, k , j, { j, k} ⊆ K − {r0 }. ∗

{r00 , r10 , · · ·

0 , rK−1 },

For state j ∈ the channel is occupied by primary users. If the secondary user makes the correct observation, secondary transmission will be suspended. If the channel state is miss-detected, collision will happen and transmission will fail. As we can see, in either case, the reward is considered as 0. Thus, combing (4) and (5), the instant reward is given as follows,    (1 − Paf )Rajk , j ∈ K − {r∗0 }, k = j, a (6) Rj =   0, otherwise. Assume the channel bandwidth is W and sampling frequency is 2W, thus the false alarm probability Paf can be expressed in the form of Q function [5] [12],   p Paf = Q Q−1 (1 − Pm )(1 + λ) + λ τ s W , (7)

where Pm is the mis-detection probability threshold which specifies the collision tolerance bound of primary user. λ denotes the S NR of primary transmission and τ s denotes the sensing time. Thus Paf is a function of sensing time τ s and channel condition, rewritten as Paf (τ s , λ), and the instant reward in state j when taking action a is expanded in detail as follow.   (8) Raj = (1 − τ s /T s )Ba (λ j ) 1 − Paf (τ s , λ pj ) . The term (1 − τ s /T s )Ba (λ j ) denotes the effective throughput when observation matches the real state (i.e., k= j), where T s is the length of a time slot and Ba (λ j ) = log2 (1+λ j ) measures the normalized channel capacity of channel na specified by action a, which can be viewed as a constant multiplicative coefficient given channel state j. λ j is the S NR of secondary transmission, while λ pj denotes the S NR of primary transmission, which can be estimated through historical data. From (8), the optimal τ s should also be determined such that Raj in selected channel na is maximized in each epoch. Thus, optimal action a∗ also specifies the optimal sensing time denoted as τ∗a . {n∗a , τ∗a } = arg max a

X j∈S na

  X   ∗ T (ω j |a) Raj + Oajk Vt−1 (Λ(Ω|a, k)) .

(9)

k

The optimal joint selection a∗ = {n∗a , τ∗a } leads to the highest expected throughput of the secondary user. After determining the policy a∗ , the secondary user initiates channel sensing in channel n∗a with parameter τ∗a and obtain observation results regarding the channel state. These observation results are employed to help update the state belief vector. We formulate the updating rules as follow.    I(Ωn ), n = n∗a , k ∈ K − {r∗0 },    0  0 (10) Ωn = Λ(Ω|a, k) =  Ωn Q n , n = n∗a , k = r∗0 ,     Ωn Qn , n , n∗a . Indication function I(Ωn ) returns a vector with all elements equal to zero except j-th element I j = 1 if observation k 0 matches the j-th channel state in set S n . Q n is the modified transition matrix of channel n given the observation that primary user is present, which can be obtained by setting α and β to 0 in (1). IV. P E The optimality of POMDP with continuous action space needs special care and a solution method is provided in [13]. In this simulation, we only consider the instant reward as our objective function for simplicity. Assume there are N = 3 primary channels with bandwidth W = 6KHz and 3-state FSMC model is considered with the average S NR at each state equal to 0dB, 5dB, and 10dB, respectively. The secondary system is operating in time slots with slot length T s = 10ms. Unless otherwise mentioned, assume α = 0.2, β = 0.8 when modeling the primary activity, and [z01 z10 , z12 , z21 ] = [0.1, 0.05, 0.15, 0.1] when modeling channel state transition.

0.5 0.48

0.9 0.85 0.8 0.75

6−state modeling 2−state modeling

0.7 0.65

0

5

10

15

20

25

30

Time slots (Unit: 10ms)

Channel efficiency of secondary user

Throughput of secondary user (bits/slot/Hz)

1 0.95

0.46

CST α=0.2 β=0.8 OST α=0.2 β=0.8 CST α=0.4 β=0.6 OST α=0.4 β=0.6

0.44 0.42 0.4 0.38 0.36 0.34

0

5

10

15

20

25

30

Time slots (Unit: 10ms)

Fig. 3: Secondary user throughput comparison between the proposed scheme (6-state) and that in [6] (2-state). A. Secondary User Throughput First, we compare our work with that presented in [6]. In [6], the channel quality variation of the secondary user is not considered. A secondary user transmits packets with a fixed transmission rate at any available channel. For fair comparison, P a we set the transmission rate in [6] as Ba0 = j γ j B (λ j ), which is the average transmission rate in our model. γ j denotes the normalized stationary probability of channel state 1 j ∈ {r01 , r11 , · · · , rK−1 }. Fig. 3 compares the secondary user throughput under our scheme and that in [6]. To keep consistent with [6], throughput is defined as the number of bits transmitted per slot with unit bandwidth. In the figure, our scheme and the scheme in [6] are referred to as 6-state and 2-state modeling respectively, which indicate the corresponding number of channel states. From the results, we can see that the 6-state case outperforms the 2-state case. In 6-state case, more knowledge of the channel conditions can be obtained, which helps the secondary user make better choice in selecting channels. B. The Impact of Sensing Time on Channel Efficiency The channel efficiency f (t) is defined as the product of spectrum opportunity 1− P f and normalized transmission time P 1−τ s /T s . We take the time average efficiency as T1 Tt=1 f (t). In the experiment, we first conduct constant sensing time (CST) strategy, then compare with our proposed joint optimization policy, in which both optimal channel index and optimal sensing time (OST) are obtained in each time slot. In CST method, the user follows the decision process specified in Section III without considering optimal sensing time in different states. Fig. 4 compares the channel efficiency between CST and OST methods. For fair comparison, the sensing time in CST P is set as τ s = j γ j τ sj , which is the average optimal sensing time in OST scheme. τ sj denotes the optimal sensing time in state j, and γ j is defined in Section IV-A. From Fig. 4, we can see that OST method has a better performance than CST method in terms of channel efficiency. Fig. 4 also plots the channel efficiency considering different primary activities. C. Heterogeneous Channels Channels are considered heterogeneous and associated with different transition probabilities, z1 = [0.05, 0.15, 0.05, 0.25],

Fig. 4: Channel efficiency comparison between the constant sensing time strategy (CTS) and the proposed optimal sensing time strategy (OST).

z2 = [0.15, 0.05, 0.05, 0.15], and z3 = [0.25, 0.05, 0.15, 0.05], respectively, for channel C1, C2 and C3. Thus, C3 has the best channel condition (on average) while C1 has the worst one (on average). We compare our scheme with the random access, which randomly chooses a channel to sense. If sensed idle, the secondary user will transmit in such channel. Fig. 5a plots the probability of each channel being chosen to transmit. It shows that the total probability adds up to 0.62 for the proposed method, which means stronger ability to exploit channel opportunities than random access. Moreover, the proposed channel access tends to transmit on channel with better condition (C3), and C1 is rarely selected in our method, while in random access, all the channels are selected with equal probability. Fig. 5b gives the throughput comparison among the three channels. As the access probability is almost the same over each channel in random access, the throughput differences are merely caused by various transmission rate in different states. It shows that the aggregated throughput in our method is significantly higher than that in random access. V. C In this paper, we propose an optimal sensing channel selection policy based on partially observable Markov decision process (POMDP). The secondary user makes a decision on which channel to sense based on previous observations of the channels. To maximize the secondary user throughput, the proposed policy jointly chooses the optimal sensing channel and sensing time. Simulation results reveal that by taking the time-varying channel state into consideration, the proposed method achieves a better performance than the existing work which does not consider channel state variation. With the optimal sensing time, the channel efficiency is also improved in our proposed method, as compared with the method with fixed sensing time. A A P  E  (2)  (3) Before proving the equivalence of (2) and (3) given independent channels, we specify some notations first. Current system

Throughput of secondary user (bits/slot/Hz)

0.7

Optimal Access Random Access

Probability of transmission

0.6 0.5 0.4 0.3 0.2 0.1 0

c1(z ) c2(z ) c3(z ) Total 1 2 3 Channel associated with different transition probabilities

1.6 1.4

Optimal Access Random Access

1.2 1 0.8 0.6 0.4 0.2 0

c1(z1) c2(z2) c3(z3) Total Channels associated with different transition probabilities

(a) Channel access probabilities

(b) Throughput concentrates on best channel

Fig. 5: Performance comparison between the proposed scheme and the random access scheme.   X XX  a X a ∗  N N 2 2 1 1 ∗ O jk Vt−1 (Λ(Ω|a, k)) , ··· ωi1 Qi1 j1 ωi2 Qi2 j2 · · · ωiN QiN jN R j + Vt (Ω) = max a

i1 , j1 i2 , j2

= max a

X

ina , jna

= max a

X

ina , jna

iN , jN

ωnina Qnina jn a

a

a

ωnina Qnina jn a a a

X k

X k

YX  ∗ (Λ(Ω|a, k)) ωrir Qrir jr , Oajna k Rajna k + Vt−1 r,na ir , jr

Oajna k

  ∗ (Λ(Ω|a, k)) . Rajna k + Vt−1

state st = [s1t , s2t , · · · , stN ] is a vector in set S 1 × S 2 · · · S N , and st = i specifies the current state srt = ir for each channel r where r = {1, 2, · · · , N}. Thus we denote i = [i1 , i2 , · · · , iN ]. In the similar way, we denote next state st+1 = j and j = [ j1 , j2 , · · · , jN ]. If channels are independent, then we have πi

Prob(st = i), N N Y Y = Prob(srt = ir ) = ωrir .

=

r=1

pi j

(13)

r=1

Prob(st+1 = j|st = i), N N Y Y = Prob(srt+1 = jr |srt = ir ) = Qrir jr .

=

r=1

(14)

r=1

ωrir ∈ Ω denotes the probability staying in state ir ∈ S r of channel r, where ir is the r-th element of i. Qrir jr represents channel r’s transition probability from state ir to jr . Thus we QN ωrir Qrir jr . have πi pi j = r=1 Once the action a is taken, the next operating channel is determined by na , and the states in other channels has nothing to do with the observation result and reward. Therefore, value function in (2) can be rewritten as (11), and then reduced to (12) since the reward is just related to chosen channel na . Therefore we have the following equivalent form based on (12), Vt∗ (Ω) = max a

X jn a

(11)

k

  X   ∗ T (ω jna |a) Rajna + Oajna k Vt−1 (Λ(Ω|a, k)) , (15) k

P where T (ω jna |a) = ina ωnina Qnina jn denotes one step transition a a a of belief vector as presented in (3).

(12)

R [1] M. Mchenry, “Spectrum white space measurements,” New America Foundation Broadband Forum (June 2003). [2] D. Chen, S. Yin, Q. Zhang, M. Liu, and S. Li, “Mining spectrum usage data: a large-scale spectrum measurement study,” in MobiCom ’09: Proceedings of the 15th annual international conference on Mobile computing and networking. New York, NY, USA: ACM, 2009, pp. 13–24. [3] I. Mitola, J., “Cognitive radio for flexible mobile multimedia communications,” in Mobile Multimedia Communications, 1999. (MoMuC ’99) 1999 IEEE International Workshop on, 1999, pp. 3 –10. [4] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, “Next generation/dynamic spectrum access/cognitive radio wireless networks: A survey,” Computer Networks, vol. 50, no. 13, pp. 2127 – 2159, 2006. [5] W.-Y. Lee and I. Akyildiz, “Optimal spectrum sensing framework for cognitive radio networks,” Wireless Communications, IEEE Transactions on, vol. 7, no. 10, pp. 3845 –3857, Oct. 2008. [6] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitive mac for opportunistic spectrum access in ad hoc networks: A pomdp framework,” Selected Areas in Communications, IEEE Journal on, vol. 25, no. 3, pp. 589 –600, april 2007. [7] J. Jia, Q. Zhang, and X. Shen, “Hc-mac: A hardware-constrained cognitive mac for efficient spectrum management,” Selected Areas in Communications, IEEE Journal on, vol. 26, no. 1, pp. 106 –117, Jan. 2008. [8] T. S. Ferguson, Optimal Stopping and Applications. [Online]. Available: http://www.math.ucla.edu/ tom/Stopping/Contents.html [9] H. Jiang, L. Lai, R. Fan, and H. Poor, “Optimal selection of channel sensing order in cognitive radio,” Wireless Communications, IEEE Transactions on, vol. 8, no. 1, pp. 297 –307, Jan. 2009. [10] H. S. Wang and N. Moayeri, “Finite-state markov channel-a useful model for radio communication channels,” Vehicular Technology, IEEE Transactions on, vol. 44, no. 1, pp. 163 –171, Feb. 1995. [11] A. Min and K. Shin, “Exploiting multi-channel diversity in spectrumagile networks,” in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 13-18 2008, pp. 1921 –1929. [12] Y.-C. Liang, Y. Zeng, E. Peh, and A. T. Hoang, “Sensing-throughput tradeoff for cognitive radio networks,” Wireless Communications, IEEE Transactions on, vol. 7, no. 4, pp. 1326 –1337, april 2008. [13] J. M. Porta, N. Vlassis, M. T. Spaan, and P. Poupart, “Point-based value iteration for continuous pomdps,” J. Mach. Learn. Res., vol. 7, pp. 2329– 2367, 2006.