You are on page 1of 8

2017 IEEE Third International Conference on Multimedia Big Data

Mining Urban WiFi QoS Factors: A Data Driven Approach

Weihao Zhou , Zhi Wang , and Wenwu Zhu


Graduate
School at Shenzhen, Tsinghua University
Tsinghua National Laboratory for Information Science and Technology
zwh14@mails.tsinghua.edu.cn, wangzhi@sz.tsinghua.edu.cn, wwzhu@mail.tsinghua.edu.cn

AbstractWiFi networks play a signicant role in providing of WiFi networks in relatively small areas. Chen et al. [2]
todays wireless connectivity; therefore, understanding and measured the network performance of a university cam-
improving WiFi network performance is important for todays pus WiFi network and identied the dominant factors that
mobile applications and services. Previous studies conducted
to investigate WiFi network performance have generally been affect network performance. Divgi et al. [3] focused on
performed using specic types of WiFi networks in relatively commercial WiFi networks in Australia and presented some
small areas and have been limited by either the scale of user activity characteristics. Ghosh et al. [4] studied user
the studied WiFi access points (APs) or by the number of behaviors when they were associated with public hotspots
users. This paper describes a country-wide measurement study using AT&T public datasets. Patro et al. [5] measured the
on WiFi network performance with the goal of determining
which factors affect the quality of service (QoS) of high- user experience using OpenWrt-based access points. Farshad
density WiFi networks. We use a crowdsourced approach to et al. [6] measured the density of access points and inter-
study both the latency and bandwidth experienced by users in ferences among neighboring access points in the city. Some
different types of WiFi networks. Our ndings indicate that other large-scale studies have been performed, but, in these
(1) WiFi network performance is correlated not only with studies, the WiFi networks were typically controllable and
signal strength but also with factors such as time; (2) the
latency and bandwidth experienced by users generally exhibit operational. Biswas et al. [7] monitored large-scale wireless
different patterns and are affected differently by the studied links using customized access points that were controllable
factors; and (3) users experience signicantly different QoS with dedicated radio hardware (e.g., Meraki MR18). Sui
from different nearby APs, which suggests that providing users et al. [8] deployed the lightweight WiFiSeer framework
with better information when choosing a WiFi connection can into large-scale operational WiFi networks on a campus to
yield more satisfactory results. We also provide insights on how
such ndings can be utilized to improve AP deployments and characterize and improve WiFi latency.
optimize association strategies.
1
Keywords-WiFi; QoS; Data Mining; Latency; Bandwidth
0.8
I. I NTRODUCTION
0.6 Beijing
Smart mobile devices, such as smartphones and tablets,
CDF

Shenzhen
have become increasingly popular among users. These de- 0.4 Shanghai
Guangzhou
vices allow users to enjoy the mobile Internet more often
than before. WiFi networks have become one of the most 0.2

popular choices for users to surf the Internet [1] while on the
0
move. Currently, WiFi networks can be found everywhere, 1 5 10 20 50 100 300
from retail stores to hotels, and are used to deliver large Nbunber of Hotspots in Each Cell
amounts of multimedia trafc such as video/audio streaming,
Figure 1. The Density of WiFi Networks in Four Cities in China. Using
online gaming, and so forth. These uses indicate that WiFi collected WiFi datasets, we divide the areas of these four cities into 100
networks play a signicant role in providing todays wire- x 100-meter cells and count the number of WiFi hotspots in each cell to
less connectivity. When many WiFi networks are available calculate the density and draw the cumulative distribution function (CDF)
curves. Nearly 50 percent of the cells in each city have more than 5 WiFi
in a city, studying network performance and determining hotspots, indicating that the density of WiFi networks in cities is high.
which factors affect the WiFi quality of service (QoS) are
important problems. Solving these problems will provide us However, compared with previous studies, studying net-
with opportunities to better understand and improve network work performance and mining the factors that affect WiFi
performance and to recommend better WiFi choices for users QoS on a large scale in cities remain challenging. There
so they can achieve better QoS. are multiple reasons for this challenge. First, collecting
In previous studies, network performance measurements large-scale WiFi datasets in metropolitan areas is difcult
and the determination of which factors inuence WiFi because i) WiFi networks are composed of many different
QoS have generally been performed using specic types types of WiFi networks, including private, public, residen-

978-1-5090-6549-3/17 $31.00 2017 IEEE 9


DOI 10.1109/BigMM.2017.12
tial and business networks; ii) WiFi networks constructed they should place more emphasis on the ping speed factor
by different individuals/organizations for different purposes and when they need high bandwidth, they should emphasize
are not controllable or even always operational; and iii) the time factor. Furthermore, we also provide insights on
modifying AP hardware or deploying a specic framework how such ndings can be utilized to improve access point
such as WiFiSeer [8] into APs on a large scale is very deployments and optimize association strategies.
expensive. Second, because WiFi networks are used not The rest of this paper is organized as follows. In Sec-
only by users but also by devices such as Roku and tion II, we provide details about the architecture of the
SlingTV, computers running automated processes, sensors crowdsourced WiFi association system and the preprocess-
and other IoT-connected devices, etc. Such attached devices ing of the WiFi datasets. The WiFi data mining framework is
can easily consume sufcient bandwidth to affect WiFi presented in Section III. The mining experiments of latency
network performance for human users. Therefore, this and bandwidth are discussed in Section IV. We show related
mining study requires large numbers of users and devices works in Section V and draw conclusions in Section VI.
to contribute to the measurement effort.
II. W I F I DATASETS
To overcome these challenges, we develop a crowd-
sourced approach to collect WiFi datasets and use machine In this section, we will introduce and then comprehen-
learning methods (i.e., decision tree) to study both the sively discuss the WiFi datasets collected by the crowd-
latency and bandwidth experienced by users in cities. The sourced WiFi network association system. The crowdsourced
crowdsourced approach, which is explained in detail in approach makes it easy to collect datasets from WiFi net-
Section II, provides us with the opportunity to collect WiFi works that are not controllable or always operational in the
datasets easily in the city. The machine learning methods city. The large scales of these WiFi datasets are helpful in
(explained in Section III) allow us to mine the factors that overcoming the limitations of WiFi mining.
affect urban WiFi QoS. A. System Architecture
In total, the WiFi datasets listed in Table I contain data
for 10,932,982 unique users, 17,489,226 unique hotspots
Coordinative association scheduler
and 49,854,136 sessions collected over 30 days in 2015,
Association
which enable a panoramic view of city WiFi networks on a scheduler
WiFi network resource
large scale. To study the factors that inuence high-density Proactive QoS User behaviour QoS and behaviours report
WiFi QoS in cities, we rst select 6 core factors from the monitor monitor
QoS testing
WiFi datasets: signal strength, signal-to-noise ratio (SNR),
Internet service provider (ISP), number of connections,
ping speed, and time (Table III). Then, we use statistical Security hotspot lter User

approaches and feature selection methods to estimate the in- Testing


servers
uences by calculating the correlations between each factor Edge
and QoS metrics such as latency and bandwidth. The sta- Crowdsourced WiFi
network resource pool
network

tistical approaches include Pearson correlation coefcients


and Kendall rank correlation coefcients. Feature selection
Figure 2. The architecture of the coordinative WiFi association system:
methods such as information gain and information gain ratio crowdsourced WiFi network information is maintained, and WiFi hotspots
are used to mine the data. are suggested to users based on centralized information.
Using these WiFi datasets and mining methods, we nd
that (1) signal strength does not have a strong inuence In this subsection, we illustrate the architecture of the
on the WiFi network performance. In contrast, time, SNR crowdsourced WiFi network monitor system and the data
and ping speed factors are more strongly correlated; (2) report mechanism. Figure 2 shows the architecture of the
the latency and bandwidth experienced by different users coordinated WiFi association system, which consists of four
on the same WiFi are not correlated, indicating that good interactive components. As the key role in the crowdsourced-
latency/bandwidth enjoyed by users is to some extent, expe- based system, users mobile devices sense the edge network
rienced randomly from the same access point; and (3) users environment, record the WiFi session processes and QoS
experience signicantly different QoS from different nearby metrics and report this measured information to a crowd-
access points, indicating that WiFi selection is critical. sourced WiFi network resource pool in the central server.
Based on our measurements and data mining results, Testing servers used to assist in QoS testing consist of some
choosing the access point with the highest signal strength dedicated content delivery network (CDN) servers, through
among multiple nearby access points does not always guar- which most multimedia services are currently delivered. The
antee the best QoS. Moreover, the factors that affect WiFi central server uses information maintained in the resource
selection also vary depending on different demands, such as pool to help us understand network performance and guide
low latency or high bandwidth. When users need low latency, hotspot associations.

10
To deploy this crowdsourced WiFi association system, we Table II
W I F I COVERAGE IN REPRESENTATIVE CITIES IN C HINA
collaborate with our industrial partner, the Tencent WiFi THE SIZE OF EACH CITY IS UPDATED IN 2015.
team, which runs a popular APP installed on smart mobile
devices. By combining the crowdsourced approach with
City WiFi covered area (km2 ) Coverage fraction
this APP, we can collect users hotspot associations and
Beijing 1357.47 0.083
WiFi network environment whenever the APP is activated Guangzhou 950.04 0.128
by users. Note that users have allowed the actions of this Shanghai 1396.40 0.220
APP and that we strictly protect user privacy. Specically, Shenzhen 535.19 0.274
the APP measures the following information: (1) latency,
such as connection time cost, is the time cost of sending
an association request to the response received and (2) speed) are used separately in latency mining and bandwidth
bandwidth, including both download and upload speeds, is mining because some (i.e., number of connections and ping
measured by downloading/uploading a 5 Mb le from/to the speed) are recorded in the latency dataset and the others are
testing CDN servers. recorded in the bandwidth dataset.
B. WiFi Datasets Table III
T HE FACTORS OF W I F I DATASETS FOR M INING
1) The Whole Datasets: The WiFi datasets were collected
from 4 representative cities (i.e., Beijing, Shanghai, Shen- ID Factors Value Ranges Used
zhen and Guangzhou) in China and are composed of two 1 Signal Strength 0,1,2,3,4 L B
parts: latency data and bandwidth data, as shown in Table I. 2 Signal-to-noise Ratio (SNR) 0,1,2,3,4,5,6,7,8 B
The latency data records are from Nov. 27 to Dec. 10, 2015, 3 Internet Service Provider (ISP) 8 classes* B
4 Number of Connections 0 255 L
and the bandwidth data records are from Nov. 27 to Dec. 5 Ping Speed 0ms L
27. These WiFi datasets have as many as ten million records 6 Time 24 Hours L B
in each class (i.e., users and hotspots), making it possible to * Includes 46000,46001,46002,46003,46007,46011,20404,45412.
study the factors that inuence WiFi QoS on a large scale.
This information also indirectly indicates that WiFi network 4) QoS: To study the factors that inuence WiFi network
density in Chinese cities is high. QoS, we choose 2 classic metrics, latency and bandwidth,
to measure the WiFi network QoS. To measure latency, we
Table I
T HE I NTRODUCTION OF W I F I DATASETS select the latency factor, which includes the time cost of
pinging from users smart mobile devices to CDN servers.
Items Users Hotspots Sessions Days For bandwidth, download and upload speeds are chosen,
Latency 6,812,933 12,812,420 38,444,958 14 which provide information about the transmission speed
Bandwidth 4,119,049 4,676,806 11,409,178 30 from users smart mobile devices to CDN servers.
Total 10,931,982 17,489,226 49,854,136 30*
* All days range from Nov. 27 to Dec. 27, 2015. C. Preprocessing
However, continuous values are not appropriate for data
2) WiFi Coverage: We investigate how representative mining algorithms such as decision tree. Therefore, we rst
cities are covered by WiFi networks. Based on the locations discretize the value ranges of the latency and download
of the WiFi hotspots, we calculate the area in these cities speed. As illustrated in Figure 6(a), we nd that upload
covered by WiFi networks. We assume that a WiFi hotspot speed and download speed have nearly the same distribution;
can cover a circular area with a radius of 30 meters. Table II thus, we choose download speed as a representative case.
lists the WiFi-covered areas in different cities, in both km2
(i.e., the size of the area that is covered by WiFi networks) Table IV
and percentage (i.e., the percentage of WiFi-covered area D ISCRETIZATIONS OF L ATENCY AND BANDWIDTH
compared to the entire city area). This observation suggests ID Classes Latency Bandwidth
that cities with higher population densities usually have 0 Ex Fast/Low 50ms (16.7%) 40Kbps (8.3%)
higher rates of WiFi availability. 1 Fast/Low 50 130 ms (24.9%) 40 500 Kbps (32.7%)
2 Slow/High 130 300 ms (28.3%) 0.5 1.6 Mbps (36.6%)
3) Factors: By analyzing the WiFi datasets, we select 6 3 Ex Slow/High 300ms (30.1%) 1.6Mbps (22.4%)
core factors from the data items to mine the inuences: sig-
nal strength, SNR, ISP, number of connections, ping speed,
To perform discretization, we draw the CDF curves of
and time (Table III). In Table III, the Used column indicates
latency and download speed as shown in Figure 3. Com-
whether these factors are used in latency or bandwidth; an
paring the equal frequency discretization and equal width
L indicates latency and a B indicates bandwidth. These
discretization, we decide to discretize these value ranges
four factors (i.e., SNR, ISP, number of connections and ping
based on real-life demands. As shown in Table IV, the

11
value ranges for latency and download speed are both Table V
M AINSTREAM M ULTI -C LASSIFICATION M ETHOD ACCURACIES
split into four classes. For latency, we adopt a threshold
value of 50 ms to represent extremely fast performance Methods Latency Bandwidth
(Ex Fast); this performance level allows users to surf the Decision Tree 0.39658362 0.431763571
Internet more satisfactorily. For download speed, we nd that Extra Trees 0.31634051 0.438247991
approximately 22.4 percent of bandwidth records exceed 1.6 AdaBoost 0.39581812 0.432840988
Random Forest 0.39617334 0.432116284
Mbps; this speed allows les to be downloaded easily and is LDA 0.39602068 0.425746223
represented by Ex High. Using this discretization approach, KNN 0.29447991 0.338264691
network performance can be suitably characterized.

1 1
We must be careful when choosing machine learning al-
0.8
0.8
gorithms to mine the rules: the selected algorithms should
0.6 0.6 be able to not only tackle the complex relationships and
CDF
CDF

Ex Fast Fast Slow Ex Slow Ex Low Low High Ex High


0.4 0.4 interdependencies but also explain the ndings simply and
0.2 0.2 clearly. We conducted some pilot experiments using 10-fold
0 0 cross-validation on mainstream multi-classication machine
10 50 130 300 1000 10 4 10 5 10 6 10 7 10 8
Latency (ms) Download Speed (bps)
learning algorithms, including decision tree, random forests,
(a) Latency Distribution (b) Bandwidth Distribution
extra trees, AdaBoost, linear discriminant analysis (LDA)
and k-nearest neighbor (KNN). From the accuracies shown
Figure 3. From these distribution curves, we divide the value ranges
into four classes: Ex Slow/Slow/Fast/Ex Fast for latency and Ex in Table V, we nd that the decision tree algorithm has the
Low/Low/High/Ex High for download speed based on real-life consider- best accuracy in the latency experiment and the fourth best
ations rather than equal-frequency discretization or equal-width discretiza- in the bandwidth experiment. By comprehensively consid-
tion. (Note: Ex denotes Extremely.)
ering the accuracy and simplicity of these machine learning
algorithms, we decide to use the decision tree algorithm to
III. W I F I DATA M INING F RAMEWORK study the rules based on these selected factors.
This section provides a comprehensive introduction to the
WiFi data mining framework, which includes three parts: B. Inuential Factors
mining methods, inuential factors and WiFi QoS metrics.
Based on Tencent WiFi datasets, we select six core factors
A. Mining Methods to mine inuences on WiFi QoS.
The mining targets consist of two parts: (1) calculating
Signal Strength. The most obvious factor in WiFi
the correlations and (2) mining the rules to instruct users
networks is the signal strength. In the WiFi datasets,
how to select better WiFi to achieve a better QoS.
the signal strength value range has ve levels, 0, 1, 2,
The statistical approaches used include Pearson correla-
3, and 4. A larger value indicates a stronger signal;
tion coefcients (PCC) and Kendall rank correlation coef-
however, whether a stronger signal affects the QoS of
cients (KRCC) and feature selection methods consist of
WiFi networks is worth researching.
information gain (IG) and information gain ratio (IGR).
SNR. SNR removes noise that cause interference, leav-
The PCC and KRCC are generally used to study the
ing the pure signal.
linear correlations between each factor and QoS metrics such
ISP. WiFi networks are constructed by ISPs, and dif-
as latency and bandwidth. Linear correlations are easy to
ferent ISPs offer different services. We need to study
understand and provide a direct view of the inuences. How-
how much inuence ISPs have on WiFi QoS.
ever, correlations are often non-linear, and these statistical
Number of Connections. Each individual access point
approaches are not able to reveal inuences from non-linear
may serve as the connection point for many different
correlations. Instead, we use IG/IGR methods to reveal the
smart mobile devices. Smart mobile devices connecting
non-linear correlations. The IG indicates a change in the
to the same access point may compete for network
amount of system information. The larger the IG value is, the
resources. Mining the inuence that this factor has on
more important a factor is for the entire system. Although IG
WiFi QoS is necessary.
is good for studying non-linear correlations, to some extent,
Ping Speed. This factor is the speed with which users
the IGR is even better because it also indicates the speed of
smart mobile devices can ping CDN servers. The con-
the change in system information.
nection time cost equals the length of the network path
To research the rules, we focus on the circumstances that
from users smart mobile devices to the CDN server
lead to bandwidth and latency that are Ex Slow or Ex High.
divided by the ping speed.
These targets indicate that it is a classication problem.
Time. Time factor indicates 24 hours in an entire day.

12
3.5 0.25 10-3 0.3
6
Time

Information Gain Ratio


3 5 0.25

Kendall Coefficient
Pearson Coefficient
0.2 Signal Strength
Information Gain

Time 4 Ping Speed 0.2 Time


2.5 Time Signal Strength 3 Number of Connections 0.15 Signal Strength
2 Signal Strength 0.15 Ping Speed Ping Speed
Ping Speed 2
Number of Connections 0.1 Number of Connections
1.5 Number of Connections 0.1 1
0.05
0
1 0
0.05 -1
0.5 -2 -0.05
0 0 -3 -0.1
1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13
Date(Nov.27 -- Dec.10 in 2015) Date(Nov.27 -- Dec.10 in 2015) Date(Nov.27 -- Dec.10 in 2015) Date(Nov.27 -- Dec.10 in 2015)

(a) Information Gain of Latency (b) Ratio of Information Gain for Latency (c) Pearson Coefcient of Latency (d) Kendall Coefcient of Latency
4.5 0.45 0.07 0.35
4 Information Gain Ratio 0.4 0.06 Time 0.3
ISP

Pearson Coefficient

Kendall Coefficient
Time Time 0.05
Information Gain

3.5 0.35 SNR 0.25


ISP ISP 0.04 Time
3 0.3 SNR 0.03 Signal Strength 0.2
SNR ISP
2.5 Signal Strength 0.25 Signal Strength 0.02 0.15 SNR
2 0.2 0.01 0.1 Signal Strength
1.5 0.15 0 0.05
-0.01
1 0.1 -0.02 0
0.5 0.05 -0.03 -0.05
0 0 -0.04 -0.1
5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30
Date(Nov.27 -- Dec.27 in 2015) Date(Nov.27 -- Dec.27 in 2015) Date(Nov.27 -- Dec.27 in 2015) Date(Nov.27 -- Dec.27 in 2015)

(e) Information Gain of Bandwidth (f) Ratio of Information Gain for Bandwidth (g) Pearson Coefcient of Bandwidth (h) Kendall Coefcient of Bandwidth
Figure 4. The inuences of six core factors on WiFi latency and bandwidth. We use statistical approaches and feature selection methods to calculate
these inuences. Figures (a), (b), (c), and (d) show the latency results, while (e), (f), (g), and (h) show the bandwidth results. The X-axis in all the images
represents the date.

C. Metrics of WiFi QoS nearly insignicant. As we all know, the time cost equals the
In this part, we present two traditional metrics, latency length of the path divided by the speed. When the number
and bandwidth, to measure WiFi QoS. of network hops from users smart mobile devices that are
1) Latency: WiFi latency [8] is a critical performance connected to the same access point to the same CDN server
measure for modern real-time interactive mobile Internet are the nearly same, latency is highly correlated with the
applications, including online games, instant messaging and ping speed.
live collaborative applications. In Table III, we research how From Figure 4(c) and Figure 4(d), we observe that the
much inuence four factors, namely, signal strength, number Pearson and Kendall correlations between the four factors
of connections, ping speed and time, have on WiFi latency. and latency are relatively small. As Figure 4(c) shows,
2) Bandwidth: Bandwidth is another core metric for the Pearson correlations are very small, below 0.01, which
measuring WiFi QoS. For users streaming video or audio indicates almost no correlation, while Figure 4(d) shows
streamingactivities that consume large amounts of data that the Kendall correlations are also small. The Kendall
bandwidth is more important than latency. In this paper, we correlations between latency and signal strength on some
select two metrics (download speed and upload speed) to days are greater than 0.25but overall, these values do not
represent bandwidth. indicate a strong positive correlation.
From the above analyses, the latency mining can be
IV. W I F I DATA M INING E XPERIMENTS summarized as follows: the PCC and KRCC tests show that
In this section, we use IG/IGR feature selection methods, these factors do not have obvious linear correlations with
PCC/KRCC statistical approaches and decision tree to per- WiFi latency, whereas the IG/IGR research results indicate
form mining experiments to investigate the effects of the that ping speed has a greater inuence on WiFi latency than
selected factors on latency and bandwidth and show results. the other three factors.

A. Latency B. Bandwidth
With the latency dataset, this subsection presents studies Bandwidth is another core metric for measuring WiFi
on the inuences of four factors on WiFi latency. As illus- QoS. As illustrated in Figure 4(e) and Figure 4(f), the study
trated in Figure 4(a) and Figure 4(b), in both measurements, results show that the four factors inuence bandwidth in
the four factors inuence latency in the following order of the following order: time > SNR > signal strength > ISP.
importance: ping speed > signal strength, time > number of Time has the highest correlation with bandwidth because
connections. Ping speed has the largest degree of inuence the same access point provides only a certain amount of
on latency, providing good evidence for the assertion above. total bandwidth, and people use WiFi networks regardless of
The degrees of inuence of signal strength and time are their current behavior (e.g., working, shopping or resting).
nearly the same, whereas the number of connections is The reason why the SNR factor has more inuence than

13
the signal strength factor is that the SNR is the signal D. Combining Latency with Bandwidth
strength without the noise that cause bandwidth losses. The factors shown in Table III could be used in both
Although WiFi networks are constructed by different ISPs, latency and bandwidth studies by using the same unique
the ISP factor has little inuence on bandwidth because the BSSID of the same hotspot to combine the WiFi datasets.
total bandwidth offered by the ISP is far greater than the Based on this idea, we research the correlation between
bandwidth available through a WiFi access pointin other latency and bandwidth at the same access point and ob-
words, the bottleneck is the AP, not the ISP. However, the tain the results shown in Figure 6(b). As shown in Fig-
total trafc on a given ISP trunk line might explain the slight ure 6(b), the latency and bandwidth of the same hotspot are
inuence that ISP has in this study. nearly independentin other words, users enjoying good
In Figure 4(g), the mining result from the PCC statistical latency/bandwidth are doing so randomly to some extent.
approach shows that the Pearson correlation values are very
small, indicating that the four factors have no obvious linear 1 10 6
2.2 400
correlations with WiFi bandwidth. In Figure 4(h), the KRCC 0.8
Download Speed
Upload Speed
download speed (bps)
ping latency (ms)

Download Speed (bps)


results show that the correlation of bandwidth with signal

Ping Latency (ms)


2
0.6 300
strength is nearly 0.3, which is larger than the correlation of

CDF
0.4 1.8
the other factors, indicating that signal strength has a slight 200

linear correlation with WiFi bandwidth. 0.2


1.6

In summary, the bandwidth mining results from PCC 0 100


100 103 104 105 106 107 108 0 10 20 30 40 50 60 70
show that these factors have no linear correlations with Download/Upload Speeds (bps) # of AP

bandwidth, while the KRCC results show that signal strength (a) Distribution Of WiFi bandwidth (b) Correlation between Latency and Bandwidth
has a small correlation with bandwidth. The IG/IGR research Figure 6. Mining the correlation between latency and bandwidth combined:
results indicate that time has the largest inuence on WiFi (a) the distribution of WiFi download speed and upload speed are nearly
the same; therefore, we choose download speed to represent bandwidth. We
bandwidth while SNR has the second largest, and both have use the same unique BSSID to select the top 72 APs that appear in both
more inuence than the other factors. in the latency and bandwidth datasets to draw the curves in Figure (b).

C. Signal Strength
We use the signal strength reported by the users through E. Decision Tree Method
the WiFi connection session traces and associate it with the As shown in Table V, we chose the decision tree method
reported QoS, which includes users perceptions of latency (Figure 7) to model the effects of six factors on WiFi QoS.
and download speed. To study the correlation between signal In Figure 7(a), the decision tree method achieves an
strength and the QoS metrics, we plot the CDFs of the accuracy of 0.4318 in classifying the bandwidth based on
average latencies experienced by users at different signal four factors. We know that signal strength has an obvious
strength levels in Figure 5(a) and the CDFs of the average impact on bandwidth. However, from Figure 7(a), we can see
download speeds experienced by users at different signal the impact of other factors as well: (1) The smaller the signal
strength levels in Figure 5(b). The impact of signal strength strength is, the lower the bandwidth is, as shown in the left-
on bandwidth is obvious (i.e., a higher signal level leads to bottom and right-bottom branches; (2) Time is associated
a larger bandwidth speed), indicating that the bottleneck is with user behavior. Here, the value 63 means 21:00 hours.
usually the last hopthe wireless hop. Thus, it is important As the gure shows, the time mainly affects the bandwidth
to improve the deployment of WiFi networks to reduce around the time before 21:00 and after 21:00 (i.e., after
interference. In contrast, the impact of signal strength on 21:00 hours, a signal strength of <= 3.5 results in Low
the latency experienced by users is less obvious. Compared bandwidth, otherwise it results in High bandwidth); (3) As
with download speed, the latency caused in a network has indicated in the footnote for Table III, the 2 ISP indicates
a signicant impact on QoS [9]. the 46002 class. ISP 3 offered Ex High bandwidth
when the signal strength was > 2.5 and before approximately
1 1
signal-level=0
21:00; and (4) The SNR factor provides a chance to diagnose
signal-level=0
0.8 signal-level=1
signal-level=2
0.8
signal-level=1
signal-level=2
the Ex Low and Low classes shown in the right-bottom
0.6
signal-level=3
signal-level=4 0.6
signal-level=3
signal-level=4
branches. The ways in which these factors affect bandwidth
CDF

are comprehensive and we should take all these factors into


CDF

0.4 0.4
consideration when diagnosing the bandwidth.
0.2
0.2
In Figure 7(b), the decision tree approach achieves an
0
1 10 30 50 100 300500 1000
0
100 101 102 103 104 105 accuracy of 0.3966. By analyzing Figure 7(b), we can nd
Download Speed (Kbps)
Latency (ms)
the following. (1) Ping speed is the deciding factor in many
(a) The Inuence on Latency (b) The Inuence on Bandwidth branches, indicating that it is important for latency. (2)
Figure 5. The Inuence of Signal Strength on Latency and Bandwidth. In most cases, when the ping speed is large, the latency

14
Signal
Strength Signal

  Strength
Signal 


ISP
Strength Ping
 Ping

 
 Speed Speed
High Time ISP Time
     
 


  
   
  Number of Ping Ping
Fast
Ex Low ISP
Signal Connections Speed Speed
SNR SNR Strength
   



 
 


 
 

 

Ex Low Low Ex Low Low High Ex High Low High Ex Fast Ex Slow Slow Ex Slow Ex Fast Slow

(a) Bandwidth (b) Latency


Figure 7. Decision tree results from modeling the effects of six factors on WiFi QoS. The colored nodes indicate classications. Figure(a) shows factors
time, SNR, ISP, Signal Strength inuencing on bandwidth. Figure(b) shows factors Ping Speed, Signal Strength, Number of Connections affecting on latency.

is Slow or Ex Slow. (3) However, in the right-bottom WiFi networks that are in range but belong to others would
branch, when the ping speed <= 1271.5ms (the prerequisite provide Internet access to mobile users. Soroush et al. [12]
is that signal strength must be > 2.5 and the ping speed studied how mobile users utilize dense deployments of WiFi
> 168.5ms), the latency is classied into the Ex Fast class. APs for concurrent WiFi connections. In particular, the
(4) Finally, in the left-bottom branch, the factor number practical issues of access-point discovery and DHCP lease
of connections has an important role in determining the acquisition using a single wireless channel were investigated
latency class, i.e., when the number of connections is <= 29, through a prototype. The access point association decision
the latency belongs to the Ex Fast class; otherwise it is generally made by the client locally and selshly. This
belongs to Ex Slow class. These results indicate that smart behavior has been analyzed through game theory in different
mobile devices connecting to the same access point will network settings [13]. Biswas et al. [14] studied wireless
compete for bandwidth resources. From the above ndings, network behaviors using traces collected from a cloud-based
we know that the inuences of various factors are quite network management system.
complex; the same factor will contribute to different results
B. WiFi Network Performance and Improvement
under different conditions.
Based on these study results, we provide some sugges- Some efforts have been devoted to WiFi network perfor-
tions to improve AP deployments and optimize association mance. Gupta et al. [15] studied the factors responsible for
strategies in summary. When selecting a WiFi network to the poor performance of dense WiFi networks and found that
achieve high bandwidth, we recommend considering the trafc asymmetry is a major factor in performance degrada-
time and SNR factor more. The inuence of SNR on WiFi tion in such environments. The limited number of orthogonal
QoS is larger than the inuence of the signal strength. This channels in 802.11 wireless networks result in overlapped
knowledge provides opportunities to improve access point channels among multiple access points, a situation known
deployment to decrease interference among high-density as co-channel access points [16] [17]. These access points
WiFi networks in cities.When selecting a WiFi network to inevitably suffer from higher interference, higher collisions
achieve low latency, we recommend connecting to the WiFi and, consequently, sub-optimal throughput. Sundaresan et
network that has the highest real-time ping speed and the al. [18] observed that Internet access links can signicantly
smallest number of connections to achieve better QoS. affect the performance users achieve because different ISPs
use different policies and trafc-shaping strategies and there
V. R ELATED W ORKS is no best ISP for all users.
A. Usage and Measurement of WiFi Networks C. AP selection
802.11 based WiFi networks have emerged as an at- There are many approaches to AP selection. Currently,
tractive solution that can provide network connectivity in most devices only utilize a subset of the available in-
places where individuals spend considerable amounts of formation to make choices based on simple assumptions
time. Several studies have investigated WiFi network usage. concerning wireless performance. The traditional preference
Afanasyev et al. [10] investigated the role that city-wide is to connect to access points with stronger RSSIand this is
WiFi deployments play in the increasingly diverse access a common approach [19]but stronger signal strength does
network spectrum and observed that a diverse set of mobility not ensure better performance [20]. Some implementations
patterns map well to the archetypal use cases for tradi- utilize historical or actively measured client-side information
tional access technologies. Efstathiou et al. [11] proposed in addition to RSSI [21]. However, it is time-consuming and
a decentralized approach for WiFi sharing, in which private bothersome for each user device to test each nearby AP.

15
VI. C ONCLUSION [8] K. Sui, M. Zhou, D. Liu, M. Ma, D. Pei, Y. Zhao, Z. Li, and
T. Moscibroda, Characterizing and improving wi latency
In this paper, we focus on mining the factors that inuence in large-scale operational networks, in Proceedings of the
the QoS of urban high-density WiFi networks. We create a 14th Annual International Conference on Mobile Systems,
crowdsourced approach that is combined with a popular APP Applications, and Services, ser. MobiSys 16, 2016.
installed on smart mobile devices to collect WiFi datasets
easily in four representative cities in China. The large [9] C. Ly, C.-H. Hsu, and M. Hefeeda, Improving Online
Gaming Quality using Detour Paths, in ACM International
scales of these WiFi datasets are helpful in overcoming the Conference on Multimedia (Multimedia), 2010.
limitations of WiFi mining. To the best of our knowledge, we
are the rst to conduct a country-level measurement study [10] M. Afanasyev, T. Chen, G. M. Voelker, and A. C. Snoeren,
on WiFi network performance and investigate which factors Usage patterns in an urban wi network, Networking,
affect the urban high-density WiFi latency and bandwidth. IEEE/ACM Transactions on, vol. 18, no. 5, 2010.
First, to understand the WiFi network performance, we select [11] E. C. Efstathiou, P. A. Frangoudis, and G. C. Polyzos,
six core factors that determine the QoS metrics and use Controlled wi- sharing in cities: A decentralized approach
feature selection methods and statistical approaches to study relying on indirect reciprocity, Mobile Computing, IEEE
the correlations. Second, we choose decision tree method to Transactions on, vol. 9, no. 8, pp. 11471160, 2010.
model the rules based on these factors to instruct to improve
[12] H. Soroush, P. Gilbert, N. Banerjee, B. N. Levine, M. Corner,
AP deployments and optimize association strategies. Finally, and L. Cox, Concurrent wi- for mobile users: analysis and
for users who need to achieve low latency and high band- measurements, in Proceedings of the Seventh COnference on
width simultaneously, our mining results show that these emerging Networking EXperiments and Technologies. ACM.
two targets cannot be simultaneously guaranteed; however,
we will study this problem more deeply in the future. [13] W. Xu, C. Hua, and A. Huang, Channel assignment and user
association game in dense 802.11 wireless networks, in IEEE
ACKNOWLEDGMENT International Conference on Communications, 2011, pp. 15.
We would like to thank our industrial partnerthe Ten- [14] S. Biswas, J. Bicket, E. Wong, R. Musaloiu-E, A. Bhartia, and
cent WiFi teamfor providing the WiFi datasets. This work D. Aguayo, Large-scale measurements of wireless network
is supported in part by funding from the Tsinghua-Tencent behavior, in Proceedings of the 2015 ACM Conference on
Joint Laboratory for Internet Innovation Technology. Special Interest Group on Data Communication. ACM.

R EFERENCES [15] A. Gupta, J. Min, and I. Rhee, Wifox: Scaling wi per-


formance for large audience environments, in Proceedings
[1] P. S. Henry and H. Luo, Wi: whats next? IEEE Commu- of the 8th international conference on Emerging networking
nications Magazine, vol. 40, no. 12, pp. 6672, Dec 2002. experiments and technologies. ACM, 2012, pp. 217228.
[2] X. Chen, R. Jin, K. Suh, B. Wang, and W. Wei, Network per- [16] W. Gosling, A simple mathematical model of co-channel
formance of smart mobile handhelds in a university campus and adjacent channel interference in land mobile radio, IEEE
wi network, in Proceedings of the 2012 ACM Conference Transactions on Vehicular Technology, vol. 29, 1980.
on Internet Measurement Conference, ser. IMC 12.
[17] A. Baid, M. Schapira, I. Seskar, J. Rexford, and D. Ray-
[3] G. Divgi and E. Chlebus, Characterization of user activity chaudhuri, Network cooperation for client-AP association
and trafc in a commercial nationwide wi- hotspot network: optimization, in International Workshop on Resource Allo-
global and individual metrics, Wireless Networks, vol. 19, cation and Cooperation in Wireless Networks, 2012.
no. 7, pp. 17831805, 2013.
[18] S. Sundaresan, W. De Donato, N. Feamster, R. Teixeira,
[4] A. Ghosh, R. Jana, V. Ramaswami, J. Rowland, and N. K. S. Crawford, and A. Pescape, Broadband internet perfor-
Shankaranarayanan, Modeling and characterization of large- mance: a view from the gateway, in ACM SIGCOMM
scale wi- trafc in public hot-spots, in INFOCOM, 2011 computer communication review, vol. 41, no. 4. ACM, 2011.
Proceedings IEEE, April 2011, pp. 29212929.
[19] F. Xu, C. C. Tan, Q. Li, G. Yan, and J. Wu, Designing a prac-
[5] A. Patro, S. Govindan, and S. Banerjee, Observing home tical access point association protocol, in 2010 Proceedings
wireless experience through wi aps, in Proceedings of the IEEE INFOCOM, March 2010, pp. 19.
19th Annual International Conference on Mobile Computing
&#38; Networking, ser. MobiCom 13. ACM, 2013. [20] G. Judd and P. Steenkiste, Fixing 802.11 access point
selection, SIGCOMM Comput. Commun. Rev., vol. 32, no. 3,
[6] A. Farshad, M. K. Marina, and F. Garcia, Urban wi charac- pp. 3131, Jul. 2002.
terization via mobile crowdsensing, in 2014 IEEE Network
Operations and Management Symposium (NOMS), May 2014. [21] A. J. Nicholson, Y. Chawathe, M. Y. Chen, B. D. Noble,
and D. Wetherall, Improved access point selection, in
[7] S. Biswas, J. Bicket, E. Wong, R. Musaloiu-E, A. Bhartia, and Proceedings of the 4th International Conference on Mobile
D. Aguayo, Large-scale measurements of wireless network Systems, Applications and Services, ser. MobiSys 06. ACM.
behavior, SIGCOMM Comput. Commun. Rev., Aug. 2015.

16