Big Data Driven Mobile Cellular Networks Modelling, Experiments, And Applications
Big Data Driven Mobile Cellular Networks Modelling, Experiments, And Applications
Abstract. The proliferation and pervasive use of mobile devices results in the accumulation of
massive amounts of wireless data. Mobile big data can be profitable only if suitable analytics
and learning methods are utilized to extract meaningful knowledge and hidden patterns. In this
article, we propose a novel mobile big data architecture consisting of five layers: the data
storage layer, the data fusion layer, the data security layer, the data analysis layer and the data
application layer. The functionality of each layer is presented. We consider one illustrative
cases under this architecture, namely, user experience prediction by leveraging machine
learning techniques. In practice, mobile big data analytics can be used for network planning
and parameter dimensioning to facilitate network design, deployment and operation.
1. Introduction
The technological revolution has facilitated the proliferation and pervasive use of digital devices, such
as smartphones, sensors, and the Internet of Things (IoT). Thereby, a massive amount of
heterogeneous structured or unstructured data, called mobile big data (MBD), has been generated by
those digital devices and carried by mobile cellular networks[1].
Historically, the value of such a great amount of MBD was underestimated until the introduction of
big data analytics. Big data analytics can be used to extract meaningful knowledge and patterns from
raw data by exploiting machine learning methods. The hidden knowlege and patterns revealed from
mobile raw data can help to improve the performance mobile cellular networks and to maximize the
revenue of operators. Compared with conventional big data problems, such as users profiling[2],
sentiment analysis and opinion mining[3], etc., MBD analytics has the following distinctive features.
First, the volume of MBD is enormous. According to Cisco’s 2014 Visual Networking Index report,
mobile data will exceed data from wired devices by 2018, constituting 61% of data traffic. By 2020,
the amount of data traversing the Internet is expected to reach 1 billion gigabytes per month. Due to
the contentious growth, the time duration for which collected data are processed for decision making
can be relatively short. Therefore, MBD analytics should be rapidly executed to cope with the newly
collected data samples. Second, MBD has temporal and spatial characteristics. The sources of MBD
are mobile smartphones, sensors and IoT ends. Mobile devices and some types of nomadic IoT
devices are free to move independently among many locations, which gives rise to the tempo-spatial
features of MBD. Measurements on a CDMA2000 network revealed that wireless data traffic is bursty
and exhibits strong diurnal patterns[4]. Third, in addition to the large data volume and the tempo-
spatial characteristics of the data, MBD also differs from conventional big data problems because the
data acquisition units of MBD are spread around complete logical network entities. The sources of
traditional analytics, such as charging and billing systems and operation systems, are basically
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
CTCE 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 466 (2018) 012074 doi:10.1088/1757-899X/466/1/012074
centralized, whilst the sources of MBD are scattered across the infrastructure, such as cell sites, core
network equipment, operation maintenance centres (OMC) of various vendors, and customer
complaint departments. The higher dimensionality of the data implies better inference, which can
provide enlightening insights. However, the high dimensionality also gives rise to the problem of data
heterogeneity. The data are generated from different sources with disparate data formats, and the
diversified granularity makes data fusion more challenging.
In this paper, we propose a novel mobile big data architecture consisting of five layers: the data
storage layer, the data fusion layer, the data security layer, the data analysis layer and the data
application layer. The functionality of each layer is presented. Under this architecture, we consider one
illustrative cases, namely, user experience prediction, by leveraging machine learning techniques. In
practice, the MBD analytics can be used for network planning and parameter dimensioning to
facilitate cellular network design, deployment and operation.
2. Datasets Description
Our research in this work is based on real-world mobile datasets collected from Chongqing, one of the
largest cities in China. This city has a population of approximately 3 million, and the operator has a
user penetration of 2/3. This dataset is a record of 1.6 million anonymous mobile users' data traffic on
a Saturday in 2014. Raw mobile data can generally be categorized into five types: application-related
data, network-related data, link-related data, user-related data and operation-related data.
•Application data are the profiles related to user applications, such as application types (instant
message, video, web services, etc.), the rate of flows, the volume of flows, and the number of TCP
segment retransmissions associated with the flows. These data are usually collected by deep packet
inspection executed at application servers.
•Network data contain information such as the coordinations of the base stations, the system
bandwidth allocated, the key performance-related indicators (accessibility, retainability, quality, etc.)
and various signalling interactions between users and networks. These data are usually collected from
the base stations and core network equipment.
•Link data include channel quality information between the users and the base station, which is
obtained by channel measurement performed by the users or base stations.
•User data include the behaviours and the preferences of users, e.g., locations, mobility, routines and
experiences. Compared with the other three types of data, user data are not directly obtained from
wireless networks but via data analytics based on the three previous types of data. For instance, the
locations of users can be estimated via positioning algorithms based on the channel information
contained in link data. Additionally, user locations might be filtered from the hypertext transfer
protocol (HTTP) request records in application data when subscribers are using real location-based
social apps or services (Yelp, Google maps, Foursquare, etc.).
•Operation-related data include customer care/ticket information, provisioning data, handset agent
records, and turn-up/test records.
2
CTCE 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 466 (2018) 012074 doi:10.1088/1757-899X/466/1/012074
MongeDB and Hadoop distributed file system (HDFS), the database of our architecture can maintain
low cost while achieving rapid response.
3
CTCE 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 466 (2018) 012074 doi:10.1088/1757-899X/466/1/012074
the real-time network performance in terms of traffic flow significantly deviates from the empirical
value. Additionally, the divergence metrics can be applied to detect abrupt changes in the empirical
PDFs of relevant features.
4
CTCE 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 466 (2018) 012074 doi:10.1088/1757-899X/466/1/012074
we need to classify the mobile users in our dataset into two categories: satisfied users and dissatisfied
users. This application process consists of three major steps.
These abstracted features are verified by the correlation coefficient, which is an approach to
measure the statistical relationship between two variables. We introduce the η
2
correlation
coefficient to measure the overall relationship between a continuous dependent variable (DV) and a
categorical independent variable (IV). η ranges from -1 to 1, where 1 indicates the strongest
2
5
CTCE 2018 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 466 (2018) 012074 doi:10.1088/1757-899X/466/1/012074
5. Conclusion
In this article, we propose a novel mobile big data architecture that consists of five layers. The data
storage layer stores a large amount of data collected from different data sources. Then, the data fusion
layer handles data ETL, and the data security layer guarantees data integrity, availability, and
confidentiality. Consequently, the processed data are input into the data analysis layer, where a variety
of analytical techniques are applied. Finally, the data application layer initiates machine learning
methods to extract hidden knowledge and patterns. Under this architecture and with the leverage of
machine learning techniques, mobile big data analytics can be used for network planning and
parameter dimensioning to facilitate network design, deployment and operation.
6. Acknowledgments
This work was partly supported by National Natural Science Foundation of China under the Grant
No.61701034 and No.61531007 and partly supported by China Postdoctoral Science Foundation under
the Grant No 2017M620696.
7. References
[1] C. V. N. Index, “Global mobile data traffic forecast update, 2013-2018,” White Paper, February,
2014.
[2] S. N. Schiaffino and A. Amandi, “Intelligent user profiling,” Artificial Intelligence, pp. 193–216,
2009.
[3] B. Liu, “Sentiment analysis and opinion mining,” Synthesis Lectures on Human Language
Technologies, vol. 5, no. 1, pp. 1–10, 2012.
[4] C. Williamson, E. Halepovic, H. Sun, and Y. Wu, “Characterization of cdma2000 cellular data
network traffic,” in Local Computer Networks, 2005. 30th Anniversary. The IEEE Conference
on. IEEE, 2005, pp.Z000–719.