2019 Virtualized CPE Services Have Finally Arrived Via Service Delivery Platforms
2019 Virtualized CPE Services Have Finally Arrived Via Service Delivery Platforms
Ian Wheelock
Engineering Fellow, CPE Solutions
CommScope
4300 Cork Airport Business Park, Kinsale Road, Cork, Ireland
00353-86-235-2712
[email protected]
Charles Cheevers,
CTO CPE Solutions
CommScope
3871 Lakefield Dr, Suwanee, GA 30024
678-473-8507
[email protected]
List of Figures
Title Page Number
Figure 1 - Factors Driving New Software Services.................................................................................... 4
Figure 2 - Traditional Software Development Model ................................................................................. 5
Figure 3 - Agile Development Model ........................................................................................................ 6
Figure 4 - Key HW Elements of Gateway ................................................................................................. 8
Figure 5 - Proposed home gateway, applicable to multiple access types .................................................. 8
Figure 6 - Sample of Potential Services Considered ................................................................................. 9
Figure 7 - Proposed Router/Services Platform close-up ........................................................................... 9
Figure 8 - Example GW Memory Trends ................................................................................................ 10
Figure 9 - Flash and RAM Organisation ................................................................................................. 11
Figure 10 - Flash memory Organisation ................................................................................................. 11
Figure 11 - HW/OS/Application Layering ................................................................................................ 12
Figure 12 - Key RDK-B Software Layers and Components..................................................................... 13
Figure 13 - prplWrt Organisation ............................................................................................................ 13
Figure 14 - Future Router Stack Architecture ......................................................................................... 15
Figure 15 - prpl High and Low Level APIs .............................................................................................. 16
Figure 16 - OVSDB/OVS Based Architecture ......................................................................................... 17
Figure 17 - NFLua Packet interception and Agent Architecture............................................................... 18
Figure 18 - prpl Service Delivery Platform/Life Cycle Management......................................................... 19
Figure 19 - Native Apps vs Container Apps ............................................................................................ 20
Figure 20 - Docker Ecosystem ............................................................................................................... 22
Figure 21 - SDP/LCM and Orchestration Overview ................................................................................ 24
Figure 22 - SDP/LCM POC Configuration .............................................................................................. 25
Figure 23 - Native, Containes and Virtual Machines ............................................................................... 27
Figure 24 - vCPE with Cloud Services ................................................................................................... 28
Figure 25 - Potential Sevice Load over 24hr Period................................................................................ 30
Figure 26 - Multitude of Options for Virtualised CPE............................................................................... 31
This model has worked. However, when compared to mobile phones or laptops, adding new software
features typically does not require an OS upgrade. Why can’t gateways follow this model? or use
something a lot more agile that has fewer moving parts to enable faster feature and service delivery to
subscribers?
This paper will concentrate on exploring what architectures and platform options exist today to deal with
service delivery beyond the monolithic image system and examine the pros and cons of these including
how virtualization techniques both in the gateway and in the cloud can be used. Details relating to RAM,
flash, and CPU resources will also be covered. The paper will also address aspects of cloud-based
applications based on application traffic tunneling and compare these as potential alternatives to thicker
gateway hosted services. Its organized as follows and covers the following sections:
• New software delivery options that are beginning to appear in the industry
• What they mean for operators, OEMs and 3rd party software/service vendors
• How they might be applied to existing and future gateway platforms
• Impact on RAM/flash/CPU resources
• How to manage or orchestrate these services
• The tradeoff between thick gateway services vs virtualized cloud services
• How gateway traffic filtering and tunneling enables these services
This big bang approach is repeated over and over as new features are requested, or improvements/bug
fixes need to be incorporated. In some instances, depending on the type of change involved, more focused
testing can be performed, getting the final monolithic upgrade ready for deployment. This lengthy
development process include approach is typical within the industry, and has remained in place as a
compromise to managing the risk of launching a completely new release out to thousands or millions of
deployed gateways.
1.2. RDK
In most cases this development process is as streamlined as its going to get. Some new platforms like
RDK enable more control around the build environment, and focus on continuous development and
integration of new features, working towards constant integration and deployment of these into the field.
This model depends on a lot of automated testing, significant logging support, and the ability to move
In most cases where RDK has been deployed with a high release cadence, the operator involved has been
tightly coupled with the actual development process, sharing bug tracking and build systems with OEMs
they have partnered with, and requiring their own development teams to be able to guide the overall
release planning/development of features as well as deal with issues arising from the field (performing
triage, collection of logs, etc.). There are definite benefits from this high cadence approach in terms of
quick turnaround of new features and bug fixes, but the model does require the cable operator to get down
and dirty with the development process, as well as owning the release, system test and deployment
processes.
All of this costs money by moving the operator into more of an OEM/development role. In most operator
cases, developing software themselves is not how they make money from their business. The RDK
codebase/architecture can still be used in the traditional development model, where an operator works
with an OEM to release a set of new features and updates at a much lower cadence - maybe once every
6/9/12 months - while building on the stability of known RDK releases from the RDK community.
Both these approaches require either the OEM to do significant development or have the operator get tied
into the development process, possibly at an uncomfortable level.
This approach normally requires the OEM to develop target platform layer interfaces that the 3rd party
library requires. If source code can be provided, this gets built directly by the OEM themselves, with the
OEM still needing to develop the target platform layer, as well as any other management control/logging
functions to fit into the existing platform. (The main reason source code is not normally shared is in order
to protect any associated Intellectual Property Rights (IPR) from being exposed to OEMs/other parties)
The following system architecture is proposed for MSO’s to consider. This uses a mixture of container
based orchestrated software services on in-home devices, as well as offering a hybrid option for services
provided partially in the home and mostly in the cloud (through tunneling and the use of iptables or ovs
on the in-home platform). The architecture also shows some tighter integration of services within the
platform itself, for when some software needs higher performance access to networking or other lower
layer services. The architecture can apply to most any WAN access, with D3.0, D3.1 and PON all shown.
2. Variables
A lot of variables shape this architecture including: the device type; available RAM; available
storage/flash (and read/write ability of same); the type of containers to be used; the number of services
that may need to reside on the platform; whether these services are provided by the MSO directly,
contracted partners, or third-party developers.
Other factors include: the supported services for the devices themselves; the types of low-level or high-
level APIs available; the infrastructure for hosting services on the device; the types of services being
considered (tightly integrated networking applications, or apps that only need IP connectivity); as well as
what type of access to hardware or local software stack is needed and what the managed API interfaces to
use are, etc.
Once the overall view of a containerized approach for local applications within devices is agreed, other
decisions must be made regarding the types of container infrastructure to use, the type of orchestration
768
512
256
0
2000 2005 2013 2015 2018 2019
RAM FLASH
Depending on the routing software stack itself, most of this storage may already be consumed. Some
firmware stacks using OpenWrt can actually be made to operate within 4MB flash and 32MB RAM, but
in reality, need more like 128MB flash and 256MB RAM to fully support operator features.
Another decision made with broadband devices is to use a “dual image” option for firmware image
storage, where two complete compressed images are stored in the flash storage, effectively limiting the
maximum image size to under 50% of the available flash memory. This is done to have a backup image,
in case an image has been corrupted in flash (due to various possible reasons).
Thankfully RAM and flash pricing has corrected over the last 2 years, meaning prices have come down
(different reasons for RAM and flash). However, operator purchasing decisions regarding RAM and flash
have had consequences on what feature upgrades may apply to existing deployed device, and in some
cases, there just is no space left to factor in any local extensions or alternatives, and alternatives, such as
hybrid or virtual cloud services must be considered.
Another aspect of broadband devices is that due to having a single monolithic firmware image containing
all the software for the system to operate with, any minor changes requiring a complete replacement of
this monolithic image. Even though this appears quite inefficient, there are a lot of operational benefits in
knowing that a population of devices are running version #N or version #N-1 of firmware.
Most broadband devices limit flash storage to be READONLY, with only the bootloader or firmware
upgrade process being able to write anything to flash. This is a major issue when considering the
download and storage of software components separate to the main firmware image. In some platforms,
read/write of flash is already supported, but other platforms may need bootloader/code refactoring to
accommodate this mode of operation.
New software services packaged in containers (and similar) tend to be overlaid on top of the existing
firmware image in some instances taking advantage of features/libraries within the platform image. Other
software however may need to be integrated directly with the existing platform image, possibly replacing
or adding functionality.
This idea of live patching of the platform itself brings considerable complexity and risk from the point of
view of both modifying the actual system properly and ensuring that a patch does not cause any issue to
the running system. Also, the management of a mixed population of devices that may have different
levels of “patching” applied may present significant operational overhead.
3.2. Compression
Given the nature of compression, the firmware image is likely to be much larger in RAM when
decompressed. The compressed firmware image is typically CRC/MD5/signature checked before any
attempts to decompress/execute code to make sure the image has not suffered any corruption while
resident in flash (or due to misprogramming) and that it is a proper cryptographically signed image. The
In nearly all these cases there is a need to support OverlayFS (a key Linux feature, in mainline since 3.18)
to ensure any configuration elements or read-write locations are handled separately to the container read-
only space in flash. This approach can reduce the overall amount of expensive RAM required (for storage
purposes) on a platform at the cost of adding additional flash, and allows for flexibility in adding extra
flash using either onboard eMMC or via plug-in USB/xSD devices.
4.1. RDK
RDK-B/-M itself has an internal CCSP bus (based on DBus) that acts as the backbone of the system,
connecting the core RDK-B subsystems together with protocol adapters and software components. New
software can be added into this system and have complete first class access to the inner workings of the
platform. Support for low level interfaces, such as Wi-Fi HAL, or Cable Modem HAL, or
Ethernet/Switch HAL (utopia) is also included, as is support for managing configuration
settings/NVRAM. External protocol adapters for TR-069, SNMP and the Comcast developed WebPA
interface allow management access to the system.
4.2. OpenWrt
OpenWrt also has an internal bus, uBus, that acts as its backbone for enabling communication and control
between all the internal elements that are used for routing and management. It uses “uci” for its
configuration management, and offers a lot of equivalent services that RDK-B/-M offers that are typically
expected in a gateway stack. prplWrt packages together some new carrier class components into openWrt.
4.3. Proprietary
Proprietary or OEM stacks, such as ARRIS Touchstone or ARRIS 9.x, all offer the same type of
functionality as RDK-B/M and OpenWrt. Each stack breaks down the control functions required for each
of the underlying subsystems to implement the various software and protocol requirements for a gateway.
Other subsystems in the SOC, such as the low-level packet acceleration and switching functions are
harder to get standard drivers for, as each hardware supplier does things differently matching their HW
architecture. However, with advances such as Open vSwitch (where switching is performed in software),
Switch Abstraction Interface (SAI) and “switchdev” it is possible to tie in these low-level hardware
features in an abstract and performant way into the chosen router stack (as long as the SOC provider
supports these features!).
The various stacks mentioned all have internal buses for connecting their various HALs and
adapters/components together. One straightforward way of exposing software interfaces to 3rd party
software is to simply provide access to the internal bus. In a number of instances (say high performance
network interfacing software), this is exactly how 3rd party software is integrated, using the internal bus as
well as tight integration with low level driver interfaces. Such integrations can be challenging (requiring
legal agreements for source code sharing, engineering access/etc.) and because two or more codebases
become so tightly coupled, the only option of releasing bug fixes or enhancements is to release a
completely new firmware load (going against the need for speedy releases). Such tight integration may
also require more software development resources to achieve the final deliverables.
Using this model for delivering the majority of new software and services cannot scale. Such a model
would also threaten the security and robustness of the stack itself, something to be avoided. What is
needed are a set of defined interfaces that can be supplied to 3rd party/Independent Software Vendors
(ISV) to allow them work somewhat independently of the detailed underpinnings of the firmware stack,
and they will still likely need the platform tool chain to enable them build software. The following
sections outline the different APIs that are available with PRPL, OVS/OVSDB, NFLua and an internal
CommScope API. These interfaces are not only critical for so-called “native” software integration where
software is built into the monolithic firmware image but also critical for container based options.
OpenSync is also described and offers an OVSDB interface that allows a hybrid model of native code
developed for the gateway that also interfaces with a remote/cloud system that may be running additional
cloud applications.
The prpl High Level API has been considered from the ground up as a platform abstraction layer to
enable the delivery of new services to be easily integrated to GWdevices. The HL-API consists of a
definition of 30+ primary features typically used in a GW as well as a model on how this can be
integrated into multiple industry stacks, including OpenWrt and RDK-B. CommScope is currently
reviewing the use of the HL-API on RDK-B, and what it will take to work over D-Bus*. The HL-API is
not limited to higher layer services being added to the device, it also supports the idea of new underlying
system components being added to a platform that can increase system functionality (and having this
available to other software layers). The HL-API and prplAdapter also support features critical to enable
3rd party software to be added to platforms, particularly in the areas of access control and “user
management”. These areas are fundamental to enabling and restricting what elements of the gateway
platform can be interacted with or controlled by software services.
A key part of prplWrt and the higher-level APIs is to provide a so-called “prplAdapter” component that
provides access as well as access-control to the inner operation of the routing/platform stack. This
interface approach is meant to help the development of services required by operators, as well as exposing
certain APIs to 3rd party application developers. Even though prpl has focused on OpenWrt, the major
effort on the higher- and lower-level APIs is considered stack agnostic, and the expectation is that these
interfaces will be available on RDK-B and other router stacks. CommScope is currently involved in an
exercise to identify the work effort for mapping prpl High-level API to RDK-B, while prpl is also pushing
the use of certain APIs into the RDK-B community for Wi-Fi management.
One example of software using both the prpl HL-API and low-level API is the prplMesh implementation,
where a software platform exposes control of the EasyMesh controller using the HL-API while also using
the prpl LL-API (namely cfg80211) for the EasyMesh agent, interacting with the low-level control and
management functions of the Wi-Fi chipset. The portable prplMesh implementation for EasyMesh will
run on any platform that supports the LL and HL prpl APIs, as well as exposing the necessary interfaces
to allow 3rd party Wi-Fi optimization systems interact with the prplMesh EasyMesh controller function.
The set of APIs provide a significant abstraction layer to support development of both applications on the
system, as well as exposing stack information (including status and monitoring information) to remote
management platforms.
The OpenSync solution is currently limited to Wi-Fi (on supported platforms) and some visibility into L2
switching. Some additional features such as basic device identification, basic speed test, and QoS control
exist along with tunneling support of home network traffic between proprietary Plume Wi-Fi PODs back
to the home gateway. The use of OVS for complete switch management is being considered on multiple
platforms. Retrofitting it on older SoC platforms may have some challenges due to existing SoC supplied
slow-path/fast-path handling and having to deal with very specific WAN access handling. However,
where it has been ported, there is an option of dealing with everything relating to packet handling directly
in software in the Linux kernel.
The use of Open vSwitch/OVSDB in the OpenSync has the potential to bring an SDN control plane to the
operator subscriber network, and could in theory be coupled with hybrid cloud applications where traffic
is selected in the home, and delivered using GRE tunnels to cloud applications that provide various
software and networking functions, similar to how a WAG works today, but dealing with much more than
just Wi-Fi hotspot related traffic.
The solution is a potential alternative to OVS. The NFLua kernel component integrates with the existing
Linux netfilter and iptables for packet inspection. The model enables userspace agents to interact directly
with NFLua packets that are intercepted, with the ability to operate on these packets locally (using various
protocol plugins for different protocols) or to act as an agent to a cloud entity that can process these
packets remotely, possibly using more complex or capable functionality not possible in the gateway
footprint.
Typically, Docker is used on extremely capable hardware platforms with plenty of RAM and Flash as
well as large CPU resources, something quite different to embedded platforms like broadband gateways.
As such, companies have been investigating more lightweight options to achieve equivalent function for
gateways. Broadband Forum developed the TR-157 approach many years ago, including a key element
known as Software Module Management (SMM). The SMM system provides the basis for a new Life
This system deals with the full lifecycle management of software components, from arranging the
download, to the provisioning and running/monitoring, and eventual removal of the software within a
gateway platform. LXC is used for application containers. The solution operates with existing ACS’s
relying on TR-069 (or in the future USP), requiring some additional capability in the ACS to help with
orchestration of what SW components are position on what gateways, etc. The system provides a
complete solution and is planned to be open sourced into the prpl Foundation, providing an option that
can be ported to any router platform for managing software components in a consistent way. Other
companies have also been working on similar approaches and the hope within the community is that we
can bring multiple parties together to create a common solution.
The choice of container option will likely have an impact on the build system for the devices being used.
In terms of LXC containers and cgroups/namespace approach, it is possible to get much smaller container
images as a result of reusing the available dynamic libraries within the primary firmware image root
filesystem. A challenge with this approach however is the tight coupling required as a result of having to
build the LXC container applications as part of the overall firmware image process. When working with
internal SW teams, this is not a major issue, but there may be the usual “sharing problems” if 3rd party
software companies need access to this build system.
Challenges such as the overall version of firmware image and versions of libraries contained in the root
filesystem may change due to upgrades, fixes, new features, etc. and any LXC container application may
be incompatible with the changes results in the need to have very careful feature and change planning in
order to avoid a permanent state of development.
In addition to resource management and resource limits for new software and services, a key requirement
to consider is how to interface with the main routing platform. In some cases, the integration requirements
for new software can be limited to an IP and TCP/UDP port mapping, whereas other integrations need to
directly interact with the local platform. Clearly defined interfaces (like all those described earlier) are a
must to ensure coordinated access to the platform is maintained. Such interfaces enable 3rd party software
providers understand how to interact with the platform, while the same interfaces provide a defined bridge
that the platform software can marshal in terms of access control rights (what application can interact with
what subsystem), and abstraction (allowing underlying systems to be modified while maintaining
consistent northbound interface).
In most cases the target platform for containerisation will require at least Linux 3.18, and preferably the
latest kernel available
Multiple industry efforts are underway to add LXC containers to embedded GW platforms, with options
being discussed with most of the Tier-1 operators.
A major benefit of Docker is how it creates containers, where every file/library/application required is
packaged into a single container image enabling it to be distributed in a highly portable fashion. As a
result a Docker container can be deployed on a gateway, local laptop, physical, or virtual machine in a
datacenter or in a cloud provider environment. The portability of the Docker container means that many
more software providers can develop their applications to run on Docker, enabling a very rich and vibrant
market space. A Docker container is a runnable instance of an application image. Like LXC it can be
started and stopped using a Docker API or a CLI. Docker relies on Linux services (either natively in a
Linux kernel, or through “Linuxkit”) and uses namespaces in the same way as LXC does to provide the
required workspace isolation for the container to operate within. Namespaces offer process, networking,
inter-process communication, mount/filesystem, and some kernel isolation.
In addition to these fundamental features, Docker introduces a whole host of extra functionality to be able
to manage and interact with containers, enabling eco-systems to be built to fully manage and orchestrate
the operation of large numbers of Docker images/containers over vast “fleets” of compute resources.
Balena concentrates on using RAM and storage more conservatively and focuses on atomicity and
durability of container pulling. These facets are ideal in the context of embedded systems, compared to
the more traditional cloud systems that Docker is targeted at).
The CommScope gateway platform relies on the open source Docker Engine to provide the framework
for hosting containers. A Docker Client is added to the gateway to manage and control access to the
Docker environment. Remote Docker clients are also supported to assist with the installation of containers
as well as querying status/etc.
SMM depends on Execution Environments (EE), Deployment Units (DU) and Execution Units (EU). The
Docker Engine is equivalent to the EE, providing a platform for hosting applications that are effectively
sandboxed to the rest of the gateway/host system. Docker Images are equivalent to the DU, providing a
way of managing the specific files/etc. associated with the application being downloaded. The EU is the
active running Docker Container executing within the Docker Engine/EE environment.
Docker containers are either pre-downloaded or downloaded from the Docker Registry. Interactions with
the Docker Registry, including authenticating access, are all logged to ensure diagnostic information can
be reviewed in the event of issues.
The current model is to use the CommScope Container API for Docker Container applications control
objects on the gateway platform itself. Extensions such as providing access control to local Linux services
and Dbus access are also provided. From an operational perspective, as some platforms are flash limited,
the Docker Engine itself is run time installed into RAM, as are the other Docker Container images.
The running of the Docker Engine on the platform requires allocation of resources from the gateway for
any Docker Applications being deployed. The current support in the gateway based Docker support is for
installing, enabling, uninstalling, and disabling using either an External Docker Client or using the
Docker Configuration file. The main features of the SMM are provided to report on status/etc. of the
Docker Engine and running applications.
Some of the Docker applications include McAfee security gateway as well as SamKnows. Other
applications are considered as well as internally developed features.
This POC chose to use the primary features of the TR-157 Software Module Management (SMM)
specification for this, providing a generic interface for this interface, allowing it to be mapped into TR-
069 for ACS management or made available for other agents to use with other orchestrators. The LCM
component provided external access to execute the available Life Cycle Management API methods, while
also being responsible for fetching packages/containers, retrieval of information about packages from the
local filesystem, as well as delivering the required applications to the Execution Environment to run.
The POC demonstrated the use of multiple Execution Environments (EE) allowing for mixed service
operation. A Base EE was used to allow upgrading of specific components into the main root filesystem,
that did not require any separation, such as new native images. A key feature of the Base EE was to allow
direct patching of the main OpenWrt system, enabling the installation of a new native package directly
into the running system. The use of the Base EE also allowed for a bit more package information to be
included to be able to authenticate the packages, etc. A so-called Native EE was added to enable root
filesystem separation, meaning that a new Native package would not overwrite anything in the base root
filesystem, enabling isolation from the running system. The final EE was the Container EE, where new
3rd party applications needed isolation from the main system, and would be have limits placed upon all
resource usage, as well as preserve system stability.
In all cases the LCM was responsible for managing the different EE, where it would perform actions on
the EE, and deal with the returned status. Operations such as install, uninstall, start and stop were all
supported.
The ACS platform was used to issue commands to the CPE platform, where they were handed over to the
LCM to perform all actions related to running services within the platform. The System EE listed above
allowed for the ACS to request package updates (OpenWrt) to be applied to the running system, while the
Supervisor EE was used to actually run the isolated applications. The Supervisor EE is responsible for
handling the environment that applications run with
The Supervisor EE supported features such as package verification and install/remove, service
startup/shutdown, as well as isolation (including limiting namespace, RAM and CPU). The OpenWrt
Summit 2018 demonstration showed some basic containers running, as well as a more complex setup that
involved Samsung SmartThings integrated in a container, downloaded into the system and using a local
Zigbee USB dongle to interact with an external Zigbee lightbulb. Other aspects such as CPU resource
limitation were also demonstrated. All of the interactions in the demonstration were controlled using the
SMM functionality on the connected ACS.
The SDP/LCM system as currently defined delivers a complete solution for managing containers and
even native applications on embedded gateways. It offers orchestration through the connected ACS
(although ACS platforms probably require custom extensions to really hope to act as orchestration
systems), and works on OpenWrt. Work is ongoing to get this functionality working on RDK and
hopefully the overall SDP/LCM solution software will be opensourced at some stage.
The work began on a platform with 512MB RAM and 4GB flash. The POC team already had extensive
experience with Nomad Server and were using this to understand the client side and how this would scale.
The Nomad Agent (that runs on a client device, such as a gateway) includes support for so-called “Task
Drivers”, allowing it to manage multiple types of execution environments, including, Docker, Isolated
Fork/Exec, Raw Fork/Exec, LXC, Java, QEMU, Rkt, Custom. For the purpose of the POC, Docker was
the chosen environment.
The POC demonstrated that the platform was well capable of delivering the required services. The size of
available flash used was considerably more than on most embedded platforms today, where for example a
gateway might only be designed with 128MB flash, which is only 3.2% of that available in the POC
platform. The POC was capable of demonstrating the use of Nomad for orchestrating the local services, as
well as showing that an IOT container application was able to function and access a local Zigbee interface
on the gateway.
This model has not really succeeded. The costs of offering the vCPE platform in the cloud while also
offering gateway HW in the home never quite added up to something that was more economic than a
dedicated gateway in the home. The truth is that vCPE hardware platforms, especially with Wi-Fi, are not
too different in costs compared to equivalent home gateway platforms, in most cases the actual SOC is the
same and offers the same MIPS processing power. The main difference would be RAM/flash costs, with a
vCPE platform requiring less of both (but ironically, could be forced to buy more than required just to hit
RAM/Flash price sweet spot).
So, is vCPE dead? The answer is no, as some very good pieces of vCPE can be used. The idea of isolating
certain traffic flows and certain Virtual Network Function (VNF) software to the cloud is an idea that has
persisted and been demonstrated to work well. In this case a traffic tunnel connects the home gateway to
the remote cloud VNF, where all the hard work is performed. One of the main examples of this is “Wi-Fi
Public HotSpot” services. The traffic to be tunneled is simply that traffic that operates on one of the Wi-Fi
SSID that the gateway offers. Every data packet is received from the SSID and tunneled using a
‘softGRE’ tunnel to the cloud VNF. The cloud VNF terminates the tunnel, extracts the traffic and
operates a Wireless Access Gateway (WAG) function, that deals with AAA and all the required traffic
management (DHCP, etc.) and encapsulation/decapsulation, before dispatching the traffic off to the
internet. This model is one of the first real examples of vCPE and has been widely adopted.
However, it’s a very basic option, using a course traffic filter (the entire SSID) to isolate traffic. The main
function the gateway must provide is the ability to isolate such traffic and pack it into a SoftGRE tunnel
connected to the cloud VNF, so it’s definitely minimized SW complexity in the gateway.
More advanced versions of vCPE have started to be developed, using more fine-grained data plane
filtering options. In a lot of cases, traffic that is filtered must be transported out to a remote cloud VNF
where the actual software processing occurs, typically through a SoftGRE or equivalent tunnel.
Alternative options also exist where this traffic could be handed over to a local container or software
component, mixing up the different models (where is makes sense).
Data Plane
Traditionally, Linux network tools, such as iptables have been used to manipulate traffic flows, providing
low level filtering and redirection/etc. These tools are used by some of the key networking functions
within the routing platform, but typically are not open to higher layer software components, as they have
the potential (if used incorrectly) of wrecking the network packet forwarding of a system. No real
programmatic API has been developed to expose this interface to 3rd parties. However, Software Defined
Networking (SDN) does offer some new ways around this.
The basic tools of SDN, such as openFlow and Open vSwitch, have offered the ability to isolate incoming
traffic flows on a platform and modify or redirect such flows for additional software processing, including
forcing a flow to be sent out an interface that happens to be a tunnel or another local interface, possibly
connected to a container. New software approaches for gateways are starting to reuse this type of
processing.
The benefit of this model is that once the software agent is enabled on the gateway platform, then any
interesting traffic flows can be dispatched via a tunnel interface to a remote cloud VNF, without requiring
Integrating the opensource OVSDB and Open vSwitch (OVS) into a gateway has enabled OpenSync to
exert very fine-grained control over traffic passing through a platform, with the possibility of redirecting
such traffic to a tunnel interface for carriage to a cloud VNF. The benefit of open source OVS is that it is
possible for 3rd party software to also use the same infrastructure if required.
A similar packet interception model that embedds a NFLua component linking to the Linux Kernel
network packet handling has also been developed. This has been used to provide sophisticated AI driven
cybersecurity and network intelligence features for network operators. The ability to deploy an agent and
then dynamically reconfigure its basic rules provides a very powerful model that allows for independent
upgrades/etc. without having to involve an operator at all. Such an agent module could also be repurposed
to provide a packet filtering option, like OpenFlow, to redirect traffic to a remote cloud VNF.
In these traffic interception/filtering/redirection cases, the traffic is either hair-pinned out to the cloud
VNF and sent back to the gateway, is completely consumed by the remote service, or dispatched to a local
agent present in the gateway that also performs processing or other software handling. Using these tools
enables easier manipulation of the data plane than ever before and offers more organized control about
how to isolated traffic and direct to software components (local or remote). More effort is being put in by
SoC providers to ensure that hardware acceleration can also be applied to this traffic manipulation,
ensuring that software can access the high speeds expected from gateways.
Control Plane
As mentioned a lot of container systems have their own proprietary backends for controlling how
containers are deployed and operated on compute platforms (e.g. gateways in the case). These tools are
more concerned with treating the containers as black boxes and satisfying the “label” of resource
requirements that come with the container.
In the case of the ARRIS Docker Container POC, additional supports were provided to allow the
manipulation of the Docker system from a remote ACS by using TR-069 extensions mapped into the TR-
157 SMM system, allowing some more native (from an operator perspective) management to be
employed. Kubernetes was not used to provide orchestration in this instance.
The SDP/LCM system that Vodafone has created also uses a similar model to ARRIS, relying on TR-157
EE, EU and DUs to enable a very flexible control system for managing sophisticated software delivery
options and life cycle management. This system also relies on the use of TR-069 to assist with
orchestration/etc.
Nomad is another orchestration system capable of flexibly managing many different images (via nomad
agent). It is capable of dealing with Balena, Docker and LXC container images, as well as many other
image options. Like Kubernetes, Nomad can scale very well in a data center setting, coping with very
high container deployment scenarios. However, Nomad and Kubernetes may not be able to scale to the
required number of containers when deployed in an operator environment with thousands or millions of
devices with multiple containers per device.
Existing ACS platforms may be able to cope with the scale of unique devices, but need additional
“orchestration” extensions to be added to them. ACS platforms already deal with firmware image
What currently is not understood is the level of concurrency of applications running within gateways.
What this means is whether the limited resources in a gateway are going to be under pressure if multiple
applications are deployed, and if some clever orchestration technique will be required to constantly add
and remove applications on demand or on a timed basis.
What is also not understood today is if operators will only allow their own curated container based
software and services to run on these gateway platforms, or will decide to open up and potentially
monetize the platform, allowing 3rd party applications to run, similar to the Android Play Store or IOS
App Store. Given the high level interfaces and various access controls available with these, it does appear
as a possibility, and may allow for hybrid mobile applications and other software services (such as IOT
systems) to be developed that rely on an “always on presence” in the home rather than having to pay for
high latency cloud based servers.
In terms of concurrency and high application counts, one of the easiest ways of addressing this is to
basically ensure sufficient storage and memory is available in the platform. Such an approach means
applications are rarely removed and replaced with other applications, thus avoiding a never ending game
of Tetris that the orchestration system must play – constantly trying to fit apps into available space. This
does at a slight cost of extra storage (the RAM can be freed up if an app is no longer active) but removes
the need for a complex orchestrator.
If the orchestration complexity outlined can be removed (through extra storage/etc.), then it’s quite
feasible to believe that the existing NMS/ACS systems that already manage vast numbers of broadband
devices should be capable of supporting the required orchestration function. Existing TR-069 systems
maybe usable, but the upgraded Universal Services Platform (USP/TR-369) protocol from the Broadband
The days of all software being delivered as a monolithic firmware image are numbered. The availability
of all the required elements to create new portable software is very encouraging. The new dataplane and
control plane options enable application developers (including ISVs, open source developers and the
MSO community) the option of creating new applications not considered before. Along with the system
high level and low-level APIs, developers are able to bundle all their required libraries and executables
within a container based system (be it LXC, Docker, Balena or others), and have these orchestrated on to
gateway platforms. The addition of LCM/SDP as well as reuse of Docker/Kubernetes, Nomad, or TR-
069/USP based orchestration systems will enable cable operators more control over what to deploy and
when/how to deploy.
Right now there are a multitude of Docker based container applications, while only a few 3rd party
container applications have been totally focused on embedded gateways. Expect this to change very soon
as hardware profiles change and the various software layers and interfaces are developed and adopted in
the multiple routing platforms that exist in the embedded broadband gateway world.
Native application and container applications need to take advantage of new APIs, HALs and Service
Delivery Platforms that are emerging to ensure fast adoption on to gateway platforms.
SDP/LCM and Docker are good options to consider for container deployments, with other platforms like
Nomad also to be considered in this space. However, orchestration systems that can manage the scale of
broadband gateway deployments and mixed deployed services have not been realized yet, resulting in the
potential use of existing or future ACS (TR-069/USP based) to handle this workload.
Getting these new software services into gateways is essential for MSOs to entice and retain subscribers
OpenSync; https://www.opensync.io/documentation