Enabling Edge Devices that Learn from Each Other Cross Modal

The document presents RecycleML, a method for transferring knowledge between edge devices using unlabeled data to enhance activity recognition across different sensing modalities. It demonstrates that RecycleML can reduce the required labeled data by over 90% and accelerate training by up to 50 times compared to traditional methods. The approach is validated through a new dataset, CMActivity, which includes synchronized data from vision, audio, and inertial sensors.

Uploaded by

pearsonicin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views6 pages

Enabling Edge Devices that Learn from Each Other Cross Modal

Uploaded by

pearsonicin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Enabling Edge Devices that Learn from Each Other: Cross Modal

Training for Activity Recognition

Tianwei Xing∗ Sandeep Singh Sandha∗ Bharathan Balaji
University of California, Los Angeles University of California, Los Angeles University of California, Los Angeles
[email protected] [email protected] [email protected]

Supriyo Chakraborty Mani Srivastava

IBM T. J. Watson Research Center University of California, Los Angeles
[email protected] [email protected]

ABSTRACT 1 INTRODUCTION
Edge devices rely extensively on machine learning for intelligent Edge devices are typically equipped with a wide variety of sensing
inferences and pattern matching. However, edge devices use a multi- modalities for tracking environmental markers. To provide insights
tude of sensing modalities and are exposed to wide ranging contexts. and enable context-aware applications (e.g. user activity recogni-
It is difficult to develop separate machine learning models for each tion [25], workout tracking [22], speech recognition [8]) the data
scenario as manual labeling is not scalable. To reduce the amount collected on these devices are used to train deep neural network
of labeled data and to speed up the training process, we propose models. However, to fully realize the learning-at-the-edge para-
to transfer knowledge between edge devices by using unlabeled digm, several challenges still needs to be addressed. In particular,
data. Our approach, called RecycleML, uses cross modal transfer the model training process needs to handle insufficient labeled data,
to accelerate the learning of edge devices across different sens- and the heterogeneity in inter-device sensing modalities.
ing modalities. Using human activity recognition as a case study, As a step towards addressing the above concerns, we propose
over our collected CMActivity dataset, we observe that RecycleML RecycleML– a mechanism to transfer knowledge between edge
reduces the amount of required labeled data by at least 90% and devices. Our approach is guided by the observation that application-
speeds up the training process by up to 50 times in comparison to specific semantic concepts can be better associated with features in
training the edge device from scratch. the higher layers (close to the output side) of a network model [5].
This observation allows us to conceptualize the layers of the dif-
CCS CONCEPTS ferent networks as an hourglass model, as shown in Figure 1. The
• Computing methodologies → Transfer learning; Neural lower half of the hourglass correspond to the lower layers (close to
networks; Learning latent representations; • Hardware → the input side) of the individual models (trained on specific sensing
Sensor applications and deployments; modalities). The narrow waist is the common layer (latent space)
into which the lower layers project their data for knowledge trans-
fer. The upper half of the hourglass comprises of the task-specific
KEYWORDS
higher layer features which are trained in a targeted fashion for
edge devices, transfer learning, cross modality, shared latent repre- task-specific transfer.
sentation, activity recognition To evaluate RecycleML, we emulate edge devices with three
ACM Reference format: sensing modalities - vision, audio and inertial (IMU) sensing as
Tianwei Xing, Sandeep Singh Sandha, Bharathan Balaji, Supriyo Chakraborty, shown in Figure 2. We perform zero-shot learning [23], i.e. use zero
and Mani Srivastava. 2018. Enabling Edge Devices that Learn from Each training labels, across different sensing modalities when they are
Other: Cross Modal Training for Activity Recognition. In Proceedings of performing the same classification task. We achieve this by training
EdgeSys ’18: International Workshop on Edge Systems, Analytics and Network- the target edge device model to have the same latent space as the
ing, Munich, Germany, June 10–15, 2018 (EdgeSys ’18), 6 pages. source model. RecycleML can also learn to expand the classification
https://doi.org/10.1145/3213344.3213351 tasks of the transferred model with very few training examples.
Our results across a mix of sensory substitutions and task trans-
∗ Both authors contributed equally to this work. fers show that, over our collected CMActivity dataset, RecycleML
reduces the amount of labeled data required to train edge devices
by at least 90% and speeds up the training process by up to 50 times
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed after doing knowledge transfer using unlabeled data.
for profit or commercial advantage and that copies bear this notice and the full citation Our contributions are as follows:
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a (1) We combine the idea of transfer learning (lower layers trans-
fee. Request permissions from [email protected].
EdgeSys ’18, June 10–15, 2018, Munich, Germany fer) with sensory substitution (higher layers transfer) to-
© 2018 Association for Computing Machinery. gether and propose a unified framework, where the knowl-
ACM ISBN 978-1-4503-5837-8/18/06. . . $15.00 edge in every part of a network could be transferred.
https://doi.org/10.1145/3213344.3213351

37
EdgeSys ’18, June 10–15, 2018, Munich, Germany Xing, Sandha et al.

Figure 2: Knowledge transfer across edge devices with

different sensing modalities.

2.2.1 Knowledge Transfer. For simplicity, let us consider two

Figure 1: Shared representation between edge devices. edge devices D X and DY , each with different sensing modality
capturing data X and Y respectively. Suppose D X has a pre-trained
model M X and performs task TX . Our goal is to train a new model
(2) We introduce a new dataset CMActivity that have synchro- MY for DY to perform task TY . To transfer knowledge from D X to
nized data of three modalities: vision, audio, and inertial. DY , we collect data X and Y from both devices while observing the
(3) For activity recognition task, we verify that the shared repre- same event. X and Y need not be labeled. An important requirement
sentation exists for time series sensory data, and it can help is the time synchronization in devices D X and DY so as to capture
transfer knowledge from ambience edge devices to wearable the same event in their data X and Y . Synchronization is natural in
edge devices and vice versa. The code for our experiment is different sensing modalities. For example, vision, audio and inertial
available on-line.1 . sensors observing the same event of human motion can capture it
in different signals (see Section 3.1 for details).
2 METHOD OVERVIEW We input data X to the pre-trained model M X , and instead of
2.1 Conceptual Scenario getting the final output value, we calculate the activation values
f (X ) of an intermediate layer that acts as our shared latent feature
Suppose Alice has an edge device DV 1 with camera in her living
representation. f is the transformation of all the early layers before
room, and it is trained to do activity recognition. Alice wants to
the specific activation. We use f (X ) as the training value for the
replicate the inferencing ability of DV 1 on other devices: a smart
model MY of device DY . Specifically, we choose a new network
watch DW which she wears regularly, an acoustic device D A1 in
д, specialized for input modality Y , and train the network д(Y ) so
her living room to turn off DV 1 whenever needed due to privacy
that it maps Y to the same shared latent feature representation
reasons, and a camera DV 2 and a voice assistant D A2 in her office,
by minimizing |д(Y ) − f (X )| 2 as our loss function. We generate
Our objective is to transfer activity recognition knowledge of DV 1
the model MY for device DY by adding the task specific output
to D A1 and DW (Video→Audio and IMU), and later, transfer activity
layers to д. In this way, model M X teaches the new model MY in a
recognition knowledge of DW to D A2 and DV 2 (IMU→Audio and
teacher-student data distillation manner [11].
Video).
2.2.2 Task Transfer. Transferring knowledge from device D X
2.2 RecycleML Description to DY does not need any ground-truth labels. However, the new
RecycleML uses the same latent feature representation across edge model MY for device DY may need additional information before
devices of different modalities to do knowledge transfer. Knowl- performing any classification or regression task. Therefore, three
edge transfer uses synchronous unlabeled data to map the input of different scenarios arise when devices D X and DY performing tasks
untrained model to the shared latent feature representation of the TX and TY in classification settings respectively: (i) Devices D X
pre-trained model (details in Section 2.2.1). Later edge devices can and DY are performing same tasks TX , (ii) Devices D X and DY
either reuse the upper layer across models or do task transfer on are performing related tasks TX and TY , e.g. where TX and TY are
the upper layers if needed (details in Section 2.2.2). both human activity inferencing but with different numbers of
categories, and (iii) Devices D X and DY are performing completely
different tasks TX and TY . In this paper, we study how to transfer
1 https://github.com/nesl/RecycleML knowledge between devices in the first two scenarios.

38
Enabling Edge Devices that Learn from Each Other EdgeSys ’18, June 10–15, 2018, Munich, Germany

We explore two different methods of task transfer: and observed a maximum time difference of 0.5 seconds between
• PureTransfer directly uses the higher layers of model M X for the observer smartphone and the user smartphone. We leave it for
new model MY . In this case no further training is needed future to explore the effect of poor time synchronization across
and no labeled data is required. devices in observing the same event. We expect the knowledge
• Transfer+LimitedTrain freezes the network д and adds higher transfer capabilities of RecycleML to degrade as the time difference
layers to MY and retrains only the higher layers using limited between devices increases.
labeled data. The details of CMActivities are shown in Table 1. The data col-
lection was done at different locations with two users wearing
In the first scenario, since the tasks are same we can use both
separate set of clothes at each location so as to make sure that the
methods. In the second and the third scenarios, direct transfer of
trained classifier learns the activity features and is least affected by
higher layers from model M X to model MY does not work as M X
the environmental factors. We split 767 videos and IMU sessions
does not give the same desired output. Hence, we use the second
into three parts: training dataset (624), testing dataset (71) and per-
method. In our experiments, we evaluate scenario (i) of task transfer
sonalization dataset (72). Training and testing datasets contain 7
using both methods of PureTransfer and Transfer+LimitedTrain and
activities at 5 different locations and personalization dataset con-
scenario (ii) using Transfer+LimitedTrain.
In our experiments, we used the output of last hidden layer tains 5 activities at 6t h location. We don’t have Go Upstairs and Go
after removing the final output layer from model M X as the f Downstairs activities in the personalization dataset.
transformation. Here f and д serve as shared latent representations The training dataset is further split into 3 parts: Pre-Training
across modalities. We add a single task specific layer to д to generate set, Transfer set and LimitTrain set. The personalization dataset is
model MY . In future, we will explore the different choices of f and split into PersonalTrain and PersonalTest sets. The testing dataset
addition of multiple task specific output layers to д. is used only for evaluation. The frame rate of video is 29 and the
sampling frequency of audio and IMU is 22050 Hz and 25 Hz re-
3 EVALUATION spectively. We use a window of 2 seconds to extract vision, audio
and IMU features from dataset with sliding window of 0.4 seconds
3.1 Dataset between consecutive windows. In case of vision and IMU, we use
For our experiments, we collected a new dataset, called CMAc- raw features directly as input to the models. We extracted features
tivities, composed of videos for vision and audio modality, and from the raw audio data using Librosa [16] and use it as the input
corresponding IMU data (accelerometer and gyroscope) from sen- features. Specifically, we extract mel-frequency cepstral coefficients
sors on left and right wrist. We collected 767 videos of roughly 10 (MFCC) [15], power spectrogram [6], mel-scaled spectrogram, spec-
second each from 2 users2 doing 7 different activities at 6 locations. tral contrast [13] and tonal centroid features (tonnetz) [10].
Every video contains a single activity and is used to label the vision, In total, we have 11976 samples in training (5000 samples for
audio and IMU data. The total duration of collected data for each Pre-Training set, 6000 samples for Transfer set, and 976 samples
modality is 125 minutes. for LimitedTrain set), 1377 samples in test and 1592 samples in
personalization (475 samples for PersonalTrain set and 1117 samples
Table 1: Description of CMActivities dataset for PersonalTest set) for each modality.

Activity Number of Videos Duration (sec) Table 2: Testing accuracy of baseline models

Go Upstairs 162 1338

Go Downstairs 161 1113 Input Modality Video Audio IMU
Walk 119 1143 Accuracy 90.92% 92.81% 90.99%
Run 115 891 Number of parameters 4.6M 0.8M 57K
Jump 73 995
Wash Hand 73 1070
Jumping Jack 90 958
3.2 Baselines
To compare the results of RecycleML, we trained Video, Sound and
We collected the videos of the user using an observer smart-
IMU models using Pre-Training dataset individually to do activ-
phone. The wrist sensors communicate the data to the smartphone
ity recognition. The models we use are the state-of-the-art deep
of the user doing the activities. The IMU data was timestamped
learning architectures that are generally adapted in a wide range
by user’s smartphone and the video by the observer smartphone.
of applications:
Time synchronization between vision and audio is naturally present
(a) Video Network is a reduced version of C3D [24] network. It in-
because both are extracted from the same videos. However, time
cludes four 3D-convolutional modules combined with 3D-maxpooling
synchronization between the user smartphone and the observer
layers, followed by 3 fully-connected layers and one output layer.
smartphone is needed so as to synchronize video and IMU data. In
The total number of parameters are about 4.6 million.
our data collection, we used the default smartphone timestamps syn-
(b) Audio Network is a multi-layer perceptron model. It has 10
chronized through the Network Time Protocol (NTP) [17] service,
fully-connected layers and a total of 810 K parameters. We add
2 The data is collected from the authors and thus does not require approval from IRB. drop-out to avoid overfitting.

39
EdgeSys ’18, June 10–15, 2018, Munich, Germany Xing, Sandha et al.

Table 3: Comparison of knowledge transfer between devices.

Significance tests (compared to the training from scratch) are carried out using t-test with P<0.005 in most cases.

Transfer Trained-Device Pure-Transfer Transfer+LimitedTrain Training from Scratch

Video(DV 1 ) to Audio(D A1 ) 90.92% 90.20% 90.36% 84.12%
Video(DV 1 ) to IMU(DW ) 90.92% 94.19% 94.37% 70.73%
IMU(DW ) to Video(DV 2 ) 90.99% 74.00% 75.13% 72.26%
IMU(DW ) to Audio(D A2 ) 90.99% 84.82% 87.82% 84.28%

(c) IMU Network is a CNN network. It has 2 convolutional modules Table 3 shows the knowledge transfer results between devices
(convolution layer + maxpooling layer), 3 fully-connected layers doing the same task of activity recognition. Model performance is
and a output layer. 57K parameters are trainable in this network. measured by test accuracy. Considering row 1, Trained-Device is
Table 2 shows the summary of the individual models. The models the accuracy of pre-trained device DV 1 . Pure-Transfer and Trans-
are trained using the training dataset and tested on testing dataset. fer+LimitedTrain are the accuracy of device D A1 using both methods
These baseline models are trained using SGD [4] and Adam [14] respectively. The last cell shows the accuracy of audio model trained
optimizers with a learning rate of 0.001. We save the models with from scratch using LimitTrainSet. As we can see both methods Pure-
best test accuracy after training for 500 epochs. Transfer and Transfer+LimitedTrain achieve better accuracy than
training from scratch. This shows that shared latent feature repre-
sentation is successful in doing knowledge transfer across devices
3.3 Knowledge Transfer Results of different modalities. We also observe that Transfer+LimitedTrain
Knowledge transfer results are presented in Table 3. In the first usually gives the best performance.
and second experiment, vision device DV 1 is trained while acoustic
device D A1 and wearable device DW are untrained respectively.
In the third and fourth experiment wearable device DW is trained
while vision device DV 2 and acoustic device D A2 are untrained. For
each of these four transfers, we follow the same procedure. Taking
vision device DV 1 to acoustic device D A1 as an example, we first
train the vision model of DV 1 from scratch using the Pre-Training
set (5000 samples) of training dataset. We use the standard SGD
optimizer with a learning rate of 0.001. The training is finished in
500 epochs. We then use DV 1 as a pre-trained device to transfer
knowledge to a D A1 following the procedure described in Section
2.2.1. In the knowledge transferring process, we use Adam optimizer
with a learning rate of 0.001, and run it for 500 epochs. The data
used in transfer process are the synchronized unlabeled vision and
sound data from Transfer set (6000 samples) of training dataset.
After transfer, the higher layers of audio model can be created using
two methods Pure-Transfer and Transfer+LimitedTrain discussed in
Section 2.2.2 when both DV 1 and D A1 are doing the same task. In
Pure-Transfer method audio model uses the output layer of vision
model directly. In Transfer+LimitedTrain, we train the new output
layer for audio model. We select a small labeled set of 500 samples Figure 3: Transfer+LimitedTrain converges in 10 epochs
randomly out of 976 samples from LimitedTrain set of training whereas Training from scratch requires training for
dataset and name it LimitTrainSet. We use the LimitTrainSet to around 500 epochs.
train the output layer of audio model for 100 epochs using Adam
optimizer. As a comparison, we also trained an audio model from
scratch using the same LimitTrainSet for 500 epochs. We use more
epochs for training from scratch as it takes more time to converge. In our experiment, we train every model for 10 times to preclude
The other three transfers are tested in the same way. The Audio and the effect of randomness. Based on the results, significance tests
IMU models which are trained from scratch use Adam optimizer. (compared to training from scratch) are carried out using t-test. We
Note: In Video to IMU transfer, it takes more time to transfer the find that the Transfer+LimitedTrain can outperform training from
knowledge, so we perform the knowledge transfer for 1000 epochs. scratch (p < 0.005) in three cases (Video to Audio, Video to IMU,
In real implementations, the knowledge transfer process for edge IMU to Audio); and p < 0.4 for the case of IMU to Video transfer.
devices can either be done in background or at the server using This is because video model is complicated and sensitive, and the
unlabeled data, so as to avoid the overhead. performance of video model trained from scratch fluctuates.

40
Enabling Edge Devices that Learn from Each Other EdgeSys ’18, June 10–15, 2018, Munich, Germany

3.4 RecycleML Reduces Training Time

We further compare the effect of number of epochs between Trans-
fer+LimitedTrain method and training from scratch using Limit-
edTrainSet (500 samples). Figure 3 shows our results in all the 4
transfers. Clearly, Transfer+LimitedTrain method trains model with
accuracy greater than 80% in most of the cases with less than 10
epochs, while training from scratch can not achieve comparable
accuracy after 500 epochs. This makes RecycleML even more suit-
able to be deployed on edge devices: it reduces the training time by
50x. The reason for this huge gain is the knowledge transfer using
unlabeled data and Transfer+LimitedTrain trains only the output
layer so it requires very less number of epochs.

3.5 RecycleML Reduces Required Labeled Data

To study the effect of number of labeled data samples on model accu-
racies, we change the size of training data for Transfer+LimitedTrain
and training from scratch. All the training samples were selected
randomly from LimitedTrain set (976 samples) of training dataset.
Although methods converge at different speeds (Transfer+LimitedTrain Figure 4: With different sizes of labeled data,
converges in 10 epochs, while Training from scratch takes about Transfer+LimitedTrain converges better than Training
500 epochs), in this experiment, we only compare the converged from scratch.
performance of all the models. Figure 4 shows our results for
four device transfers. Consider Video (DV 1 ) to Audio (D A1 ), Trans-
fer+LimitedTrain is compared with training Audio (D A1 ) from scratch.
Using Transfer+LimitedTrain, the model achieve best achievable ac-
curacy using only 50 data samples. While training model from
scratch cannot get comparable results even if we increase the
size of available data to 976 samples as shown in upper left fig-
ure. The testing was performed on entire test dataset. So Recy-
cleML reduces labeled data requirement by at least 90%. However,
in ideal scenario, when abundant labeled data samples are available,
training from scratch slowly converges and can outperform Trans-
fer+LimitedTrain. For IMU (D I MU ) to Video (DV 2 ), when more than
750 labeled data are available, training from scratch can outperform
the method of Transfer+LimitedTrain.

3.6 Related Task Transfer Using RecycleML

We tested knowledge transfer from video device to IMU device
with video model doing activity recognition task with 7 categories
while goal of IMU model is to do activity recognition task with 5
categories in a totally different location. Figure 5: Transferring knowledge to a new task:
We did knowledge transfer as described in Section 2.2.1 and Transfer+LimitedTrain learns faster and better than
finally used Transfer+LimitedTrain method to train the output layer Training from Scratch.
of IMU model using PersonalTrain set (475 samples). The trained
models are tested on PersonalTest set (1117 samples). In Figure 5, we
plot the learning curve on Transfer+LimitedTrain and training from combining modalities for human activity recognition on mobile
scratch trained using PersonalTrain . When transferring knowledge devices. We use the idea of representing multiple modalities in the
to a relevant task, RecycleML still learns faster: it converges in 10 same latent space in intermediate layers of a deep network, but our
epochs and gets a testing accuracy of 91.58%, while training from focus is on knowledge transfer for machine learning models across
scratch takes 500 epochs and only gets an accuracy of 61.86%. multi-modal edge devices.
Ba et al. [3], Hinton et al. [11] present knowledge transfer be-
4 RELATED WORK tween the same modality. Ngian et al. [19] use shared representa-
RecycleML is inspired from prior works in machine learning for tions to improve visual speech classification. Aytar et al. [1] learn
multimodal data. Previous works [12, 18, 20, 21] combine lower shared representations that connect multiple forms of image and
layers from multiple modalities to develop a unified model that text data. Frome et al. [7] show knowledge transfer from text to
outperforms the individual modalities. Radu et al. [20, 21] study vision for object classification. Gupta et al. [9] present knowledge

41
EdgeSys ’18, June 10–15, 2018, Munich, Germany Xing, Sandha et al.

transfer between labeled RBG images and unlabeled depth and op- or implied, of the funding agencies. The U.S. and UK Governments are
tical flow images. Aytar et al. [2] show that visual knowledge can authorized to reproduce and distribute reprints for Government purposes
be transfer from vision to sound. notwithstanding any copy-right notation hereon.
The prior works either focus on image and text data, or take two
modalities (vision and audio) from the same source into considera-
REFERENCES
[1] Aytar, Y., Castrejon, L., Vondrick, C., Pirsiavash, H., and Torralba, A.
tion. In RecycleML, we consider three commonly available sensing Cross-modal scene networks. IEEE transactions on pattern analysis and machine
modalities on edge devices from multiple sources, and create a uni- intelligence (2017).
[2] Aytar, Y., Vondrick, C., and Torralba, A. Soundnet: Learning sound repre-
fied representation that bridge them. This allows edge devices to sentations from unlabeled video. In Advances in Neural Information Processing
use multimodal knowledge transfer across different sensing modal- Systems (2016), pp. 892–900.
ities of ambient sensors (vision and audio) and wearables sensors [3] Ba, J., and Caruana, R. Do deep nets really need to be deep? In Advances in
neural information processing systems (2014), pp. 2654–2662.
(IMU) for the first time. [4] Bottou, L. Large-scale machine learning with stochastic gradient descent. In
Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–186.
5 DISCUSSION [5] Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Dar-
rell, T. Decaf: A deep convolutional activation feature for generic visual recog-
While RecycleML shows promise in terms of handling both paucity nition. In International conference on machine learning (2014), pp. 647–655.
of labeled data and also speeds up model training across multiple [6] Ellis, D. Chroma feature analysis and synthesis. Resources of Laboratory for the
Recognition and Organization of Speech and Audio-LabROSA (2007).
modalities, the ability of the approach to generalize to different [7] Frome, A., Corrado, G., Shlens, J., Bengio, S., Dean, J., Ranzato, M., and
applications for larger datasets needs further investigation. Further- Mikolov, T. Devise: A deep visual-semantic embedding model. In Neural
Information Processing Systems (NIPS) (2013).
more, our experiments indicate that while the trained models can [8] Graves, A., Mohamed, A.-r., and Hinton, G. Speech recognition with deep
be personalized to a specific environment, they need regularization recurrent neural networks. In Acoustics, speech and signal processing (icassp),
to generalize to new settings. 2013 ieee international conference on (2013), IEEE, pp. 6645–6649.
[9] Gupta, S., Hoffman, J., and Malik, J. Cross modal distillation for supervision
For cross modal knowledge transfer using RecycleML, we need transfer. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference
unlabeled but synchronized data. In our experiments, since audio on (2016), IEEE, pp. 2827–2836.
and video data are captured by the same device, they are natu- [10] Harte, C., Sandler, M., and Gasser, M. Detecting harmonic change in musical
audio. In Proceedings of the 1st ACM workshop on Audio and music computing
rally synchronized. In addition, we used the default smartphone multimedia (2006), ACM, pp. 21–26.
timestamps, synchronized through the Network Time Protocol [11] Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural
network. arXiv preprint arXiv:1503.02531 (2015).
(NTP) [17] service, to synchronize IMU device with video and sound [12] Huang, J., and Kingsbury, B. Audio-visual deep learning for noise robust
device. In real settings, however, edge devices have to be time syn- speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013
chronized in order to observe the same event at the same time. IEEE International Conference on (2013), IEEE, pp. 7596–7599.
[13] Jiang, D.-N., Lu, L., Zhang, H.-J., Tao, J.-H., and Cai, L.-H. Music type clas-
In our experiments, we chose the fully connected layer (imme- sification by spectral contrast feature. In Multimedia and Expo, 2002. ICME’02.
diately prior to the output layer) as the common latent space. In Proceedings. 2002 IEEE International Conference on (2002), vol. 1, IEEE, pp. 113–116.
future, we plan to explore different choices for the shared represen- [14] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014).
tation layer, for efficient sensory substitution and task transfer on [15] Logan, B., et al. Mel frequency cepstral coefficients for music modeling. In
edge devices. ISMIR (2000), vol. 270, pp. 1–11.
[16] McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and
Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of
6 CONCLUSION the 14th python in science conference (2015), pp. 18–25.
[17] Mills, D. L. Internet time synchronization: the network time protocol. IEEE
Heterogeneity in sensing modality of the edge devices, together Transactions on communications 39, 10 (1991), 1482–1493.
with lack of labeled training data, represent two of the most sig- [18] Münzner, S., Schmidt, P., Reiss, A., Hanselmann, M., Stiefelhagen, R., and
nificant barriers to enabling the learning-on-the-edge paradigm. Dürichen, R. Cnn-based sensor fusion techniques for multimodal human activity
recognition. In Proceedings of the 2017 ACM International Symposium on Wearable
Towards this end, we presented RecycleML, a system that enables Computers (2017), ACM, pp. 158–165.
multi-modality edge devices to perform knowledge transfer be- [19] Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. Multimodal deep
tween their models by mapping their lower layers to a shared latent learning. In Proceedings of the 28th international conference on machine learning
(ICML-11) (2011), pp. 689–696.
space representation. RecycleML further allows task-specific trans- [20] Radu, V., Lane, N. D., Bhattacharya, S., Mascolo, C., Marina, M. K., and
fer between models by targeted retraining of the higher layers Kawsar, F. Towards multimodal deep learning for activity recognition on mobile
devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive
beyond the shared latent space – reducing the amount of labeled and Ubiquitous Computing: Adjunct (2016), ACM, pp. 185–188.
data needed for model training. Our initial experiments, performed [21] Radu, V., Tong, C., Bhattacharya, S., Lane, N. D., Mascolo, C., Marina, M. K.,
using multi-modality data (vision, audio, IMU) for activity recogni- and Kawsar, F. Multimodal deep learning for activity and context recogni-
tion. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
tion, show that transfer model trained using RecycleML leads to Technologies 1, 4 (2018), 157.
reduced training time and results in increased accuracy compared [22] Shen, C., Ho, B.-J., and Srivastava, M. Milift: Efficient smartwatch-based
to an edge model trained from scratch using limited labeled data. workout tracking using automatic segmentation. IEEE Transactions on Mobile
Computing (2017).
[23] Socher, R., Ganjoo, M., Manning, C. D., and Ng, A. Zero-shot learning through
7 ACKNOWLEDGEMENT cross-modal transfer. In Advances in neural information processing systems (2013),
pp. 935–943.
This research was sponsored by the U.S. Army Research Laboratory and [24] Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. Learning
the UK Ministry of Defence under Agreement Number W911NF-16-3-0001, spatiotemporal features with 3d convolutional networks. In Computer Vision
by the National Institutes of Health under award #U154EB020404, and by (ICCV), 2015 IEEE International Conference on (2015), IEEE, pp. 4489–4497.
the National Science Foundation under award #1636916. The views and [25] Yang, J., Nguyen, M. N., San, P. P., Li, X., and Krishnaswamy, S. Deep convolu-
tional neural networks on multichannel time series for human activity recogni-
conclusions contained in this document are those of the authors and should tion. In IJCAI (2015), pp. 3995–4001.
not be interpreted as representing the official policies, either expressed

International Standard: Iso/Iec 27042
100% (1)
International Standard: Iso/Iec 27042
24 pages
Autocratic-Democratic Leadership Style Questionnaire PDF
100% (5)
Autocratic-Democratic Leadership Style Questionnaire PDF
6 pages
Human Motion Detection - Report
No ratings yet
Human Motion Detection - Report
50 pages
Good Engineering Practice
50% (2)
Good Engineering Practice
3 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
No ratings yet
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
12 pages
Edge_Machine_Learning_for_AI-Enabled_IoT_Devices_A (1)
No ratings yet
Edge_Machine_Learning_for_AI-Enabled_IoT_Devices_A (1)
33 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Ali 2020 J. Phys. Conf. Ser. 1529 042076
No ratings yet
Ali 2020 J. Phys. Conf. Ser. 1529 042076
11 pages
Eris_An_Online_Auction_for_Scheduling_Unbiased_Distributed_Learning_Over_Edge_Networks
No ratings yet
Eris_An_Online_Auction_for_Scheduling_Unbiased_Distributed_Learning_Over_Edge_Networks
14 pages
Learning Task-Oriented Communication For Edge Inference: An Information Bottleneck Approach
No ratings yet
Learning Task-Oriented Communication For Edge Inference: An Information Bottleneck Approach
14 pages
Yolov 7
100% (1)
Yolov 7
17 pages
(李佳翰) 2019 (LCH) 《Deep Learning-constructed Joint Transmission-recognition for Internet of Things》〔IEEE Access
No ratings yet
(李佳翰) 2019 (LCH) 《Deep Learning-constructed Joint Transmission-recognition for Internet of Things》〔IEEE Access
15 pages
A Comprehensive Survey of Deep Learning Based Lightweight Object Detection Models For Edge Devices
No ratings yet
A Comprehensive Survey of Deep Learning Based Lightweight Object Detection Models For Edge Devices
49 pages
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
2207.02696v1 2
No ratings yet
2207.02696v1 2
15 pages
ALiteDistributedSemanticCommunicationSystemforInternet_ofThings
No ratings yet
ALiteDistributedSemanticCommunicationSystemforInternet_ofThings
12 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
(IJCST-V8I4P18) :madhankumar Y
No ratings yet
(IJCST-V8I4P18) :madhankumar Y
7 pages
Electronics 11 00139 v2
No ratings yet
Electronics 11 00139 v2
20 pages
1_deepLearning_edge_iot_article_1_CV_20250513_IEEENW_32_1_96_101
No ratings yet
1_deepLearning_edge_iot_article_1_CV_20250513_IEEENW_32_1_96_101
7 pages
Wang YOLOv7 Trainable Bag-Of-Freebies Sets New State-Of-The-Art For Real-Time Object Detectors CVPR 2023 Paper
No ratings yet
Wang YOLOv7 Trainable Bag-Of-Freebies Sets New State-Of-The-Art For Real-Time Object Detectors CVPR 2023 Paper
12 pages
2306.06603v1
No ratings yet
2306.06603v1
18 pages
2018_DGCNN
No ratings yet
2018_DGCNN
12 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
From Everand
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
Pasquale De Marco
No ratings yet
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
(2019) Towards Machine Learning With Zero Real - World Data
No ratings yet
(2019) Towards Machine Learning With Zero Real - World Data
6 pages
Aop Iccps 2024
No ratings yet
Aop Iccps 2024
10 pages
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
LSTM Network-Based Adaptation Approach For Dynamic Integration in Intelligent End-Edge-Cloud Systems
No ratings yet
LSTM Network-Based Adaptation Approach For Dynamic Integration in Intelligent End-Edge-Cloud Systems
13 pages
Machine Learning Algorithms For Wireless Sensor Networksa Survey
100% (1)
Machine Learning Algorithms For Wireless Sensor Networksa Survey
25 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
2. Feb 25 - Vol. 12 No. 3
No ratings yet
2. Feb 25 - Vol. 12 No. 3
64 pages
ComSIS_17252
No ratings yet
ComSIS_17252
26 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
ReviewPaper TransferLearning
No ratings yet
ReviewPaper TransferLearning
6 pages
Intelligence at the Extreme Edge a Survey of Tinyml
No ratings yet
Intelligence at the Extreme Edge a Survey of Tinyml
31 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Improving Global Awareness of Linkset Predictions Using Cross-Attentive Modulation Tokens
No ratings yet
Improving Global Awareness of Linkset Predictions Using Cross-Attentive Modulation Tokens
17 pages
MobileNetV2 Inverted Residuals and Linear Bottlenecks
No ratings yet
MobileNetV2 Inverted Residuals and Linear Bottlenecks
11 pages
Project Report of Dewan and Durjoy
No ratings yet
Project Report of Dewan and Durjoy
33 pages
jlpea-12-00061-v2
No ratings yet
jlpea-12-00061-v2
24 pages
Kshitij Synopsis
No ratings yet
Kshitij Synopsis
8 pages
DLEI_PPT_B-Batch_Unit-6
No ratings yet
DLEI_PPT_B-Batch_Unit-6
41 pages
IoT Learning
No ratings yet
IoT Learning
6 pages
1 s2.0 S2665917423000764 Main
No ratings yet
1 s2.0 S2665917423000764 Main
8 pages
Drone Movement Detection Network Using Raspberry Pi
No ratings yet
Drone Movement Detection Network Using Raspberry Pi
7 pages
IoT Chapter (1-7)
No ratings yet
IoT Chapter (1-7)
17 pages
Synthetic Data For Object Classification in Industrial Applications
No ratings yet
Synthetic Data For Object Classification in Industrial Applications
8 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Temporal Approaches for Human Activity Recognition using Inertial Sensors, GARCIA, FA
No ratings yet
Temporal Approaches for Human Activity Recognition using Inertial Sensors, GARCIA, FA
6 pages
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
From Everand
Machine Learning: Adaptive Behaviour Through Experience: Thinking Machines
alasdair gilchrist
4.5/5 (5)
Master Inspera
No ratings yet
Master Inspera
45 pages
Classification-Driven Discrete Neural Representation Learning for Semantic Communications
No ratings yet
Classification-Driven Discrete Neural Representation Learning for Semantic Communications
13 pages
Combining Machine Learning and Edge Computing,Opportunities, Challenges, Platforms, Frameworks, and Use Cases
No ratings yet
Combining Machine Learning and Edge Computing,Opportunities, Challenges, Platforms, Frameworks, and Use Cases
26 pages
Multi Agent System: Fundamentals and Applications
From Everand
Multi Agent System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Efficient_Wireless_Traffic_Prediction_at_the_Edge_A_Federated_Meta-Learning_Approach
No ratings yet
Efficient_Wireless_Traffic_Prediction_at_the_Edge_A_Federated_Meta-Learning_Approach
5 pages
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Chapter12_PeriodicMotion
No ratings yet
Chapter12_PeriodicMotion
39 pages
Timesofindia Used Cooking Oil
No ratings yet
Timesofindia Used Cooking Oil
4 pages
Chapter3 2D&3D Motion
No ratings yet
Chapter3 2D&3D Motion
35 pages
15 Creative Uses for Used Cooking Oil You Never Knew _ Mahoney Environmental(Boyalı)
No ratings yet
15 Creative Uses for Used Cooking Oil You Never Knew _ Mahoney Environmental(Boyalı)
5 pages
Used Cooking Oil Recycling for Commercial Businesses _ Rubicon(Boyalı)
No ratings yet
Used Cooking Oil Recycling for Commercial Businesses _ Rubicon(Boyalı)
5 pages
Chapter10_Dynamics of Rotational Motion2e
No ratings yet
Chapter10_Dynamics of Rotational Motion2e
33 pages
algo2-premid-supplementary
No ratings yet
algo2-premid-supplementary
102 pages
Chapter5e Friction UniformCircularMotion 2e
No ratings yet
Chapter5e Friction UniformCircularMotion 2e
33 pages
Lecture3_IO_BLG336E_2022
No ratings yet
Lecture3_IO_BLG336E_2022
61 pages
Paper5(Boyalı)
No ratings yet
Paper5(Boyalı)
6 pages
Paper4
No ratings yet
Paper4
7 pages
Paper1
No ratings yet
Paper1
6 pages
Paper2
No ratings yet
Paper2
4 pages
Lecture1_IO_BLG336E_2022
No ratings yet
Lecture1_IO_BLG336E_2022
87 pages
Lecture9_IO_BLG336E_2022
No ratings yet
Lecture9_IO_BLG336E_2022
149 pages
Object Detection with Automative Radar Sensors using CFAR Algorithms
No ratings yet
Object Detection with Automative Radar Sensors using CFAR Algorithms
26 pages
Lecture11_IO_BLG336E_2022
No ratings yet
Lecture11_IO_BLG336E_2022
83 pages
Lecture10_IO_BLG336E_2022
No ratings yet
Lecture10_IO_BLG336E_2022
136 pages
Think Bayes 2 — Think Bayes
No ratings yet
Think Bayes 2 — Think Bayes
2 pages
Lecture8_IO_BLG336E_2022
No ratings yet
Lecture8_IO_BLG336E_2022
87 pages
3-D_object_tracking_in_millimeter-wave_radar_for_advanced_driver_assistance_systems
No ratings yet
3-D_object_tracking_in_millimeter-wave_radar_for_advanced_driver_assistance_systems
4 pages
Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data
No ratings yet
Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data
7 pages
1-s2.0-S0378873315000271-main
No ratings yet
1-s2.0-S0378873315000271-main
8 pages
Dense Human Point Cloud Generation from mmWave
No ratings yet
Dense Human Point Cloud Generation from mmWave
13 pages
project wine and spirits
No ratings yet
project wine and spirits
16 pages
Ironsworn Jumpchain
No ratings yet
Ironsworn Jumpchain
16 pages
Mini Project Reaction Engineering GROUP 9 (Stage 2)
100% (3)
Mini Project Reaction Engineering GROUP 9 (Stage 2)
41 pages
Charms and Amulets
No ratings yet
Charms and Amulets
4 pages
Gurugram 2022-23
No ratings yet
Gurugram 2022-23
2 pages
Syarat Visa Business - German
No ratings yet
Syarat Visa Business - German
4 pages
Chemistry S 12 paper
No ratings yet
Chemistry S 12 paper
22 pages
Rural Marketing Assignment - Vinay Bhandolkar
No ratings yet
Rural Marketing Assignment - Vinay Bhandolkar
15 pages
Case 1-1 Starbucks - Going Global Fast
92% (13)
Case 1-1 Starbucks - Going Global Fast
2 pages
GRADE 11 PR1 Lesson 5 COT DETAILED
No ratings yet
GRADE 11 PR1 Lesson 5 COT DETAILED
9 pages
Resume Iim Nsy
No ratings yet
Resume Iim Nsy
18 pages
ManualPanasonic KXDT521 Guide
No ratings yet
ManualPanasonic KXDT521 Guide
24 pages
Estimating On A Number Line To 1000 - Horizontal
No ratings yet
Estimating On A Number Line To 1000 - Horizontal
7 pages
3rd Quarter Examination
No ratings yet
3rd Quarter Examination
6 pages
Unit h167 01 Research Methods Sample Assessment Materials
No ratings yet
Unit h167 01 Research Methods Sample Assessment Materials
36 pages
Feedback in The Clinical Setting: Review Open Access
No ratings yet
Feedback in The Clinical Setting: Review Open Access
5 pages
PXM 3000 User Manual Mn150012en
No ratings yet
PXM 3000 User Manual Mn150012en
220 pages
TM 12 Joint Product and by Product
No ratings yet
TM 12 Joint Product and by Product
54 pages
Mid Yr Pir 2022
No ratings yet
Mid Yr Pir 2022
6 pages
Knowledge, Attitude and Practice of Haemovigilance, India
No ratings yet
Knowledge, Attitude and Practice of Haemovigilance, India
7 pages
CFT
No ratings yet
CFT
25 pages
CSR and Ethical Issues of Kotak Mahindra Bank Limited
No ratings yet
CSR and Ethical Issues of Kotak Mahindra Bank Limited
12 pages
Aisin Atf-0t4 Technical Data
No ratings yet
Aisin Atf-0t4 Technical Data
1 page
Acknowledgement: Experimental Study of Performance of Conical Solar Still Using Nano Fluid
No ratings yet
Acknowledgement: Experimental Study of Performance of Conical Solar Still Using Nano Fluid
5 pages
Final Exam: DISC 333 (Part 2)
100% (1)
Final Exam: DISC 333 (Part 2)
11 pages
(Translated Texts for Historians, 42) Magnus Aurelius Cassiodorus Senator, Mark Vessey, James W. Halporn (Transl.) - Institutions of Divine and Secular Learning and on the Soul-Liverpool University Pr
No ratings yet
(Translated Texts for Historians, 42) Magnus Aurelius Cassiodorus Senator, Mark Vessey, James W. Halporn (Transl.) - Institutions of Divine and Secular Learning and on the Soul-Liverpool University Pr
328 pages
SILAG 2025 Application Form 1 (2)
No ratings yet
SILAG 2025 Application Form 1 (2)
6 pages