0% found this document useful (0 votes)
2 views

Response

The document is a response to reviewer comments regarding a manuscript on tensor computations for consumer electronics data. The authors address various comments, providing clarifications and revisions related to tensor decomposition methods, computational paradigms, and experimental design. They also enhance the manuscript's presentation and detail the efficiency improvements achieved through their proposed methods.

Uploaded by

olianfee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Response

The document is a response to reviewer comments regarding a manuscript on tensor computations for consumer electronics data. The authors address various comments, providing clarifications and revisions related to tensor decomposition methods, computational paradigms, and experimental design. They also enhance the manuscript's presentation and detail the efficiency improvements achieved through their proposed methods.

Uploaded by

olianfee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Response to Review Comments of Submission

TCE-2024-04-1017
Journal: IEEE Transactions on Consumer Electronics
Manuscript ID: TCE-2024-04-1017
Paper Title: Cloud-Edge Collaborative Scalable Tucker-Based Tensor Computations
for Ubiquitous Consumer Electronics Data
Authors: Huazhong Liu,Weiyuan Zhang, Ren Li, Yunfan Zhang, Jihong Ding,
Guangshun Zhang, Hanning Zhang, Laurence T. Yang
First of all, we would like to thank the reviewers for spending so much valuable time
reviewing the paper and deeply appreciate the constructive comments and helpful
suggestions from all reviewers. We also would like to thank the associate editor for
handling the paper and giving us the revision chance. According to the reviews
suggestions, we have addressed all review comments. The detailed response and the
description of the revisions are presented below.

Response to the Comments of Reviewer 1

Comment1.1: To handle ubiquitous electronics big data, tensor-based big data


analysis methods are universally exploited. This paper aims to propose a set of tensor
operations based on Tucker decomposition and their scalable computations to mitigate
the curse of dimensionality and enhance tensor-based computation efficiency. This
paper tackles an interested issue, and the efforts of the authors are clear in
investigating the problem and in writing the manuscript. However, the paper still
needs further improvement to enhance its quality, here are some comments to this
paper.
Response1.1: Thanks for your positive comments.

Comment1.2: This paper discusses CANDECOMP-PARAFAC (CP), tensor train


(TT) and Tucker decompositions, what are their differences, as well as their
advantages and disadvantages?
Response1.2: Thanks for your comment.
1.The CP decomposition represents a tensor as a sum of rank-one tensors and its
representation is very concise. However, the decomposition algorithms generally are
not stable for high-order tensors and the computation of the optimal rank is
considered as an NP-hard problem.
2.TT decomposition decompose a high-order tensor as a sequence of low-order core
tensors, reducing the complexity of representing high-dimensional tensors and
enabling efficient computation for tensors with low numerical rank. TT is
advantageous for large-scale problems but can be challenging for higher-dimensional
or irregular tensors.
3.Tucker decomposition decompose the original high-order tensor into a low
dimensional core tensor and a series of factor matrices, the feature data is
concentrated on the core tensor and each part of the decomposition result has a clear
meaning. Therefore, Tucker decomposition is widely used in various fields. However,
when decomposing high-order tensors, their core tensors are still affected by the curse
of dimensionality, limiting their applicability in environments with constrained
computing and storage resources.

Comment 1.3: The paper uses many different expressions about computational
paradigms, such as "parallel execution," "scalable scheme," and "distributed manner."
What are their differences?
Response 1.3: Thanks for your constructive comment.
In summary, "parallel execution" performs multiple computations or processes
simultaneously, its main idea is to divide large problems into smaller ones and solve
them concurrently. "Distributed manner" emphasizes that data storage and task
execution at different worker nodes in different locations and collaborate through
network communication to complete tasks. "Scalable scheme" focus more on whether
the system or method can effectively handle the growing workload when increasing
resources such as computing power, storage space, bandwidth. A good scalable
scheme should include parallel execution and distributed methods, as both methods
are means to achieve system scalability.

Comment 1.4: Is there only one way to achieve intra-core parallelism as shown in
Fig.7? This seems different from the intra-core parallelism shown in Fig.5.
Response 1.4: Thanks for your constructive comments.
There is more than one intra-core parallel scheme, and Figure 7 only illustrates the
idea of intra-core parallelism. In fact, Figure 7 represents the idea of intra-core
parallel partitioning when computing between core tensors and vectors. Similarly,
intra-core parallel partitioning between the core tensor and factor matrices can also be
performed in this way, while Figure 5 reflects the idea of intra-core parallel
partitioning when performing operations between different core tensors.

Comment1.5: In section 6, the inner product is used as an example of an intra-


Tukcercore scalable model, but Fig.9 also reflects the idea of an inter-Tukcercore
scalable model. It is better to provide an explanation for this phenomenon.
Response1.5: Thank you for your insightful comment.
In this section, we aim to explain intra-core parallelism, while Tucker-based inner
product involve operations between core tensor and vectors, as well as operations
between different core tensors. Therefore, it is possible to further implement intra-
core parallelism on the basis of inter-core parallelism. Using the Tucker-based inner
product as an example is intended to help readers better understand how intra-core
parallelism works.
In the revised version, this point is clarified as follows:"The intra-Tuckercore scalable
model is well-suited for scalable implementations of Tucker decomposition that
involve core tensor operations, including multilinear contraction and tensor inner
product. The Tucker-based inner product operation involves multilinear contraction
operation, so we take it as a typical example to analyze."

Comment1.6: What is the reason for choosing different parameters, such as orders,
dimensionalities, ranks, etc., for different tensor operations during experimental
design? The authors need to provide a detailed explanation.
Response1.6: Thank you for your insightful comment.
Choosing different parameters for different tensor operations is determined based on
their computational characteristics. The experimental indicators are related to the
execution time, so we analyze the time complexity of different operations to
determine which parameters to use. According to the complexity analysis in section
VI, the time complexity of Tucker mode-n product is O( I ,changing
2
r) the dimension I of
the original tensor will significantly affect the amount of data in the factor matrix.
Therefore, we chose to change the dimensionality of the original tensor for different
experiments to better understand its impact. For same reason, the time complexity of
Hadamard product and inner product is O((r .Thus,
/ n) 2 N ) we chose to change the rank r of
core tensor to get the experimental results of Hadamard product and inner product.

Comment1.7: The serial-parallel ratio is an important indicator for measuring


experimental results, but it is not reflected in Fig.10, why?
Response1.7: Thank you for your comment.
Adding serial parallel ratio can provide a clear experimental effect. In the revised
version, we have corrected this by depicting the series-parallel comparison using line
graphs in Fig. 10.

Revise

Revise
Comment1.8:What is the precision in the relevant experiments of Tucker rounding?
The paper did not explain this, which is really confusing.
Response1.8:Thanks for your helpful comment.
Precision is the prescribed accuracy during executing Tucker decomposition in Tucker
rounding method. The definition of accuracy. For more explanation, please refer to
Ref.[32].In the revised version, we have rewritten the sentence as follows: “...using
different levels of precision[32] and compare the dimensions of the core tensor ...”

Comment1.9: The font thickness of tensor symbols in Figures should be consistent,


and there are also issues with the paper's layout.
Response1.9: Done. Thank you for your comment.
In the revised version, we have ensured consistent font thickness for tensor symbols in
Figures and addressed layout issues in the paper for improved presentation.
Response to the Comments of Reviewer 2
Comment2.1: Efficiently analyzing the ubiquitous data generated by consumer
electronic devices poses a challenging problem. By leveraging tensor-based data
analysis techniques, this paper proposes a set of tensor operations based on Tucker
decomposition results along with their scalable computations. Also this paper
introduces the Tucker rounding method for further compression of the core tensor to
better suit for cloud-edge collaborations. This topic is very meaningful. But I suggest
the authors do the following modifications and corrections to make it completed.

Response2.1: Thanks for your positive comments.

Comment2.2: In section III, Mode-n Kronecker product is mentioned two times, but
there is still a lack of explanation for this operation.
Response2.2: Thanks for your valuable comment.
Mode n Kronecker is a widely used operation. The Mode-n Kronecker product is a
tensor that only performs Kronecker product along the nth mode(order) of another
tensor. We have provided detailed explanations in the revised version.

Comment2.3: The paper divides tensor operations into three categories, but the
reasons for doing this is too brief. I think the authors should provide additional
explanations.
Response2.3: Thanks for your comment.
We divided tensor operations into three categories based on their functions.
1. The operation of taking values for a single tensor is classified as basic operations,
mainly including extracting values, extracting fibers, extracting slices, and extracting
sub tensors.
2. The operations that can cause numerical changes within a tensor or calculate a
certain value about a tensor are classified as mathematical operations, mainly
including Hadamard product, addition, subtraction, inner product, and Frobenius
norm.
3. Contraction operation is a special operation that changes the order and rank of the
tensor, mainly including Tucker mode-n product and multilinear contraction.

Comment2.4: The proposed Tucker rounding algorithm in section IV is used to


compresses the core tensor, but why is the input of pseudocode the original tensor?
Maybe the input should be the Tucker decomposition format.
Response2.4: Thanks for your helpful comment.
Tucker rounding is indeed a method of compressing the core tensor, so the input
should be the Tucker decomposition result of the original tensor. In the revised
version, we have corrected the pseudocode.

Comment2.5: This paper points out that the time complexity of scalable tensor inner
product operations is O(r^2N), while performing the same operation on original
tensors is only O(I^N), how to explain the advantages of the scalable computing?
Response2.5: Thanks for your constructive comment.
Although the exponent of the inner product time complexity based on Tucker
decomposition is 2N, r is the rank of the core tensor, which is much smaller than the
dimension I of the original tensor. Therefore, in general, the scale tensor inner product
operations still have computational efficiency advantages. From the above analysis, it
can be inferred that the scalable approach also has advantages in terms of space
complexity.

Comment2.6: The experimental section involves some typical Tucker-based tensor


computations based on different scalable models. What are their differences and the
selection criteria among these models? Can you provide a detailed explanation?
Response2.6: Thanks for your insightful comment.
The experimental section includes four different scalable models of Tucker-based
tensor computations to encompass both inter-core and intra-core computational
models as presented in Section 5. Specifically, the Tucker Mode-n product employs
inter-core parallelism, the multilinear vector product utilizes intra-core parallelism,
and both the Hadamard product and the inner product leverage a combination of inter-
core and intra-core parallelism. This selection aims to demonstrate the versatility and
efficiency of our proposed methods across various computational scenarios,
showcasing their applicability in Tucker-based computations.

Comment2.7: In Fig. 10, when the number of working nodes is 1, I believe that the
serial and parallel execution times should be exactly the same. Why does your
experimental result appear to have better parallel performance instead? In fact, setting
the node to 1 is meaningless for serial and parallel execution.
Response2.7: Thanks for your insightful comment.
When the node is 1, both serial and parallel computations are performed on a single
node, and there is no difference between the two. The execution time should be the
same, so it is not necessary to show it in the experimental results. The small difference
in the experiment is the error after taking the average value after repeating multiple
sets of experiments. In the improved version, we have removed the experiment with 1
node in the real-world dateset.

Comment2.8: Long sentences and typing errors should be removed to improve the
readability of this paper.
Response2.8: Done. Thanks for your comment.
In the revised version, we have polished this paper carefully for several times and
corrected some grammatical mistakes and flawed sentences. Besides, we have also
asked a native speaker to proofread the paper again.
Response to the Comments of Reviewer 3
Comment3.1: This paper is of great significance in addressing the computation
efficiency for handling consumer electronics data. By introducing the calculation rules
of Tucker-based tensor operations and their scalable computing architecture, this
paper has improved the computational efficiency and provided a feasible solution for
data processing in cloud-edge collaborative environments. However, I have following
comments for improving your paper.
Response3.1: Thanks for your positive comments.

Comment3.2: In the abstract, the authors illustrate the effectiveness of the


experiments, but the percentage of efficiency improvement is not given, I suggest that
the authors add it.
Response3.2: Thanks for your comment.
We have provided a detailed explanation in the abstract to quantify the efficiency
improvement of the experiment, better demonstrating its effectiveness. The revised
version is as follows: “Extensive experimental results demonstrate that the scalable
Tucker-based tensor computation method significantly improves computational
efficiency, achieves an average efficiency improvement of 2 to 5 times compared to
serial execution.”

Comment3.3: When discussing cloud-edge computing architecture in Fig. 1, what's


its association with tensor decomposition techniques, especially Tucker
decomposition? The authors should provide necessary explanations to help readers
better understand the association and motivation.
Response3.3: Thank you for your insightful comment.
The cloud-edge computing architecture is inherently distributed, offering the
advantage of widespread and flexible deployment at the edge. Tensor decomposition,
such as Tucker decomposition, is highly suitable for this architecture because it can
decompose large-scale tensors into smaller parts. If these smaller parts can be stored
and computed at the edge, it reduces data storage and transmission loads, thus
improving overall computational efficiency.
By storing and computing tensor decomposition results at the edge, we can process
and analyze data more efficiently, making full use of the distributed cloud-edge
architecture. This integration allows for more effective resource utilization.

Comment3.4: In the paper, "dimension of tensor" and "dimensionality of tensor" are


frequently used, what is their difference? I think it isn't very clear when using these
two expressions.
Response3.4: Thank you for your constructive suggestion.
In fact, “dimensionality” and “dimension” represent the same meaning in the paper.
In the revised version, we have revised the expression as “dimensionality” in a unified
manner.
Comment3.5: In section 3, there is inconsistency in the naming format of theorems,
the authors should unify the format. For example, the naming rule of Theorem 8 is
Tucker-based Frobenius Norm, while the previous naming does not include the prefix
of Tucker-based.
Response3.5: Thank you for your valuable comment.
In the revised version, we have standardized the naming format of all theorems to
ensure consistency. Specifically, we have updated the naming rule of Theorem 8 to
align with the previous naming conventions, removing the "Tucker-based" prefix to
maintain uniformity.

Comment3.6: The time complexity analysis in Section 6 indicates that parallel


computing has better communication time than serial computing, while the
experimental conclusion explains that parallel communication overhead is greater
than serial, which seems contradictory. How to explain this phenomenon?
Response3.6: Thank you for your constructive comment
In Section 6, the time complexity analysis indicates that parallel computing can
achieve better communication time due to simultaneous data transfers, which
theoretically reduces the overall communication latency compared to serial
computing. However, in practice, the experimental conclusion shows that the parallel
communication overhead can be greater than that of serial computing. This
discrepancy arises due to several factors:
1. Overhead of Managing Parallel Processes: Parallel computing involves additional
overhead in managing and synchronizing multiple processes or threads. This overhead
can sometimes outweigh the benefits of reduced communication time, especially in
situations with high parallel granularity.
2. Latency of Communication Setup: Establishing multiple communication channels
in parallel computing can introduce additional latency. The setup time for initiating
these channels can contribute significantly to the overall communication overhead.
3. Granularity of Data Distribution: The efficiency of parallel communication also
depends on how well the data is distributed across the processes. If the data
distribution is not optimal, it can lead to imbalances and increased communication
times, thereby reducing the effectiveness of parallel computing.
These factors were not reflected in the complexity analysis, leading to the seemingly
contradictory results. We appreciate the opportunity to explain and clarify this point.

Comment3.7: This paper mentions the cloud-edge collaboration frequently, how does
it reflect in the experiment?
Response3.7: Thank you for your comment.
We use the EdgeCloudSim to simulate the edge computing environment. This tool is
specially used for edge computing scenarios, EdgeCloudSim is based on CloudSim to
address the specific needs of edge computing research and support necessary
functions in computing and network capabilities. EdgeCloudSim provides a modular
architecture, in which each module solves specific aspects of edge computing and
clearly defines interfaces with other modules. The entire system mainly consists of
five main modules, namely: core simulation, network, load generator, mobility, and
edge coordinator. The overall architecture is shown in the Figure below. Each module
contains configuration files that can be adjusted by simulation parameters to achieve
different cloud edge computing environments. In the experiment, network modules
and edge coordination modules were mainly used to calculate the delay in the task
allocation process.

Comment3.8: The representation of tensors should be standardized, for example,


tensor G in the proof process of Formula 12 in section 3 should be underlined.
Response3.8: Thank you for your valuable comment.
We have carefully reviewed all the formulas in the paper and made corresponding
modifications, including underlining the tensor G to ensure consistency and
standardization of representation.

Comment3.9: The author should further polish the English writing.


Response3.9: Done. Thanks for your comment.
In the revised version, we have polished this paper carefully for several times and
asked a native speaker to proofread the paper again.

You might also like