Reviewer 2 of CDC12 submission 1652
Comments to the author
======================
PAPER SUMMARY
===========
The goal of the paper is to illustrate an average consensus
algorithm robust to random and independent link failures.
The authors first define the network model used. Then, the
authors review a previous algorithm called ratio-consensus,
where each node maintains two variables updated in
parallel, and the estimate of the mean is obtained by
computing the ratio of these variables. This previous
algorithm is guaranteed to converge to the correct average
only in the case of perfect communications.
The paper then propose to add additional states to store a
scaled running sum of each node and the same running sums
transmitted by neighboring nodes. The states are then
updated from differences of such quantities. The authors
show that this scheme is equivalent to a Markov chain where
the states are given by the original states plus quantities
that keep track, for each node, of the updates that have
been transmitted but not received by the nodes. Note that
the transition matrices of the states are influenced by the
link failures.
Using properties of Markov chains, the authors show
convergence of the states with probability one, and show
that the ratio of variables used for estimating the
consensus value can be computed infinitely often.
This then implies convergence to the correct average.
COMMENTS
=======
The paper proposes an interesting and elegant analysis of
an extension to an existing consensus algorithms for the
case of unreliable link failures. The paper is in general
understandable and all the proofs appear to be correct
(except for a minor mistake, see below).
The main critique that can be made to the paper is in the
presentation. The authors spend lots of space in the first
5 pages on issues that could be treated more compactly and
that could probably moved to other places where they are
more relevant (see details below). Conversely, the
technical part (section V and VI) mostly follows the
pattern claim/proof/claim/proof with little or no
high-level comments on where the results are heading. In
addition, the closure of the paper feels rushed.
Finally, as a minor complaint, the given assumptions are
somewhat impractical: implementing the fact that each node
knows the number of outgoing edges (possibly using just one
edge) could be cumbersome in practice.
The following is a list of possible improvements/typos that
the authors should address.
-The analisys of [4] given in the review of prior work is
not accurate. The work of [4] uses re-transmissions to
effectively lower the packet loss probabilities for each
link. However, it does not require the use of acks as
mentioned by the authors.
-In fact, the work of [4] uses an intuition similar to the
one used in this paper. The main difference is that in [4]
the quantities that have been transmitted but not received
are tracked in explicit variables which are used in
corrective iterations. On the other hand, in the method
proposed in this paper, these variables are implicit and
the proof techniques are quite different. The authors
should include a closer comparison of [4] with the proposed
method.
- In the first three pages (e.g., column 1 of page 2 and of
page 3) devote a large amount of space to elaborate on
aspects of the problem that are quite simple to grasp. The
authors should make this sections more compact, and spend
more time giving more high-level guidance in the technical
parts of section V and VI
- Some parts of the first sections should be moved where
they are more relevant. Specifically, the paragraph named
"A detour" should be moved near (39), where the authors can
say that in [5] z_k is always non-zero, but, in the present
case, this is not guaranteed, and we need threshold to set
a threshold to avoid divisions by zero. Similarly, section
III.A should be moved to *after* the introduction of the
algorithm. In this way, it would make more sense for the
reader and would function as a good introduction to section
IV.
- The authors shold make more clear that the aggragate
states used for the analysis are the original states and
the fictitous states \nu all stacked in the same vector y.
- In property M3, the constant c seems to actually be
1/max_i({D_i})
- As mentioned before, section V.B and VI should provide a
high-level overview of the architecture of the proof before
going into the details.
- In (27), it should be "T_k" instead of "T_{lk}"
- Eq. (33) is a repetition of (31)
- The constants mu and mu_z in (42) should be dependent on
(i)
- Making clear the limits for i and j in section VI would
improve the clarity of the presentation (see, e.g., the
sums in (37), (40), (41)).
- The authors should highlight that, in (37) and in Theorem
1 and its proof, the limits of j in the sum over all the
states are not important, since the \nu variables are zero
at k=0. This helps understanding why pi^\ast is the desired
average.
- Page 8, first column. "at least one of the elements of
\tilde{z}_{j-1}l must be greater than or equal than 1/n" it
should be "[...] than 1/(n+|E|)", where "|E|" is the number
of edges in the network. This typo does not affect the rest
of the proofs significantly.
- This reviewer could not understand last sentence of
concluding remarks.
--_----------=_134252753232161171
Content-Disposition: attachment; filename="Review56314.txt"
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; name="Review56314.txt"
Reviewer 3 of CDC12 submission 1652
Comments to the author
======================
This paper studies protocols for distributed averaging over
links which drop some messages. A novel protocol is
proposed and proven to converge in this setting. The
convergence proof follows by relating the operation of the
protocol to the evolution of a Markov chain on an augmented
state space, and then (carefully) adapting tools for
studying convergence of forward products of stochastic
matrices.
Overall the paper is well-written. It contains a number of
interesting insights and the main result constitutes a
novel contribution. Given the wide interest in consensus
and distributed averaging, this work is certainly relevant
to the community. I thus recommend the paper be accepted
for publication. Some suggestions and comments are included
below.
This paper seems to combine a number of ideas that have
been floating around in the literature in order to obtain a
very interesting protocol which is resilient to packet
drops. The approach to correcting for dropped packets
through the use of the additional state variables sigma and
rho seems similar to the approaches described in [4], as
well as in some of the literature on broadcast gossip
algorithms. From the practical standpoint, this approach
requires each node to use additional memory proportional to
the number of (incoming and outgoing) neighbors. This is a
non-trivial extension of standard distributed averaging
algorithms, and to be fair, this this should be stated more
clearly in the abstract and/or introduction.
Another practical concern has to do with the fact that the
messages \sigma_k[i] have unbounded magnitude (as k -->
\infty). This has non-trivial implications for
implementation of this approach: in a coded system, the
number of bits required to represent \sigma_k grows with
time, and subsequently, the time required to transmit it at
each iteration also grows (assuming the bitrate of each
transmitter remains fixed). Is there a more practical
work-around?
This approach also assumes that each node knows its
out-degree. The literature on neighbor discovery which this
reviewer is familiar with deals either with discovering
symmetric links or focuses only on outgoing links. Knowing
the out-degree in a directed network seems like a strong
assumption. It would help if the authors could include a
reference if one is available.
The authors refer to [5] throughout as the source of the
so-called "ratio consensus" algorithm. However this
algorithm appears to be identical to the Push-Sum algorithm
proposed by Kempe, Dobra, and Gehrke in their 2003 FOCS
paper, "Gossip-based computation of aggregate information".
In the discussion of reliable links, it may be worth also
mentioning that Approach 1 (acks) requires bi-directional
communication links, and so it doesn't apply to directed
networks.
It seems important that every link in G successfully
supports a transmission every once in a while. I expected
to see a condition similar to the "bounded delay"
conditions that typically appear in work on distributed
averaging with delayed communications. Is there a
fundamental reason why this isn't necessary in your work,
or is it an artifact of the assumption that link success
events are iid, and so the time between successful
transmissions (analogous to the delay) is bounded whp?
Finally, with regards to the conclusion, indeed my first
thought upon encountering property M4 was that it is not at
all critical. However, it seems necessary in the proof of
Lemma 1 to conclude that W_k is scrambling, and it isn't
immediately obvious why W_k having non-zero columns would
be sufficient.
Typos:
- There is a comma missing between k and the \sum at the
bottom of page 3.
- In Lemma 4, "instances" should be "instants".
--_----------=_134252753232161171
Content-Disposition: attachment; filename="Review59440.txt"
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; name="Review59440.txt"
Reviewer 4 of CDC12 submission 1652
Comments to the author
======================
The paper proves convergence of the robust consensus scheme
in the presence of unreliable links. The topic is of
interest but the paper presentation should be strongly
improved. The Subsection IV.B and V.A should be shorten in
behalf of introducing a numerical example to better clarify
the effectiveness of the proposed algorithm. Moreover it is
not clear the overall benefit in getting the agreement
about the ratio $\pi^*$ (eq (37)): actually $\pi^*$ is the
average consensus if $z_0[j]=1$ for all $j$ (that is the
assumption made in the paper). The authors should explain
the advantages to get an agreement on $\pi^*$ for a general
$z_0[j]$.
Minor comment:
- the sum $\sum_{j=1}^{k}\mu_{k}[i]$ in Section III.B seems
wrong
--_----------=_134252753232161171
Content-Disposition: attachment; filename="Review59442.txt"
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; name="Review59442.txt"
Reviewer 5 of CDC12 submission 1652
Comments to the author
======================
1. This paper is written in very difficult if not obscure
language. The purpose and objectives are not clear.
2. The posible applications in technical world were not
identified.
3. Convergence of the algorithm was not properly
demeonstrated.
4. Conclusions should not include any refereneces of
abbreviations.
5. This paper may need to be rewritten.