Reviewer 2 of CDC12 submission 1652 Comments to the author ====================== PAPER SUMMARY =========== The goal of the paper is to illustrate an average consensus algorithm robust to random and independent link failures. The authors first define the network model used. Then, the authors review a previous algorithm called ratio-consensus, where each node maintains two variables updated in parallel, and the estimate of the mean is obtained by computing the ratio of these variables. This previous algorithm is guaranteed to converge to the correct average only in the case of perfect communications. The paper then propose to add additional states to store a scaled running sum of each node and the same running sums transmitted by neighboring nodes. The states are then updated from differences of such quantities. The authors show that this scheme is equivalent to a Markov chain where the states are given by the original states plus quantities that keep track, for each node, of the updates that have been transmitted but not received by the nodes. Note that the transition matrices of the states are influenced by the link failures. Using properties of Markov chains, the authors show convergence of the states with probability one, and show that the ratio of variables used for estimating the consensus value can be computed infinitely often. This then implies convergence to the correct average. COMMENTS ======= The paper proposes an interesting and elegant analysis of an extension to an existing consensus algorithms for the case of unreliable link failures. The paper is in general understandable and all the proofs appear to be correct (except for a minor mistake, see below). The main critique that can be made to the paper is in the presentation. The authors spend lots of space in the first 5 pages on issues that could be treated more compactly and that could probably moved to other places where they are more relevant (see details below). Conversely, the technical part (section V and VI) mostly follows the pattern claim/proof/claim/proof with little or no high-level comments on where the results are heading. In addition, the closure of the paper feels rushed. Finally, as a minor complaint, the given assumptions are somewhat impractical: implementing the fact that each node knows the number of outgoing edges (possibly using just one edge) could be cumbersome in practice. The following is a list of possible improvements/typos that the authors should address. -The analisys of [4] given in the review of prior work is not accurate. The work of [4] uses re-transmissions to effectively lower the packet loss probabilities for each link. However, it does not require the use of acks as mentioned by the authors. -In fact, the work of [4] uses an intuition similar to the one used in this paper. The main difference is that in [4] the quantities that have been transmitted but not received are tracked in explicit variables which are used in corrective iterations. On the other hand, in the method proposed in this paper, these variables are implicit and the proof techniques are quite different. The authors should include a closer comparison of [4] with the proposed method. - In the first three pages (e.g., column 1 of page 2 and of page 3) devote a large amount of space to elaborate on aspects of the problem that are quite simple to grasp. The authors should make this sections more compact, and spend more time giving more high-level guidance in the technical parts of section V and VI - Some parts of the first sections should be moved where they are more relevant. Specifically, the paragraph named "A detour" should be moved near (39), where the authors can say that in [5] z_k is always non-zero, but, in the present case, this is not guaranteed, and we need threshold to set a threshold to avoid divisions by zero. Similarly, section III.A should be moved to *after* the introduction of the algorithm. In this way, it would make more sense for the reader and would function as a good introduction to section IV. - The authors shold make more clear that the aggragate states used for the analysis are the original states and the fictitous states \nu all stacked in the same vector y. - In property M3, the constant c seems to actually be 1/max_i({D_i}) - As mentioned before, section V.B and VI should provide a high-level overview of the architecture of the proof before going into the details. - In (27), it should be "T_k" instead of "T_{lk}" - Eq. (33) is a repetition of (31) - The constants mu and mu_z in (42) should be dependent on (i) - Making clear the limits for i and j in section VI would improve the clarity of the presentation (see, e.g., the sums in (37), (40), (41)). - The authors should highlight that, in (37) and in Theorem 1 and its proof, the limits of j in the sum over all the states are not important, since the \nu variables are zero at k=0. This helps understanding why pi^\ast is the desired average. - Page 8, first column. "at least one of the elements of \tilde{z}_{j-1}l must be greater than or equal than 1/n" it should be "[...] than 1/(n+|E|)", where "|E|" is the number of edges in the network. This typo does not affect the rest of the proofs significantly. - This reviewer could not understand last sentence of concluding remarks. --_----------=_134252753232161171 Content-Disposition: attachment; filename="Review56314.txt" Content-Transfer-Encoding: 8bit Content-Type: text/plain; name="Review56314.txt" Reviewer 3 of CDC12 submission 1652 Comments to the author ====================== This paper studies protocols for distributed averaging over links which drop some messages. A novel protocol is proposed and proven to converge in this setting. The convergence proof follows by relating the operation of the protocol to the evolution of a Markov chain on an augmented state space, and then (carefully) adapting tools for studying convergence of forward products of stochastic matrices. Overall the paper is well-written. It contains a number of interesting insights and the main result constitutes a novel contribution. Given the wide interest in consensus and distributed averaging, this work is certainly relevant to the community. I thus recommend the paper be accepted for publication. Some suggestions and comments are included below. This paper seems to combine a number of ideas that have been floating around in the literature in order to obtain a very interesting protocol which is resilient to packet drops. The approach to correcting for dropped packets through the use of the additional state variables sigma and rho seems similar to the approaches described in [4], as well as in some of the literature on broadcast gossip algorithms. From the practical standpoint, this approach requires each node to use additional memory proportional to the number of (incoming and outgoing) neighbors. This is a non-trivial extension of standard distributed averaging algorithms, and to be fair, this this should be stated more clearly in the abstract and/or introduction. Another practical concern has to do with the fact that the messages \sigma_k[i] have unbounded magnitude (as k --> \infty). This has non-trivial implications for implementation of this approach: in a coded system, the number of bits required to represent \sigma_k grows with time, and subsequently, the time required to transmit it at each iteration also grows (assuming the bitrate of each transmitter remains fixed). Is there a more practical work-around? This approach also assumes that each node knows its out-degree. The literature on neighbor discovery which this reviewer is familiar with deals either with discovering symmetric links or focuses only on outgoing links. Knowing the out-degree in a directed network seems like a strong assumption. It would help if the authors could include a reference if one is available. The authors refer to [5] throughout as the source of the so-called "ratio consensus" algorithm. However this algorithm appears to be identical to the Push-Sum algorithm proposed by Kempe, Dobra, and Gehrke in their 2003 FOCS paper, "Gossip-based computation of aggregate information". In the discussion of reliable links, it may be worth also mentioning that Approach 1 (acks) requires bi-directional communication links, and so it doesn't apply to directed networks. It seems important that every link in G successfully supports a transmission every once in a while. I expected to see a condition similar to the "bounded delay" conditions that typically appear in work on distributed averaging with delayed communications. Is there a fundamental reason why this isn't necessary in your work, or is it an artifact of the assumption that link success events are iid, and so the time between successful transmissions (analogous to the delay) is bounded whp? Finally, with regards to the conclusion, indeed my first thought upon encountering property M4 was that it is not at all critical. However, it seems necessary in the proof of Lemma 1 to conclude that W_k is scrambling, and it isn't immediately obvious why W_k having non-zero columns would be sufficient. Typos: - There is a comma missing between k and the \sum at the bottom of page 3. - In Lemma 4, "instances" should be "instants". --_----------=_134252753232161171 Content-Disposition: attachment; filename="Review59440.txt" Content-Transfer-Encoding: 8bit Content-Type: text/plain; name="Review59440.txt" Reviewer 4 of CDC12 submission 1652 Comments to the author ====================== The paper proves convergence of the robust consensus scheme in the presence of unreliable links. The topic is of interest but the paper presentation should be strongly improved. The Subsection IV.B and V.A should be shorten in behalf of introducing a numerical example to better clarify the effectiveness of the proposed algorithm. Moreover it is not clear the overall benefit in getting the agreement about the ratio $\pi^*$ (eq (37)): actually $\pi^*$ is the average consensus if $z_0[j]=1$ for all $j$ (that is the assumption made in the paper). The authors should explain the advantages to get an agreement on $\pi^*$ for a general $z_0[j]$. Minor comment: - the sum $\sum_{j=1}^{k}\mu_{k}[i]$ in Section III.B seems wrong --_----------=_134252753232161171 Content-Disposition: attachment; filename="Review59442.txt" Content-Transfer-Encoding: 8bit Content-Type: text/plain; name="Review59442.txt" Reviewer 5 of CDC12 submission 1652 Comments to the author ====================== 1. This paper is written in very difficult if not obscure language. The purpose and objectives are not clear. 2. The posible applications in technical world were not identified. 3. Convergence of the algorithm was not properly demeonstrated. 4. Conclusions should not include any refereneces of abbreviations. 5. This paper may need to be rewritten.