----------------------- REVIEW 1 ---------------------
PAPER: 153
TITLE: Non-Bayesian Learning in the Presence of Byzantine Agents
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
The authors study the problem of non-Bayesian learning, where agents of a network have to learn an unknown "state of the world" \theta among a finite set of possible states, despite the presence of some Byzantine agents that can misbehave in arbitrary ways. At each round each agent receives a "private signal" about the state of the world and can communicate with her neighbors and update her "belief" probability distribution over the possible states of the world. The authors propose an algorithm (an update rule for the agents) and prove that, under some assumptions, the belief probability distribution of every non-Byzantine agent eventually converges almost surely to the probability distribution concentrated on the actual state of the world.
On the positive side, the problem of Byzantine fault-tolerance in non-Bayesian learning problem seems an important one in distributed computing. The solution proposed in this paper builds upon a recent previous result about Byzantine consensus of one of the authors and looks interesting.
On the negative side, the main convergence result (Theorem 4) requires "Assumption 2", which in turn requires "Assumption 1" and it is not completely clear to me how strong are the restrictions those assumptions impose on the topology of the original graph and on the location of the Byzantine nodes. Moreover, there is no explicit discussion about the rate of convergence of the algorithm, even though something might be derived from the proof of Theorem 4.
I have not checked all the details in the proofs, but they look sound. Overall I think the paper gives a sufficient contribution and fits the scope of DISC, so I lean toward acceptance.
----------------------- REVIEW 2 ---------------------
PAPER: 153
TITLE: Non-Bayesian Learning in the Presence of Byzantine Agents
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
This is an interesting paper. It brings together two relatively disjoint sets of literature: the problem of distributed learning and the problem of Byzantine agreement. As a result, I think this would be a very valuable paper to include in the DISC program.
The basic setup is that a set of agents are collectively trying to observe some fact of the world, and it is assumed that each agent has knowledge of the distribution of its reading as a function of the real state. The goal is for every non-faulty agent to eventually output the real ground truth.
The general approach is based on average Byzantine consensus (and Tverberg's theorem). It builds nicely on ideas from previous work on iterative Byzantine consensus, showing how to apply those ideas to the learning framework to ensure agreement.
A few comments to the authors below:
Comments about the introduction:
(1) The claim in the abstract and intro that we can't just use Byazntine agreement as a black box because communication depends on local observations is unclear. A lot of the meaning relies on the word "effective" and that just isn't clear. At this point in the paper, it is mysterious why nodes can't just take an observation of the world and just run a standard Byzantine agreement (in the convergent sense) on that observation. That would ignore the extra information provided by the marginal distributions of the local signals, but it would provide something. I think it isn't just that basic Byzantine agreement wouldn't work. It's the fact that Byzantine agreement would result in a much less good solution.
(2) I gather the term non-Bayesian is standard in the learning literature, but it feels kind of misleading. Defining an approach by what it is NOT seems strange.
The problem *specification* isn't Bayesian or non-Bayesian. The problem specification is to learn the true real ground truth being observed. So the problem isn't non-Bayesian. And it is true that your solution isn't Bayesian, in the sense that it doesn't use a Bayes-rule update function. But by that logic, the entire (average) consensus literature is also non-Bayesian as is lots of other research. Again, strange to list what the algorithm is NOT. There are many other things that it is also NOT.
So if you were one of the first papers in this area, I would encourage you to remove the term non-Bayesian. But I'm guessing that using that term in the title ties this paper to the existing literature in an important way and so you can't change that.
(3) It would be nice if you could say a bit more in the intro about the algorithm itself and the precise results. I go to try to find a precise statement of results and I'm left searching until I find Theorem 4, which is at the end and not even self-contained (due to Assumption 2).
(4) Do you know anything about the convergence rate? It presumably depends on the quality of agents' observations. Are there any simple cases where you can say something useful (e.g., if you assume the distribution of each observation is gaussian with mean equal to the real ground truth)? Perhaps in other cases, the convergence time can be arbitrarily long? Perhaps at least discuss this in the introduction.
Three model questions for the authors:
(1) In the model section, should there be some assumption that the private signals provide useful information? For example, imagine that \ell(x | \theta) is uniform over all x (i.e., regardless of the ground truth, the observation is just random noise)? Maybe you exclude this later in the paper, but reading Section 2, I'm not seeing this. [Update after reading more: I think you resolve this in Section 4.1, but I would like to see a discussion of this earlier.]
(2) Assuming that you do assume each agent gets a signal that provides some useful information. then couldn't each node come to its own decision (with no communication) in the infinite limit? That is, over enough time, a given agent could simply learn from their own observations the correct output.
(3) In that case, I see two possible goals for collaboration. One is that some users have observations that provide useful information and others do not. (And those that do not have useful observations know it because you assume the marginal distributions are known.) In that case, it is particularly important to state precisely the assumption that the collection of observations provides sufficient information. The other possible goal is to converge faster via communication then alone. In that case, we need theorems not just about "in the limit" but about rate of convergence.
Question about the results:
----------------------- REVIEW 3 ---------------------
PAPER: 153
TITLE: Non-Bayesian Learning in the Presence of Byzantine Agents
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
The paper develops an algorithm for nodes within a network to learn a "state", via local communication, in the presence of adversarial Byzantine nodes.
The point of departure is the non-Bayesian model of the recent Jadbabaie et al., in which each node iteratively updates its belief as the geometric mean of its neighbors beliefs and of its Bayesian belief update.
p2: "In this paper, we consider static networks for ease of exposition, although we believe that our results can be easily generalized to time-varying networks." - the candor of the "we believe" here is commendable. Though on p4 this is omitted.
p1,p2: "the *effective* communication networks are *dependent* on all the random local observations" - it's not clear exactly what this statement means, particularly with the two italicized words
p1,p3: what does "the likelihood of the cumulative private signals" mean?
Although there are occasional missing word typos throughout (see below), the paper is generally well written.
Typos:
p1: "agents collaboratively" - missing word "to"
p1,p2: "On the other hand," -> "Moverover,"?
p2: "fully Bayesian belief update rule" - missing word "a" or "the"?
p2: "the fault-tolerant version the non-Bayesian framework" - missing word "of"
p4: "attracted significant amount" - missing word "a"
p4: "the past work mostly focus on" -> "focuses"
p4: "the steps inOne-Iter" - missing space