----------------------- REVIEW 1 ---------------------
PAPER: 93
TITLE: Fault-Tolerant Multi-Agent Optimization: Optimal Distributed Algorithms
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
The problem here is to agree on a scalar value x that optimizes a function that is the sum of individual convex differentiable objective functions h_i(x) for each process i, even though h_i is only known to individual processes i and up to f of those processes are Byzantine. The paper seems to be a capstone of a series of technical reports leading up to the current results, which show that even though it's not possible to optimize the real objective function, it is possible to optimize in the limit a weighted sum of the h_i functions in which some of the processes may be assigned weight zero.
At first glance this seems like a very strange kind of solution. If we imagine that the original problem is to figure out where to set the thermostat, and that some people prefer it warmer and some colder, the solution is to throw out the opinions of any outliers and reach agreement only among the most agreeable remaining processes. Indeed something like this is both necessary (because the Byzantine processes might insist that only temperatures above 8000 K are acceptable, and we can't tell if these processes are in fact Byzantine or just want to get a very close suntan) and the core of the algorithm, which is doing gradient descent based on a weighted average of gradients that excludes the f most extreme gradients in either direction. But it seems a little hard to motivate a priori, and it might be helpful to have some discussion of when a group of users would accept optimizing the modified objective function instead of the original one.
Some additional oddities of the solution are that in guarantees convergence to agreement only in the limit, and even the agreed value is not stable (although the processes will continue to agree on what to set the thermostat to, this value may go up and down over time as the Byzantine processes revise their claimed objectives). It's not clear that any algorithm could do better than this, although it seems that perhaps adding some sort of weighting of old values could at least slow it down.
As an optimization algorithm, this is not terribly exciting, as gradient descent to optimize a convex function is pretty standard stuff. The main contribution is showing how to make it robust against Byzantine inputs and coming up with a formal characterization of the quality of the resulting solution. This is a significant contribution and argues for accepting the paper.
The presentation of the paper is good overall, if a little formal. The introduction could use some examples of how one might use solutions with these characteristics.
----------------------- REVIEW 2 ---------------------
PAPER: 93
TITLE: Fault-Tolerant Multi-Agent Optimization: Optimal Distributed Algorithms
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
The paper studies the performance of a version of the distributed optimization algorithm under the presence of adversaries or faulty agents. In this sense, the paper aims to generalize the results on robust consensus algorithms to distributed optimization problem. The main focus of the paper is on Byzantine faults. For the case where the underlying network is fully connected, the authors provide an algorithm, which relies on exchanging the states and and also the gradients of the local objective functions.
Overall, the problem is of interest and will be suitable for presentation.
The main drawback of the algorithm is that the network is assumed to be fully connected, and also the exchange of the gradients. This reviewer is not sure why one needs to rely on exchanging gradients for achieving the objective. The authors mention that the algorithm where the agents ignore the values of the extreme neighbours, similar to the one implemented for consensus algorithm, does not achieve the objective and that is the main reason that the exchange of the gradients is required. This step is not clear for this reviewer. In particular, suppose that the agents know (or have a bound) on the number of faulty agents (say $ F $), and they implement an algorithm similar to the distributed optimization algorithm where the agents ignore the values of the extreme neighbours in their neighbourhood (which would actually be the same for all agents when the network is complete). The underlying directed graph that shows the communications between "non-faculty" agents in each time!
is strongly connected, and hence the modified distributed optimization algorithm should still remain in the convex hull (in particular, these agents should still be able to achieve consensus), but as the authors noted, the convergence will be affected by the sequence of left-eigenvectors corresponding to the sequence of strongly connected graphs. Hence, it seems that one should be able to prove convergence to the convex hull, without the exchange of the gradients. Now, it might be true that states of these agents still wonder in the convex hull of their individual minimizers (not clear if this can happen given that the underlying network is complete). But this has to be shown, to justify relying on the exchange of gradients, which seems slightly unrealistic.
Also, the analysis of the results are overly involved and complicated for a conference paper. I think the exposition and readability of the paper can be improved.
I suggest to discuss in the related work section the following paper:
Maurice Herlihy, Sergio Rajsbaum, Michel Raynal, Julien Stainer:
Computing in the Presence of Concurrent Solo Executions. LATIN 2014: 214-225
----------------------- REVIEW 3 ---------------------
PAPER: 93
TITLE: Fault-Tolerant Multi-Agent Optimization: Optimal Distributed Algorithms
AUTHORS: Lili Su and Nitin Vaidya
----------- Review -----------
General description:
The paper discusses the problem of optimization in a Byzantine, synchronous distributed system. Each agent has its own optimization function (real-valued domain),
and the idea is to find a global real value that ideally represents a "balanced" convex combination of the functions across the agents.
Contributions:
C1) Show that the convex combination of the agent functions may not be admissible when the number of non-zero indexes of the convex combination is big enough;
C2) Provide an algorithm that gives a 2-approximation within each index of the optimal convex combination for as many non-zero indexes as possible, in order to guarantee the existence of the output;
The authors claim a third contribution, a "constrained" version that limits the domain space of the optimization function. I see that contribution as part of a different work, referenced from here.
All the details in that constrained version are in this external work. Granted, many of the principles discussed here seem like they would apply directly to the proofs of the constrained version.
Merit:
The problem is interesting to the distributed computing community, and serves as basis to similar algorithms in different settings, particularly
i) the constrained version mentioned by the authors;
ii) asynchronous Byzantine systems.
The impossibility result (C1) is an interesting result (although relatively direct, simple).
The algorithm represents an instance of a conceptually simple protocol that presents enough challenges to constitute a new problem. While the general technique of the algorithm is based on old principles (such as discard extremal inputs to eliminate Byzantine influence), the convergence argument is nontrivial. The main reason is that gradients in the iterative conversion are dealt with independently from the current value from the functions computed in other processes.
I like the insight that the optimization problem fundamentally relies on the convexity of the solution space. This can be useful to immediately apply the algorithm to other appropriate domains.
Presentation:
The authors do a good job describing the high-level ideas of the proofs, and what are the main difficulties demonstrating the results.
The text is well-written and it is easy to understand the overall proof strategy from the main text.
On page 11, "When agents crash..."
- You should probably put this in a section called "Discussion and Future Work".
On page 11, "We have only considered synchronous...":
- Same comment as above
On page 11, "Open problems":
- Same comment as above
Criticism:
"although our algorithm does provably exhibit the desired behavior as above (i.e., approximate equality) after adequately large finite number of iterations."
- I would like to see a better discussion on this. You show that the step sizes are asymptotically bounded for a specific choice of $\lambda$ over time, in Proposition 1,
and I wonder how it relates to the convergence in a finite number of steps.
"While the algorithm structure above resembles the prior algorithms for Byzantine consensus, the key difficulty in proving the desired admissibility result arises due to the possibility that the Byzantine agents send different (erroneous) gradients to different non-faulty agents"
- This is a problem, because you can rely on protocols that eliminate this problem even when the N > 3t (and you mention this further ahead in the paper). I don't see why you're pointing this out here.
I think this is a well-written paper, yet the techniques are standard in the area (the proof of theorem 1, trimming for values in R). The problem is interesting, however, but I would like to see a more elaborate discussion on other possibilities of implementing this protocol, possibly using standard algorithms such as approximate consensus. Considering all the observations above, I will settle for (weak) accept.
Specific corrections:
Page 2, third paragraph: f(x) -> g(x)
Definition 2: I think readers would be more comfortable with the distance definition expressed in terms of argmin instead of using "inf".
Page 9, inequality (10), and following line: I don't see the definition of L anywhere. Did I miss it?
Page 9, Sec 6.2, first paragraph: "is asymptotically in Y". Maybe you're missing a word here?
Page 11, second to last paragraph: "withX" -> "with X"
Appendix B, case (ii): min (argmin p1(x)) -> min (argmin p2(x))
Please add a "Discussion and Future Work" section. Some readers like to look at the abstract and then at the conclusions before getting into the details.