>>> ----------------------- REVIEW 1 ---------------------
>>> PAPER: 21
>>> TITLE: Differentially Private Distributed Optimization
>>> AUTHORS: Zhenqi Huang, Sayan Mitra and Nitin Vaidya
>>>
>>> OVERALL EVALUATION: 1 (weak accept)
>>>
>>> ----------- REVIEW -----------
>>> Summary:
>>>
>>> This paper considers the private distributed optimization problem (PDOP) in which n nodes are required to satisfy the following even though an adversary can hear the communication between all nodes: (1) minimize a global cost function (the global cost function is defined as the sum of individual cost functions), and (2) prevent the adversary from deducing the individual cost function of any node.
>>>
>>> Under certain assumptions (convexity, compactness, the n-by-n matrix that represents the communication graph being doubly stochastic, robust connectivity), the paper considers a class of synchronous iterative distributed algorithms that work in the following way: in each round, a node generates a value to broadcast with function R which takes the current estimate of the optimal point and the round number as its parameters, and then broadcasts this value. Within the same round, each node receives values from other nodes and stores them in a buffer and then calls function F which aggregates the values stored in the buffer subject to the edge weights of the current graph, and generates a new value. This new value is fed into function U which basically generates the new estimate of the optimal point that redices the cost function of the node. A particular algorithm can be generated by instantiating functions R, F, and U.
>>>
>>> It is said that an iterative distributed algorithm solves PDOP, if (1) the expected distance between the estimates of the optimal point of any pair of nodes converges to 0 (this property is called Convergence), (2) the limit (w.r.t. round number) of the expected distance between the average estimate of the optimal point among all nodes and the actual optimal point is upper bounded by some value d >= 0 (this property is called Accuracy; the lower the d value, the better the accuracy), and (3) the algorithm preserves what is called epsilon-differential privacy (which ensures that, with significant probability, the adversary does not gain information regarding the individual cost function of an agent, the lower the epsilon value, the better the privacy).
>>>
>>> The paper provides an instantiation of functions R, F, and U. Within function R, noise drawn from the Laplace distribution is added to the estimate of the optimal point to generate the value to be broadcast. Within function F, a weighted sum z of the received values with respect to the edge weights of the current graph is calculated. Function U is instantiated to basically perform gradient descent with a certain learning rate on the individual cost function given z. The calculated value is projected to the domain of consideration to generate a new estimate of the optimal point.
>>>
>>> The authors formally prove that the provided algorithm satisfies Convergence, epsilon-differential privacy, and Accuracy, and also addresses the trade-off between d and epsilon; having all other parameters being fixed, d is in the order of 1/epsilon^2. The authors observe this trade-off through experiments.
>>>
>>> Comments:
>>>
>>> - The problem definition introduces a series of graphs (over which communication occurs in the rounds) which have weights on their edges. At this point, it is unclear what the weights are modeling. Later on it appears that the weights are used in the algorithm (the value received by i from neighbor j is multiplied by the weight on that edges before being added to the sum). Thus the weights are part of the algorithm, not part of the system, and should be introduced with the algorithm, not with the problem statement. How does the algorithm compute these weights in each round?
>>>
>>> - The biggest issue that I have faced in getting a high-level understanding of the paper was relating Definition 4 with its informal description presented in the 2nd column of page 4. How can the inequality in Definition 4 represent the fact that an adversary cannot gain information about any node's individual cost function? An example and/or a more detailed explanation would be really helpful.
>>>
>>> - The paper assumes the matrix that represents the communication graph to be doubly stochastic. Also, it is mentioned there exists distributed algorithms to derive a doubly stochastic matrix among a network. This would imply that the algorithm presented in this paper might not work well if the network is highly-dynamic. Furthermore, it seems like it is important for all nodes to have the same view of the matrix in each round. So, it seems like within each round, the graph topology must be fixed and the duration of a round should be at least the time it takes to complete the distributed algorithm that derives the doubly stochastic network. Relaxing this restriction might serve as a good topic for future work.
>>>
>>> - As presented in the paper, I could see that a good application for PDOP would be military applications. Are there any other applications for PDOP?
>>>
>>> -- small comments:
>>>
>>> - (page 2, bottom of 1st column) It is said that "y=Proj_X(x) if y \in X and ||y-x|| <= ||z-x|| for any z \in R^n \ X". Shouldn't "z \in R^n \ X" be replaced with "z \in X" since y should be compared with every other point in X?
>>>
>>> - (page 3, middle of 2nd column) "Receive(y_i)" should be "Broadcast(y_i)".
>>>
>>> - (page 4, definition 1 and 2) I'm guessing that \mathbb{E} represents the expectation.
>>>
>>> - It might be helpful to use the term "gradient descent" when explaining subroutine U. (and maybe consider the use of the term "learning rate" for the step size)
>>>
>>> - (page 5, middle of 1st column) "The initial noise depends on the three parameters, c,p,q,epsilon ...". I guess that "three" should be replaced with "four".
>>>
>>> - (page 6, statement of lemma 4) I am guessing that "=" should be replaced with "<=" since the last part of the proof of lemma 4 tells us that \Delta(t) <= 2C_2\sqrt(n)\gamma_t.
>>>
>>> - (page 6, beginning of section 4.3) "A(t)" should be "A_t".
>>>
>>> - (page 8, 2nd column, Example 2) "PDOP" should replace "DPOD".
>>>
>>>
>>> ----------------------- REVIEW 2 ---------------------
>>> PAPER: 21
>>> TITLE: Differentially Private Distributed Optimization
>>> AUTHORS: Zhenqi Huang, Sayan Mitra and Nitin Vaidya
>>>
>>> OVERALL EVALUATION: -1 (weak reject)
>>>
>>> ----------- REVIEW -----------
>>> The paper proposes a solution to solving in a distributed way an optimization problem.
>>> In this problem one strives to find a global optimum given a (private) cost function
>>> at each process. Besides approaching the global minimum a secondary goal is to hide
>>> from any observer (looking at the messages exchanged) the actual cost functions of the
>>> processes.
>>>
>>> The paper is quite mathematical, so unfortunately I was not able to fully follow it through.
>>> My main concern is independent of this, however, as my main concerns relate to the notion
>>> of privacy introduced in the paper. It seems that the notion of privacy is one of the main
>>> contributions of the paper, so to make I believe my concerns are critical, but by no means
>>> impossible to resolve. As far as I understand, the notion of privacy is based on the relation
>>> between the probabilities of two observations given two (neighbouring) problems.
>>>
>>> To me there is two problems with this definition:
>>>
>>> The fundamental problem is that here we compare $\mathbb{P}(R^{-1}(P,Y,x)$ with
>>> $\mathbb{P}(R^{-1}(P',Y,x)$ for two problems P and P', but the probability spaces
>>> for the two problems are independent of each other, and indeed the measure was
>>> \mathbb{P}_P was only defined with respect to a single problem P. But in Eq. (4)
>>> there are no subscripts on the probability measures compared (see above). I was not
>>> able to find an assignment of subscripts to the two that I was sure of to make sense.
>>> I hope that this issue is easy to fix (and I am overlooking the simple solution).
>>>
>>> The more practical problem is that differential privacy does not seem suited to the
>>> job it is trying to solve. With respect to Example 1 it seems to say that the adversary
>>> given two sets of observations cannot determine which set of locations the agents are
>>> at exactly. However, in this example being able to determine the approximate position
>>> of some agent might be bad enough. Or put differently, since F (the set of all
>>> possible cost functions) might be infinite, not being able to approximate any given
>>> agents f_i from a single observation is much more useful than not being able to guess
>>> the exact function of some agent given two observations.
>>> Obviously, some might argue that differential privacy is in fact practical; some additional
>>> examples, that include a discussion of the privacy aspect of the problem would help to
>>> make the paper more convincing.
>>>
>>> Minor Comments
>>>
>>> I believe that there is a mistake in the definition of Projection, given the text
>>> the point y should be the point closest to in X, but the formal definition requires
>>> the point to be closer to x than any point not in X. Moreover, the "well known
>>> property" only applies if X is convex which is not explicitly assumed until
>>> assumption 1 one page later.
>>>
>>> Algorithm 1 should return x (or possibly some function of x) but not R.
>>>
>>> Right below algorithm 1: "That is, the Receive(yi) routine ... broadcasts"
>>> I believe it should refer to the broacast routine?
>>>
>>>
>>> ----------------------- REVIEW 3 ---------------------
>>> PAPER: 21
>>> TITLE: Differentially Private Distributed Optimization
>>> AUTHORS: Zhenqi Huang, Sayan Mitra and Nitin Vaidya
>>>
>>> OVERALL EVALUATION: 2 (accept)
>>>
>>> ----------- REVIEW -----------
>>> In this paper, the authors formulate a private distributed optimization problem (PDOP) in terms of differential privacy. In a PDOP, N cooperative agents are required to minimize a global cost function f that is the sum of N local cost functions each of which is associated with an individual agent. The agents may exchange their estimates of the optimal solution of the global cost function, but are required to keep their own local cost functions differentially private from an adversary with access to all the communications among them. An epidemic-like algorithm is proposed in this paper to tackle such a problem. The defined problem is important; the proposed algorithm is novel and analytical results are solid. An acceptation of this paper is recommended. Yet, some suggestions are as follows.
>>>
>>> 1. The iterative algorithm relies on synchronous rounds. Is this synchronization among agents an essential requirement for the proposed mechanism? Or it is just for the convenience of analysis. Will the algorithm still work in an asynchronous manner? It won’t be easy to synchronize a large number of N agents.
>>>
>>> 2. It is not clear how fast the proposed algorithm will converge under the four given parameters of the algorithm, in particular, the parameter of accuracy level, d. Section 4.5 only discusses possible ways to tune these parameters. It appears that the algorithm may take infinite time to converge as the step size is also decaying.
>>>
>>> 3. There are some typos. For examples, in the 5th line of the last paragraph of subsection 3.3, there is a extra ‘of’ after /sigma-algebra. In the 8th line of the first paragraph of subsection 4.5, three words “the three parameters” are misplaced before the accuracy level ‘d.’
>>>
>>
>