Reviewer 5 of 2018 ACC submission 1409 Comments to the author ====================== This paper deals with "privatised" versions of the distributed gradient descent algorithm on a graph. It seems to be the conference version of preprints [36-38] by the authors. The authors introduce 3 algorithms where they add noise to the parameters estimated by distributed gradient descent in such a way that the noise balances out eventually, they provide a convergence rate analysis to show how this additional privacy affects the constant and note the actual rate. * The proposed algorithms are rather well introduced as well as comparisons with standard DGD. However their actual privacy and comparison with usual privacy setups (differential privacy) is done too quickly and late in the paper which makes the reader question the interest of the proposed algorithms while reading the paper. * I fail to see an actual usecase for the Functional Sharing version, could the authors provide one? * The numerical illustrations are not very convincing to me: i) polynomial optimization: the functions are rather simple but more importantly the convergence speed depends heavily on the initialization. It seems strange that the RSS-LB performs way better than standard DGD, the reviewer wonders if that it is due to the intialization or a particular realization. ii) Machine Learning: I suppose that the optimized function is non-convex for the MNIST deep neural network training so I wonder what point it actually makes. For the logistic regression, a functional value would have more sense in an optimization context as Accuracy can be tampered with due to regulurarization and overfitting issues. Minor comments: *[Assumptions] The compacity of X is rather crucial as it is a rather strong assumption that implies that (A-1-3, X^* is bounded) and that A-2-2 => A-2-1. The text of the assumptions could thus be simplied to enlight this assumptions. *[Sec V] in the title "Proof of Theorem 3", Theorem is misspelled. Reviewer 2 of 2018 ACC submission 1409 Comments to the author ====================== The authors propose a series of algorithms for distributed optimization with some privacy guarantees. The paper is well written, nevertheless, there is some concerns about the provided results. A border line rejection is suggested. Find the comments bellow: If the authors consider fixed undirected graphs, why do they differentiate the assumption of having strongly connected graphs, it seems that in this scenario the graph being connected is enough. It is not clear whether the authors consider a fixed or a time-varying graph. On the one side, they define the set of neighbors to be fixed, but in their definition of the gossip matrix they use the subindex k. Apart from boundedness, there is never an intuition on how to sample the random variables s^{i,j}, in fact in their algorithm 1, there is never a sampling stage where the realizations of this random variables are drawn. How can they in step 1 assign the value of 0 to a random variable? The boundedness assumption of the random noise variables also implies that the agents require an extra information on the number of agents in the network. Again why does the matrix B depends on k? if the authors use the metropolis weight and the graphs are fixed, then there is no reason to have this. MAIN Concerns: 1) In Theorem 1, the authors claim that their algorithmic proposal “solve” the distributed optimization problem. However, one required input for their algorithm is the parameter MAX_ITER, which by the way seem to be discarded in Algorithm 3. This MAX_ITER is never discussed, how can the authors find a value for this parameter? Moreover, what is the definition of a solution for the problem, since clearly this algorithm in finite time will only produce approximate ones? They talk about asymptotic convergence, yet their algorithms execute in finite time. 2) The authors claim their algorithms achieve some trade-off between privacy and solution of the distributed optimization problem. On the side of the solution itself, Theorem 3 provides a result that relates the magnitude of the added noise versus the convergence rate of the algorithm. However, there is no way to quantify the effect of \Delta in the privacy preservation side. Their results in Theorem 2 talk about preserving privacy but do not relate in any manner the parameter value \Delta. NOTATION: f is both used for a function and the number of adversarial agents, this creates confusion. The experimental results on the machine learning example seem interesting, nevertheless, deep neural networks are particularly shown to be non-convex, thus invalidating the initial convexity assumption required for the results of the paper to hold. For theorem 3, the authors refer to [16] for their convergence results. Nevertheless, [16] work on a different setup of time-varying directed graphs. Reviewer 4 of 2018 ACC submission 1409 Comments to the author ====================== The content and the focus of the paper is very interesting and relevant. In many places the paper you have used the terminology 'feasible objective function', 'feasible instances of problem 1'. Especially, in the key results (e.g., claims and theorems) it is used. In my opinion this is not clear and confusing. When you are using 'feasible set' its definition is clear. However, the terminology of 'feasible function' is not clear. So I want you to be more precise in this. You have mentioned that "When each agent’s local objective function is restricted to be any polynomial of a bounded degree, the set of feasible functions forms an additive group. Theorem 2 makes a claim regarding the privacy achieved using function sharing in this case." However, in assumption A1, you exclude the nonconvex functions. Therefore, the claim above, in particular, "Theorem 2 makes a claim regarding the privacy achieved using function sharing in this case" is not really true. Please refine the sentence appropriately.