======= Review 1 =======

> *** Contributions: What are the major issues addressed in the paper? Do you 
consider them important? Comment on the novelty, creativity, impact, and 
technical depth in the paper.

This paper proposes a novel scheme for guaranteeing Byzantine fault tolerance. 
The proposed method is based on network coding, and to the best of my knowledge 
is a new application of this technique.  The paper does not provide very much 
theoretical analysis (although there is a proof of correctness), relying instead 
on testbed evaluation.  The performance improvement, as measured in the testbed, 
appears to be relatively minor.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

The paper presents a new approach to an important problem, as well as a new 
application of network coding.  The experimental data provide comparison between 
the proposed scheme and existing methods.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be 
brief.]

The improvement of the proposed method over existing methods is not reflected in 
the simulation plots.  Not much theoretical analysis.

> *** Detailed Comments: Please provide detailed comments that will help the TPC 
assess the paper and help provide feedback to the authors.

My first comment is that the title of the paper is somewhat misleading.  Based 
on the title, the paper appears to be an experimental/testbed study of existing 
Byzantine fault tolerance methods, when in fact it proposes and implements a new 
method.  Also, it is not clear why the contribution is motivated by data 
centers, since Byzantine faults are present in other contexts as well.

The paper proposes an approach (referred to as NCBA) to correcting Byzantine 
faults based on network coding.  While this is a novel application of network 
coding, it  appears to rely on existing network coding tools.  Also, the paper 
would benefit from additional explanation of why the particular coding scheme 
used in this work was selected, as well as some theoretical or experimental 
evaluation of other possible encoding methods.

The paper demonstrates the effectiveness of NCBA through experimental 
evaluation.  The simulation plots in Figures 3-6, however, seem to show that 
NCBA either does not improve on the performance of existing methods (Figures 3 
and 4) or performs worse (Figures 5 and 6, left-most data points).  This casts 
doubt on the practical usefulness of the proposed method.  Hence, while the use 
of network coding in ensuring fault tolerance is an interesting concept, the 
paper does not make the case that it is a viable alternative to the current 
state of the art.

> *** Recommendation: Your overall rating (Please try giving as few borderlines 
as possible). B = (top 30% of reviewer's perception of all INFOCOM submissions, 
but not top 20%) - accept if there is room (3)

======= Review 2 =======

> *** Contributions: What are the major issues addressed in the paper? Do you 
consider them important? Comment on the novelty, creativity, impact, and 
technical depth in the paper.

There are two standard protocols for Byzantine fault tolerant state machine 
protocol that are applicable for data centers: one is the basic one introduced 
by Lamport and then the second one which s a somewhat simplified (i.e., with 
some additional assumption) version that uses collision resistant hash functions 
to reduce the network traffic. The current paper proposes new interesting 
approach to do the same using the well concept of network coding that has been 
fairly extensively used in other applications. The new approach does not use any 
hash function and hence provides a correct solution as opposed to a correct 
solution with a high probability; and, of course, the new schemes is more 
expensive in terms of communication cost but is computationally cheaper. The 
paper also does a comprehensive experimental investigation to compare the three 
schemes and show that the new scheme is at least as good as the previous ones, 
but does not depend on the goodness of the chosen hash functions.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

The use of the of the concept of network encoding to solve the problem is novel, 
interesting and demonstrates the usefulness of network coding in yet another 
application domain.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be 
brief.]

I do not see much except I would like to see some theoretical analysis of the 
proposed NCBA scheme; then that is outside the stated scope of the paper.

> *** Detailed Comments: Please provide detailed comments that will help the TPC 
assess the paper and help provide feedback to the authors.

There are two standard protocols for Byzantine fault tolerant state machine 
protocol that are applicable for data centers: one is the basic one introduced 
by Lamport and then the second one which s a somewhat simplified (i.e., with 
some additional assumption) version that uses collision resistant hash functions 
to reduce the network traffic. The current paper proposes new interesting 
approach to do the same using the well concept of network coding that has been 
fairly extensively used in other applications. The new approach does not use any 
hash function and hence provides a correct solution as opposed to a correct 
solution with a high probability; and, of course, the new schemes is more 
expensive in terms of communication cost but is computationally cheaper. The 
paper also does a comprehensive experimental investigation to compare the three 
schemes and show that the new scheme is at least as good as the previous ones, 
but does not depend on the goodness of the chosen hash functions.

> *** Recommendation: Your overall rating (Please try giving as few borderlines 
as possible). B+ = (top 20% of reviewer's perception of all INFOCOM submissions, 
but not top 10%) weak accept (4)

======= Review 3 =======

> *** Contributions: What are the major issues addressed in the paper? Do you 
consider them important? Comment on the novelty, creativity, impact, and 
technical depth in the paper.

This paper implements and evaluates three different Byzantine Fault-Tolerant 
(BFT) state machine replication protocols for data centers, and the authors 
propose a network coding based BFT protocol that performs comparably with 
Digest.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

This paper represents the first implementation of BFT with network coding, which 
can trade off between communication cost and computational cost in computing 
hash functions.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be 
brief.]

My concern regarding the paper is related to the significance of trade-off 
between communication cost and computational cost in computing hash functions.

> *** Detailed Comments: Please provide detailed comments that will help the TPC 
assess the paper and help provide feedback to the authors.

The authors claim that "even though NCBA introduces 50% more communication cost 
than PBFT does, it is compensated by reducing the computational cost of hashing. 
Through extensive experiments, we verified that NCBA performs at least as well 
as Digest, without relying on any cryptographic assumption on the hardness of 
breaking the hash function."

I really doubt the practical significance of the above claim. First of all, 
standard hash functions such as SHA-256 has been widely applied almost to every 
aspect of the network/communication/internet/security protocols. I don't see any 
problem using it.

Hash functions are designed to be efficient in computation; paying 50% more 
communication overhead to reduce the computation cost of the hash function just 
is not the right way to pursue in my opinion.

That said, it is nice to see a new approach implementing BFT based on network 
coding, even though its significance is not justified too well.

> *** Recommendation: Your overall rating (Please try giving as few borderlines 
as possible). B = (top 30% of reviewer's perception of all INFOCOM submissions, 
but not top 20%) - accept if there is room (3)

======= Review 4 =======

> *** Contributions: What are the major issues addressed in the paper? Do you 
consider them important? Comment on the novelty, creativity, impact, and 
technical depth in the paper.

This paper investigates the Byzantine Fault-Tolerant (BFT) replication protocol 
design problem for data center, online storage, and cloud computing. Instead of 
using hashing function and relying on its good collision-reduction property, the 
proposed protocol uses RS codes for state broadcasts among replicas. Such RS 
codes allow the replicas to efficiently determine the faulty one(s) among 
themselves.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

The strength of the paper is the unique application of RS code into faulty 
tolerancy and cloud computing. The problem is interesting and solution is novel. 
The paper is well-written.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be 
brief.]

Please see the detailed comments below.

> *** Detailed Comments: Please provide detailed comments that will help the TPC 
assess the paper and help provide feedback to the authors.

This paper investigates the Byzantine Fault-Tolerant (BFT) replication protocol 
design problem for data center, online storage, and cloud computing. Instead of 
using hashing function and relying on its good collision-reduction property, the 
proposed protocol uses RS codes for state broadcasts among replicas. Such RS 
codes allow the replicas to efficiently determine the faulty one(s) among 
themselves.

The strength of the paper is the unique application of RS code into faulty 
tolerancy and cloud computing. The problem is interesting and solution is novel. 
The paper is well-written. 

I have several comments: 

In Figure 1, prepare phase, shouldn't peer 3 broadcast to peers 1 and 2 as well? 
In addition, blue color arrows in this figure can't be distinguished in 
black/white printout and should be avoided. 

In first paragraph of Section IV.B. The third issue (a faulty peer Pi broadcasts 
Detected_i=TRUE even though it should be FALSE). Under what circumstances will 
such an error occur? 

In the following paragraph, Y will ignore X when X ignores Y (as discussed by 
the authors). Will this affect whether other fault-free nodes will detect the 
faulty Y? Majority vote?

The linear lines in Figure 3 are not very informative. The authors are suggested 
to present results that help readers understand the contents better. 
Furthermore, why did the authors include SHA-256 with SHA-512? The results look 
rather similar.

> *** Recommendation: Your overall rating (Please try giving as few borderlines 
as possible). A = (top 10% of reviewer's perception of all INFOCOM submissions) 
strong accept (5)