---------- Forwarded message ---------
From: ICDCN 2020 <icdcn2020@easychair.org>
Date: Fri, Sep 20, 2019 at 12:53 PM


Dear Zhuolun Xiang,

We are happy to inform you that your submission
112: Global Stabilization for Causally Consistent Partial Replication
was selected for presentation as a full paper at ICDCN 2020. Congratulations!

You will soon receive another email with instructions for preparing and submitting the proceedings version of your paper. Please note that it will be due by **October 18**.

You will find the reviews below. This year we received 47 submissions to the Distributed Computing track and we accepted 14 full papers. 

Thank you for submitting your work to ICDCN. We look forward to seeing you in Kolkata!

Best regards,
Keren Censor-Hillel and Arpita Patra
ICDCN 2020 Distributed Computing Track Chairs

SUBMISSION: 112
TITLE: Global Stabilization for Causally Consistent Partial Replication


----------------------- REVIEW 1 ---------------------
SUBMISSION: 112
TITLE: Global Stabilization for Causally Consistent Partial Replication
AUTHORS: Zhuolun Xiang and Nitin Vaidya

----------- Overall evaluation -----------
(weak accept)

This paper studies how to apply global stabilization for implementing causal consistency in partially replicated distributed storage systems. They propose algorithms for both clients and servers. In the algorithms, each operation is associated with a timestamp, which is some physical clock time. The main idea of the algorithms proposed is to serialize all the operations and resulting versions by their timestamps such that causal consistency is satisfied. When a server returns a version of some key to a client, it has to guarantee that all its causal dependencies are visible to the client. This is accomplished by applying global stable time defined as the time point such that the server has received all updates less than this time from other servers. So, a server can only return a version of some key only when the global stable time reaches a certain point. The key contribution of this paper is the computation of the global stable time in a partially replicated system. To com!
 pute the global stable time, a server needs to track heartbeat messages from two sets. The first set represents local dependencies of a key, which consists of edges in the augmented share graph through which other servers can directly send updates to the server.  The second set represents remote dependencies of a key, which consists of all edges through which other servers can send updates to some servers who are in the same server set with this server. Each server decides which servers to send heartbeat messages to based on the above two sets. Each server computes the global stable time based on heartbeat messages received from the above two sets. The global stable time computed is guaranteed to be optimal for general partial replication. This paper also presents optimization for remove update visibility latency by allowing clients to migrate among different servers. Simulation results demonstate that the proposed mechanism has better performance than a full repplication s!
 ystem in terms of visibility latency and message overhead. 

Main comments:
1. This paper solves the problem of implementing causal consistency in a partially replicated distributed storage system by using global stabilization. 
2. The algorithms presented in this paper are correct and clear. 
3. The simulation part shows the effectness of the proposed algorithms. However, it only compares performance with a full replication system. I would suggest that authors
compare their algorithm with some known partial replication system.


----------------------- REVIEW 2 ---------------------
SUBMISSION: 112
TITLE: Global Stabilization for Causally Consistent Partial Replication
AUTHORS: Zhuolun Xiang and Nitin Vaidya

----------- Overall evaluation -----------
The paper proposes a protocol for the implementation of a distributed multi-version key-value store, where keys, and their values and versions are replicated for fault-tolerance. Replication is done on the servers of the distributed system in such a way a server is allowed to store only a subset of the keys of the domain. That is why the paper defines the data model as general partial replication.
The proposed protocol for the implementation of GET, i.e., read, and PUT, i.e., write, operations ensures that the store guarantees causal consistency, meaning that it returns values to a client that are consistent with causality. At high level, given the definition of causality relation on operations (e.g., like the one of Definition 3 in the paper), and a pair of operations <a,b> that belongs to the relation, then a client that has seen the effects of b also sees the effects of a.

Causal consistency is guaranteed by relying on a global stabilization technique, which is usually adopted in the context of causal distributed data stores, and allows the computation of a global stabilization time at a replica when is safe (with respect to causality) to read values at.

The paper also claims that the proposed protocol is optimal in terms of the remote update visibility latency in the context of partial replication. It also provides the result of a simulation that aims to compare a potential efficiency of the proposal with respect to an existing implementation of a distributed causally consistent key-value store, i.e., GentleRain.

The motivation of the proposal is explicitly focused on the lack of causally consistent data stores that implement partial replication as data model in literature.

The paper has some potential, but it has several weaknesses as well. The details of the protocol are not clearly explained, and that makes the evaluation of the paper really hard, in my opinion. Furthermore, the strength of the motivations is not clearly proved. As an example, I could argue that the main competitor, GentleRain, supports general partial replication, where each server can store a subset of the keys. I provide more details below.

In general I would recommend an improvement of the quality of the presentation, which will certainly improve the relevance of the proposal, and make its novelty clear.

It is also hard to assess the novelty of a proposal of an implementation of a distributed key-value store without a real implementation and a system-to-system comparison.


* More details (in order of appearance in the paper) *

- Abstract, page 1: The abstract provides details about related work, which is misleading. The abstract should summarize the proposal in a very direct way.

- Abstract, page 1: “by the previous work” -> “by previous works”.

- Abstract, page 1: “However, all previous designs…within one data center.”. Which I think is true. However, the authors do not use that as a motivation anymore. On the other hand, general partial replication becomes a motivation, where each server can store an arbitrary subset of the data. As far as I understand, the proposal does not assume any grouping of servers in different data centers, which is different. To the best of my knowledge, in GentleRain, which is the competitor of the proposal, the domain is partitioned in n shards, and each shard has m replicas. Therefore, we can have n*m servers, each storing just one shard, hence a subset of the data. The only constraint is that there must be m data centers, and each data center has to maintain all of the n shards. Therefore, the whole domain is made available to clients connected to a data center by servers of that data center.

- Abstract, page 1: “general partially replication” -> “general partial replication”.

- Abstract and introduction, page 1: I would use just one dedicated section for the details about related work. On the other hand, I see that part of the related work is both in the abstract and in the introduction. Also, did the authors consider a comparison with Wren (DSN’18) as well? Wren provides non-blocking reads in a partitioned causally consistent data store.

- Introduction, page 1: “many systems provide consistency guarantees when clients access the data.” -> what’s definition of a consistency guarantee when a client accesses data? I did not really get the purpose of this sentence. Maybe the authors meant that applications are developed based on the consistency guarantees that the stores provide?

- Introduction, page 1: “its emerging applications in social networks.” -> Why is that in the case of social networks? That should be justified.

- Introduction, page 2: “However, there are several constraints on the design of GentleRain.” -> As already explained above, the difference between this proposal and the one of GentleRain is not that clear to me. They both support sharding of data, and then each shard is replicated. In GentleRain a client does not need to read from a remote datacenter, though. The protocol of this paper does not have that as a constraint. That can entail an additional delay on read operations. But it is not clear how that has an impact on the computation of GST. For instance, GentleRain uses a heartbeat mechanism as well. This needs to be clarified. Like a step by step comparison, since the authors claim that their protocol structure is inspired by GentleRain.
As far as I understand, the protocol uses an a priori knowledge on the accesses from the clients to save heartbeat messages that are used for computing GST. If so, how practical would that be?

- Introduction, page 2: “Under these constraints, the global stabilization approach is simple and straightforward.” -> This sentence is not fair. I think that is just about minimize the importance of previous contributions.

- Section 2, page 2: How is S_c related to S? Is S the set of servers? Also, is C the set of clients?

- Section 2, page 2: about the definition of “the set containing all clients’ server sets” -> That is not formally correct. Shouldn’t that be G={S_c : c \in C}? Also, G is the superset of S, hence |G|\leq 2^n. But isn’t |G|=m?

- Section 2, page 2: Definitions of point-to-point, reliable, and FIFO are needed.

- Section 2, page 2: The authors should formally define what a version is. On the other hand, they write about the creation of a new version.

- Section 2, page 2: Does the origin of the PUT matter in the definition?  Please, the authors should also define operations in a more formal way. Does GET return a value? Where is that value defined? What is the domain of keys and values?

- Section 2.1, page 2: “Only keys shared by multiple replicas are showed in Figure 2.” -> The authors should explain what figure 2 depicts. Otherwise what it is described here seems unrelated.
Examples: are “k” and “a” keys? What does a vertex represent in Figure 2? And what about the edges?

- Section 2.1, page 2: “we define an share graph” -> “we define a share graph”.

- Section 2.1, page 2: “define a augmented share graph” -> “define an augmented share graph”.

- From page 3 on: Please, the authors should consider to use V1, V2, etc., as identifiers of versions.

- Section 2.1, Example, page 3: “G^a consists 7” -> “G^a consists of 7”.

- Section 2.1, Example, page 3: I thought that servers were identified by numbers in 1,..,n. Are h, i, and j in that interval? Where is that explained. Can i be equal to 1, 2, 3, or 4. What is the value of n? 

- Section 2.1, Example, page 3: What is the definition of connected sets of edges?

- Section 2.2, Definition 3, page 3: Isn’t this itself the definition of a causality relation on operations?

- Section 2.2, Definition 3, page 3: The definition of happened-before uses the term “happens”. So what does happen mean?

- Table 1, page 3: “server set that client c can access”. Is the set a priori determined in some way?

- Section 2.2, Definition 5, page 3: Mix of english and logic. Please, use \lor to denote or.

- Section 2.2, Definition 5, page 3: What does “returned from” mean? Is K stored on i?

- Section 2.2, Definition 5, page 3: I am wondering on the validity of the definition when the client does not execute GET(k). Is K visible to c in that case?

- Section 2.2, second bullet of Definition 6, page 3: What if c never executes a GET on k? Or, would this be part of Definition 5?      

- Section 3, page 4: It is not clear from the description why the proposal protocol needs two different scalars for PUT and GET, respectively.

- Section 3, page 4: Basically the authors do not describe the algorithm. They should describe what the pseudocode defines.

- Section 3, line 2 of Algorithm 1, page 4: Why does j have to be different from i?

- Section 3, page 4: “PUT at other servers on some key” -> Is that “some key” or the same key that the client is executing the get on? 

- Section 4, page 5: “is much more complicated than GentleRain” -> “Much more complicated” is neither formal nor technical. The authors should be more accurate. For instance, it is more expensive because, e.g., it requires more communication steps, etc.

- Section 4.1, first column, page 5: What’s the definition of vertex repetition? What’s the meaning of direct edge in an undirected graph? Which cycle are the author talking about in the example of L_i(k) of Figure 3? Also, text of the column has to be broken down into multiple formal definitions.

- Section 4.1, first column, page 5: Why is 3 in the example on R_i(g)?. What’s the path that the authors are considering there?

- Section 4.1, first column, page 5: “c’” -> Maybe, c would be more accurate since c is in the figure.

- Section 4.1, page 5: “along which the causal dependencies of key k’s version may be sent” -> It does not seem to be specified in the definition.

- Section 4.2, page 6: Isn’t this redundant? See third paragraph of section 3.

- Section 5, page 6: Statements of lemmas, and proof sketch of Theorem 1 are sound to me.

- Section 6, page 6: “clients are allowed to migrate among the servers freely without extra delays” -> What’s an extra delay? Extra due to what?

- Section 6, page 6: “GST is optimal” -> I do not understand the definition of optimal here. Roughly speaking, I see as optimal something that cannot be improved further. On the other hand, the authors are proving that GST is valid, meaning that if the reading rule adopts GST as defined, then causal consistency is not violated.

- Section 7.2, page 7: I understand the intuition. However, the pseudocode needs to be described.

- Section 8.1, page 8: Evaluation is weak for this kind of protocols since it is a simulation. It would be really helpful a system to system comparison, to quantify the improvement and hence the novelty. In general, it would be nice having a name for the proposed algorithm for referencing. “Ours” is not ideal.


----------------------- REVIEW 3 ---------------------
SUBMISSION: 112
TITLE: Global Stabilization for Causally Consistent Partial Replication
AUTHORS: Zhuolun Xiang and Nitin Vaidya

----------- Overall evaluation -----------
The paper proposes an algorithm that guaranties the causal consistency in partially replicated systems. The algorithm uses the technique known as "global stabilization time" also used previously in the literature.

The authors provide also an evaluation of the performances of their work.


Comments to the authors:

- Some parts of the paper are common with the authors work published this year in PODC and referenced in the current paper as a technical report. The authors are invited to explicitly explain what parts are common between two works. I would see the curent work as a complement for the journal version of their PODC paper.
- The simulation section is particularly weak: 
-----the authors compare their algorithm only with GentleRain. Why not comparing with Eunomia (which compares with GentleRain)? Eunomia is referenced as a technical report. This paper has been published in  Usenix 2017.
----- the workload used in the simulation is a synthetic. A realistic workload would have been interesting here especially when similar works in the literature provide performance evaluation with real workloads.  The explanation for the use of the ring topology is totally inadequate with the application targeted by the paper.
-----the simulations have been performed only 3 times which is not concludent. 
-----the confidence level of the simulation is not shown