TPDS '20

Reviewer 1

Recommendation

Author Should Prepare A Minor Revision

Comments

This paper presents OWebSync, a system to synchronize state in collaborative web-based applications. The paper addresses an important problem, is of interest to the TPDS audience and in general it is easy to follow even though the organization could be improved.

Detailed comments:

the intro motivates the paper well, but the organization could be improved. The contributions of the paper (briefly discussed in the paragraph "In this paper, we present..." would be better suited to appear after discussing the existing approaches, right before the last paragraph of the intro. that presents the roadmap
when first an acronym such as CRM or HR, the full name should be given
§2.1.2 when there are conflicting edits, what's the right decision from the application's perspective? How does this maps to the underlying data structures?
§2.2.1 "Hence, OT is not resilient against message loss in offline situations". This is confusing as, by definition, when offline are messages are lost
§2.2.4 discussed the approach taken in the paper. This would be better suited for the next section, that introduces the system and data model
§3.1 and §3.2 contains some general background about data structures, this would fit better in §2
The discussion of the ORMap should provide some context (maybe some examples?) to make the paper more accessible to a wider audience
"First, we extended this data structure with a Merkle-tree that we pattern over the object's logical tree-structure." What's the meaning of "pattern over"?
A figure with an example of the data structures, for instance for the example already provided in the paper, would be rather useful to help the reader better understand the concepts being discusseds
Algorithm 1 (should be an Alg. not a "Figure") needs a more detailed explanation both in the text and as comments in the pseudo-code itself. Understading all the details of the Merge procedure, in particular, can be challenging
In Alg. 1 the definition of the KV is missing (this is stated in the text but the alg. should be self-contained)
In the line of a previous comment, a figure depicting what's happening in Fig.2 and Fig.3 would be rather useful. How does each entry in the JSON maps to the tree?
Even though the paper assumes non-malicious users, MD5 is rather weak as there are several semi-automated tools to generate collisions for arbitrary content. Why not select a stronger hash function?
§4.3 the GET returns the CRDT if the hash is different from its own CRDT at the given path. What happens otherwise? How does the receiver distinguishes between no changes and a message lost/delayed? Same applies to the REMOVE
"The synchronization starts with the highest CRDT in the tree" Meaning the root?
$4.3 "If during a merger process, a child..." This paragraph is rather confusing. What "sides" are being discussed? Client and server side? Two distinct clients? The text should be improved and the problem being addressed should be defined more precisely
In the evaluation, the text discussing the results and the plots themselves are quite far apart, reorganizing the text so that it's closer to the plots would improve the readability of the text. Also it would be more useful to introduce each benchmark and the results rather than presenting first all benchmarks and then all the results. I've found myself going back and forth in the paper trying to understand the results in light of the benchmark being used
Regarding the results some things are not clear.
- "The synchronization times of the succeeded updates are illustrated in ...". Does this means that some updates might fail causing clients to diverge? If so on which techniques and what is the impact of this in the results and application?
- What percentiles to the whiskers cover? The sentence "Only at the upper whisker, all of the missed updates are fully synchronized" seems to contradict the caption "Only at the upper whisker, most of the missed updates are synchronized". It is the 100th percentile? The 99th?
- What is the shape of the tree, depth and fanout, being used in OWebSync? What's the impact of different shapes in the results? How is the shape decide in the first place? Can the application have some control over this? This seems to have a significant impact in the results hence evaluating some possible configurations or at least discussing their impact would benefit the paper
- The key of Fig. 6 is not very legible, it'd be better to place the name of the system below each respective whisker plot. Same applies for the other plots
- "full synchronization to all clients in the online scenarios" What does this means? It is unclear how time is measured. Is it the time for all clients to sync after all the updates finish? Is this realistic? If so, what does this means for the case of offline clients? Do they all come back after the updates finish just to synchronize? If that's the case I don't think this is a very realistic measurement, as the measurement should be taken while the system is processing other updates.
- What are the assumptions regarding clock synchrony to perform this measurement?
- What is the impact of each technique in the server resources? It is mentioned that OT techniques are more resource intensive, but what about the other systems? Is OWebSync more efficient on server resources than the other approaches? This is a key comparison to be able to fully access the practicality of OWebSync.

Additional Questions:

Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: This paper presents a technique to improve the synchronization time of collaborative web-based applications subject to frequent offline periods.
Is the manuscript technically sound? Please explain your answer under Public Comments below.: Yes

Which category describes this manuscript?: Research/Technology
How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below.: Relevant

Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes
Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate
Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes
How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Satisfactory
Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Easy to read
Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included
If yes to 6, should it be accepted:
Would you recommend adding the code/data associated with this paper to help address your concerns and/or strengthen the paper?: Yes

Please rate the manuscript. Please explain your choice.: Good

Reviewer 2

Recommendation

Author Should Prepare A Major Revision For A Second Review

Comments

The problem of the paper is well-motivated and the background is explained precisely and succinctly. It is convincing that the paper aims at making a significant contribution to the state of the art, with relevant practical applications, as illustrated by two use cases.

My main concerns, however, are about the Chapter 3 (data model). While probably none of the issues are a complete show-stopper, significant additional work is required to make that part of the paper unambiguous, convincing, and correct. The main issue are errors in the core ORMap CRDT implementation. It is under-specified, contains errors, and lacks any kind of (even informal) attempt to proof that it is actually correct.

A fundamental concern is that the authors claim that their ORMap is based on Shaprio's ORSet, but it fact it is fundamentally different. A usual ORSet checks, in its GET function, if some key is in the Removed set. The authors' version does not do so, without explanation. In an ORSet, a key may appear more than once (with different unique tags) in the Observed set (if added by multiple nodes independently). Checking for set inclusion is obviously easy, but if you implement a map instead of set, with key-value pairs, you would end up with multiple values for a single key (and different tags), and deciding which value is the "right" one is not obvious. The authors do not address that problem, may because their "ORMap" is fundamentally different from a usual ORSet, but fail to explain that part. Also, ORSet's delete function deletes *all* instances of a key (all tags), whereas the code in Fig 1 apparently removes only one. The evident differences are significant, and it is certainly not obvious that the proposed ORMap data type is correct. At least some better justification why this data type is a correct CRDT should be provided

I was trying to understand the ORMap better by reading their pseudo code implementation (Fig. 1), but this a difficult task as well. They use a "JOIN" function in line 42 and line 46 that is never defined or explained. Possibly this is meant to be a recursive call to the MERGE function, but I am not sure about this. Maybe some other magic is happening in that undefined function that addresses some the issue I raise here? Hard to tell, if he paper lacks that information.

An important case to be handled by CRDTs is resolving conflicts. Lets consider a very simple case, two nodes concurrently add items with the same key A and different values (1, 2), and different tags. So node 1 has (tag1, A, 1) in its Observed set, and the remote node has (tag2, A, 2) and sends a state update. From what I get from Fig. 1, if a key already exists at node 1 (e.g., line 34, line 40), the local observed set is never changed (the only place this happens is in line 47, which is only reached for a key not locally existing), so based on the code in Fig. 1, tag2 will never make it into node 1's Observed set. In the other direction, by the same argument, the remote node will never add tag1 to its Observed set. While the values (1,2) might be merged by the leaf-level LWWRegister based on their timestamps, the tag in the Observed sets will remain inconsistent after merge. If a node is deleted (by tag id) on one side, it will not be deleted on the other side, resulting in observable difference.

Some additional minor notes on Section 3:

The embedded Merkle tree requires calculating hashes over sets. These hashes needs to be precisely defined in a deterministic way (a set, on its own, has no order, so a simple way of calculating a hash over the (arbitrarily ordered) concatenation of the content is not deterministic. A deterministic hash function is a prerequisite, of course easy to implement, but I consider it important to specify this explicitly in the paper.
One possibly critical assumption of authors is that there are no faulty clients. But not only malicious clients are an unsolved problem, also accidental faults (maybe a client has a local clock that is 10 years off into the future), resulting in a state that cannot be modified for years... This seems to be a real practical problem, and a discussion if such problem can somehow reasonably be mitigated would be a valuable contribution.
Sec. 3.2 explains that "when a value is removed, *it* is added to the Removed set", which confused me (what is added? the value of the key/value pair? or rather the key? actually its a random ID associated with the key/value pair). This becomes clearer much later when looking at the code (Fig. 1), but I suggest being less ambiguous right at the beginning of this section.
Related to prvious comments, also the SET function (in Fig. 1) does not consider the Removed set. If a key's tag (id) is in the Removed set, its not correct to just update the value. Instead a new entry in the observed set with a new tag needs to be generated (at least that is also what the authors explain at the beginning of Section 3.2)
p5 s/needs to be send/needs to be sent/

Section 4 (synchronization protocol):

While the description appears to be valid, the protocol is presented in a very informal way only. A more rigorous specification should be provided.
I don't agree with the statement "the number of messages, and thus the length of the synchronization protocol, is therefore limited to the maximum depth of the Merkle-tree". This is obviously true if you consider a change to a single leaf in the tree, as you need to descent a single path in the tree from root to leaf. But the way the statement is made it appears as a statement about an arbitrary merge. If you have to merge two significantly modified trees, I am assuming you will have many more messages. For example, in Fig. 5, if in drawing1 all objects object1..object100 have changed, for each of these 100 objects a push message needs to be sent in step 3, and each of that messages can trigger a PUSH or GET message from client to server in step 4.

Section 5 (evaluation):
The evaluation is, over all, well explained and convincing. Fig. 6, however, is somewhat unfortunate because most boxes in the box-plot are so small that their color is not visible.

References:
Reference [35] is incomplete (lacks proceedings where it was published)
Reference [51] is incomplete (also lacks information where it was published)

Additional Questions:

Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: The paper proposes an optimized form of state-based CRDTs for implementing seamless synchronization of distributed web clients. Its main contributions are efficient state synchronization based on Merkle trees and performance optimization for the communication between server and client. The proposed solution enables not only scalable low-latency online updates, but also efficient operations in disconnected scenarios.
Is the manuscript technically sound? Please explain your answer under Public Comments below.: Partially

Which category describes this manuscript?: Research/Technology
How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below.: Relevant

Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes
Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate
Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes
How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Satisfactory
Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Readable - but requires some effort to understand
Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included
If yes to 6, should it be accepted: After revisions. Please include explanation under Public Comments below.
Would you recommend adding the code/data associated with this paper to help address your concerns and/or strengthen the paper?: No

Please rate the manuscript. Please explain your choice.: Fair

Reviewer 3

Recommendation

Author Should Prepare A Minor Revision

Comments

In overall the paper is well written and does a good job of comparing with other similar systems that are available for JS based middleware. The authors try to show the merits and limitations of the chosen approach. One interesting contribution they have is to show the trade-offs among state based and operation based designs when considering the likelihood of partitions, and confirms that operation based approaches are lighter when communication is available but are less efficient when recovering from partitions.

My main issue with the proposal is the use of a set of tombstones for the removed IDs since this set will likely grow linearly in workloads where deletions are common. Notice that in state based delta CRDTs it is possible to track causality in a compact way for a recursive map and avoid tombstones. This choice in the present paper might be compensated since a remove wins semantics is also chosen, and deleting sub-trees could possibly get rid of inner tombstones. In any case, it would be valuable for this paper to discuss this issue in more detail and possibly add an workload that is more heavy in deletions and tombstone creation.

Some details:

1. Introduction.

"replicas eventual consistent" -> "replicas eventually consistent"

Its important to clarify the different approaches to delta CRDTs in [17,24] and [18] since although sharing some terminology they differ. As far as I can tell Legion uses [18] and the recursive map with common version vectors is in [24].

3.1. LWW

Last-writer-win register are not always consistent with causality when clocks get out of sync. A better design is probably a multi-value register (also observed remove) and use of clock timestamp only to choose the value to present among the possibly concurrent values. This is not an issue that requires change, bit its important to signal the drawback.

3.2. OO-Map

I found the use of path[0] and path[1..] hard to follow, maybe clarify notation or add a small concrete example when explaining the implementation.

Misc:

You might want to also check https://github.com/peer-base/js-delta-crdts, from the IPFS team, that seems to contain a recursive map implementation in JS.

Additional Questions:

Please explain how this manuscript advances this field of research and/or contributes something new to the literature.: The paper presents a state based CRDT design and web based middleware in JS for a recursive observed remove map with leafs being last-writer-wins registers. The recursive construction resorts to pointers to a key-value store and uses merkle-trees to efficiently detect sub-portions of the structure that are unchanged and avoid synchronization of those.
While the design uses standard CRDT techniques, such as sets of tombstones instead of version vectors common to the whole tree, it does a good job of motivating with real use cases and doing an extensive comparison with competing designs of web based middlewares.
Is the manuscript technically sound? Please explain your answer under Public Comments below.: Yes

Which category describes this manuscript?: Practice / Application / Case Study / Experience Report
How relevant is this manuscript to the readers of this periodical? Please explain your rating under Public Comments below.: Relevant

Are the title, abstract, and keywords appropriate? Please explain under Public Comments below.: Yes
Does the manuscript contain sufficient and appropriate references? Please explain under Public Comments below.: References are sufficient and appropriate
Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? Please explain your answer under Public Comments below.: Yes
How would you rate the organization of the manuscript? Is it focused? Is the length appropriate for the topic? Please explain under Public Comments below.: Satisfactory
Please rate the readability of the manuscript. Explain your rating under Public Comments below.: Readable - but requires some effort to understand
Should the supplemental material be included? (Click on the Supplementary Files icon to view files): Does not apply, no supplementary files included
If yes to 6, should it be accepted:
Would you recommend adding the code/data associated with this paper to help address your concerns and/or strengthen the paper?: No

Please rate the manuscript. Please explain your choice.: Good

TPDS 2020-02-0106 Reviews and Comments

TPDS-2020-02-0106 OWebSync: Seamless Synchronization of Distributed Web Clients

Editor Comments

Comments to the Author

Reviewer 1

Recommendation

Comments

Detailed comments:

Additional Questions:

Reviewer 2

Recommendation

Comments

Additional Questions:

Reviewer 3

Recommendation

Comments

Additional Questions: