EuroSys '20

Review A

Overall merit

2. Weak reject

Reviewer expertise

2. Some familiarity

Paper summary

The paper presents the OWebSync middleware for data synchronization in web apps. It is based on combining state-based CRDTs with merkle-trees.

Strengths

Relevant topic
Extensive performance evaluation

Weaknesses

Significance of the contribution unclear
Unconvincing use cases
Questions arise with the performance evaluation

Comments for author

The paper is well written although the provided content could be much shortened, while some important aspects are missing like the detail of the OWebSync algorithm / protocol.

Overall, the contribution is: (1) incremental and (2) not generalizable:

(1) The main feature of OWebSync is that it combines state-based CRDTs with Merkle-trees. But this contribution has already been published at SRDS 2019: Alex Auvolat and François Taiani. Merkle Search Trees: Efficient State-Based CRDTs in Open Networks (reference cited in the paper)
(2) The paper draws general conclusions such as "operation-based approaches cannot achieve acceptable performance in case of network disruptions" and "current state-based approaches using delta-state-based CRDTs are faster to recover than the operation-based ones" but the study focusses on JSON data types and on the available implementations of JSON synchronisation based on OT (ShareDB), operation-based CRDTs (Yjs, Automerge), delta-state CRDTs (Legion) without referring to the main theoretical approaches behind these implementations. For instance, there is no research paper referenced explaining the ShareDB approach as the OT literature is very large and it would be interesting to know which kind of OT algorithm they implemented (e.g., are operation transformations done only on the client sides, only on the server, or both on the server and the clients as performance differs significantly in these cases?). The paper would need to state that the results hold only for a particular type of data and for some targeted implementations.

One of the merits of the paper is the performance evaluation with the setup for the three types of benchmarks: for online and offline modes of collaboration and for measuring the size of the data over a longer time. However, we have no detail about how the compared tools were plugged into the middleware. Comparing different implementations of several tools (some of them include optimizations, some other not) is not equivalent to comparing the synchronization algorithms.

The results obtained are somehow predictable for the data set used for benchmarking. For a document containing 100 and 1000 objects where the number of 1440 updates exceed the size of the document, it is expected that a state-based synchronization where the whole states of the documents are synchronized is more efficient than an operation-based synchronization where synchronization is based on operations. However, for larger documents with fewer updates, the results might not have been the same.

The above further questions the significance of the contribution as the paper clearly states that they do not deal with document that is the most challenging case for interactive web apps. Indeed, the paper assumes that 5s delay is the accepted limit for users both for real-time collaboration and disconnected scenarios. However, the given reference in the paper refers to website response times which is not the same as for editing shared documents. Moreover, users editing documents in real-time expect a higher interactivity than when synchronizing documents after a disconnection (such as by using git) that tolerate higher delays. In the user study by Ignat et al, ECSCW 2015, "How Do User Groups Cope with Delay in Real-Time Collaborative Note Taking", it has been shown that delays less than 5s negatively influence performances in real-time collaborative editing.

Overall, the relevance of the work raises questions, starting with the use cases that are mentioned as motivating scenarios, which have been around for quite some time (at least some variants of those) and for which the added benefit of OWebSync over the SOTA is unclear.

Review B

Overall merit

3. Weak accept

Reviewer expertise

2. Some familiarity

Paper summary

OWebSync: A Web Middleware with State-Based CRDTs for Interactive Groupware Supporting Seamless Synchronization of Distributed Web Clients anonymous.

OWebSync is a novel middleware to implement web-based applications that support offline operations. It relies on a client-server architecture, similarly to Google Docs. An OWebSync application manipulates json objects backed with convergent replicated data types (CvRDT). The key novelty of OWebSync is the use of Merkle trees to maintain and update the shared state.

Strengths

Novel approach
Well written paper

Weaknesses

The proposed merge process can be expensive and no complexity measure is provided.

Comments for author

OWebSync: A Web Middleware with State-Based CRDTs for Interactive Groupware Supporting Seamless Synchronization of Distributed Web Clients anonymous.

The class of applications that would benefit from the OWebSync middleware is large and common today. As OWebSync relies on a central server and CvRDTs, it does not suffer from the metadata explosion problem. Moreover, using Merkle trees reduces the cost of synchronization. The evaluation shows a clear performance improvement over prior works in groupware scenarios with (and without) disconnected operations.

Among the limitations of the approach is the fact that there is a central server which may harm availability (no client-to-client synchronization is possible). The complexity of the merge process (recursive w. messages exchanged back and forth) can be expensive and no complexity measure is provided. Synchronization is sequential at the server and can potentially bottleneck with a lot of clients (e.g., 100s).

Overall, the paper is well-written and the proposed approach promising with respect to the state of the art. I recommend it for acceptance.

Further remarks

p4r. When the child is another ORMap, the hash of it is the combined hash of the hashes of all the children of that ORMap. this should be described in a separate paragraph about Merkle trees.

p4 This data structure supports [..] ORMaps support [..]

p4 when the child again, this is confusing mixing up ORMap and Merkle trees

p4 in a key-value store is listed, which kvs is it question here?

p5 This can be enforced by automatically logging out the user after a month of no usage. at this point, the reader does not know the underlying architecture

p5 in the same path not defined

Fig 3. change the glyph for the server

p7 When the number of child values [..] heavy

p7 Other countries are pushing latencies down to 30 ms [53]. it seems not to matter in this context

p8 The third and last benchmark the workload is not described

p8 succeeded updates not defined

p8 which are currently only used for background synchronization between servers need a citation here

p8 This can be seen in Figure 5 give further details

references 40 and 41 are identical

Review C

Overall merit

3. Weak accept

Reviewer expertise

4. Expert

Paper summary

The paper focuses on the most efficient method for "synchronising" multiple replicas of the same data item. It proposes a state-based algorithm that transmits only the modified parts of the data item, using a Merkle tree patterned on the logical tree structure of the data item. The paper includes a careful experimental comparison against four other approaches. According to the experiments, the OWebSync approach is less efficient in a well-connected environment, but outperforms the others when connectivity is intermittent.

Strengths

Important and timely problem.
Proposes and evaluates a novel approach.
Interesting comparison of different technologies.
Careful experimental evaluation, including both unfavourable and favourable conditions to the authors' proposal.

Weaknesses

Unclear problem statement and focus.
Narrow data model (single large tree-structured object).

Comments for author

Thank you for submitting to EuroSys. This paper is timely and addresses an important topic for builders of actual distributed systems.

High-level comments

The title, abstract and introduction do not clearly state the problem addressed by the paper. The introduction mixes in too much of a state-of-the-art section. The solution is summarised clearly, but this is a bit lost in the excessive verbosity and unclear problem statement.

The term "synchronization" is not defined!!! Your usage of the term is unusual in the systems context.

The focus is on a single large object with a more-or-less-balanced tree structure. Please justify why you think this is the common or interesting case. Your paper deliberately ignores poorly-balanced trees, flat objects such as text, and completely overlooks other interesting structures, such as a table or a graph. You also ignore applications that use multiple objects, and the issue of multi-object consistency. All this needs to be stated explicitly and justified. A better paper would expand the proposal and experiments to include different object structures and multiple objects.

Your argument on object vs Merkle Tree structure is a bit confusing. In Section 3 you pattern the Merkle tree structure on the object's logical tree structure. Yet in Section 7 you show that this is not such a great idea and you introduce so-called Virtual Merkle Trees. Why not just divorce the MT structure from the internal object structure from the start? Why don't you compare to this approach in the experiments?

Section 5, "Performance evaluation" is pretty comprehensive, within the narrow assumptions of the data model (single large tree-structured object). It candidly includes both unfavourable and favourable cases for the author's design.

In the experimental results, I am very surprised at the high absolute values for synchronisation, with all the technologies! Even at the stated 60ms network latency, 1s = 8 network round trip times, instead of the expected 1 to 2 RTTs. Can you explain such high numbers? Providing a detailed breakdown and analysis of where the time goes would make a very useful paper.

Related work: missing a comparison with References 4 and 10, which are closely related.

Detailed comments

Notation: 1L = page 1, left column.

2L: "vector clocks... require one entry per client per object": this is a common misconception, but a misconception nonetheless.

2R Case studies: some important information is missing. How many clients? what is the level of active concurrency? What is the probability of conflict? What is the cost of a conflict, i.e., do we even care when a conflict occurs? Are these real use cases with actual users? are they actually implemented? or are these just thought experiments? The rest of the paper does not give confidence that they are real.

2L Try-out demo: a nice touch, thanks. You should clarify what exactly is being demonstrated (is it one of the apps listed in Section 2.1?) and what lessons can be learned. I noticed that although the 2D placement seems well replicated, object depth is not.

2R operative --> operational

3L "coordination" not defined. Same as synchronization? (also not defined).

3L Section 2.2 contains interesting background information, but its purpose is not clear. Please tell the reader where you are going.

3R Description of Merkle Trees is too vague to be useful. Be precise.

Section 3 section confusingly mixes the semantics and implementation levels.

4L+4R Not clear. If I didn't know LWW and ORmap beforehand, I would not be able to read this.

4R The semantics of "remove" are not clear, especially since there is no "create". Doesn't "remove" actually mean "set to null"?

4R "Merging the ORmap with another one". Why would one merge two independent ORmaps? You probably mean "merging this replica with the value received from another replica of the same ORmap."

4r "does not require constraints between the data": what is this about? What does it mean? Is it relevant?

Figure 1: not clear what this represents, and what is the relation between 1a and 1b. Without an explanation, it is more confusing than helpful.

5L "Another type of conflict occurs when assigning different types of CRDTs to the same path": I have no idea what this means.

5L "remove all tombstones after a month": this is unsafe, as you yourself explain w.r.t. YJS on page 11L.

5L "CvRDTs for ordered lists... could be added in future work": how is this different from a text sequence, which you said in 4R that doesn't fit well into your framework?

6L Explain the meaning of the code snippet.

6L The description of the synchronization primitives is too vague. Who calls them, and when? Formalise the algorithm of the primitives.

Figure 3 lacks context. What exactly was updated?

Footnote 3: I very much appreciate that you make your raw measurements available.

8R "OWebSync performs within the guidelines of 1--2 s for interactive performance". 9R "well below 5 seconds in the online scenario and can be called interactive" These statements are bizarre. The standard measure for "interactive performance" is 50-100ms response time, not 1s. But this is irrelevant, because you are measuring latency (time to propagate an update through the network) not response time. As long as the local app responds immediately, why would users care about latency?

Figures 4--6: the caption should clarify which case is plotted (online vs disconnected). Figure 7: not clear how to read (and not visible when printed in black and white).

Review D

Overall merit

3. Weak accept

Reviewer expertise

2. Some familiarity

Paper summary

This paper presents OWebSync, a system that extends web applications with CRDT-based storage.

Strengths

seems like a relevant problem

not a crowded space - don't know of any systems that tackle it in a principled way

an interesting new twist on an important technology

Weaknesses

writing could be improved - doesn't help convincing the reader of the motivation

only applied to home-brewed applications

Comments for author

Overall this seems like a very relevant piece of research with some interesting new ideas. I particularly liked the fact that not only the problem statement seems relevant, but also it does not seems like a crowded space, since I don't know of any systems that tackle this problem in a principled way. Furthermore, the technical solution includes an interesting new twist on a very relevant technology (CRDTs), which seems to work very effectively in this realistic scenario. That said, the writing needs to be improved (possibly through some careful shepherding).

Regarding the motivation, the initial read was a difficult one because several problems: first, the excessive use of terms whose meaning was unknown to me. E.g., upfront the paper says that the goal is to develop a "web-based middleware for interactive groupware" - perhaps it's my lack of background, but I remained clueless regarding what was the scope of the paper after reading this. Second, the abstract, intro, and background sections fail to make a compelling case for the need for such a system, in large part because the background only focuses on the low-level technologies for CRDTs, and not on the high level systems that are used nowadays to tackle the overall problem of developing collaborative web applications using interactive web pages. As a reader, I would be much more comfortable with the motivation if the introduction would be able to argue that the technologies that are available to tackle this problem are X, Y, and Z, with pointers to papers or systems that provided evidence that this is the way that such applications are being developed today, and then explained in a more careful way why are X, Y, and Z insufficient to successfully address the problem. In particular, it is important to (1) add citations throughout the introduction to point to examples and substantiate claims, and (2) rethink section 2 as pointed out before.

This leads to a related problem, which is the choice of applications. Even though they seem relevant because they are developed in conjunction with industry, they are nonetheless built from scratch. As such, the evaluation in its current state doesn't help understand how seamless is the path for adopting this system. It would be more interesting if the authors could take an existing application, adapt it to use OWebSync, and then report on how difficult it was to adapt it, namely how many lines of code were changed and how much effort was put (e.g., person months).

Review F

Overall merit

2. Weak reject

Reviewer expertise

2. Some familiarity

Paper summary

This paper presents OWebSync, a web-based middleware for interactive groupware with fast resynchronization of offline clients and continuous, interactive synchronization for online clients. OWebSync implements a fine-grained data synchronization model, and leverages state-based Conflict-free Replicated Data Types to automatically resolve conflicts. Additional techniques are employed to achieve the interactive performance, including Merkle-trees, virtual Merkel-tree levels, and message batching. Evaluation shows OWebSync is especially better in operating in and recovering from offline settings and network disruptions.

Strengths

OWebSync looks like a useful system that can support interactive applications.

OWebSync appears to outperform existing systems in offline settings and recovering from network disruptions and scales better over time.

Weaknesses

The work seems incremental: OWebSync mainly combines CRDTs with Merkel trees and some other optimization tricks.

The evaluation setup is very controlled and simplistic.

The evaluation is basically done in the context of one particular scenario (the eDesigners case study).

Comments for author

I guess OWebSync is a useful system to achieve consistent views fast, and it does seem to perform well in certain settings. But other than that, the work seems to be mostly engineering right now. The main technical aspects involve adding Merkle tree, etc. to CRDTs, so overall seem quite incremental.

The evaluation is based on one scenario only. Is eDesigners the most challenging? What about others?

The evaluation setup involves only VMs and emulated network conditions between them. This seems rather simplistic. If you expect the eDesigner app to be run over 4G, or even on a plane, you should have a few testbed machines connected over wireless (4G or WiFi). The conditions of these networks tend to fluctuate a lot, at different times of the day too. Further, if you have a mixture of users connected over different networks (wired or wireless), you might also see more interesting synchronization behavior.

Some breakdown results will be helpful. For example, can you illustrate how Merkle tree, etc. individually helped?

EuroSys '20 Paper #16 Reviews and Comments

Paper #16 OWebSync: A Web Middleware with State-Based CRDTs for Interactive Groupware Supporting Seamless Synchronization of Distributed Web Clients

Review A

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Review B

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Review C

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Review D

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Review E

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Review F

Overall merit

Reviewer expertise

Paper summary

Strengths

Weaknesses

Comments for author

Comment @A1 by Reviewer A