Submitted: 4 November 2019
Rejected: 14 February 2020
Review A: 2. Weak reject
Review B: 3. Weak accept
Review C: 3. Weak accept
Review D: 3. Weak accept
Review E: 2. Weak reject
Review F: 2. Weak reject
2. Weak reject
2. Some familiarity
The paper presents the OWebSync middleware for data synchronization in web apps. It is based on combining state-based CRDTs with merkle-trees.
The paper is well written although the provided content could be much shortened, while some important aspects are missing like the detail of the OWebSync algorithm / protocol.
Overall, the contribution is: (1) incremental and (2) not generalizable:
One of the merits of the paper is the performance evaluation with the setup for the three types of benchmarks: for online and offline modes of collaboration and for measuring the size of the data over a longer time. However, we have no detail about how the compared tools were plugged into the middleware. Comparing different implementations of several tools (some of them include optimizations, some other not) is not equivalent to comparing the synchronization algorithms.
The results obtained are somehow predictable for the data set used for benchmarking. For a document containing 100 and 1000 objects where the number of 1440 updates exceed the size of the document, it is expected that a state-based synchronization where the whole states of the documents are synchronized is more efficient than an operation-based synchronization where synchronization is based on operations. However, for larger documents with fewer updates, the results might not have been the same.
The above further questions the significance of the contribution as the paper clearly states that they do not deal with document that is the most challenging case for interactive web apps. Indeed, the paper assumes that 5s delay is the accepted limit for users both for real-time collaboration and disconnected scenarios. However, the given reference in the paper refers to website response times which is not the same as for editing shared documents. Moreover, users editing documents in real-time expect a higher interactivity than when synchronizing documents after a disconnection (such as by using git) that tolerate higher delays. In the user study by Ignat et al, ECSCW 2015, "How Do User Groups Cope with Delay in Real-Time Collaborative Note Taking", it has been shown that delays less than 5s negatively influence performances in real-time collaborative editing.
Overall, the relevance of the work raises questions, starting with the use cases that are mentioned as motivating scenarios, which have been around for quite some time (at least some variants of those) and for which the added benefit of OWebSync over the SOTA is unclear.
3. Weak accept
2. Some familiarity
OWebSync: A Web Middleware with State-Based CRDTs for Interactive Groupware Supporting Seamless Synchronization of Distributed Web Clients anonymous.
OWebSync is a novel middleware to implement web-based applications that support offline operations. It relies on a client-server architecture, similarly to Google Docs. An OWebSync application manipulates json objects backed with convergent replicated data types (CvRDT). The key novelty of OWebSync is the use of Merkle trees to maintain and update the shared state.
OWebSync: A Web Middleware with State-Based CRDTs for Interactive Groupware Supporting Seamless Synchronization of Distributed Web Clients anonymous.
OWebSync is a novel middleware to implement web-based applications that support offline operations. It relies on a client-server architecture, similarly to Google Docs. An OWebSync application manipulates json objects backed with convergent replicated data types (CvRDT). The key novelty of OWebSync is the use of Merkle trees to maintain and update the shared state.
The class of applications that would benefit from the OWebSync middleware is large and common today. As OWebSync relies on a central server and CvRDTs, it does not suffer from the metadata explosion problem. Moreover, using Merkle trees reduces the cost of synchronization. The evaluation shows a clear performance improvement over prior works in groupware scenarios with (and without) disconnected operations.
Among the limitations of the approach is the fact that there is a central server which may harm availability (no client-to-client synchronization is possible). The complexity of the merge process (recursive w. messages exchanged back and forth) can be expensive and no complexity measure is provided. Synchronization is sequential at the server and can potentially bottleneck with a lot of clients (e.g., 100s).
Overall, the paper is well-written and the proposed approach promising with respect to the state of the art. I recommend it for acceptance.
Further remarks
p4r. When the child is another ORMap, the hash of it is the combined hash of the hashes of all the children of that ORMap. this should be described in a separate paragraph about Merkle trees.
p4 This data structure supports [..] ORMaps support [..]
p4 when the child again, this is confusing mixing up ORMap and Merkle trees
p4 in a key-value store is listed, which kvs is it question here?
p5 This can be enforced by automatically logging out the user after a month of no usage. at this point, the reader does not know the underlying architecture
p5 in the same path not defined
Fig 3. change the glyph for the server
p7 When the number of child values [..] heavy
p7 Other countries are pushing latencies down to 30 ms [53]. it seems not to matter in this context
p8 The third and last benchmark the workload is not described
p8 succeeded updates not defined
p8 which are currently only used for background synchronization between servers need a citation here
p8 This can be seen in Figure 5 give further details
references 40 and 41 are identical
3. Weak accept
4. Expert
The paper focuses on the most efficient method for "synchronising" multiple replicas of the same data item. It proposes a state-based algorithm that transmits only the modified parts of the data item, using a Merkle tree patterned on the logical tree structure of the data item. The paper includes a careful experimental comparison against four other approaches. According to the experiments, the OWebSync approach is less efficient in a well-connected environment, but outperforms the others when connectivity is intermittent.
Thank you for submitting to EuroSys. This paper is timely and addresses an important topic for builders of actual distributed systems.
High-level comments
The title, abstract and introduction do not clearly state the problem addressed by the paper. The introduction mixes in too much of a state-of-the-art section. The solution is summarised clearly, but this is a bit lost in the excessive verbosity and unclear problem statement.
The term "synchronization" is not defined!!! Your usage of the term is unusual in the systems context.
The focus is on a single large object with a more-or-less-balanced tree structure. Please justify why you think this is the common or interesting case. Your paper deliberately ignores poorly-balanced trees, flat objects such as text, and completely overlooks other interesting structures, such as a table or a graph. You also ignore applications that use multiple objects, and the issue of multi-object consistency. All this needs to be stated explicitly and justified. A better paper would expand the proposal and experiments to include different object structures and multiple objects.
Your argument on object vs Merkle Tree structure is a bit confusing. In Section 3 you pattern the Merkle tree structure on the object's logical tree structure. Yet in Section 7 you show that this is not such a great idea and you introduce so-called Virtual Merkle Trees. Why not just divorce the MT structure from the internal object structure from the start? Why don't you compare to this approach in the experiments?
Section 5, "Performance evaluation" is pretty comprehensive, within the narrow assumptions of the data model (single large tree-structured object). It candidly includes both unfavourable and favourable cases for the author's design.
In the experimental results, I am very surprised at the high absolute values for synchronisation, with all the technologies! Even at the stated 60ms network latency, 1s = 8 network round trip times, instead of the expected 1 to 2 RTTs. Can you explain such high numbers? Providing a detailed breakdown and analysis of where the time goes would make a very useful paper.
Related work: missing a comparison with References 4 and 10, which are closely related.
Detailed comments
Notation: 1L = page 1, left column.
2L: "vector clocks... require one entry per client per object": this is a common misconception, but a misconception nonetheless.
2R Case studies: some important information is missing. How many clients? what is the level of active concurrency? What is the probability of conflict? What is the cost of a conflict, i.e., do we even care when a conflict occurs? Are these real use cases with actual users? are they actually implemented? or are these just thought experiments? The rest of the paper does not give confidence that they are real.
2L Try-out demo: a nice touch, thanks. You should clarify what exactly is being demonstrated (is it one of the apps listed in Section 2.1?) and what lessons can be learned. I noticed that although the 2D placement seems well replicated, object depth is not.
2R operative --> operational
3L "coordination" not defined. Same as synchronization? (also not defined).
3L Section 2.2 contains interesting background information, but its purpose is not clear. Please tell the reader where you are going.
3R Description of Merkle Trees is too vague to be useful. Be precise.
Section 3 section confusingly mixes the semantics and implementation levels.
4L+4R Not clear. If I didn't know LWW and ORmap beforehand, I would not be able to read this.
4R The semantics of "remove" are not clear, especially since there is no "create". Doesn't "remove" actually mean "set to null"?
4R "Merging the ORmap with another one". Why would one merge two independent ORmaps? You probably mean "merging this replica with the value received from another replica of the same ORmap."
4r "does not require constraints between the data": what is this about? What does it mean? Is it relevant?
Figure 1: not clear what this represents, and what is the relation between 1a and 1b. Without an explanation, it is more confusing than helpful.
5L "Another type of conflict occurs when assigning different types of CRDTs to the same path": I have no idea what this means.
5L "remove all tombstones after a month": this is unsafe, as you yourself explain w.r.t. YJS on page 11L.
5L "CvRDTs for ordered lists... could be added in future work": how is this different from a text sequence, which you said in 4R that doesn't fit well into your framework?
6L Explain the meaning of the code snippet.
6L The description of the synchronization primitives is too vague. Who calls them, and when? Formalise the algorithm of the primitives.
Figure 3 lacks context. What exactly was updated?
Footnote 3: I very much appreciate that you make your raw measurements available.
8R "OWebSync performs within the guidelines of 1--2 s for interactive performance". 9R "well below 5 seconds in the online scenario and can be called interactive" These statements are bizarre. The standard measure for "interactive performance" is 50-100ms response time, not 1s. But this is irrelevant, because you are measuring latency (time to propagate an update through the network) not response time. As long as the local app responds immediately, why would users care about latency?
Figures 4--6: the caption should clarify which case is plotted (online vs disconnected). Figure 7: not clear how to read (and not visible when printed in black and white).
3. Weak accept
2. Some familiarity
This paper presents OWebSync, a system that extends web applications with CRDT-based storage.
seems like a relevant problem
not a crowded space - don't know of any systems that tackle it in a principled way
an interesting new twist on an important technology
writing could be improved - doesn't help convincing the reader of the motivation
only applied to home-brewed applications
Overall this seems like a very relevant piece of research with some interesting new ideas. I particularly liked the fact that not only the problem statement seems relevant, but also it does not seems like a crowded space, since I don't know of any systems that tackle this problem in a principled way. Furthermore, the technical solution includes an interesting new twist on a very relevant technology (CRDTs), which seems to work very effectively in this realistic scenario. That said, the writing needs to be improved (possibly through some careful shepherding).
Regarding the motivation, the initial read was a difficult one because several problems: first, the excessive use of terms whose meaning was unknown to me. E.g., upfront the paper says that the goal is to develop a "web-based middleware for interactive groupware" - perhaps it's my lack of background, but I remained clueless regarding what was the scope of the paper after reading this. Second, the abstract, intro, and background sections fail to make a compelling case for the need for such a system, in large part because the background only focuses on the low-level technologies for CRDTs, and not on the high level systems that are used nowadays to tackle the overall problem of developing collaborative web applications using interactive web pages. As a reader, I would be much more comfortable with the motivation if the introduction would be able to argue that the technologies that are available to tackle this problem are X, Y, and Z, with pointers to papers or systems that provided evidence that this is the way that such applications are being developed today, and then explained in a more careful way why are X, Y, and Z insufficient to successfully address the problem. In particular, it is important to (1) add citations throughout the introduction to point to examples and substantiate claims, and (2) rethink section 2 as pointed out before.
This leads to a related problem, which is the choice of applications. Even though they seem relevant because they are developed in conjunction with industry, they are nonetheless built from scratch. As such, the evaluation in its current state doesn't help understand how seamless is the path for adopting this system. It would be more interesting if the authors could take an existing application, adapt it to use OWebSync, and then report on how difficult it was to adapt it, namely how many lines of code were changed and how much effort was put (e.g., person months).
2. Weak reject
4. Expert
OWebSync is a conflict-free replicated data type middleware paper which describes and evaluates the authors' system for explouting application data structure to optimise scalability over operation-based or state-delta based approaches to CRDT.
quite clearly written; a current topic of some interest
limited scale evaluation; no human factors either (i.e. real use).
I like this paper and I like the topic, but I would really like to see a real user trial of such a system. There were many papers on CSCW 20 years back and most included not just the middleware scaling eval, but also some HCI.
2. Weak reject
2. Some familiarity
This paper presents OWebSync, a web-based middleware for interactive groupware with fast resynchronization of offline clients and continuous, interactive synchronization for online clients. OWebSync implements a fine-grained data synchronization model, and leverages state-based Conflict-free Replicated Data Types to automatically resolve conflicts. Additional techniques are employed to achieve the interactive performance, including Merkle-trees, virtual Merkel-tree levels, and message batching. Evaluation shows OWebSync is especially better in operating in and recovering from offline settings and network disruptions.
OWebSync looks like a useful system that can support interactive applications.
OWebSync appears to outperform existing systems in offline settings and recovering from network disruptions and scales better over time.
The work seems incremental: OWebSync mainly combines CRDTs with Merkel trees and some other optimization tricks.
The evaluation setup is very controlled and simplistic.
The evaluation is basically done in the context of one particular scenario (the eDesigners case study).
I guess OWebSync is a useful system to achieve consistent views fast, and it does seem to perform well in certain settings. But other than that, the work seems to be mostly engineering right now. The main technical aspects involve adding Merkle tree, etc. to CRDTs, so overall seem quite incremental.
The evaluation is based on one scenario only. Is eDesigners the most challenging? What about others?
The evaluation setup involves only VMs and emulated network conditions between them. This seems rather simplistic. If you expect the eDesigner app to be run over 4G, or even on a plane, you should have a few testbed machines connected over wireless (4G or WiFi). The conditions of these networks tend to fluctuate a lot, at different times of the day too. Further, if you have a mixture of users connected over different networks (wired or wireless), you might also see more interesting synchronization behavior.
Some breakdown results will be helpful. For example, can you illustrate how Merkle tree, etc. individually helped?
The paper was discussed at the PC meeting. The PC members agreed that the work is relevant and timely. However, they also felt that the paper does not sufficiently highlight the significance of the research contribution over the state of the art. This led to not accept the paper.