Submitted: 22 November 2022
Minor revision: 8 January 2023
Reviewer 1: Accept With No Changes
Reviewer 2: Accept With No Changes
Reviewer 3: Author Should Prepare A Minor Revision
Reviewer 4: Author Should Prepare A Minor Revision
Reviewers appreciate the revision and agree that many points have been addressed in this manuscript. There are still few points to be addressed, especially from the reviewer 3. I suggest to carefully address them in the next round of revision.
Accept With No Changes
In this revised version, the authors have improved the paper much and addressed my concerning questions. So I suggest to accept this paper.
Please rate the manuscript. Please explain your choice.: Good
Accept With No Changes
This review relates to my previous review (reviewer 2).
The authors have addressed all concerns addressed in my review, both in their reply and in the manuscript (revision R1). In particular, I re-read and re-evaluated the voting on round 0 values and agree with the authors that this is the correct setting for the algorithm. Thanks a lot for pointing this out and for adding the explanatory example to the manuscript.
I have also reviewed the authors response to my peer reviewer's concerns. I receive the final judgement to them, but from my perspective, I was satisfied with the responses and the corresponding changes to the manuscript. As such, I recommend acceptance of the manuscript.
Aside from the above, allow me a comment on the authors reply to reviewer 3: "For performance reasons, it is beneficial to remove a Byzantine replica permanently, but this has to be done offline and coordinated by all honest members."
There are a multitude of online and automatic solutions (e.g., interleaving protocol-level agreement with agreement on group membership changes, or, if you add / remove single replicas at a time, just making sure a protocol-level agreement happens between any two such modifications). Please consider some of the old papers on group membership, in case this is of interest for you. These are of course orthogonal concerns and should work with BeauForT out of the box. Same for the rejuvenation of replicas, to maintain healthy majorities.
Please rate the manuscript. Please explain your choice.: Excellent
Author Should Prepare A Minor Revision
I'm roughly persuaded by the authors' responses, except for some remaining confusions or revised suggestions as follows:
In general, I recommend acceptance after a minor revision.
Please rate the manuscript. Please explain your choice.: Good
Author Should Prepare A Minor Revision
The authors have now clarified a number of points raised in the previous review. Let me point out a few aspects, which in my opinion are still unclear in this new version.
- Section 3.1 (model). It is still not clear from this section whether the topology is fixed. Could this be clarified in this section? The evaluation section mentions that "In every run, the network configuration will be different, because replicas will connect to each other randomly to form the gossip network." It is one thing for the replicas to connect to different nodes in a fixed topology, it is another thing for the topology to change. Which one is it here? Also, can the topology change over time?
The text there in the model section now says that "At least 2f+1 replicas should be able to connect to each other." I don't get this point. This seem to require a fully connected subsystem of size 2f+1. Isn't it enough for each node to be able to connect to 2f+1 other nodes?
The text then says "Only if no progress is being made on a new proposal, replicas will close some existing connections and connect to a few different replicas." How does this work? How many replicas will they connect to in practice?
- Algorithm 1. Lines 43-44, VOTE returns (val,r,SIGN(view,round,val,type,r)). Why aren't view, round, and type in the tuple? What is r? Why is it not a parameter of the VOTE function?
Thanks for your reply regarding the commit phase. I'm not sure I fully understand the condition line 30, i.e., LEN(votes)-1>r. Couldn't it be that a replica hasn't received votes for some of the rounds, and so LEN(votes)-1 isn't > r, but we still want the replica to go back to the pre-commit phase because it has received votes for a new round?
- Section 3.2.5 (State-based replication protocol). This section explains how this protocol is used to merge states (which are tuples of the form (value,qc,votes,value',qc')). It is clear how votes and qc' are "merged". It is however still unclear how the other components are "merged". The text says that "a state associated with a higher view number overwrites any older state". However, according to line 44 of Algorithm 1, views are not includes in votes, and therefore neither is qc or qc'. Could this be clarified? Could you clearly explain how each component is updated?
The text says "The process continues until a specific key-value pair is reached". What "specific" key-value pair are you referring to?
The text says "If those n - f are all for the same next value, then no new round be started". Did you mean "is started"?
- Section 3.2.5 (Examples). The examples in this section are really useful to follow what the algorithm does. I'm still unclear on how this protocol satisfies liveness, and another example would help. Here is the scenario. As it appears that replicas vote for the requests they receive first (according to "Imagine now that replica D also receives the votes from A and B between t1 and t2 . If the vote from B comes in first, then D will also vote for v2"), what if a replica is well-placed in the network and its requests reach the other replicas first (before the requests of the other replicas). Won't that prevent the requests from the other replicas from ever being accepted?
- Section 4.1 (Overall architecture). The text there says "At startup, every replica will connect to at least seven other replicas randomly." Why 7 replicas? Is that irrespective of f?
- Section 6 (Related work). A paragraph was added on related leaderless algorithms [51], [52], and [53]. However, the text still does not clearly state how the protocol proposed in this paper differs from the ones listed in that section. Is it the case that [51] requires f+3 rounds in the best case, while your protocol requires less rounds, and that [52] and [53] rely on a fully connected network, while in your case not all replicas need to be connected to all other replicas? Is that a critical distinction? Couldn't one just use a protocol such as Dolev's Reliable Communication protocol as the communication layer of DBFT to support a non-fully connected network?
- Figure 7. Thanks for your answer. You mentioned "For larger networks, the upper whisker is also visible, but the median is still at the line at the bottom, which is also the box of the boxplot." Could this be clarified in the paper too? The figure is hard to read without this information. Also, since the median is the bottom line in the case of BFT-SMaRt, does that mean that BFT-SMaRt performs better?
- Lemma 6 in appendix A.2. Thank you for your reply regarding the meaning of "At least n - f votes are known". Could this also be clarified in the paper?
- Lemma 7 in appendix A.2. Thank you for your reply. Could you clarify what you mean by "exclude" in the proof as you did in your reply?
Please rate the manuscript. Please explain your choice.: Good