SRDS '22 Paper #54 Reviews and Comments

Paper #54 RoBFT: Robust Byzantine Fault Tolerance for Client-centric Mobile Web Applications

Submitted: 26 April 2022
Rejected: 17 June 2022

Review A: 1. Reject
Review B: 3. Weak accept
Review C: 2. Weak reject

Review A

Overall merit

1. Reject

Reviewer expertise

3. Knowledgeable

Paper summary

The paper proposes a BFT protocol and evaluates its implementation.

Strengths

The paper proposes a BFT protocol and evaluates its implementation.

Weaknesses

Considering the past work on BFT, the novelty of the proposed protocol is unclear. The incremental contribution does not seem adequate for an SRDS paper.

Review B

Overall merit

3. Weak accept

Reviewer expertise

3. Knowledgeable

Paper summary

The paper addresses the problem of defining a Byzantine Fault Tolerant (BFT) consensus protocol for client-centric applications.

Strengths

Weaknesses

Comments for author

The paper is motivated by the recent advent of edge computing where many computations are carried on at the client side over devices with limited capabilities in terms of computation power and memory space.

The main contribution is a consensus protocol and its implementation in a browser-based architecture.

My only concerns is about the amount of novelty. The proposed system is basically the result of the combination of existing techniques slightly adapted to work at the browser side. What it is not completely clear to me are the challenging that the authors needed to face to move from a server-based implementation to a browser-based one.

Review C

Overall merit

2. Weak reject

Reviewer expertise

3. Knowledgeable

Paper summary

This paper presents a BFT replication protocol designed to be used in a community-based peer-to-peer setting (motivating example is a shared loyalty program for small merchants). The paper presents the replication protocol, system architecture, and a browser-based implementation. The evaluation compares the protocol to BFT-SMaRt and Tendermint.

Strengths

Weaknesses

Comments for author

The overall idea of the paper is to allow community groups to deploy cooperative peer-to-peer applications without fully trusting one another. To do this, the authors propose a BFT-replicated key value store based on a new BFT replication protocol (RoBFT). The problem is interesting, and the protocol is new. My biggest concerns are that the protocol details were not clear to me, and it was also not fully clear why a new protocol is needed (i.e. why is it impossible to use any of the many existing BFT protocols?).

The protocol proposed appears to be based on Tendermint, but adapts it to work without a leader. The evaluation suggests that the RoBFT protocol improves performance compared to Tendermint, but does not explain where the performance improvement comes from in detail. To understand the benefit of the proposed protocol, it would be useful to have a detailed discussion and comparison with the Tendermint protocol to show where the performance improvement comes from. Is it coming from the design decisions in the protocol, or from implementation details?

I had a hard time understanding the precise details of the protocol. For example, one of the main points is that "Honest replicas will always vote for the value with the most votes in round 0". The idea is that eventually all replicas will have the same view of the votes in round 0, and so this will lead to consensus. However, as far as I can tell, it's not necessary that a single value with "the most votes" actually exists -- what if each replica simultaneously tries to propose a value, such that we have n values each with 1 vote? What happens in this case? Maybe ties can be broken arbitrarily, but this should be specified. The protocol description also states that "If a replica detects that another replica is Byzantine, it will exclude this Byzantine replica permanently, and its votes do not count anymore", but it does not specify precisely how a replica detects that another replica is Byzantine (and is it guaranteed that all correct replicas will agree on which replicas are Byzantine?). Gossip-based communication appears to be an important point for the protocol (since the evaluation claims that this is what allows it to scale to 100 nodes, while BFT-SMaRt struggled in that scenario). But, I didn't see where the gossip protocol is actually described.

It is also not clear to me that limitations of existing BFT replication protocols are really the main challenge that applications like the ones the authors envision face. For example, for such a system to work, it requires that enough of the nodes are online at the time that a particular user wants to perform an operation. For the loyalty program example, this may work if all the vendors are at a farmer's market at the same time and running the protocol on their devices, but what if one of them goes to another event later on their own? Or what if a user wants to check their loyalty points balance -- is that possible? I also don't think the proposed solution handles some of the most problematic types of Byzantine behavior in such a system. For example, what if a malicious merchant decides to simply spend/delete all of a customer's loyalty points?

Author Response

We would like to thank the reviewers for their feedback on our paper.

Reviewer B asks what are the challenges that we needed to face to move from a server-based implementation to a browser-based one? Servers can maintain many steady, low-latency connections to other servers, while browsers in practice can only maintain a few connections. The targeted mobile environment also introduces higher latencies and is more unstable than is typically taken into consideration. This mobile nature is also more challenging than the fact it runs inside a browser, as is explained in the next paragraph. This necessitates a consensus protocol adapted to these circumstances. A crucial contribution here is also the use of BLS. BLs allows us to delay the verification of signatures in the optimistic case, and only at the end verify many signatures at once by verifying one aggregated signature. Earlier prototypes which used elliptic curves without aggregation were too costly for a browser environment.

Reviewer C questions that the proposed protocol is a good fit for the motivating example and asks why a new BFT protocol is needed for the applications being considered? We target an environment with 10-100 lightweight and mobile web clients. Such devices have a permanent yet unstable internet connection over a data subscription and are operational and reactive most of the time. I.e., we assume those mobile devices always have a 3G or 4G connection, but, as you probably experienced yourself, this kind of connection is less stable than a wired connection and short time disruptions are commonplace. Many existing protocols use all-to-all communication, which is simply not possible in a web-based environment. You can keep a connection open to 10-20 other browsers, but after that performance deteriorates quickly. Alternatively, there exist Gossip-based protocols, such as Tendermint, that do not require a connection to every other node. However, Tendermint is leader-based, which in practice means that when this leader fails, consensus will be delayed until the next leader is elected. We have also shown this problem in our evaluation in Figure 7. By removing the leader, and proposing a leaderless, Gossip-based consensus protocol, we get a more stable latency, which does not increase when short network disruptions happen.

Reviewer C asks what happens if each replica simultaneously tries to propose a value, such that we have n values each with 1 vote? In this case, the lexicographical order of the hashes of those values is taken as a tiebreaker.

Reviewer C asks to specify precisely how a replica detects that another replica is Byzantine? There are two obvious ways in which a replica can act Byzantine. It can vote for two different values in the same round. In this case another replica detects this once it sees the two conflicting votes. Every other replica will agree on this Byzantine behavior, as it suffices to show the two conflicting votes. Another way to act Byzantine is to not vote for the winning value, but instead vote for a different value. However, the Gossip protocol requires each node to Gossip their full state for that round, i.e., the votes for the previous rounds are included. This means that when a Byzantine replica votes for a different value, the replica receiving it will detect that the replica is Byzantine. Note that in both cases it is possible that some replicas already decided that one replica is Byzantine, while others have not yet reached this conclusion. The statement “If a replica detects that another replica is Byzantine, it will exclude this Byzantine replica permanently, and its votes do not count anymore.” is only valid for local decisions, another replica can still consider the vote of the Byzantine replica valid and make decisions partly based on that vote. These will still be accepted by the replicas that already detected the Byzantine behavior.

Reviewer C asks where the Gossip protocol is described? It is described in section III.B.4-5.

Reviewer C asks what would happen if a malicious merchant decides to simply spend all of a customer's loyalty points? This is also solved in our prototype. As explained in section IV.A.i: the application should specify an “access control callback”, which will be called each time before voting for a new value. This callback decides whether the new value is acceptable or not. Sending loyalty points from a customer to a merchant is only allowed when it is signed by the private key of the customer, if the signature is not valid, the newly proposed value will never be voted on.

Comment @A1 by Reviewer B

Summary of the discussion: reviewers appreciated the effort spent in clarifying our concerns. Some of our doubts have been solved by the rebuttal but it is still unclear to us the real novelty of the proposed approach and the contribution seems limited.