The problem with Jepsen is that it can prove that systems are unsafe, but it can't be used to prove that systems are safe.
For example Consul did their own Jepsen testing[1] in which they passed them, but Kyle pointed out [2] that they passed them only because they changed timeouts from 1 s to 300 ms effectively making the race condition window smaller.
It's typicaly not tractable to prove that a system is safe. For example, RethinkDB used the raft consensus algorithm, which on its own is proven correct, but due to a bug could in some cases break linearization. Jepsen was able to uncover this.
The best method we currently have for demonstrating that large systems are safe is to try really hard to prove that they are unsafe and fail.
For example Consul did their own Jepsen testing[1] in which they passed them, but Kyle pointed out [2] that they passed them only because they changed timeouts from 1 s to 300 ms effectively making the race condition window smaller.
[1] https://www.consul.io/docs/internals/jepsen.html [2] https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul