Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Clock drift is a real thing in distributed systems.

If I have a 5 second lock that expires at 12:01:00 and my clock is off by 1 second I could write at 12:01:01 potentially after someone else has a lock, or worse, after someone else has written.



but the expiration time is relative to the redis server's time, and the time measurement on the client side is done relative to the client's time. Other clients don't care about your clock and you don't care about redis' clock AFAIK. From TFM:

> The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.


What is the effect on the client of a VM live migration "pause"?

or daylight savings change?

or ntp updates which change the time?

Since the system clock can change relative to itself at any time, what effect does that have on the algorithm?


Distributed time is tricky, but it depends on what the intended use is. Under normal operation it will only move forward, for example.

> VM live migration "pause"?

System clock will take a larger step than usual. It won't go backwards.

> daylight savings change?

None. System clock is UTC for a reason.

> ntp updates which change the time?

NTP only drifts the clock (under normal operation).


Not that I think it effects your point, but just off the top of my head:

System clock will take a larger step than usual. It won't go backwards.

Assumes that checkpoint and restore to a previous time won't happen.

NTP only drifts the clock (under normal operation).

Unless it's a system like this. Note that this was pulled at about 1pm EST.

  Waiting for clock tick...
  ...got clock tick
  Time read from Hardware Clock: 2016/02/09 22:36:50
  Hw clock time : 2016/02/09 22:36:50 = 1455057410 seconds since 1969
  Tue 09 Feb 2016 02:36:50 PM PST  -1.038948 seconds
Under normal operation it will only move forward

'Under normal operation' is not really a high bar for distributed systems. After all, the network is reliable 'under normal operation' too.


Use CLOCK_MONOTONIC. Or you could make your client abort when the clock goes backwards.

If someone forks a VM from a running snapshot, you're screwed whether they mess with the clock or not.


> Assumes that checkpoint and restore to a previous time won't happen.

Or migration to a host whose clock is behind the original host's clock.


> NTP only drifts the clock (under normal operation).

That shouldn't be an assumption. NTP may jump the clock if the difference is too high, I've had it once that a system was connected to two different NTP servers and they each had a different clock and the system would jump the clock every now and then based on what NTP server it though was more correct.


I don't disagree, I'm just saying that time drift between nodes is not an issue with this algorithm. Also, using monotonic clock as antirez says he intends to in the post, will take care of most of these scenarios besides VM migration pauses.


What about requiring PTP sync or shutting down the node? I have no idea about the practical implications, just something I thought of. You can get a CDMA time server on ebay for a few hundred bucks that support PTP.


Off hand you either need multiple time servers or you might as well have a central lock (since you are bottlenecked in either case) which just punts the issue, now the various time servers need to be in sync or the issue comes up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: