> *If it takes a long time to copy a database and it gets updated midway through...

rsync · 2025-05-01T20:11:04 1746130264

"For replicating sqlite databases safely there is (litestream) ..."

A reminder that litestream can run over plain old SFTP[1] which means you can stream database replication to just about any UNIX endpoint over SSH.

I have a favorite[2] but any SFTP server will do ...

[1] https://github.com/benbjohnson/litestream/issues/140

[2] https://www.rsync.net/resources/notes/2021-q3-rsync.net_tech...

aitchnyu · 2025-05-02T06:52:29 1746168749

Whats the closest counterpart in Postgres ecosystem?

j16sdiz · 2025-05-02T10:43:19 1746182599

log shipping[1] or logical replication[2]

[1] https://www.postgresql.org/docs/current/warm-standby.html [2] https://www.postgresql.org/docs/current/logical-replication....

rsync · 2025-05-02T19:06:34 1746212794

This isn't what you mean but I can't resist:

  pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"

creatonez · 2025-05-01T13:57:20 1746107840

> You can't copy the file of a running, active db receiving updates, that can only result in corruption

To push back against "only" -- there is actually one scenario where this works. Copying a file or a subvolume on Btrfs or ZFS can be done atomically, so if it's an ACID database or an LSM tree, in the worst case it will just rollback. Of course, if it's multiple files you have to take care to wrap them in a subvolume so that all of them are copied in the same transaction, simply using `cp --reflink=always` won't do.

Possibly freezing the process with SIGSTOP would yield the same result, but I wouldn't count on that

lmz · 2025-05-01T14:05:46 1746108346

It can't be done without fs specific snapshots - otherwise how would it distinguish between a cp/rsync needing consistent reads vs another sqlite client wanting the newest data?

ummonk · 2025-05-01T15:30:14 1746113414

I would assume cp uses ioctl (with atomic copies of individual files on filesystems that support CoW like APFS and BTRFS), whereas sqlite probably uses mmap?

vlovich123 · 2025-05-01T15:43:42 1746114222

I was trying to find evidence that reflink copies are atomic and could not and LLMs seem to think they are not. So at best may be a btrfs only feature?

creatonez · 2025-05-01T18:57:46 1746125866

From Linux kernel documentation:

https://man7.org/linux/man-pages/man2/ioctl_ficlone.2.html

> Clones are atomic with regards to concurrent writes, so no locks need to be taken to obtain a consistent cloned copy.

I'm not aware of any of the filesystems that use it (Btrfs, XFS, Bcachefs, ZFS) that deviate from expected atomic behavior, at least with single files being the atom in question for `FICLONE` operation.

vlovich123 · 2025-05-01T20:11:53 1746130313

Thanks! I similarly thought that a file clone would work but I couldn't confirm it.

lmz · 2025-05-02T00:30:53 1746145853

So not a "naive" cp, but at least it calls a shared ioctl that is implemented buy multiple filesystems.

o11c · 2025-05-01T14:48:17 1746110897

Obligatory "LVM still exists and snapshots are easy enough to overprovision for"

HumanOstrich · 2025-05-01T18:12:27 1746123147

Taking an LVM snapshot and then copying the sqlite database from that is sufficient to keep it from being corrupted, but you can have incomplete transactions that will be rolled back during crash recovery.

The problem is that LVM snapshots operate at the block device level and only ensure there are no torn or half-written blocks. It doesn't know about the filesystem's journal or metadata.

To get a consistent point-in-time snapshot without triggering crash-recovery and losing transactions, you also need to lock the sqlite database or filesystem from writes during the snapshot.

    PRAGMA wal_checkpoint(FULL);
    BEGIN IMMEDIATE;  -- locks out writers
    .   /* trigger your LVM snapshot here */
    COMMIT;

You can also use fsfreeze to get the same level of safety:

    sudo fsfreeze -f /mnt/data      # (A) flush dirty pages & block writes
    lvcreate -L1G -s -n snap0 /dev/vg0/data
    sudo fsfreeze -u /mnt/data      # (B) thaw, resume writes

Bonus - validate the snapshotted db file with:

    sqlite3 mydb-snapshot.sqlite "PRAGMA integrity_check;"

remram · 2025-05-02T12:44:07 1746189847

What does "losing transactions" mean? Some transactions will have committed before your backup and some will have committed after and therefore won't be included. I don't see the problem you are trying to solve?

Whether a transaction had started and gets transparently rolled back, or you had prevented from starting, what is the difference to you? Either way, you have a point-in-time snapshot, that time is the latest commit before the LVM snapshot.

You're discussing this in terms of "safety" and that doesn't seem right to me.

HumanOstrich · 2025-05-02T14:40:02 1746196802

This isn't really a personal or controversial take on the issue. There are easier ways to back up a sqlite database, but if you want to use LVM snapshots you need to understand how to do it correctly if you want to have useful backups.

Here's a scenario for a note-taking app backed by sqlite:

1. User A is editing a note and the app writes the changes to the database.

2. SQLite is in WAL mode, meaning changes go to a -wal file first.

3. You take the LVM snapshot while SQLite is:

  - midway through a write (to the WAL or the main db file)
  - or the WAL hasn’t been checkpointed back into the main DB file

4. The snapshot includes:

  - a partial write to notes.db
  - or a notes.db and notes.db-wal that are out of sync

Result: The backup is inconsistent. Restoring this snapshot later might:

  - Cause sqlite3 to throw errors like database disk image is malformed
  - Lose recent edits
  - Require manual recovery or loss of WAL contents

In order to get a consistent, point-in-time recovery where your database is left in state A and your backup from the LVM snapshot is in state B with _no_ intermediate states (like rolled back transactions or partial writes to the db file), you have to first either:

  - Tell SQLite to create checkpoint (write the WAL contents to the main DB) and suspend writes
  - Or, flush and block all writes to the filesystem using fsfreeze

Then take the LVM snapshot.

exsysadmin · 2025-05-03T10:52:46 1746269566

The last time I had to call fsfreeze (or xfs_freeze) before creating an LVM snapshot was maybe 17-18 years ago on a system with rather old (even for the time) "stable" versions of everything, where it was not yet integrated.

Since decades, lvm triggers a freeze before snapshot creation and an unfreeze afterwards, if the filesystem supports it.

> fsfreeze is unnecessary for device-mapper devices. The device-mapper (and LVM) automatically freezes a filesystem on the device when a snapshot creation is requested. For more details see the dmsetup(8) man page.

https://man7.org/linux/man-pages/man8/fsfreeze.8.html

It is part of the VFS:

> freeze_fs > > called when VFS is locking a filesystem and forcing it into a consistent state. This method is currently used by the Logical Volume Manager (LVM) and ioctl(FIFREEZE). Optional.

https://docs.kernel.org/filesystems/vfs.html

All relevant filesystems implement it.

> List of these filesystems include (2016-12-18) btrfs, ext2/3/4, f2fs, jfs, nilfs2, reiserfs, and xfs. Previous list may be incomplete, as more filesystems get support.

Using LVM snapshots is absolutely fine for creating consistent and complete copies of sqlite databases. You have to to copy the rollback-journal or WAL, too: they are part the db, it will checkpoint to the db file when you open it or its copy. Every committed transaction before the snapshot is in it. It is complete.

And even if you do use a mode below PRAGMA synchronous=FULL ( https://sqlite.org/pragma.html#pragma_synchronous ) -- FULL is needed to ensure durability as it fsync()s the WAL on every commit -- you are better of than with a system crash: the freeze writes out every dirty page to disk, which is not guaranteed in case of a crash, where in NORMAL mode, you might lose the transactions written to the WAL file, but not to synced to disk yet.

remram · 2025-05-02T16:48:48 1746204528

You are wrong, the whole point of the WAL is to make SQLite crash-consistent. Same as the rollback journal. SQLite will safely rollback partial writes. I don't know where you got the idea that in WAL mode, SQLite ditches its consistency guarantees somehow.

If you can corrupt a SQLite database by pulling the power or simulating it by taking a block-device snapshot, this is a serious SQLite bug and you should report it.

https://sqlite.org/transactional.html & https://sqlite.org/atomiccommit.html have more details

HumanOstrich · 2025-05-02T18:07:58 1746209278

If you want to keep being aggressively ignorant, you can go argue with ChatGPT about it. I'm not wasting any more effort on you.

dathery · 2025-05-02T18:27:30 1746210450

Sorry but your point is just completely wrong and I am not sure why you have this belief. It is extremely normal and safe to backup SQLite via block device or filesystem snapshots. Are you under the impression that SQLite cannot recover from a power loss...? The whole point of using a log is that you can recover from crashes mid-way through updating the database file by replaying the writes from the log.

If it will convince you, I went and asked ChatGPT like you recommended and it agrees:

> If you want to use snapshots: > Use a filesystem or block-level snapshot tool that guarantees a point-in-time, atomic snapshot (e.g., LVM snapshots, ZFS snapshots, Btrfs snapshots, or VSS on Windows).

If you were going to "Tell SQLite to create checkpoint (write the WAL contents to the main DB) and suspend writes" as you suggest is necessary, why even bother with a snapshot at that point?

remram · 2025-05-02T21:31:04 1746221464

I'm sorry you took "you're wrong" as aggressive. I am providing sources though and other users are agreeing with me, will you please look into it? Or ask ChatGPT if SQLite is crash-consistent? Please consider the possibility you are spreading misinformation...

wswope · 2025-05-01T14:09:59 1746108599

The built-in .backup command is also intended as an official tool for making “snapshotted” versions of a live db that can be copied around.

remram · 2025-05-02T12:50:11 1746190211

This leverages a dedicated on-line backup API: https://sqlite.org/backup.html

lknuth · 2025-05-01T15:11:44 1746112304

While I run and love litestream on my own system, I also like that they have a pretty comprehensive guide on how to do something like this manually, via built-in tools: https://litestream.io/alternatives/cron/

yard2010 · 2025-05-01T15:13:05 1746112385

Litestream is really cool! I'm planning to use it to backup and restore my SQLite in the container level, just like what that ex-google guy who started a startup of a small KVM and had a flood in his warehouse while on vacation did. If I'm not mistaken. I would link here the perfect guide he wrote but there's 0 chance I'll find it. If you understand the reference please post the link.

mtlynch · 2025-05-01T15:18:11 1746112691

Haha, that sounds like me. Here's the writeup you're talking about:

https://mtlynch.io/litestream/

And here's the flooding story:

https://mtlynch.io/solo-developer-year-6/#the-most-terrifyin...

Sidenote: I still use Litestream in every project where I use SQLite.

yellow_lead · 2025-05-01T13:55:14 1746107714

Litestream looks interesting but they are still in beta, and seem to have not had a release in over a year, although SQLite doesn't move that quickly.

Is Litestream still an active project?

clintonb · 2025-05-01T19:25:34 1746127534

Despite the beta label and lack of a 1.x release, I would consider the project pretty stable. We've used it in production for over 18 months to support an offline-first point of sale system. We haven't had any issues with Litestream.

yellow_lead · 2025-05-02T04:01:25 1746158485

That's great, thanks for sharing!

acrispino · 2025-05-01T23:54:08 1746143648

seems like a new release is being worked on: https://github.com/benbjohnson/litestream/pull/636

pixl97 · 2025-05-01T13:55:21 1746107721

>You can't copy the file of a running, active db receiving updates, that can only result in corruption

There is a slight 'well akshully' on this. A DB flush and FS snapshot where you copy the snapshotted file will allow this. MSSQL VSS snapshots would be an example of this.

tpmoney · 2025-05-01T14:06:42 1746108402

Similarly you can rsync a Postgres data directory safely while the db is running, with the caveat that you likely lose any data written while the rsync is running. And if you want that data, you can get it with the WAL files.

It’s been years since I needed to do this, but if I remember right, you can clone an entire pg db live with a `pg_backup_start()`, rsync the data directory, pg_backup_stop() and rsync the WAL files written since backup start.

edoceo · 2025-05-01T15:07:48 1746112068

For moving DBs where I'm allowed minutes of downtime I do rsync (slow) first from the live, while hot, then just stop that one, then rsync again (fast) then make the new one hot.

Works a treat when other (better) method are not available.

bob1029 · 2025-05-01T23:12:32 1746141152

MSSQL also offers a virtual backup device interface that 3rd party tools can implement.

https://learn.microsoft.com/en-us/sql/relational-databases/b...

gerdesj · 2025-05-02T00:22:14 1746145334

"If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file"

Wot? There are multiple ways of snapshotting/checkpointing, starting at the virty level and working on down the stack through the application level.

yellowapple · 2025-05-04T01:38:59 1746322739

Litestream is overkill for this; "VACUUM foo INTO foo.db;" is all you need.

https://www.sqlite.org/lang_vacuum.html

jmull · 2025-05-01T14:11:28 1746108688

If the corruption is detectable and infrequent enough for your purposes, then it does work, with a simple “retry until success” loop. (That’s how TCP works, for example.)

benbjohnson · 2025-05-01T17:51:14 1746121874

Not all corruption is detectable. You could make a copy during a transaction where only a subset of the transactions saved pages are persisted but all branch & leaf pages are pointed to correctly. That would give you a state of the database that never actually existed and break atomicity.

jmull · 2025-05-01T19:38:06 1746128286

> Not all corruption is detectable.

Well, I don't know rsync that well. If you're saying it doesn't detect changes to files while it's being copied, then I'll believe you.

And, as far as I know, it's impossible to detect absolutely all corruption.

But you can pretty easily detect, e.g., that a file has or has not changed since before you copied it to after, on a system with a basically functioning filesystem and clock, with a reasonable/useful level of confidence.

quotemstr · 2025-05-01T13:24:38 1746105878

> Of course! You can't copy the file of a running, active db receiving updates, that can only result in corruption

Do people really not understand how file storage works? I cannot rightly apprehend the confusion of ideas that would produce an attempt to copy a volatile database without synchronization and expect it to work.

kccqzy · 2025-05-01T14:29:44 1746109784

The confusion of ideas here is understandable IMO: people assume everything is atomic. Databases of course famously have ACID guarantees. But it's easy for people to assume copying is also an atomic operation. Honestly if someone works too much with databases and not enough with filesystems it's a mistake easily made.

ahazred8ta · 2025-05-01T14:33:42 1746110022

> I cannot rightly apprehend the confusion of ideas

I see you are a man of culture.

kccqzy · 2025-05-01T14:40:27 1746110427

Charles Babbage is smart, but either he lacks empathy to understand other people or he's just saying that deliberately for comedic effect.

jerf · 2025-05-01T15:33:29 1746113609

It was early days... very early days. He didn't have the benefit of trying to help his (metaphorical) grandparents get their emails or worked under a manager who thinks 2023-era ChatGPT is only slightly less reliable than the Standard Model of Physics, if not slightly more.

bambax · 2025-05-01T16:43:34 1746117814

Oh he definitely lacked empathy.

But things haven't improved much. Today we have "prompt engineers" whose only job is to input the right question in order to get the right answer.