> If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file. The first half of the file is pre-update, the second half file is post-update, and they don’t match. When I try to open the database locally, I get an error
Of course! You can't copy the file of a running, active db receiving updates, that can only result in corruption.
> You can't copy the file of a running, active db receiving updates, that can only result in corruption
To push back against "only" -- there is actually one scenario where this works. Copying a file or a subvolume on Btrfs or ZFS can be done atomically, so if it's an ACID database or an LSM tree, in the worst case it will just rollback. Of course, if it's multiple files you have to take care to wrap them in a subvolume so that all of them are copied in the same transaction, simply using `cp --reflink=always` won't do.
Possibly freezing the process with SIGSTOP would yield the same result, but I wouldn't count on that
It can't be done without fs specific snapshots - otherwise how would it distinguish between a cp/rsync needing consistent reads vs another sqlite client wanting the newest data?
I would assume cp uses ioctl (with atomic copies of individual files on filesystems that support CoW like APFS and BTRFS), whereas sqlite probably uses mmap?
I was trying to find evidence that reflink copies are atomic and could not and LLMs seem to think they are not. So at best may be a btrfs only feature?
> Clones are atomic with regards to concurrent writes, so no locks need to be taken to obtain a consistent cloned copy.
I'm not aware of any of the filesystems that use it (Btrfs, XFS, Bcachefs, ZFS) that deviate from expected atomic behavior, at least with single files being the atom in question for `FICLONE` operation.
Taking an LVM snapshot and then copying the sqlite database from that is sufficient to keep it from being corrupted, but you can have incomplete transactions that will be rolled back during crash recovery.
The problem is that LVM snapshots operate at the block device level and only ensure there are no torn or half-written blocks. It doesn't know about the filesystem's journal or metadata.
To get a consistent point-in-time snapshot without triggering crash-recovery and losing transactions, you also need to lock the sqlite database or filesystem from writes during the snapshot.
PRAGMA wal_checkpoint(FULL);
BEGIN IMMEDIATE; -- locks out writers
. /* trigger your LVM snapshot here */
COMMIT;
You can also use fsfreeze to get the same level of safety:
What does "losing transactions" mean? Some transactions will have committed before your backup and some will have committed after and therefore won't be included. I don't see the problem you are trying to solve?
Whether a transaction had started and gets transparently rolled back, or you had prevented from starting, what is the difference to you? Either way, you have a point-in-time snapshot, that time is the latest commit before the LVM snapshot.
You're discussing this in terms of "safety" and that doesn't seem right to me.
This isn't really a personal or controversial take on the issue. There are easier ways to back up a sqlite database, but if you want to use LVM snapshots you need to understand how to do it correctly if you want to have useful backups.
Here's a scenario for a note-taking app backed by sqlite:
1. User A is editing a note and the app writes the changes to the database.
2. SQLite is in WAL mode, meaning changes go to a -wal file first.
3. You take the LVM snapshot while SQLite is:
- midway through a write (to the WAL or the main db file)
- or the WAL hasn’t been checkpointed back into the main DB file
4. The snapshot includes:
- a partial write to notes.db
- or a notes.db and notes.db-wal that are out of sync
Result: The backup is inconsistent. Restoring this snapshot later might:
- Cause sqlite3 to throw errors like database disk image is malformed
- Lose recent edits
- Require manual recovery or loss of WAL contents
In order to get a consistent, point-in-time recovery where your database is left in state A and your backup from the LVM snapshot is in state B with _no_ intermediate states (like rolled back transactions or partial writes to the db file), you have to first either:
- Tell SQLite to create checkpoint (write the WAL contents to the main DB) and suspend writes
- Or, flush and block all writes to the filesystem using fsfreeze
The last time I had to call fsfreeze (or xfs_freeze) before creating an LVM snapshot was maybe 17-18 years ago on a system with rather old (even for the time) "stable" versions of everything, where it was not yet integrated.
Since decades, lvm triggers a freeze before snapshot creation and an unfreeze afterwards, if the filesystem supports it.
> fsfreeze is unnecessary for device-mapper devices. The device-mapper (and LVM) automatically freezes a filesystem on the device when a snapshot creation is requested. For more details see the dmsetup(8) man page.
> freeze_fs
>
> called when VFS is locking a filesystem and forcing it into a consistent state. This method is currently used by the Logical Volume Manager (LVM) and ioctl(FIFREEZE). Optional.
> List of these filesystems include (2016-12-18) btrfs, ext2/3/4, f2fs, jfs, nilfs2, reiserfs, and xfs. Previous list may be incomplete, as more filesystems get support.
Using LVM snapshots is absolutely fine for creating consistent and complete copies of sqlite databases. You have to to copy the rollback-journal or WAL, too: they are part the db, it will checkpoint to the db file when you open it or its copy. Every committed transaction before the snapshot is in it. It is complete.
And even if you do use a mode below PRAGMA synchronous=FULL (
https://sqlite.org/pragma.html#pragma_synchronous ) -- FULL is needed to ensure durability as it fsync()s the WAL on every commit -- you are better of than with a system crash: the freeze writes out every dirty page to disk, which is not guaranteed in case of a crash, where in NORMAL mode, you might lose the transactions written to the WAL file, but not to synced to disk yet.
You are wrong, the whole point of the WAL is to make SQLite crash-consistent. Same as the rollback journal. SQLite will safely rollback partial writes. I don't know where you got the idea that in WAL mode, SQLite ditches its consistency guarantees somehow.
If you can corrupt a SQLite database by pulling the power or simulating it by taking a block-device snapshot, this is a serious SQLite bug and you should report it.
Sorry but your point is just completely wrong and I am not sure why you have this belief. It is extremely normal and safe to backup SQLite via block device or filesystem snapshots. Are you under the impression that SQLite cannot recover from a power loss...? The whole point of using a log is that you can recover from crashes mid-way through updating the database file by replaying the writes from the log.
If it will convince you, I went and asked ChatGPT like you recommended and it agrees:
> If you want to use snapshots:
> Use a filesystem or block-level snapshot tool that guarantees a point-in-time, atomic snapshot (e.g., LVM snapshots, ZFS snapshots, Btrfs snapshots, or VSS on Windows).
If you were going to "Tell SQLite to create checkpoint (write the WAL contents to the main DB) and suspend writes" as you suggest is necessary, why even bother with a snapshot at that point?
I'm sorry you took "you're wrong" as aggressive. I am providing sources though and other users are agreeing with me, will you please look into it? Or ask ChatGPT if SQLite is crash-consistent? Please consider the possibility you are spreading misinformation...
While I run and love litestream on my own system, I also like that they have a pretty comprehensive guide on how to do something like this manually, via built-in tools: https://litestream.io/alternatives/cron/
Litestream is really cool! I'm planning to use it to backup and restore my SQLite in the container level, just like what that ex-google guy who started a startup of a small KVM and had a flood in his warehouse while on vacation did. If I'm not mistaken. I would link here the perfect guide he wrote but there's 0 chance I'll find it. If you understand the reference please post the link.
Despite the beta label and lack of a 1.x release, I would consider the project pretty stable. We've used it in production for over 18 months to support an offline-first point of sale system. We haven't had any issues with Litestream.
>You can't copy the file of a running, active db receiving updates, that can only result in corruption
There is a slight 'well akshully' on this. A DB flush and FS snapshot where you copy the snapshotted file will allow this. MSSQL VSS snapshots would be an example of this.
Similarly you can rsync a Postgres data directory safely while the db is running, with the caveat that you likely lose any data written while the rsync is running. And if you want that data, you can get it with the WAL files.
It’s been years since I needed to do this, but if I remember right, you can clone an entire pg db live with a `pg_backup_start()`, rsync the data directory, pg_backup_stop() and rsync the WAL files written since backup start.
For moving DBs where I'm allowed minutes of downtime I do rsync (slow) first from the live, while hot, then just stop that one, then rsync again (fast) then make the new one hot.
Works a treat when other (better) method are not available.
If the corruption is detectable and infrequent enough for your purposes, then it does work, with a simple “retry until success” loop. (That’s how TCP works, for example.)
Not all corruption is detectable. You could make a copy during a transaction where only a subset of the transactions saved pages are persisted but all branch & leaf pages are pointed to correctly. That would give you a state of the database that never actually existed and break atomicity.
Well, I don't know rsync that well. If you're saying it doesn't detect changes to files while it's being copied, then I'll believe you.
And, as far as I know, it's impossible to detect absolutely all corruption.
But you can pretty easily detect, e.g., that a file has or has not changed since before you copied it to after, on a system with a basically functioning filesystem and clock, with a reasonable/useful level of confidence.
> Of course! You can't copy the file of a running, active db receiving updates, that can only result in corruption
Do people really not understand how file storage works? I cannot rightly apprehend the confusion of ideas that would produce an attempt to copy a volatile database without synchronization and expect it to work.
The confusion of ideas here is understandable IMO: people assume everything is atomic. Databases of course famously have ACID guarantees. But it's easy for people to assume copying is also an atomic operation. Honestly if someone works too much with databases and not enough with filesystems it's a mistake easily made.
It was early days... very early days. He didn't have the benefit of trying to help his (metaphorical) grandparents get their emails or worked under a manager who thinks 2023-era ChatGPT is only slightly less reliable than the Standard Model of Physics, if not slightly more.
Of course! You can't copy the file of a running, active db receiving updates, that can only result in corruption.
For replicating sqlite databases safely there is
https://github.com/benbjohnson/litestream