This looks great! Would it be possible to add the total estimated cost for retrieval? At first sight it looks like downloading at a slower rate is substantially cheaper although it is not. I think this would give a better idea to your user how much downloading the data will end up costing.
This looks really interesting to me. One thing that I'm not quite clear on is whether or not backups made using Arq to Glacier can be restored using other software.
I'm mildly concerned about what could happen if I need to restore software in three years on a new version of OS X (or a different OS entirely), possibly after sreitshamer has been hit by a bus and not updated things.
Seems like bandwidth costs would kill this, restoring a TB is $120 in bandwidth alone, then you have the retrieval fee.
I agree with some others that the cheapest for now is buying more HD and keeping them in sync with crashplan or something else, until Google Drive drops a bit further in price for 1tb.
I do a rotation of hard drives in my house. I think it would be useful for the "my house burns down" scenario. In other words, I will probably never need the restore, but if I do, cost really isn't an issue.
Don't forget buying a separate facility to store your extra hard drives and your time regularly doing full restoration tests to confirm that you haven't been hit by silent bitrot.
The OP quotes storing a terabyte a month as costing $10 on Glacier. I haven't done mass backup to S3...but I was almost expecting this to be the cost for a TB on S3, not on Glacier.
You can currently get a 3TB, USB 3.0 TB drive for as little as $120. At $10 a month to store 1TB on Glacier, you'll surpass that cost in a year.
I know Glacier data is likely more safe than moving your files to an external HD and then putting it in your safe deposit box...but the HD ends up being cheaper in the medium-term and far more flexible and faster. It's unlikely your house will burn down and all of your available backups...it's even less likely that that situation will happen and that this last-resort external HD will somehow also be lost, right?
The use-case that Glacier is intended for is long-term archival preservation. When archivists say "long-term", they are thinking decades at a minimum.
When you're thinking like that, you need to ask yourself what 1TB actually means. Your USB disk the the equivalent of a VHS tape in 1980... it holds a few hours of video, but to keep the quality of the original, you need to maintain it. Will a USB drive in a safe deposit box work in 20 years? Will you have a USB3 interface handy in 20 years?
Digital media is really tough to archive over long periods of time vs. paper. I worked with a US State archivist on a project, and they stuggle with the issue. They have a good handle on the papers of the colonial government from the 1600's, but have a difficult time accessing the early digital records from the 1980's.
So putting stuff in glacier, and letting Amazon handle storage migration, break/fix, replication, etc for $120 per TB/year sounds like a value to me.
If we assume that Amazon's service survives the next twenty years, I still don't see how we can reasonably expect to be able to read the files we stored there. File formats become obsolete all the time. If you don't maintain your data it is quite likely that it is useless independent of where you store it.
That is a problem, but it's a problem that you have with "roll your own infrastructure" as well. Digital data isn't like paper -- archiving doesn't mean keeping tapes in a vault. You need to periodically refresh and check the integrity of the media.
Amazon's service gives you an expectation that 99.999999999% of your data will survive intact on average every year. So while you need to maintain your data logically, you can set clear expectations with respect to the integrity of that data.
The fact that Amazon's infrastructure may not be around in 20 years isn't especially relevant either -- neither will any maintained infrastructure. Tapes need to be refreshed, archival storage infrastructure usually ages out in a 8-12 years, and you need to do a TCO analysis each time to refresh.
The other thing is that you get physical isolation. Flooding hasn't been a problem in lower Manhattan or Staten Island in recent memory. How many safe deposit boxes do you think were destroyed in those two places recently?
I know Amazon says they have 11 9s, but merely given that there have been 5 mass extinction events in the past 500M years, you probably can't do better than 8 9s. (11 9s gives you even odds that your data will last 70 billion years, as long as you pay Amazon's fee.)
You're probably right that Amazon holds your data for as long as that, however, there's another way to look at 11 nines: If you upload one hundred billion (10^11) files on Glacier, you expect to lose one of them per year, and not more. That's probably realistically achievable.
Compare this with S3, where you will lose 1 in one billion files per year. Shocking!
While physical media formats and hardware interfaces become obsolete all the time, it's something of a myth that file formats do the same. Even proprietary formats can usually be read by digging up a copy of the software that created them (in theory this software may be lost, but typically it is not). The one major exception is DRM'd files where the keys have been lost.
This is truer today than it once was. Go back, say, 15 years and you'll find any number of word processor and presentation formats that are only readable by long-gone software. I agree that, today, we've (mostly) coalesced around a modest number of standards that are either open or, at least, widely used and well-documented.
I'm not sure what you mean by file formats becoming obsolete all the time. This may be true for some proprietary formats that are no longer readable by a modern application, but for any reasonably widespread format there should not be much difficultly. I don't see any reason why we wouldn't be able to read most of our text, image, and video data decades down the road.
Remember, we're supposedly entering the "post-PC" world. How do you intend to open a WordStar document or Quattro in 2040, after 30 years of app stores and sandboxed apps?
The presentation of documents are critical as well. Will complex documents created in Word 95 render properly on the iPad XXXVI in 2040?
Archival file formats like PDF/A, open specification formats like OOXML and open source formats like ODF will help with this issue, but I'm sure there will be many audio and video files that will be unintelligible in 30 years.
The other alternative might be to review your current archival storage format every year, and upgrade while the interface is still available but new storage formats are coming out.
You are neglecting the value of your time here. If you have photos, videos, or other information that accumulates often, going to the bank on a regular basis to sync your drive to your computer sounds like WAAAAY more hassle than just paying $10 a month on Glacier. Assuming it takes 1 hour a month to back up your info, in order for it to be worth it, you would have to value your time at less than $10/hour, completely discounting the cost of the drive.
Yes, you have a good point on the inconvenience of regular syncing...but I guess my use-case for Glacier would be for files that do not need to be regularly synced.
For example, as a photographer, I have roughly 1.5 TB in photo files. The vast majority of these I will never need to access again because the ones that I've liked, I've either stored in other drives (for example, a small drive of wedding photos for my portfolio or for recent customers) or on online services (yes, I know this is a cost in addition to the external drive, but I was using these services already, and for reasons other than just storage).
So for me, Glacier would be a place where I dump a chronological store of photos, a first-in-last-out kind of system. There's no need to regularly sync any of these photos. There may be a need to go back through certain folders years from now if I decided to compile something for a family album or something bespoke like that. And for that reason, it's necessary to have a last-resort store for these files.
So if I have 1TB of files that need to be rarely accessed and never updated, it's hard to justify spending that on Glacier (assuming I have a pretty safe external place to store this drive). At the rate that I accumulate files, it should only take me about an hour a year to backup photos. (plus the time it takes to find the hiding place for the hard drive).
Actually, even in your use case, the cost of glacier (or some other service like http://bit-chest.com/ ) is justified. However, you have to see it as a kind of "fire insurance" for your files. Even if the total monetary cost, computing time and materials that you use to store the files yourself is probably a lot smaller than putting your files in Glacier, what you get from Glacier is that the files there are SAFE. Eleven nines safe. No worries about burning houses, earthquakes, storms, flooded data centres, just safe. Getting that degree of safety would be hard to do on your own, and probably push your costs into similar regions as the ones that Amazon offers.
Amazon strongly implies that your data is stored triple redundantly. You'll have to buy more than one disk to get triple redundancy. Also, you'll need to "scrub" the data regularly to detect bit rot (100 bytes per TB are expected to go bad every year). And probably store the files with redundant coding to protect against this bit rot.
Plug: I am working on a startup (submitted to this YC round!) to solve this problem using user^Wcustomer-owned hardware for precisely the sort of reasons you describe. I am looking for co-founders. If anybody wants to talk, email is in my profile.
One external online provider really only counts as "one copy", ever. This is primarily because you cannot audit the ongoing storage architecture and processes of any given provider. You're looking for SPOFs, not how many disks may hold data replicas. One software error (or site/account hack) can wipe out all of your data. Or an entirely out-of-band error occurs: the provider goes belly-up.
Cloud storage is awesome in many ways. Yet it doesn't replace your backup strategy, it merely complements it.
'scrubbing' is just a fancy way of saying that the files are read from media, checksums recomputed, and compared against stored checksums. If checksums are different and data was stored redundantly, then recovery is carried out, and correct data is written back to media.
I'm too lazy and don't really do it with my multiple DVD, CDs and HDD backup directories. But ideally, I should be doing it. My startup will make this sort of thing easy and automatic.
Search for zfec. Allows you to split a file into N chunks, only M of which are required to reconstitute the original. Protects against N-M independent errors/corruptions.
Sorry, I think I should retract that statement which I seem to have recalled mistakenly. The error rate seems to be quite a bit lower than that, so I will post an article here after I research it thoroughly.
Sorry, my comment above wasn't well thought out. Amazon's durability guarantee is on a per-object basis, not on a per-byte basis. I will post an article here after I research it thoroughly.
It's not about losing the drive: hard drives break all the time. Externally cased ones especially as they are subject to many more shocks on average. Drives that haven't been spun up in a long time are even more likely to fail upon spinup.
This is a good point. I was thinking that a drive rarely turned on would last longer than the usual 2-3 years...but it's a bit of a worry if you don't know if it'll turn on at all after 5 years.
I have an old Lacie Rugged 160 GB external drive that's 4.5 years old, and rarely used. It had a 1 year warranty. It's still working. I don't rely on it totally these days, though.
At $10 a month to store 1TB on Glacier, you'll surpass that cost in a year.
Does those $10 include the servant which does the backup and drives the hard-disk off-site for you automatically, as would be the equivalent of an automated job with s3cmd/rsync (and whatever you have for arq and glacier)?
If not so, you are missing something crucial in your equation.
I think it's assumed in the Glacier use-case at hand that the data is not time-critical. The continued-existence of 1TB of old photos is important, not immediate access to it after my house burns down.
The backup time and storage for my use-case is mostly one-off. I don't have a need to rsync a folder containing photos from 2008 because I will not be making critical updates to that folder.
You aren't comparing apples to apples. You aren't even comparing apples to gorillas.
First, you would need more than one hard drive. You would need to buy at least two hard drives(even two may expose you to more risk than you desire). Even if you do opt for just two, and do manual mirroring, it now takes two years to catch up with glacier.
Second, $240 upfront is more costly in this environment than putting $10 up per month for 24 months. Glacier costs will very likely fall over two years, and that fact will be reflected in your statement at the end of the month. HDD costs will fall over the two year period, you can't utilize any of these lower rates, since you've already sunk your capital into two outdated hard drives.
Third, glacier is designed to have data constantly written to it, and only infrequently read from it. This is difficult to pull off with external hard drives that you have locked in a safe at the bank.
Glacier looks great, until you look at the retrieval costs!
If you use it as a backup strategy and you're going to need access to all of your data then you're going to have to extract it from glacier in a relatively short time period. This might cost you an arm and a leg (relative to the storage costs)!
If your use case matches the price structure in that most of your data is write and forget and occasionally you'll need a small piece of it back then it's a fantastic service & as last-ditch backup options go it's a whole lot better than nothing at all!
(edit: not sure that calculator is entirely useful since it appears to squeeze the entire download into a single 4 hour period.)
In particular the assumption that "Retrieval time is how long it takes Amazon to fulfull your request. Retrieval time has nothing to do with your download speed." seems to be the root cause.
But this isn't just any online archival solution, it's Glacier, which is being positioned as a last-resort place for files.
Online storage services are useful for continual syncing, which using an external drive would be inconvenient for (if that drive was stored off site). Glacier is for files that you aren't likely to access again for months-or-years-on-end, unless there's an emergency or you're doing something bespoke, like making a family photo album and need to go back through all your files again.
So filling up an external drive and keeping it off-site and using Glacier is not as much apples and oranges.
(I'm the guy behind Arq) It's also useful as a second-tier backup. If you use Time Machine or SuperDuper you may never need to retrieve your Arq backups, but it's nice to know it's there in case your house burns down or somebody steals your computer + hard drives.
BTW, I'm not dissing on your product at all (and apologize for going off on a tangent). It could be that Arq is the glue that makes Glacier far more superior than any kind of personal storage solution. I'm just bickering about the cost of Glacier alone...but otherwise, I think it's great to have a Glacier-accessible option.
I didn't take it as a criticism of Arq. I wish costs were lower as well. On the other hand, I see there's a big difference between a hard drive vs bottomless storage that's remote, replicated, durable, and "always" accessible, so the cost doesn't seem so high to me. I've been of the opinion for a while now that the safest way to store bits is on spinning hard drives -- safer than DVD, tape, turned-off hard drives.
I wasn't very interested in Glacier at first, but the more I hear about it, the more I'm convinced it's a perfect companion to time machine backups. Why pay $13/mo for CrashPlan or some other similar service when I can pay a fraction of that for Glacier?
Of course, this makes sense for me because my backup is rather tiny (~300 GB). The "unlimited" plans at backup services are worth it in certain scenarios I guess.
Note that Glacier+Arq also allows you something entirely different from CrashPlan and incremental backups: Data archives. I.e. you can store data there cheaply that you don't have to keep available on your hard drive. Very useful for storing data that's too large to have on your drives all the time, like the pictures and videos mentioned in other comments in this thread.
Crashplan is cheap b/c they make it quite difficult for the average user to _actually_ backup all of their data. The interface is all sorts of confusing to non-experts, and I was embarrassed to realize I had fallen for a UI trick that made me think my drive was backing up, when in fact it was only my user directory. I'd made the same mistake on the 6 computers in the family, and had to do a bunch of phone-tech support with in-laws.
I paid for a year, so I'm using it now as additional backup, with my primary backup on a local HD and Arq.
And it is 5-10x as expensive as Glacier itself...
Also, assuming Arc stores your data in a 'standard' way, I only have to worry about Amazon going out of service. Here there is also the risk of this small german company.
That is indeed true, there are currently no mechanisms in place for ensuring users' data in case of insolvency. However, we're working on getting an escrow agreement in place that would ensure that Glacier vaults can be transferred to their "owners" in the case of insolvency. Additionally, there are some provisions in german GmbH laws that give warning before insolvency occurs and allow clean processing in the case.
Concerning the price, it is indeed a multiple of pure Glacier. However, they include, apart from lots of convenience, also all auxiliary costs that you'd have to manage yourself for a "pure" Glacier account. As noted in another comment, you'd have to keep an eye on retrieval cost, early deletion and additional S3 storage for metadata.
If you're not interested in tinkering with IAM keys and retrieval rates, I still think Bit Chest is a good value proposition.
You should have pointed the disclaimer on your original, blatantly advertising, comment.
I prefer to use the space I have on Amazon S3, under my own account with them, which I pay directly to them, than using some other obscure, third party, close beta service.
I edited to add the disclaimer. Thanks for calling me out on that.
I understand that most people on HN will probably use Glacier directly, and that is totally OK. The service that I am offering is not "making it possible", rather it's "making it simple, so much so that your grandmother can use it". You do use Dropbox in the same way, don't you?
Also, the reason it's obscure is that is less than a week old and has less than 40 users. I wouldn't expect any offering of that age/size to be well-known, so I'm quite satisfied with my obscurity.
ct is cents. For Europe, that will be Euro cents, all other areas will be billed in USD cents.
I'm disappointed they chose to charge an upgrade fee to former customers. Especially after a year of serious bugs and many users having to replace their backups completely. I'm not sure if I'd still recommend it, but I am happy to see the Glacier support - it looks like a natural fit.
Given the retrieval costs, it seems to be that Glacier is better for an "if all else fails" backup. A cheap-to-store, last resort when your other backups fail. Still, it's nice that Arq calculates the retrieval fee for you.
(I'm the guy behind Arq)
Yes. Unless you want to retrieve very slowly, in which case the "peak-hourly retrieval" fee is lower. Or retrieve just a few things (and stay under the free-retrieval maximum).
The site seems to have gone down at the time of this writing. I am looking for a backup alternative to Time Machine because it seems to have a memory leak. The fraction of available memory just shrinks, even if I quit everything. I've turned Time Machine backups off in System Preferences and rigged launched to back up only 3 times a day, and now I'm no longer having to reboot every other day.
A shame, because TM seemed to work perfectly in Snow Leopard.
Like many, I welcome the inclusion of Amazon Glacier.
I don't think that alone merited a major version bump[1], but understand that the developer needs to feed his family, so will gladly pay for the upgrade.
[1] The ability to create the backup bucket, and not have it auto-create, for example, would be something that I would consider a major version bump.
Are the metadata stored in S3 or Glacier? Can you choose to recover only particular files? Then you can use Glacier not only as a last resort backup, but also as a defense against bit-rot if say one backup fails and the other is corrupt.
(I'm the guy behind Arq) The backup record data are stored in S3. Otherwise it would take 4 hours just to look something up, which would be unusable. You can choose to recover individual files, just as before.
As a longtime Arq user, I'm bummed there is no way to convert my existing Arq S3 backups to Glacier without uploading everything again. I know this is a limitation of Amazon's Glacier API, so maybe it will be possible in the future..
"In the coming months, Amazon Simple Storage Service (Amazon S3) plans to introduce an option that will allow you to seamlessly move data between Amazon S3 and Amazon Glacier using data lifecycle policies."
In addition to the $25 retrieval fee there will also be a $120 bandwidth fee, so we're looking $145. And it will take 12 days to complete, at this speed.
Bumping up the speed by a factor of 10 will bump up the retrieval fee by the same factor, so $250 + $120 = $370 total restore cost in a little over a day's time.
Hm. I wonder if you can bill the insurance company for this in case of fire etc.
Alternatively, it would be nice if Amazon allowed several accounts to pool together their retrieval allowance - it's not likely that all of my friends will have their house burn down at the same time.
It's actually a bit more complicated than that. You get 5% of your data back for free every month but it's prorated at a data transfer rate of (5% of your data)/num days in month = data per day. if you store 1TB you can retrieve 50GB/month = ~1.66GB/day.
If you go over that it's more expensive but they prorate it based on how much free data you get and how fast you were going while you downloaded it.
I wouldn't trust Amazon Glacier. I'm guessing that they have a secret Glacier datacenter in the Artic Pole. What happens when the glaciers melt? You are done for.