The Upspin manifesto: On the ownership and sharing of data (2014)

spc476 · on Oct 26, 2017

I'm reading the manifesto, and for some reason, a story I once heard, possibly apocryphal, came floating up out of my memory: a person was using their PC when a coworker came up asking for a copy of a spreadsheet and handed the first person a floppy disk. That person took the disk, inserted it into the PC, then launched Lotus 1-2-3, loaded the file, and then saved it to the floppy. The second person was incredulous that the first person just didn't copy the file directly from the hard drive to the floppy. The first person replied, "you can do that?"

zerocrates · on Oct 27, 2017

I have a family member whose standard workflow involved doing the same with Word on Windows. Save As was the only way he knew how to copy things.

He's a baby boomer. I'd assume this is only more common now with peoples' main interactions with computing being in an app-focused, hidden-filesystem world.

jacobush · on Oct 27, 2017

Yes, but this is not as bad as opening the word doc, selecting everything (with the mouse), copying it, opening a new empty doc, pasting it in there, save. This is how you copy word documents. These are pretty easy. Some other file formats are harder, like Excel sheets with tabs.

Now you may be tempted to think, "people do that?!". Of course they do. If you can imagine it, someone is doing it.

jpeloquin · on Oct 26, 2017

The manifesto's idea that files should have a unique address which any machine can access reminds me of Brian Hauer's rant, which argues that each application should have a single instance with a unique address (for a given user) that any machine can access (http://tiamat.tsotech.com/pao). Put the two together and a person's entire digital life would seamlessly follow them between machines.

I like the proposal of making caching a central design element to work around today's bandwidth limitations. I work with large-ish (a few TB) scientific datasets, and it isn't pleasant to have to choose between (a) storing everything on network storage and suffering slow IO or (b) storing everything locally on every workstation and suffering the need to synchronize data.

scoot · on Oct 27, 2017

" it isn't pleasant to have to choose between (a) storing everything on network storage and suffering slow IO or (b) storing everything locally on every workstation and suffering the need to synchronize data."

That's a solved problem, which depending on the workload could involve a compute farm with a clustered or distributed file system, a copy data management solution, or cachefs, for example. The are also solutions for shared storage between containers across multiple nodes.

jpeloquin · on Oct 27, 2017

> That's a solved problem ... compute farm with a clustered or distributed file system, a copy data management solution, or cachefs, for example.

Thank you for suggesting some potential solutions to my data management problems. From googling them, I get the impression that you are thinking primarily of enterprise scale data management (i.e., organizations with server farms and an IT department), whereas I'm primarily thinking of organizations with < 30 employees (who mostly use desktop software) and a single file server. My particular situation is an academic lab. However, I think these solutions can can still work with a little adaptation:

Copy data management seems to be the use of block level data deduplication or virtual disks on a server in order to decrease the disk utilization per VM or per container. I'm not completely sure; I found mostly marketing documents, and there's no wikipedia entry. This, as well as clustered/distributed file systems, would apply if we turned the file server into a VM server and had each employee work on a VM via remote desktop. In principle, this could work, and would let temporary employees (e.g., summer students and visiting scholars) get started quickly with a standard OS environment. I will experiment with this when our last batch of desktop PCs hit end of life.

CacheFS looks useful if we start using NFS to connect to our file server instead of Samba. It looks like Windows 10 (Enterprise) supports NFS caching too (https://technet.microsoft.com/en-us/library/cc976862.aspx). I will try this.

krylon · on Oct 26, 2017

That sounds a lot like IPFS (Interplanetary filesystem) [https://ipfs.io/]

Or am I missing some key point here?

hawkinsw · on Oct 26, 2017

Rob Pike explains this manifesto in several videos available on YouTube:

https://www.youtube.com/watch?v=ENLWEfi0Tkg

It's a fascinating talk. I really enjoyed it. I hope you do too!

helper · on Oct 26, 2017

The 2014 label is a little weird. While it was written in 2014 it was only just published publicly today.

natural219 · on Oct 26, 2017

Definitely a fan of this project, but I'm intensely curious as to why they separated from Camlistore, which seems like a very similar project and is also headed by a key member of the Go core team (Brad Fitzpatrick). Anybody from either of those two projects care to comment?

Motivation: There are 100s of initiatives trying to solve similar problems, and they could be solved relatively quickly if engineers deigned to work together on a solution instead of splintering off into hundreds of fractured groups.

jff · on Oct 26, 2017

Brad Fritzpatrick's response: https://news.ycombinator.com/item?id=13700968

natural219 · on Oct 26, 2017

Ah, thanks for the speedy response.

> The main difference I see is that Camlistore can model POSIX filesystems for backup and FUSE, but that's not its preferred view of the world.

This makes me want to throw things. I'm actually mentally discounting both projects now on the charge that core authors seem to care more about bickering over technical details than implementing working solutions to these society-breaking problems.

jff · on Oct 26, 2017

Andrew Gerrard worked on both and apparently didn't think Camlistore was the right basis for what they wanted in Upspin. But I'm sure you, who I'm not sure has used either project, know better than Andrew and Brad and Rob.

natural219 · on Oct 26, 2017

I am claiming I do, yes, and would happily make my case to any of them for why they should do the hard work of agreeing on minor technical details and merge the two projects. It is the easiest instinct for engineers to "split off and code their own version" over technical disagreements, and why we have a dizzying array of incompatible, half-completed decentralization projects while Facebook and Twitter continue to eat society.

Thank you again for the info/backstory, though. I am just a naysayer who has sat through 1000 pitches of Fitzpatrick's basis thesis back in 2010 and seen excruciatingly minimal progress in the space of "actually making these things work for normal people".

enneff · on Oct 26, 2017

I know both projects intimately and they are not "minor technical details" but rather fundamental architectural differences.

natural219 · on Oct 26, 2017

I'm happy to discuss this further -- my life-passion-project is to see decentralization through -- but fear I've overstepped my bounds in this thread and am taking away focus from the project at hand, which I am a supporter of.

scoot · on Oct 27, 2017

"I'm actually mentally discounting both projects now " "the project at hand, which I am a supporter of"

Which is it?

natural219 · on Oct 27, 2017

I keep a ranking of decentralization projects in terms of how likely they are to succeed and catch on. Camlistore and Upspin have been near the top of my list for years now (Camlistore was the one that originally inspired me to quit my job at Twitch and do decentralization advocacy full-time). I am now slighly less excited about both projects, although they still have incredible potential and I would be overjoyed if either of them met with minor success.

At this point, I get the sense that Upspin/Camlistore don’t really _want_ to succeed in terms of catching mass-market success and disrupting the innovation-stifling tech giants. It seems like they’re more interested in scratching their personal itch and being content with that. Totally fine, but I’m going to be slighly less excited about releases from both of these projects in the future unless I get indications that the core team members are willing to escape the same trap that plagues all standardization schemes (https://xkcd.com/927/)

skj · on Oct 27, 2017

(Gerrand)

jff · on Oct 27, 2017

Well that's embarrassing, I've been reading his name online for years and always read it as Gerrard. I'll just blame small fonts, that's it.

DonbunEf7 · on Oct 26, 2017

Sounds like Named Data Networking, which cannot come soon enough. Is anybody doing commercial NDN yet?

skybrian · on Oct 27, 2017

Assuming they got similar adoption, I'm wondering why I should use Upspin rather than Keybase? It seems like Keybase's users and groups are more sophisticated, and its private git support is immediately useful.

shykes · on Oct 27, 2017

I am a big fan of both Upspin and Keybase. In my view they are quite different.

1) Polish vs Openness:

- Keybase is a polished product and a closed identity platform;

- Upspin is open-source plumbing and an (almost) open platform.

2) Focus

- Keybase is a crypto identity platform which happens to have a file storage app;

- Upspin is a file storage platform which happens to have a crypto identify feature.

Note: I call Upspin "almost open" because it does not support running your own key server in a private namespace. All users must use the same public key server. In exchange for a slightly less open platform, Upspin gets a strong guarantee of a single global namespace, which is a really great feature for end users. I think it's great that the project is clear about its priorities and the tradeoffs it's willing to make, and communicates them upfront.

4ad · on Oct 27, 2017

Of course you can trivially run your own key server and your own upspin universe. But then of course, you can't talk with other people.

The problem with upspin is that there's a SPOC keyserver, not that there's a single namespace. You could trivially have a single global namespace with many delegated keyservers using DNS. You know, like e-mail. But the authors unfortunately don't want that.

scribu · on Oct 27, 2017

> From a human point of view, the data is all we care about: my pictures, my mail, my documents.

I don’t think it’s that simple.

Consider messaging apps: do users care about their own messages? No - a list of sentences is meaningless when detached from the overall conversation.

So in who’s $HOME do you store that conversation?

mark_edward · on Oct 28, 2017

Both, like email

xfer · on Oct 26, 2017

Is there any specification of protocol? Or the code is the specification for now(and in flux)?

enneff · on Oct 26, 2017

The core interfaces are pretty well-documented: https://godoc.org/upspin.io/upspin and the wire protocol: https://godoc.org/upspin.io/rpc

We think the APIs have settled down a lot now, but they may yet change.

aidenn0 · on Oct 27, 2017

The manifesto talks about networked home directories; has anyone used AFS/Coda/InterMezo and can they speak to how the experience compares to NFS?

Karrot_Kream · on Oct 26, 2017

Hm I'd love to marry this to git-annex. Not sure why I haven't heard of upspin before.

alpb · on Oct 26, 2017

Why does the title read "(2014)"? This seems to have been published today. It's new content, even though it's probably internally prepared in 2014.

dang · on Oct 26, 2017

The article was written in 2014. If, say, a letter from Mark Twain got published for the first time today, we'd put the year it was written in the title.

nickm12 · on Oct 27, 2017

Count me among the confused. Works are usually referenced by their publication year and, on Hacker News, I always assume something with a (YEAR) in the title was published in that year.

pm · on Oct 27, 2017

That seems strange, and the example you give is less applicable than one might first think.

A letter is private correspondence; the date it is written is essentially the date it is published. A manifesto, on the other hand, could be considered a public document, and so its date of publishing may not necessarily be the date it was written.

I get that it was written a while ago and only made available for public consumption recently. However, appending a date to the article name signals to most users that the article is an older article that may have been previously posted. This is the part that's confusing, and I think masks what is essentially a new document in the eyes of the public.