Lack of Turing-completeness can be a feature. Take PDF vs PostScript. The latter...

andrewflnr · on March 8, 2014

If you have a nice data format like s-exprs, it's a fairly simple matter to just aggressively reject any code/data that can't be proven harmless. For example, if you're loading saved game data, just verify that the table contains only tables with primitive data; if there's anything else, throw an error. Then you can safely execute it in a turing-complete environment and be sure it won't cause problems.

Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.

espeed · on March 8, 2014

Have you checked out EDN yet (https://github.com/edn-format/edn)?

It's a relatively new data format designed by Rich Hickey that has versioning and backward-compatibility baked in from the start.

EDN stands for "Extensible Data Notation". It has an extensible type system that enables you to define custom types on top of its built-in primitives, and there's no schema.

To define a type, you simply use a custom prefix/tag inline:

  #wolf/pack {:alpha "Greybeard" :betas ["Frostpaw" "Blackwind" "Bloodjaw"]}

While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.

The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).

Here's the InfoQ video and a few threads from when it was announced:

https://news.ycombinator.com/item?id=4487462, https://groups.google.com/forum/#!topic/clojure/aRUEIlAHguU, http://www.infoq.com/interviews/hickey-clojure-reader

andrewflnr · on March 8, 2014

I've looked at EDN a bit, even started a sad little C# parser. I don't see what it has to do with my previous comment, which is all about how schemas are potentially useful. I'm trying to say that after you check the schema, you don't just read the data, you execute it, and that has the effect of applying the configuration or just constructing the object.

eurleif · on March 7, 2014

>There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.

Of course there's a "better way": running the code in a sandbox. You could do so using js.js[1], for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)

[1] https://sns.cs.princeton.edu/2012/04/javascript-in-javascrip...

haberman · on March 7, 2014

You're right inasmuch as I shouldn't have implied that unsandboxed interpretation is the only option.

But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:

1. I can know a priori roughly how much CPU I will spend evaluating this payload.

2. I can know that the payload halts.

This is why, for example, the D language in DTrace is intentionally not Turing-complete.

Someone · on March 7, 2014

I agree 100% with you, but #1 isn't completely true. The counterexample is the ZIP bomb (http://en.wikipedia.org/wiki/Zip_bomb) Whenever you unzip anything you got from outside, you should limit the time spent and the amount of memory written.

eurleif · on March 7, 2014

If excess CPU/non-halting behavior is the issue, you could run the code with a timeout.

haberman · on March 7, 2014

You could. But that has downsides also:

1. imposing CPU limits incurs an inherent CPU overhead and code complexity.

2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.

So now if we fully evaluate the options, the choice is between:

1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.

2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.

Sure, sometimes all the work involved in (2) is worth it, that's why we have JavaScript in web browsers after all. But a Turing-complete version of JSON would never have taken off like JSON did for APIs, because it would be far more difficult and perilous to implement.

darkmighty · on March 7, 2014

I have to agree here. General Turing-completeness was known from the beginning to imply undecidable questions -- about it's structure, running time, memory and so on. I don't think this has a place as the 'data'.

Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.

Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.

nextstep · on March 8, 2014

http://en.wikipedia.org/wiki/PostScript#The_language

PostScript is Turing-complete.

nkurz · on March 8, 2014

Join me on my crusade to eliminate the use of 'former' and 'latter' in any writing unless the goal is obfuscation. It's error prone, almost always requires rereading, and never is the clearest choice.

Here's my attempt at a clearer version:

"Take Postscript vs PDF. Postscript is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first."

haberman · on March 8, 2014

That's like campaigning to eliminate pronouns. "Former" and "latter" are just like "it", except they are for when you mentioned two things.

Repeating the proper noun doesn't achieve the goal of emphasizing the fact that you're referring to something you just mentioned.

nkurz · on March 8, 2014

Pronouns are fine. Substituting 'the first' and 'the second' would be an improvement. It's specifically 'former' and 'latter' that should be deprecated. I'd be interested in seeing a study comparing readers' comprehensions of the various phrasings. What cost in clarity would you be willing to pay?

johnminniton · on March 8, 2014

Did you even read his comment? That is precisely what he said!

"Take PDF vs PostScript. The latter is Turing-complete"

Danieru · on March 8, 2014

I think that is what haberman meant.

Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.