Lack of Turing-completeness can be a feature. Take PDF vs PostScript. The latter is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first.
By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.
For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.
If you have a nice data format like s-exprs, it's a fairly simple matter to just aggressively reject any code/data that can't be proven harmless. For example, if you're loading saved game data, just verify that the table contains only tables with primitive data; if there's anything else, throw an error. Then you can safely execute it in a turing-complete environment and be sure it won't cause problems.
Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.
It's a relatively new data format designed by Rich Hickey that has versioning and backward-compatibility baked in from the start.
EDN stands for "Extensible Data Notation". It has an extensible type system that enables you to define custom types on top of its built-in primitives, and there's no schema.
To define a type, you simply use a custom prefix/tag inline:
While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.
The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).
Here's the InfoQ video and a few threads from when it was announced:
I've looked at EDN a bit, even started a sad little C# parser. I don't see what it has to do with my previous comment, which is all about how schemas are potentially useful. I'm trying to say that after you check the schema, you don't just read the data, you execute it, and that has the effect of applying the configuration or just constructing the object.
>There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.
Of course there's a "better way": running the code in a sandbox. You could do so using js.js[1], for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)
You're right inasmuch as I shouldn't have implied that unsandboxed interpretation is the only option.
But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:
1. I can know a priori roughly how much CPU I will spend evaluating this payload.
2. I can know that the payload halts.
This is why, for example, the D language in DTrace is intentionally not Turing-complete.
I agree 100% with you, but #1 isn't completely true. The counterexample is the ZIP bomb (http://en.wikipedia.org/wiki/Zip_bomb) Whenever you unzip anything you got from outside, you should limit the time spent and the amount of memory written.
1. imposing CPU limits incurs an inherent CPU overhead and code complexity.
2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.
So now if we fully evaluate the options, the choice is between:
1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.
2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.
Sure, sometimes all the work involved in (2) is worth it, that's why we have JavaScript in web browsers after all. But a Turing-complete version of JSON would never have taken off like JSON did for APIs, because it would be far more difficult and perilous to implement.
I have to agree here. General Turing-completeness was known from the beginning to imply undecidable questions -- about it's structure, running time, memory and so on. I don't think this has a place as the 'data'.
Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.
Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.
Join me on my crusade to eliminate the use of 'former' and 'latter' in any writing unless the goal is obfuscation. It's error prone, almost always requires rereading, and never is the clearest choice.
Here's my attempt at a clearer version:
"Take Postscript vs PDF. Postscript is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first."
Pronouns are fine. Substituting 'the first' and 'the second' would be an improvement. It's specifically 'former' and 'latter' that should be deprecated. I'd be interested in seeing a study comparing readers' comprehensions of the various phrasings. What cost in clarity would you be willing to pay?
Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.
By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.
For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.