All those devs who skipped xpath/xquery/xslt from the bad old days of xml are go...

MilStdJunkie · on July 17, 2023

Xquery, for me, was and remains a core tool for dealing with XML specifications of surreal complexity that verge on madness. BaseX is the "Microsoft Access" xquery application, while eXist is sort of like a full framework, with package management, deployment, and that sort of thing. Other query languages might be more cutting edge, but they either 1) have a lot of stuff I don't need, or, more likely, 2) require a more permissive InfoSec setup than I am typically allowed. "Docker and any other form of virtualization are not permitted on ANY company network regardless of circumstances". Well, ok then.

Generally the next stop after xquery, for me, is text mining, either on R+Python or on Orange ML. If a miner doesn't cut it, then LLM shenanigans.

Also, xpath? It's pretty great. XQuery? Does the job. XSLT? Ok, so NOW that's the feeling of a panic attack. I've been doing XSLT for literal decades, and I still don't know what I'm doing when wrenching on a giant pile of FOP generating funhouse madness. When I am tagged into a data transformation job, I always stress that xquery is the right tool, rather than a confounding nested directory of XSLT using different parsers and different passes like a figure-8 interstate off-ramp. For FOP, though, there's really just one game in town for that. Although, having said that, me and a bunch of others are doing our damndest to show that what you're trying to do with XSLT/FO can be done way way way easier with CSS and Paged Media (either via Paged.js or Vivliostyle or any of the other zillion PMM implementations). The downside is you have to wrench some CSS yourself, but honestly, that's probably going to be easier than wrenching on DocBook-XSL or the DITA-OT or one of the MIL-STD XSL packages.

mcswell · on July 17, 2023

Glad to hear that someone else thinks of XSLT the way I do. I had to write some to deal with converting DocBook XML to LaTeX (building on dblatex, but adding some specializations), and besides being verbose (as another commenter here says), I found it virtually impossible to debug. I'd much rather write in Prolog.

cryptonector · on July 17, 2023

The problem with XSLT is how incredibly verbose it is, but maybe that's just the problem with XML. jq is to JSON as XSLT/XPath is to XML, which shows you can have pithiness.

masklinn · on July 17, 2023

> maybe that's just the problem with XML

It’s the problem from having built XSLT out of XML, that was completely unnecessary.

And then with XSLT being so verbose, having cheaped out and made the current node (“.”) so implicit. And the confusion from the dual use of templates as both match/patch and function constructs (it would have worked fine as a shortcut, but it makes grokking how things fit much harder than necessary).

I wouldn’t say jq is really comparable to xslt though, xslt has much more transformative flexibility. It’s closer to xquery.

cryptonector · on July 18, 2023

> I wouldn’t say jq is really comparable to xslt though, xslt has much more transformative flexibility. It’s closer to xquery.

jq isn't structured around data schema transformations, it's true, so writing complex transforms is not as easy as with XSLT, but it is very much possible. Essentially it's a reduction over [possibly paths to] elements of interest in `.` updating the reduction state with the new schema. You can organize your code into functions that do much what XSLs do.

jerf · on July 17, 2023

"XSLT? Ok, so NOW that's the feeling of a panic attack."

Right, so here's the secret decoder ring of XSLT: Underneath all the complexity, it isn't doing ANYTHING you can't do in your language of choice armed with an XPath library. And it is often incapable of doing even some rather simple things you can do in your language of choice armed with an XPath library.

XSLT is just a terrible programming language. That's all it is. All of the magic is in the XPath part; once you've selected the nodes you're working with, XSLT is a horrifyingly awful way of manipulating them into doing what you want them to do.

XSLT is the intersection of the worst parts of declarative programming with the worst parts of functional programming, wrapped up in one of the worst ways of serializing a programming language. What confuses some people even to this day is that they see "declarative programming" and "functional" and even "standardized serialization" and accidentally credit XSLT with the benefits of such approaches, and then if XSLT doesn't work they blame themselves for failing declarative functional programming in such a wonderful serialization format. They're wrong. It's XSLT failing them, whose origin is also people who thought if they just create something declarative and functional and serialized through XML they were guaranteed to be producing something good because those things are just so Platonically good on their own that they couldn't possibly produce something useless and broken, so it was not necessary to analyze the resulting abomination to see whether it actually fulfilled the goals, because it simply by definition fulfilled the goals by virtue of being declarative and functional and in the bestest serialization ever.

Perhaps there is a declarative, functional XSLT-inspired language that could be written that would be as good as the people bedazzled by the buzzwords think XSLT is. (Though there's no world where such a language is helped by serializing into XML; serializing a language for manipulating XML into XML is actually the worst choice possible because of the nested encoding you inevitably produce!) However, in the meantime, you don't really need to wait around for someone to produce it because it turns your favorite general purpose language equipped with an XPath library is already 90%+ of the way there.

MilStdJunkie · on July 18, 2023

That's sort of been my secret suspicion. Seriously, when a customer asks me to make some XSLT to turn a pile of XML into a bunch of spreadsheets/tables/whatevers, I try to sit them down and list out the many ways that are - from the standpoint of efficiency . . AND functionality . . AND "retain your sanity"-ness - superior, vastly, sometimes exponentially so. Xquery is one of those, but if the output is more complex than a read of delimited data, then your rec "Functional Language X plus Xpath" is the Way to Go.

Please don't tell anyone I said this, but the hardcore XML people can be a little culty. At untold meetings, I'd walk through all the functionality that they're getting - from a staggeringly gigantic XML system, priced in millions of dollars - and demonstrate, one bullet point after another, how the same functionality is available from a few lines of Asciidoc/ReStructuredText/<insert_your_tech_here>. Then, at the end of everything, I always have to contend with the last one: "But it's not XML". To which I shrug my shoulders: I can't solve existential problems. Why is this specific technology so important? My theory is that there's some very deep and possibly poisoned incentives at work here, which I don't really want to unpack because it would make me sound nasty, but the most innocent of these is the Sisyphean drive to impose labels on the chaos of business and on natural language. I think it might be a neurosis of the industry, but I've seen it outside of aero-def-landia too. Who knows? Maybe someday I too will rue the day I ever turned from the True Faith.

eyelidlessness · on July 17, 2023

While I share your disdain for XSLT, the way I look at the XSLT in a project I maintain is that it’s a straightforward (albeit labor intensive, whether manual or automated) refactor away from being directly ported to a more maintainable FP implementation… whenever I eventually have the bandwidth and buy in to do it.

It’s not so much that XSLT is a terrible langauge, it’s a terrible syntax. At the end of the day, it’s just a really awkward way to express some pretty basic functions. With a bit of care, it’s easy to imagine transforming arbitrary XSLT to any arbitrary language expressing the same semantics. This is especially true because XSLT itself is so limited in capability.

ketralnis · on July 17, 2023

XML had a bad rap and was certainly abused. But the wealth and quality of the tools for working with it is really unmatched even today.

sacado2 · on July 17, 2023

It's also the only widespread format that can deal with both rich text documents (à la HTML) and complex, structured data (what JSON is good for). It's golden when you need to add tons of complex annotations to a text document.

marcosdumay · on July 17, 2023

XML was a very good idea badly implemented to an absurd level.

A XML-lite that ditches all the noise and stupidity, but keeps the annotated text with user-defined sum-typed tags, and a type specification would be better than any standard we have today.

hollerith · on July 17, 2023

"XML-lite"

XML's creation (or definition) in the early 1990s was motivated mainly by a desire for an SGML-lite.

marcosdumay · on July 17, 2023

And it was a great goal, as it made SGML borderline usable.

optymizer · on July 17, 2023

<rant>

JSON is a pain in the ass for organizing and parsing structured data, particularly arrays of things.

It's so much more verbose and disassociates an object's definition from it's type. Let's say you have parents with children with names.

   <parents>
     <parent name="Bob">
       <child name="Alice" age="12" />
     </parent>
   </parents>

The root object is 'parents', so you know a bunch of 'parent' elements are going to follow. When you read the <parent> object, you also know it's a parent because the tag say so. You don't even need a <children> element because it's fine to have a list of <child> elements directly after <parent>.

Now here's what I think a typical JSON equivalent would be:

  {
     "parents": [
        {
          "name": "Bob",
          "children": [
            {
              "name": "Alice",
              "age": 12
            }
          ]
        }
     ]
  }

Ok, so whitespace aside, it's less verbose, but look at all the info that's missing.

What "type" is the root object? You'd say "parents", but how did you find that out? You have to know _a priori_ that a field called 'parents' would have to be there. Not a big deal on the root object because it's usually special, but how about a single parent?

Look at the 'parents' array. The only thing hinting at the fact that { "name": "Bob" } is a parent is the fact that it is part of an array, that's attached to the 'parents' field of the parent object. You have to do 'upwards' lookups to determine what this object is. The object itself doesn't have that information. Same thing with { "name": "Alice" }. How do you know that's a child object? You don't. You have to do an upwards lookup.

Now you might say "just tag the elements with their type so you can keep track of what these objects are". Let's try that:

  {
    "type": "parents",
    "parents": [
      {
        "type": "parent",
        "name": "Bob",
        "children": [
          {
             "type": "child",
             "name": "Alice",
             "age": 12
          }
        ]
      }
    ] 
  }

Sweet, now we're reaching data representation parity, but if you get an object like this from a third party service, how do you validate it without kicking in the logic to process each element? You'd have to have a 'dry run' version of your logic.

In fact, how do you formally describe the structure of these objects to another service so that the service could guarantee that it is only generating valid objects in the first place? XML Schema was a solid solution for that. JSON Schema had no support anywhere last time I looked at it. Where is it now? It looks like you could use it if you wanted to, but afaict most people generate fly-by-the-seat-of-your-pants JSON objects in code and no 'formal' validation is happening, other than the reply code from the service when the object is actually sent (if you think about it, that's just "testing in prod")

I think version 1 of XSLT, XPath and so on where pretty simple solutions to working with structured data, but people went overboard with trying to shoehorn XML into solving problems best suited for imperative code, so you got XSLT 2.0 (want for loops? no.), XQuery and XPath 2.0 abominations, various weird xlink solutions, imperative code in tags: <script>function foo() { }</script> which introduced a second syntax, and so on.

I understand why the XML world of nonsense had to be stopped, but we threw the baby with the bath water.

Don't get me wrong, I like JSON, but I also feel like we collectively took a step back and opted for 'the javascript of structured data representation'.

Maybe the rise of thick, JavaScript heavy clients had a lot to do with it? XML was never easy to work with in JavaScript, which is a shame, seeing as it has the same roots as HTML. I blame the DOM API - it has always been tedious to use in any language that implemented it.

ActionScript had that nice built-in @ syntax for selecting nodes and first-class support for XML in the language (E4X?). How we killed that first-class language support only to turn around and rediscover it in half-baked form as transpiled JSX is beyond me.

</rant>

pwdisswordfishc · on July 17, 2023

Domenic Denicola (aka the man who ruined promises) probably will as well.

https://github.com/whatwg/dom/issues/67

cstrahan · on July 17, 2023

That made me chuckle.

For those not familiar with the promise design controversy:

http://brianmckenna.org/blog/category_theory_promisesaplus

https://github.com/promises-aplus/constructor-spec/issues/24

https://github.com/promises-aplus/promises-spec/issues/94

dgb23 · on July 17, 2023

I guess sometimes Worse is not Better?

kreetx · on July 17, 2023

How did he ruin it?

EDIT: Thanks, sibling!

infogulch · on July 17, 2023

IME their opinions are split between trauma and nostalgia.

smrtinsert · on July 17, 2023

Xslt and xml apis could do neat things. I didn't mind the era

abrookewood · on July 17, 2023

Starting to see more use of JSON Schemas in my job and can't help but think ... "we had something for this previously".

cryptonector · on July 17, 2023

jq is the XSLT/XPath of JSON.

AbraKdabra · on July 17, 2023

I had that skip until two years ago when I had to use it to parse the OpenVAS API output. I wish I never had to put a stop to that skip, I hated every second of my life working with XPath.