Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Betty – English-like interface for the command line (github.com/pickhardt)
190 points by jrpt on May 4, 2014 | hide | past | favorite | 81 comments


I wonder if the author of Betty has taken a look at one of the existing Inform (http://inform7.com) interpreter variants used for Interactive Fiction games (what we used to call "text adventures"). They have progressed a lot since the old text adventures of yore, and are now capable of parsing pretty complex English sentences (instead of just "take sword", you can write "examine sword, then take it and give it to the wizard", and the interpreter will even ask you about ambiguities, such as "which sword? the magic sword or the long sword?").

Then again, I'm unsure if the precision needed from command line interfaces can ever be improved with natural language. Natural language is imprecise, ambiguous, redundant, and confusing even to our human interlocutors. Why would we want to interface with computers in English, instead of using a synthetic and concise made-up language? I'm talking specifically about the command line, which is for power users anyway.

(A perfectly valid answer is "because it was fun to write Betty, and that's its own reward", of course).


No, I wasn't aware of that, but that is interesting.

Betty just grew out of my cheat sheet of commands. I was tired of repeatedly looking up things in my cheat sheet or Google, and decided to make this.

The current project has a limited number of commands, but I am hoping that by making it public, others will issue pull requests with things they'd want to use.

Its current state is command line only, which is for power users. But thinking ahead, on the roadmap I wrote that by the time it reaches v1.0, I'd like it to be extensible. This means it can be used from text-to-speech apps or whatever. Some futuristic use cases: I'm sitting on my bed, with my computer on my desk, and say "Betty: next song" and it just works. Or I'm running on my treadmill, talking aloud: I'd say "Betty: open the New York Times in Chrome" then "Betty: read me this article" and it just works. That'd be cool. I don't know specifically how to do everything I'd want to do, but those are the sorts of far future things that would be cool. In its current implementation, it's mainly for not having to keep looking up commands or options.


That would be cool indeed and I hope my comment didn't sound too negative. And you should definitely take a look at Inform (and similar), it's amazing what they can do. I think some interpreters are open source (edit: it seems it's not, I remembered incorrectly).


Ironically, with version 7, Inform itself switched to natural language syntax. The following is valid Inform 7 source code:

`The iron-barred gate is a door. It is north of the Drawbridge and south of the Entrance Hall. It is closed and openable. Before entering the castle, try entering the gate instead. Before going inside in the Drawbridge, try going north instead. Understand "door" as the gate.`

However, what makes a command like "examine sword, then take it and give it to the wizard" complex is not the parsing, but the world model. What if examining the sword reveals an engraving saying "Never hand this to a wizard"? What happens to giving if taking fails? What if the wizard himself is holding the sword? What if he attacks anyone holding a weapon? Etc.

Writing a natural language command line interface seems a lot more straightforward.


Unfortunately, as I understand it, while the "old" v6 toolset is open, the "in" tool that takes "regular" text and turns it into v6 "code" is gratis, but not open sourced:

https://groups.google.com/forum/#!topic/rec.arts.int-fiction...

Some more on how inform7 differs from more traditional interactive fiction systems:

http://www.onlamp.com/pub/a/onlamp/2006/06/08/inside-inform-...

As for this tool: I like it. I like the idea that it's simple, and that the author didn't let details like it being difficult to do "well" or "better" stand in his way. I've thought along the same lines many times, not so much "how did I do _ (solution often involves awk, sort, uniq et al) -- I'm too familiar with the command line for that, but rather: Jeez, you've got a smart phone. The one guaranteed interface that works is audio in/out. How hard could it be to deploy solid limited domain voice control coupled with a dead simple state-machine for doing stuff like "next song", "stop", "play", "accept call".

I wonder if these projects (there must have been projects, right, ever since the first uptake of mp3s) get the first part working, then some moron suggests that the user will expect to be able to do "play song such-and-such by such-and-such" -- and then it's no longer (strictly) limited domain, and that doesn't work, so the scrap the whole thing.

[edit: Some more on "newstyle" inform vs traditional IF:] "A Comparison of TADS 3 and Inform 7" http://brasslantern.org/writers/iftheory/tads3andi7.html


Inform was my very first programming language. Now that I code in Ruby all day, I'd probably find something like TADS less frustration prone, but Inform is incredibly powerful for its domain.


Here is another modern interactive fiction system of which I'm the creator and maintainer. It is fully opensource (BSD license): http://code.google.com/p/aetheria/

It can also parse sentences like the one with the sword and the wizard shown above. It's really not very difficult to achieve those things, as imperative sentences have pretty uniform characteristics that make the problem of parsing imperatives much easier than general parsing. As said in another comment, the bulk of the complexity of these systems is in the world model, not in the parsing.

A drawback is that, although the system works for several languages including English, the documentation for game developers/IF writers is only in Spanish at the moment. Collaboration for translating it to English would be very welcome.



The very smart people at Wit.AI[1] are building a backend for this sort of thing. You could do worse than hook into that in the short term.

https://wit.ai/


My favorite command-line parser was in the (now-defunct) Ubiquity [edit: not unity] project from Mozilla. It had internationalized parsing, extendable verbs and noun types.

http://mitcho.com/blog/projects/a-demonstration-of-ubiquity-...

edit2: another link https://wiki.mozilla.org/Labs/Ubiquity/Parser_2


Yes! I remember this!

http://vimeo.com/1561578


And "Parser 2" was much better than that! It could predict verbs from nouns and use multi-word verbs without hyphens.


The historical problem with things that parse almost English is that it's hard to remember where the line is drawn between stuff the system will and will not understand. Once upon a time, reading about this stuff, that seemed significant. On reflection, I'm not actually sure how to weight it, relative to the benefits.


What about for the blind?


Every natural language interface eventually will be required to generate a new program. For example, if you say my parser supports query like "count number of words in file X", someone would want to issue a queries like "count number of words starting with character a and at most length 3 in file X" and so on.

Natural language interfaces eventually fall on its face not because parsing is really hard but rather because after you parse, how do you generate a program on the fly to solve the user's problem.


It is indeed a hard problem, but there is some research going on in this area http://people.csail.mit.edu/nkushman/papers/naacl2013.pdf

and the application is definitely worth it, because it lowers the barrier for doing "complicated" stuff for people that are not very computer-savvy


This may be why concerning of testing in ruby, dhh supports native Test::Unit instead of Cucumber [1].

1. http://cukes.info


Crazier alternative:

1) Install howdoi https://github.com/gleitz/howdoi

2) Write commands like the following (all work as expected):

    howdoi get time in bash | sh
    howdoi clear bash | sh
    howdoi find current user bash | sh
    howdoi restart computer bash | sh
    howdoi fork bomb | sh
#YOLO


It might be worth mentioning that howdoi is just a wrapper around a google (essentially a "I'm feeling luck"-search as far as I can tell)?

It's hardly the same as a curated set of specialized pipes. Nor does it work offline (howdoi configure eth0/enable wireless...).


It actually uses StackOverflow.


Indeed it does, I only read to around:

    SEARCH_URL = 'https://www.google.com/search?q=site:{0}%20{1}'
and didn't look much further.


Add this to your .bashrc:

    function yolo() { howdoi ”$@“ bash | sh ;}
and use it like:

    $ yolo get time
      1399402426
note: dont actually


Really interesting program, although, as Andrew has said, seems kinda limited. That's not your fault at all; any early project will seem limited in scope. Given some popularity and extra effort, something like this could be the Siri of the command line.

Which gets me thinking -- is stuff like Siri and Google Now really just like this? Core set of pre-set commands surrounded by regex magic to recognize said pre-set commands? Interesting.

Begs the question : is it possible, using current knowledge in machine learning and NLP, to create a English-like interface for #{some_device_or_program_here} that learns and self-develops the English commands from the user? Sort of like how Bayesian spam filters (http://www.paulgraham.com/spam.html) don't have core preset hardcoded set of Spam-Related Words and classify them accordingly, but instead takes an initial corpus and then learns and self-develops from the user after that.


I think having a prefix (be that "betty" "b" and/or some "mode" for your shell) that signals you want to use natural language (with all it's ambiguity) is a good idea. The zsh way of suggesting "did you mean" rather than simply erroring with "command not found/invalid syntax" drives me nuts -- but a lot of people seem like it. Having a prefix allows you more freedom at "learning" -- ambiguity isn't so terrible if the user expects it (and the simple idea of just listing alternatives that betty uses seems like a great interface. Not as sexy as "I'm feeling lucky"-style (super-)high scoring wins, middling scores ties and asks user to pick -- but I think it may win on the principle of least surprise).

Anyway, it does seem that most proof-of-concept voice-control (as opposed to text-controlled) systems use a prefix too "siri"/"glass"/<microsoft had one, can't remember which, also they have "xbox". The idea is that if the mic is always on, you don't want your drones to blow something up, just because you jokingly told a friend "kill it with fire" in a voice call. Context is hard to get right for such systems, I expect the kinetic and similar systems can do better (if user looks at computer, listen. If user is already speaking "in conversation with" to computer, listen. Otherwise ignore, unless user asks for computer by name).

As for you question, I think it should be relatively easy to train, say a music-player app to understand stuff like "next song", "accept call", "repeat" -- in any language, using simple statistical methods. Not sure how far you could take it though (example, dictation software still makes (AFAIK) enough errors that it's not really a viable option if the user already can type reasonably well (or hire an actual stenographer)).


Stuff like "next song", "accept call", etc. can be done with extensive knowledge in machine learning and some clever work.

The really tough bits will be stuff like,

"Siri, check if PBS Idea Channel has uploaded any new videos, please."

How will Siri know you mean the YouTube app? How will Siri know what "check if X has uploaded any new videos" means? How will Siri know you mean "PBS Idea Channel" and not the channel called "PBS Idea"?


Cortona was released with an API to external apps, and that only allows simple pattern matching, so that's similar. Obviously the built-in stuff is more complex. I've no idea about today, but the original Siri was mostly just chaining relationships together by keyword matching in an ontology.

You could probably train a system like this with a word alignment approach if you generated a training corpus. But ideally you'd want to be able to show the system a new manpage and have it map arguments correctly.

Also a false positive in a SPAM filter is bad, but `rm -rf`ing because of the vagaries of the English language is worse.


Other than being fun; I can't quite see the benefit of making my shell more verbose. I'm sure I am not the only one who can get by with the regular commands through mnemonics.

And to further my point; if this was such a good idea, then why has nobody aliased 'ls' to list, 'cp' to copy, (etc.)? Or, while we're at it, 'list files', or 'list files in current directory'. Maybe someone has and I just didn't meet them yet.

I do see the point of a programming language to be a bit more verbose, since it makes sharing between humans easier, as understanding old codebases – but even there, people are split (Java is too verbose! Ruby is natural! Lisp is too much abbreviated! etc.). But using a shell, issuing commands, is fundamentally different to programming, in my opinion. Consider the analogy of a crane (just to avoid cars for once), what would you prefer: Waving your finger around in thin air, or using levers? And before you answer that: while the former seems way more cool, it doesn't work well. Getting rid of mechanical controls has been tried experimentally by the Air force (I read that back in the 90's; sorry no link to a source now): They were able to interpret commands from the pilots brainwaves, but manual control was consistently faster. We humans just evolved this way: Thinking of moving a plane into particular direction is slower than moving your hand accordingly to yank a stick – it seems to have a higher more mental load for us. Do you picture your arm moving your mouse? I bet you "just do".

But I digress; though the fundamentals in play are the same. My fingers are fast; why should I slow down the whole thing by making them type more?

Of course, a shell using of single-letter commands would be equally useless to an overly verbose shell. The sweet spot, in the end, might be a personal preference and dependent of daily usage and experience. In my case, that's a bit more than the POSIX command set, and – to exaggerate a bit – not writing essays on the command line.


The whole point of this is when you don't remember the command which you are trying to achieve. (See the author's comment[1]: "Betty just grew out of my cheat sheet of commands. I was tired of repeatedly looking up things in my cheat sheet or Google, and decided to make this.")

It's faster than a google search.

[1] https://news.ycombinator.com/item?id=7696453


I'd say the time spent looking it up (and maybe creating an alias/script) is less than typing it verbosely each time.

I don't see 'betty whats my ip address' as a valid alternative to 'ip addr list'. And in case of 'betty next song' vs. osascript -e 'tell application "iTunes" to skip to next track' I'd just alias the latter to something short & memorable (though I just wrote it from memory).


It becomes interesting when you can tell Betty to do things you don't otherwise know how to do. (How do you tell iTunes to skip this song from the cmd line? I don't know, and I'm too lazy to look it up.)


Do you mean skip it once, or skip it forever?

The first is really an "advance to next track" command.

The latter is a "mark current song as Never Play and advance to next track".

I'd be surprised if there was any kind of interface with iTunes that doesn't have at least one of those capabilities.


This is classically what `apropos` is there for, but I don't see any mention of iTunes in there.



Err, serves me right. Nonetheless, the bottom line of my post should hold true.


Natural language interfaces can be very frustrating when they (inevitably) don't recognize (or worse, misinterpret) your input... I'd rather have commands that are easy to remember (i.e. simple and consist).


A cute tool, but seems incredibly limited. Why would I use any of these commands more than once if I can just alias them? While some of them might be useful I really just don't see myself using a tool like this.


And just for the kicks:

function genie(){ query=`printf "%s+" $@` echo $query result=`curl -s "https://weannie.pannous.com/api?out=simple&input=$query"` echo $result say $result 2>/dev/null } alias hey=genie alias how="genie how" alias what="genie what" alias when="genie when"

me:~$ what is the root of pi It is approximately 1.7725

me:~$ how old is obama Barack Obama is 52 years old


A PR implementing that for Betty: https://github.com/pickhardt/betty/pull/4


me:~$ capital of australia

answer: Sydney.

hmm this is wrong, how to submit a fix for this?


From what I gather the developer is allowing users to use natural language so they won't have to remember arcane commands.

I have looked up how to do _____ multiple times because I don't remember the command or syntax. Creating an alias is nice but then I still have to remember my own alias or look it up.

This tool makes it so you don't have to remember anything. Assuming it works well it will yield a great user-experience.


It's some pretty basic regex, so you're going to have to remember the correct phrasing anyway,


For now. The idea seems to make it English-like so that could simply be a matter of time.


the jump from simple regexes to "English-like" is quite the leap.


Sure, I'd say people should look at this like a MVP. If there's value there I'm sure someone will find a better way to reach that English-like goal.



for real. everything you can do with this you can do with aliases for the most part. Plus, this actively prevents you from actually learning all the various core *nix cmd line utilities.


by printing them before it executes them?


The idea is awesome though.


That's great. However, I don't see an end to capture all the ambiguity of language and especially misspellings... can this be contained in a reasonably fast running script?

  ~/tmp betty my ip
  Betty: Sorry, I don't understand.
  ~/tmp betty my ip adress
  Betty: Sorry, I don't understand.
  ~/tmp betty whats my ip address
  Betty: Running curl ifconfig.me


This is an interesting idea, but I'm not sure it has much use outside of proof of concept. Most people don't need to use the command line, and those who do don't find it that difficult.

In fact, for the most part the command line is very usable once you've learned the philosophy behind it. I remember reading something about the BSD flavors of UNIX along the lines of "BSD is very user friendly, just not for inexperienced users". I see the command line as being similar. It's very powerful when you understand it, but a bit opaque when you're learning it. Fortunately, you can learn the basics in an afternoon.

Now one useful aspect of this is that it makes commands that would require a long command line much simpler than they would otherwise be. However, I find that whenever I have a long command line that I use repeatedly I just write an alias for it and add it to my .zsh-aliases file. I like this solution better because it is more customizable for my workflow.


Well, an argument could be made for having the alias-file be something like:

    alias ll="ls -l"
      doc = """List files -- long list format"""
      natural_match = "list all files long size date owner group"
And have:

    "list all files by size"
match this (as betty does now: running ll "docstring". Then, when you realize that "list all files by size", really is "ls --sort=size", possibly with "-l", you could just add another definition:

    alias lss="ls -l --sort=size"
      doc = """List files -- sorted by size, long list format"""
      natural_match = "list all files sort by size long format"
      natural_exact_match "list all files by size"
And now have:

    "list all files by size"
Score higher for lss than for ll (the scoring function is assumed to be some combination of tag count, proximit match, substring match etc, a so-called hard problem, but only needs to match over defined aliases -- and trying hard to first match the "exact_match" part...).

Of course, it should all be paired by a define-command or something that autmatically saves these monstrosties to a file or other db. So that you could do:

     # Lots of head scratching and experimentation, until:
     nmcli dev wifi list |sort -k8 -n # list wifi access points by
                                      # signal strength

     # define prev as "list wifi access points by signal strength"
     > displays commmand and prompt for tags etc, then on "y" at y/n
       prompt saves definition to "alias" file.

Come to think of it, it's a little like mandating/recommending users to document their aliases, and to have the shell automatically suggest from the docstrings. I guess zsh actually does a bit of this already (I'm still sticking with bash, it's gotten good enough for me over the years, and I'm always terrified by something, that if it isn't posix/ksh also isn't bash -- as that just makes scripts brittle...).


exactly .

Also this is not the best example to start with imo,

  betty whats my username
  Betty: Running whoami
  jrp
may be should put better example in the beginning


Seems like this could be handled more like the Symbolics Genera command line, which offered extensible intelligent commands with context-sensitive completions. Here's a demo by Kalman Reti: http://youtu.be/o4-YnLpLgtk


Interesting project. I know for us geeks the short obscure commands are sometimes easier but something like this can ease people into using command line more.

I applaud the author because typically I don't have the courage to even attempt solving problems like this. And I wonder what the author is going to do about this.

Because well ... natural-language processing just isn't there yet, you can kind of "fake" it a bit by looking for keywords and patterns and regex but it only goes so far.

So beyond a certain point, you end up needing to solve the natural-language processing problem first, so that you can then build Betty.


Betty looks pretty interesting. Adding tab completion would certainly cut down on the amount of typing with the length of the commands. Do the commands also accept short variants? Such as i for iTunes?


Why are all computerized natural-language "assistants" all modeled as female? (Siri, Cortana, Betty...) A remnant of the view of women's role as "secretaries"?

It's interesting to see that this applies to real-world products, which are still kind of "dumb" or single-minded (as in default voices for GPS navigation assistants), while in fiction, when computers are smarter than its owner (HAL, Jarvis) they more are often than not represented as male.


I think you can find quite a few counter examples of a female AI, not least of which Cortana. It's possible to imagine S'Jet as an example, though perhaps much closer to Transcendence (movie) than AI.

Really, I think it simply has a lot to do with the voice. I'll be darned if I can recall the source, but it seems that women can usually be understood more clearly and easily than men; that claim may be only over a lossy channel, but it seems to jive with my experience (listening to podcasts on a motorcycle, women tend to be easier to make out over the noise, and even Google maps' female voice is clear over a highway's roar).


AFAIK, J.A.R.V.I.S. was modeled after a butler. As a counterexample for smarter-than-its-owner AI you have Jane from Ender's Game series.


I think it's partially region-based, in the United Kingdom, Siri is a male. (Perhaps it's somehow related to butlers, where as in the United States, secretaries were more common.)

Betty was also likely named Betty because of the last three letters, there's no voice with the application, it's only gendered because of the name.


Well Siri isn't female (and "Siri" as a name isn't a girl or boy name per se), its voice gender varies depending on the country. Apple uses the highest quality voice they have as the default. If you ask Siri what's its gender, it famously replies "I haven't been assigned a gender".

Even then though, female voice usually wins for phone voice services. Why?

Turns out males voices are more "lossy" over plain-old-telephone-service and over tiny tinny speakers (like on the iPhone).

The frequency range of a female voice maps to less loss in this speaker/service range compared to a deep male voice, where the lower frequencies get lost. So female voices are easier to understand in such circumstances.

This explains in part why OSX has a male voice, because a Mac comes by default with much better speakers than a phone does. Even then deeper voices are still not preferred. The "in a time..." movie trailer voice isn't coming to an AI assistant any time soon.

As for Cortana, the basically nude Halo AI bimbo, you have a point there, awkward choice for a commercial product, but that's Microsoft's problem. They've always had awkward marketing/branding choices, so this fits in.

As for Betty, "Betty the secretary" reference is quite obvious, but as a personal project, I'd say he's forgiven.


This seems like a recipe for disaster.

When you're at a command prompt you should always know exactly what the command you're inputting is going to do. Since Betty necessarily obfuscates your command this seems like a good way for unintended consequences.

The obvious solution to that is make it so Betty never does anything that actually matters. So then the question is; why bother with Betty at all?


That there is masterful copy.

I'll be honest, I have a general software test called "StonerTesting" - if you're high as fuck a lot of your cognitive function is distorted, so successfully using a piece of software (even its website) means you're software is in fact quite easy to use.

I'm thinking of offering it as a service.


Chance for my Plug: I'm using similar approach with cloud management at http://webservice.management. It allows you to manage VMs and run tasks across them in english. (Create a small vm with node.js or build lamp stack on vm1 and so on)


Upon seeing the list of commands, I immediately thought "this is great! I could alias betty to b and then maybe a few more aliases here and there. I could take "betty next song" and make it 'b n s'!"

and then went "oooooh, so that's how this all started."


isnt the whole idea of a command line for impatient little people like myself that dislike guis and inefficiency? this sort of defeats the whole purpose, i'm now typing sentences instead of shortcuts


To some people this might be an easier and more efficient way to wrap their head around the CLI.


Is it compatible with fish shell?

edit: answering my own question, it seems to work just fine


Between this and bropages Linux is finally taking shape as a viable contender to MSDOS! :)

Curious how many man hours have been put into Betty so far. Good work pickhardt, looking forward to taking this for a spin. Hope you put in some umask command support in this, I keep screwing that up...


COBOL was supposed to be English-like too wasn't it?


CLI in Ruby, ok.


I like this idea a lot. But why on earth it is written in ruby?! - I know the answer and it is "ok". But it is probably the reason why it won't never gain huge momentum, prove me wrong :)


Doesn't really matter what it's written in as long as its logic is easily extendable by someone who doesn't know Ruby. Preferably without having to code at all.


"DSLs" are one of ruby's strengths, no?


What language would you write it in?


Given that this program is mainly several regexp matches against the input string, perhaps perl or awk would be a perfect match here?



c :)


I knew my reply will be heavily hammered by rubyists. There is nothing wrong in the language but I think that shell tools should be written in C or in some other low level language without complex dependencies over ruby or java or python etc. or any other non-default installation stuff


You're not only being down-voted by Rubyists. For something that deals heavily with string manipulation and isn't speed-sensitive, C is most definitely not the best language to use.


I'm afraid that ship has sailed a long time ago:

    $ for bin in /bin/* /usr/bin/* /usr/local/bin/*; do
    >     if [[ -f $bin && "$(cat $bin | head -c 2)" == "#!" ]];then
    >         echo $bin
    >     fi
    > done | wc -l

    544
I imagine most of these are written in Perl (though there's surely a fair amount of Python and shell scripting in there, too), but that's still a high-level language with a complex runtime and dependencies compared to C.

To be fair, there's still about twice as many compiled binaries on that box, but having system commands written in a non-C language is by no means an exception.


I was curious about what the spread would be, so for one data point:

    $ file -b /bin/* /usr/bin/* /usr/local/bin/* | cut -f1 -d, | sort | uniq -c | sort -rn | head -n 10
       1028 ELF 64-bit LSB executable
        388 POSIX shell script
        265 Perl script
        130 ELF 64-bit LSB shared object
        126 Python script
         46 Ruby script
         38 Bourne-Again shell script
         24 symbolic link to `mtools'
         18 setuid ELF 64-bit LSB executable
         16 setgid ELF 64-bit LSB executable
After the top 10, most were symbolic links. So, seems shell scripts are by far the most common, with perl and python scripts not too far behind, and ruby making a decent appearance.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: