I've tried a number of notebook AIs, jupyter ai, hex, deepnote, einblick. The one that worked best for me was einblick probably because it's data-aware. For AIs that don't support that you need to be overly specific when writing prompts, which is annoying, and you keep having to rename/reference the correct dataframes and variables (even more annoying).
Would love to exchange notes on this if you're up for it!
For louie.ai, we've been going for data-aware from the get-go, and more broadly, doing a LLM-first tool design rethink. In the large, as I look around, it feels super early for the dev community figuring out core genAI notebook tool uses, flows, & assumptions. Likewise, zooming-in on individual feature experiments, current tools feel rough & underpowered relative to what we already know is possible.
We've been forced to question a lot as we've been learning from going operational and experimenting with design. Again, if up for it, would love to chat & exchange notes!
By data-aware I mean that the AI leverages additional context about the data to generate code for a given prompt. Let's say you're asking an AI to "build a regression model for column X". To give you a targeted, executable response, the AI needs to know: which dataframes contain a column named "X"? if there are many such dataframes, which one should be referenced for the regression task? Is X a numeric column, and if not can it be converted to numeric column? Does the data need to be normalized beforehand?
If the AI is unable to answer such questions on its own, it will only ever be able to return a generic answer. That's equivalent to typing it into ChatGPT, requiring the user to modify the returned code before it actually does what the user asked for. That clearly isn't a great for an AI that operates on data. A data-aware AI on the other hand is able to provide more targeted responses that require much less user intervention because it has access to the broader context.
A couple of other benefits:
- the AI will have an easier time automatically fixing runtime errors
- it knows how to fix and transform user input into the correct data format, e.g., "san fancisco" => "San Francisco"