Agreed. Notebook environments are great for exploration, discovery and pedagogy. They aren't so good for productionizing code.
We found this out the hard way when we tried to productionize ML code in Jupyter. We had to export to .py and add boilerplate. This works fine unless there is back and forth iteration between modeling and prod, which there invariably is; our data scientists had to make changes to the notebook and we had to redo our boilerplate, so the notebook code and production code were constantly out of sync. This could have been alleviated with automation -- but such automation is bespoke and hard to generalize.
PyCharm has a Scientific Mode (similar to RStudio's IDE approach, where you are actually writing code in a text file but are able to statefully/interactively run code by pressing Ctrl-Enter on code blocks). Spyder, Matlab and a bunch of other IDEs implement this idea too.
Unlike notebooks, this is, I feel, a good middle ground between interactive exploration and having production-ready code.
I solve this in my personal workflow by extracting the important bit to a module, editing in that module, and testing/exploring changes in a notebook by reloading the module.
This is how I work as well, where all the code I'm working with in a jupyter notebook is directly visible on my screen. Any other code is generally 'finished' and put into a text editor.
Additionally, I use the following settings in my ipython_config.py file to automatically reload modules:
Note that the autoreload features can be very tricky to use safely with Python.
For example, at least in some previous versions, Caffe and TensorFlow make incompatible assumptions about the ability to claim all available GPU memory. So there can be situations where you first import Caffe, then later import TensorFlow with restrictions on its GPU policy. If you naively re-import the Caffe code, it can evict TensorFlow from whatever GPUs it had reclaimed, and coming up with a group of settings that reliably prevent this, across possibly different machine where the notebook will be run, is very tricky.
This once led to a huge time sink because someone on my team created a mistaken GitHub issue claiming our TensorFlow model had a bug (since the notebook was producing an error). We spent all this time trying to reproduce it and figure out why it wasn't working, and eventually realized it was because of this hidden auto-reload setting on his specific IPython setup that caused Caffe to evict TensorFlow just for his specific usage pattern, resulting in strange errors because the TensorFlow model was no longer loaded in GPU memory.
There can be other problems too, like auto-reloading modules that have large start-up times (say if they load a very large model into memory). Sometimes you want to re-run a cell without auto-reload, even if you still want selective auto-reload functionality in other parts.
Thanks for explaining the downsides related to using this feature. Like all good config options, there are tradeoffs. Luckily I haven't been bitten by it yet, but I'll remember that if I run into issues
Yes, I tried doing this too but it forces me to flip between the notebook and a separate text editor (for editing that module). It's not that seamless and the context switches were a little expensive (at least for me), but your mileage may vary.
On that console / IDE point you made, IPython can still be quite good for that if you use the interactive shell.
For example, I might make two shell tabs in tmux, and make one a small rectangle towards the bottom of the screen (holds my running IPython session), and a large rectangle above it (holds my Emacs where I’m editing source code).
And I might have a third shell tab somewhere that detects any time source files are changed and re-runs unit tests.
Yes, that's definitely a possibility. Though it would be nice if it were more nicely integrated into an environment like RStudio where you can interactively set breakpoints, watch variables, etc. while still maintaining the interactivity.
I do the tmux/vim too, but for exploratory work the experience is less well-integrated than it could be with an Rstudio-like IDE.
I agree, and the IDE setups can be very valuable for certain use cases or certain preferences. The equivalent thing in the IPython shell approach, basically using souped-up pudb, is not quite as nicely interactive with the debugging cycle, since setting breakpoints, watchpoints, etc., is either a matter of editing them into the source code and re-running, or becoming a master of specifying them on the command line, both of which require stepping out of the tight iteration workflow slightly (but to be fair, they also offer more power than the preconfigured options availabe in the IDE debugger features).
Yes, my IDE is vim but it's a hard sell to a lot of folks... especially having to map a shortcut key to "import ipdb; ipdb.set_trace()" for breakpoints...
Rodeo [1] was an attempt at an IDE but development died, and now that yhat's been acquired, there's no sign of any further development. I wish the Jupyter folks would push more in this direction (and they are with Jupyter Lab) but I get the sense they are really invested in the notebook paradigm.
Well, I guess they are invested in it as a component of the JuyterLab toolbox, but JupyterLab tries to integrate it with consoles and editing windows: https://lwn.net/Articles/748937/
We found this out the hard way when we tried to productionize ML code in Jupyter. We had to export to .py and add boilerplate. This works fine unless there is back and forth iteration between modeling and prod, which there invariably is; our data scientists had to make changes to the notebook and we had to redo our boilerplate, so the notebook code and production code were constantly out of sync. This could have been alleviated with automation -- but such automation is bespoke and hard to generalize.
PyCharm has a Scientific Mode (similar to RStudio's IDE approach, where you are actually writing code in a text file but are able to statefully/interactively run code by pressing Ctrl-Enter on code blocks). Spyder, Matlab and a bunch of other IDEs implement this idea too.
Unlike notebooks, this is, I feel, a good middle ground between interactive exploration and having production-ready code.