Unsupervised Feature Learning and Deep Learning Tutorial

montecarl · on May 26, 2013

I have a question that I don't really know where to ask that relates to unsupervised feature learning (I think).

Lets say I have many-body problem, such as the energy of some number of atoms given their positions. I can calculate this by solving some equation, but its expensive. If I wanted to apply some machine learning to the problem some form of regression would work if I stuck to a fixed size (some number of atoms in a particular coordinate system).

More concretely, if I had energy as a function of two atoms, I could fit some function through the points and get something halfway reasonable.

However, if I want to have three atoms, I have no idea how to treat it. The learning algorithm has no idea about "atoms" or that their energies might be composable in some way.

What sort of machine learning could figure out and learn the underlying rules of physics and "understand" how energy should change as a function of the positions of the atoms and the underlying physics rules that must be preserved. One of these rules is that the energy must go to a constant (possibly zero depending on the energy is defined) if you separate the atoms infinitely far apart and must go to infinity if the distance is zero. Also the function must be smooth with continuous derivatives.

I haven't made my point very well, but perhaps someone would know what literature to read about unsupervised learning algorithms that could "learn" physics.

tfgg · on May 26, 2013

People have tried applying machine learning to learning many-body potentials in physics for use in speeding up quantum molecular dynamics while maintaining most of the accuracy. What you'd do is say that the total energy of the system is a sum of local potentials on each atom R, where the input is the local environment of atom R:

E = \sum_R e(env(R))

You use some method to create some features env(R) that makes the translational and rotational invariance of e() easy, with a radial cutoff beyond some distance, and then model e somehow. I think the most promising method is Gaussian Approximation Potentials, which use Gaussian processes to model e and (what they call) a bispectral decomposition to represent the local environment around the atom.

http://prl.aps.org/abstract/PRL/v104/i13/e136403 (free arxiv version: http://arxiv.org/abs/0910.1019)

Without the above simplifications like modelling it as a sum of local potentials, and making env(R) a cutoff, you would indeed just be fitting a 3N dimensional function in the case of N atoms. It'd be exact, but it'd also blow up pretty badly and utterly nontransferable. Also, the energy surface isn't necessarily continuous and differentiable -- consider the energy when two atoms move to occupy the same position.

I suspect that the bispectral decomposition to give env(R) could be improved by using unsupervised feature learning to learn better features such as "we're 5 angstrom away from a surface". I've seen talks where people have hand-optimized feature sets to include things like "there's an aromatic ring pointing at us from 5 angstrom away" that a simple function + cutoff might miss.

montecarl · on May 26, 2013

I think I might have read a different paper on GAP before, this one seems to have some more detail of their philosophy. Thank you very much for the link.

You say that: "Without the above simplifications like modelling it as a sum of local potentials, and making env(R) a cutoff, you would indeed just be fitting a 3N dimensional function in the case of N atoms. It'd be exact, but it'd also blow up pretty badly and utterly nontransferable. Also, the energy surface isn't necessarily continuous and differentiable -- consider the energy when two atoms move to occupy the same position."

Those contraints and representations of the problem are the thing I would want the machine learning algorithm to discover. Is this beyond the scope of the state of the art algorithms in machine learning? I understand that making a good choice for the representation of the problem should make the job of the learning algorithm easier, but finding the representation that is best is quite challenging.

tfgg · on May 27, 2013

I think it's possible, it'd just take serious computing power. There are a number of physically guaranteed symmetries which would be silly to make a machine learn:

a) Permutation symmetry of identical atoms b) Rotational symmetry of system c) Translational symmetry of system

I suppose it could learn them approximately, given enough examples, but why bother? I think it'd be kind of like not using a convolutional neural net for recognising digits in photos and just using a bazillion more weights and examples.

I'd say you'd start with a completely general learning model which respects the above symmetries and then see where it takes it.

However, I don't know how you'd make a transferable N-body potential from a model taught only on some number of atoms. Again, I guess it's kind of like training a CNN handwriting recogniser on 256x256 images and then applying it to arbitrary sized images, which you can only do by assuming locality and translational symmetry of the features.

wslh · on May 26, 2013

What is the place of support vector machines once deep learning techniques are going mainsteeam?

textminer · on May 26, 2013

Am I crazy, or is the short-run benefit more for unsupervised feature learning more than classification accuracy? If I recall from my brief reading on the subject, kernel-aided SVMs and Random Forests are basically equivalent to a three-layer deep graphical net, but on painfully feature-engineered inputs. I'd wager learning features is more useful now than any benefit of a super-deep architecture.

(I would love for a practitioner to shoot this down or corroborate my hunch-- awfully new to neural nets, having basically read a monograph, played with Theano, and watched some Hinton lectures.)

ihodes · on May 26, 2013

What did you read, and would you recommend it? I'm looking for some good reading on the subject.

textminer · on May 26, 2013

Learn and be well: http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

Explains and motivates the use of autoencoders (and sparsity constraints) and restricted Boltzmann machines, too.

ihodes · on May 28, 2013

Thank you!

awc · on May 26, 2013

SVMs may still be useful as a classifier sitting on top of a deep model. Plugging the features learned during unsupervised pre-training into an SVM instead of a neural net is perfectly valid.

Lower layers of the "deep" model are (typically) performing non-linear dimensionality reduction. i.e. generating a set of high-level features which make subsequent classification easier than it would have been on the raw input data.

fiatmoney · on May 26, 2013

You can use SVMs for deep learning, for one.

http://books.nips.cc/papers/files/nips25/NIPS2012_1290.pdf

maurits · on May 26, 2013

Handouts and two video lectures that accompany this tutorial:

http://www.stanford.edu/class/cs294a/handouts.html

zachwill · on May 26, 2013

I feel like ensembles are just so much easier to work with — and can be incredibly accurate given you take enough time to fine tune the parameters and provide the right features. Most ML problems that I deal with fit really well with Gradient Boosting, and it's awesome to be able to see the breakdown of how decision trees are voting.

lightcatcher · on May 26, 2013

Using dropout with deep neural networks is an (extremely cheap) way to gain many of the benefits of using an ensemble. If you want a great overview of the dropout technique, watch this tech talk by Hinton: http://www.youtube.com/watch?v=DleXA5ADG78

Also, the recent maxout algorithm (from Montreal group, authors of Theano) that got state of the art results on several datasets is essentially just an algorithm designed to do particularly well with dropout (as far as I understand it).

SatvikBeri · on May 26, 2013

You can use Neural Networks in an ensemble, and it works quite well since both ANNs and Decision Trees are highly unstable and prone to overfitting. It does significantly lower your visibility into how the algorithm works though.

TheDelta · on May 26, 2013

Thanks