Wednesday, March 21, 2018

neural nets: brute force?

I don't think neural nets are brute force nor black boxes, but that's what I've heard some people say. So that's what I thought for a little while...

Neural nets seem like brute force because you generally need huge training sets to get good results out of them (no comment on what "huge" and "good results" mean).

Now, to make a comparison between brute force and neural nets, I do want to first "define" what brute force is. To me, brute force for a neural network would be: Take all possible inputs and all possible configurations of the neural net parameters. Find the error associated to each neural net parameter set by computing the error for each input and then combining those errors into a total error for that parameter set (how do we measure and combine errors? I don't know, but I hope it's continuously). Do this for all parameter sets and choose a parameter set with a minimal error.

Caveat: does a minimum exist? An infimum does, but whether that's attainable is up to analysis. Neural nets are continuous maps from input to output, and for fixed input the map from parameters to error should be continuous, and we should probably define the parameters on a closed domain (but that depends on the applications I guess). Is that domain compact? That's really up to the application... So "brute force" doesn't always even give us a parameter set unless we have a compact domain for the parameters. Ew... well unimportant. The point is that neural nets are not brute force, even if they can have issues with stabilizing at a set of parameters. That's something to look into: When do optimal parameters for a neural net exist?

One argument I've heard against the "brute force" accusation (strong word choice much?) is using the example of image processing: How big is the configuration space for an image, and do we think a neural net is using all that information?

configuration space of a 720p rgb image

If you look at just one pixel, you have to specify the red, green, and blue components, each of which can take integer values between 0 and 255 inclusive. That give 256^3=16,777,216 different colors that each pixel can represent.

If you have 1280 by 720 pixel images then you get 921,600 pixels total. So, for a 720p rgb image, we're looking at 16,777,216^921,600 configurations. Which is a lot. For context, apparently there are 10^82 atoms in the known, observable universe [1]. So, do we really think neural nets are working through every conceivable image and updating accordingly?

Well, no one is training neural nets on 16,777,216^921,600-image training sets. But usually the images of interest do not span that whole configuration space; they probably cluster in a couple of areas. Perhaps if you restrict to just those areas, neural nets are basically brute-forcing it.

So maybe this argument is a bust? I can't really finish it satisfactorily at this point in time.

looking at the algorithm of neural nets

An argument that makes better sense to me for why neural nets are not brute force: neural nets are actually tracing a path through their configuration space and looking for a local minimum of a cost function. That in no way implies they are trying every conceivable combination of parameter values. They're only looking at (a discrete sampling of) a 1-dimensional piece of a definitely more than 1-dimensional parameter space! Neural nets don't always even reach the optimal configuration because they get stuck on local minima of the cost function. Brute force would have at least gotten the global minimum (again, if it exists...)!

in conclusion...

Neural nets are not brute force. They may seem like it because you start a neural net with "no" (ie, comparatively little) information about the system it's predicting and then throw a whole lot of training data at it to make it work. And some neural nets do work really, really well. But even though they're "slow" to train, they're not as slow as brute force. And even though neural nets can work really, really well, there are rarely (never? only sometimes?) guarantees that they produce (or even approach) optimal parameters, whereas brute force should be perfect in that regard. 

And "brute force" isn't even a good analogy for neural nets! It's more like the neural nets make fewer assumptions, and so they need more training than models that make more assumptions (and thus start off kind of "pre-trained"). They're still a kind of heuristic, without all the glorious guarantees of optimal parameters and horrendously long computation times that come attached to brute force. 

No comments:

Post a Comment