Thursday, March 15, 2018

my introduction to neural networks

The first encounter: My advisor said neural networks were slow and couldn't accomplish complicated behavior way back in 2010. No comment on whether that was true then or is true now.

From my second discussion about neural nets I "learned" that neural nets are essentially brute force. 

Darryl Wade was my third neural nets discussion partner and the first person to describe neural nets to me as a rich topic, the first to explain them as more than a trendy black box. Thanks, Darryl.

what's a neural network?

Say you have some data and some features you want to extract from that data. If you put together a training set of data (input data + their corresponding features/ground truth) you can (try to) set up a neural network to do your feature extraction for any data you collect in the future.

The set up is this: the neural network is made of nodes organized into "layers." You have some input nodes, which map into the first layer of nodes, which map into the next layer of nodes ... which map into the output nodes. Nodes in the same layer do not map into each other.

A deep neural network just has more than one layer of internal/hidden/latent (not input or output) nodes.

Each node takes in input, does something non-linear to it, and passes the result on to the next node. The parameters of the neural net are what define the non-linear functions in each node. When you put training data through the neural net you compare the output (which represents the features) to the ground truth and use the difference of the two to tweak the parameters of the neural net.

Ok, but really, how do you tweak those parameters? As it trains, the neural network traces a path through the parameter space using hill climbing. The hill climbing is optimizing for a cost function (distance between output and ground truth). Apparently this has a fancy name ... backpropagation. Why do we need a fancy neural nets name for hill climbing? This idea will need more exploration.


questions that arise from this "definition" 

For a system that I originally thought was grab and go (just set up your cost function and flip the switch!), there sure are a lot of choices to be make before you start training. How does choosing all of the following affect the predictive value of final parameters and the time needed to reach parameters that are "good enough?"
  • initial conditions
  • ordering of training data
  • step size for hill climbing (once you've picked the direction you're going to move in the parameter space, how far do you go?)
  • nonlinear functions in each node
  • metric for comparing neural net output to training data ground truth
  • And what might interest me the most (as of now): topology of the neural net
And there are probably more ... 

ideas for future posts and more questions 
  • Neural nets are not brute force. 
  • How does backpropagation work and what problem is it fixing? 
  • What are other ways to use neural nets? 
  • What do the layers do? hierarchical feature extraction?
  • What happens when the topology of the neural net has directed cycles?
  • Can you start with an overkill neural net (lots of layers and lots of nodes) and then prune it?
  • Are there ever substitutions/refactorings of the nodes that give you the same output with related response to training? Maybe you could condense a neural network or refactor it into one that's better understood. 

No comments:

Post a Comment