Sunday, April 1, 2018

controllability (and observability) (and identifiability?) for neural nets

I used the word "controllability" while writing another post and it brings an interesting thought to mind: controllability is a technical term. <blank> controllability is the idea that from any initial condition, you can reach any point in the <blank> space in finite time. <blank> could be filled in by "state," "output," and maybe other spaces associated with your system. The way that you reach the desired point in the <blank> space is by changing only the inputs to the system that you have control over. 

Controllability is a row rank type of condition in LTI (and LTD?) systems, so there should be a dual to it that's column rank... internet says it's observability. Probably that you can determine the input or state based on just output? Yeah that sounds right, both in terms of the name "observability" and the technical aspects of duality to controllability. 

What does controllability have to do with machine learning? Well, it might be nice to know, for example, what the possible outputs are for a neural network (output controllability). That might depend on neural net topology or initial conditions in an interesting way and might inform how we choose those topologies and initial conditions. State controllability might be useful for tuning or pretraining in deep learning. Perhaps state controllability could guarantee that we could tune/pretrain the neural net in a modular fashion, tuning/pretraining one layer at a time? 

Similarly, could observability be useful? If a system couldn't tell the difference between inputs that you want distinguished, then that would discount the associated neural net topology and initial conditions as plausible for the application. So that's input observability. State observability could have some uses for interpretability. Though.. just because you know what the state is doesn't mean that it's interpretable by a human, which is the real point of interpretability.  

Can we use controllability (observability) as a way to analyze neural net topology and initial conditions, as a preliminary check to see if they could possibly be appropriate for the application? Or to find the smallest/simplest possible neural nets that could theoretically solve our problems?

Another nice property of a system is identifiability. (Global) identifiability is the idea that, given enough data (any desired inputs and the corresponding outputs from the system), you can determine all the of the parameters of the system. There are different relaxations, like local identifiability, which means that, given enough data, you can determine a finite number of possibilities for the system parameters. 

I've come across identifiability through algebraic statistics, which is a field that... isn't really used in practice yet, as far as I know (says someone from topological data analysis... lol. Though I think persistent homology is used by some statisticians and the work on persistence landscapes to make (finite-dimensional) feature vectors out of persistence is a result in the right direction, and currently there are a bunch of people working towards getting persistent homology fed into machine learning algorithms). I've gotten the impression that identifiability is a theoretical result that is too technically difficult to use in practice. But identifiability seems like it's necessary for state observability, so it should in practice be easier than observability, right? Well, generalizing to observability might make the important facets of the problem more clear so it could seem easier to humans, but ... it's not a good sign. 

So, I guess, in principle, we could ask: Can we use identifiability as a way to analyze neural net topology and initial conditions, as a preliminary check to see if they could possibly be appropriate for the application?

For identifiability, my hopes are not high, nor are my expectations. Which might imply that controllability and observability aren't nice to work with, either. I think I've mostly seen them in linear systems, but... a lot of machine learning is linear algebra, right?

Came across another term: accessibility. It's weaker than controllability. But maybe I'll look into it another time...

No comments:

Post a Comment