Thursday, April 5, 2018

what i do right now

There are a lot of different ways to describe what most people do. It's about the context you choose. I'll try a few different contexts here as I try to describe what I do.

feature extraction

features are measurable properties of data. You might have a picture, and a feature might be how many humans are in the picture. You might have a song, and a feature might be the key, or the tempo, or a chord progression. A "good" set of feature gives important information (whatever "important" means); doesn't have overlap, or more than one feature representing the same information; and can differentiate between the data being studied.

The important thing to notice about features is that data is not always in a form that makes a given feature easily accessible; data doesn't necessarily directly describe features of interest, they might he indirectly hinted at in complicated ways. This is unfortunate because statistics generally deals with explicit data, not the hidden nuggets of information that we may really want to work with.

For example, a picture is a bunch of color data. It explicitly states how much red, blue, and green is at each pixel. What does that tell a human vs a computer about how many dogs are in a picture? Humans are good at extracting the number of dogs from pictures, and we can correctly identify the number for lots of pictures. If you run a basic statistics algorithm on image data, though, it's going to do all of its statistics on colors, not numbers of dogs. A computer needs a feature extractor to first find the number of dogs in each picture in a data set and explicitly state that number in the data (this is going to be some sort of computer vision algorithm). Then that information can be sent to statistical or machine learning techniques for more analysis.


my work in terms of feature extraction

I work on a feature extractor. It extracts qualitative (and some quantitative) information from point clouds (it can also be used on other types of data sets, but that wont be covered here). The problem with this feature extractor is that it outputs information in the form of a module, which is a mathematical object. As far as I know, there are no statistical or machine learning techniques designed for module data, so we need to translate that module data into acceptable input for statistics/ML techniques. That's what I'm doing.


my work in terms of mathematics

From a mathematical point of view I'm looking for invariants of poset modules. Invariants are features that don't change when you make certain alterations to your module. The alterations -- which we call isomorphisms -- don't change the inherent information of the module, but they do change how the module is presented. It's an idea similar to reducing fractions. \(\frac{2}{4}\) and \(\frac{1}{2}\) are two fraction that we've presented or written down differently, but they actually represent the same quantity. Here the isomorphism is reduction of fractions, and the invariant is the actual quantity or value of the number.



Can we design a learning algorithm that learns which features are important? In principle I would say that neural nets already do this, but hand-picked feature extraction and selection seems to be a big part of training a learning algorithm to work correctly. ML and neural nets don't automate data analysis. You still have to choose your features and pick a model, to some extent -- it's just the parameters of that model that get learned.

No comments:

Post a Comment