My Projects

These are projects that I work on. Each one is like a little science baby to me. *Sniff*.



I painted this optimization surface on the wall of a former apartment. And I got my deposit back!

Protein Identification with Mass Spectrometry

Imagine you built some cool things with Legos and then a friend took them apart, leaving small chunks together. Then later you're trying to remember what you built. You browse through the Lego manual and find several pieces in chunks together that should only be there if you built the X-Wing fighter and a chunk that could have come from Robinhood's castle or from the X-Wing, but you don't find any other pieces that could have come from Robinhood's castle. And so you guess that you built the X-wing.

This is essentially how mass spectrometry-based proteomics works. Proteins, unlike DNA, are not shaped like a big zipper, and so DNA sequencing tricks cannot be used to sequence them (see DNA sequencing primer).. This is especially unfortunate because proteins are actually responsible for almost everything that goes on in a cell, so we're very interested in knowing what proteins are there. One way to do this is to chop up the proteins and sequence (imperfectly) the little pieces, and then try to put them back together to see what proteins were originally there. This works very well, but unfortunately, the problem of putting them back together is still not solved very well, and so we're not very good at finding proteins (yet!).

I'm making algorithms that intelligently decide what proteins are in a sample by looking at pieces of these proteins and putting them back together.

Many past approaches have focused on making lots of simplifying (and often inaccurate) assumptions about the way that your friend took the Legos apart, and then using that to try to "untake them apart." These assumptions are necessary, as taking things apart is always going to be easier than "untaking apart." As they say, you can't turn a sausage grinder backwards and get a pig.

My recent work effectively focuses on having thousands of friends hep me (I know, I know, I just accepted friend requests from bands on myspace) build random things from the Lego manual take the Legos apart over and over and over again, progressively giving more time (and Legos!) to the friends that get pieces that look a lot like the pieces you actually saw. This only requires taking things apart many times. Taking things apart requires fewer assumptions and allows us to build a much better model.

A Fast Randomized Algorithm for Linear/Quadratic Programming

Linear programming problems are basically problems where you're trapped inside a bunch of walls that make a polyhedron (like an oddly-shaped room, a cut gemstone, or a soccer ball with flattened-out facets), and you want to climb to the highest point inside. One way to do this is to climb up the vertices (they would look like corners of a room in 3-space). Other methods sit in the center of the polyhedron and make a huge elliptical balloon that fills it up, and then go toward the highest point on the ballon, and repeat this over and over. These things take time, especially when you consider you're in n-space, which is considerably more tricky than 3-space (*ahem*, when n ≥ 4).

My approach throws all caution to the wind (ask anyone, and they will tell you I have no caution anymore; it has all been dashed to the wind) and takes large random steps up the polyhedron. It would be like pointing a laser pointer randomly upward in the room, finding the wall that it hits, lying flat on that wall (much like Spider-man) and doing it again (so that the ray of light is flat along the wall), and repeating this over and over again. And strangely enough, it climbs very fast, because it effectively randomly samples from the vertices (again, which are like corners in the room) above you. If everyone in a room were given some amount of money, and you were able to repeatedly switch amounts with someone that has more than you, it wouldn't be long before you have the most money in the room. Any time you make a good switch, you'll never have to worry about switching back with anyone poorer than you again (plebeians!). And you can sometimes be very luck and start by switching with an extremely wealthy person early on, and then there are so few people left to switch with, you're bound to get the most money very soon.

This algorithm is often much, much faster than the other approaches. I'd rather be lucky than good!

Svmvia

Svmvia is a project that helps SVM classification (see classification primer). SVMs operate by dividing examples with a hyperplane in a transformed space, but there is a trade-off between whether you want very accurate classification or classification where you feel confident in every decision you make. Imagine you're drawing a line to separate two distantly separated clouds of X's and O's, with the exception that there is one single X right next to the O's. Should you draw the line so that all X's are on one side and all O's on the other, or should you draw the line so that the one stray X is on the wrong side, but the dividing line creates a huge chasm between the two populations of points? In short, make confident decisions where you know you'll make some mistakes, or make inconfident decisions where you don't make any mistakes (but aren't sure you won't make mistakes later)? The best choice is, as you would guess, in the middle.

Svmvia quickly finds the best comprimise, tells you, and then you can get a cup of coffee.