“OK,” I said, “we can do this. All I need is a librarian, an architect, and a planetary scientist.”
But let’s start from the beginning.
Proteins. They are awesome. In the human body, proteins do all the work. They hold things in place, they move things, they make things, they destroy things. They can store and transmit information (and those that do are my favourites). But how might a protein store information?
In very general terms, proteins can store information by changing the state they are in (which, in turn, changes their function). Some, for instance, have different shapes with distinct functions (think transformers). Some can undergo small chemical modifications. Some can associate with other molecules. Some can move around between different locations within the cell and fulfil different functions there. Some can do several of these things at once. These are, to use the proper scientific term, the coolest of all proteins. The way they function is so complex that we have a hard time predicting their behaviour without using computational models. Yet at the same time, we also have a hard time building computational models in the first place.
Why is that? Say you have a protein that can exist in two different shapes and be chemically modified (or not). If you combine those, that’s 4 possible states (Shape A and modified, Shape A and not modified, Shape B and modified, Shape B and not modified). Add another possible option (for instance, the ability to exist in two distinct parts of the cell), and you get 8 states. Add another one, and you get 16. Soon, the number of possibilities gets way out of hand. We call this “Combinatorial Explosion”. I like it, because it has the word “Explosion” in it. But it also makes my work sort of difficult. Because traditionally, the modelling tools and algorithms we have require us to list every possible state and every possible change that each of those states can undergo. And this is a problem if we have a lot of them.
How many is a lot? For instance, during my PhD I worked on a model of a neuronal protein that included one billion possible states, that’s 109 (a one followed by 9 zeros). During my postdoctoral work, I wanted to extend that model a bit. The extended model would have required around 1020 lines of code, that’s a hundred quintillion. And that is … well, too much. I can’t have a model with 1020 lines of code. Not just because I am lazy, but because that would be a lot of lines. How many exactly? What does 1020 mean?
Well, if the average book has around 200 pages and around 50 lines per page, that would mean there are around 10 000 lines per book (or 105). So I would need 1016 books. What does 1016 books mean?
Well, the British Library has 25 million books, the Library of Congress has around 29 million. Let’s be generous and say it has 100 million, i.e. 108 books. So, I would need around 108 Libraries of Congress to house my code. What does 108 Libraries of Congress mean?
Well, the largest building of the Library of Congress is around 2 million square feet, which (in real money) is about 2×105 square meters. For 108 Libraries of Congress, I would need something on the order of 1013 square meters, or tens of millions of square kilometers. What does that mean?
Well, the moon has a surface area of 37.9 million square kilometers, so that’s pretty close. In other words: I would basically have to cover the surface of the moon with libraries in order to hold all the books that hold all the code that describes my model.
“OOOOOOK,” I said, “we can do this. All I need is a librarian, an architect, and a planetary scientist.”
“Wait a minute,” said my supervisor, “we don’t have a budget for that.”
To be continued
Pingback: Stuck with your research? – Kaffeehaus | life and learning