Disease should beware – LBMs have arrived

Humanity’s Michael Geer explains what Large Biological Models are and why they are the next giant leap in health.

Beware the Ides of March

Last Friday was the Ides of March, and it is a day that should be marked as a turning point in our communal fight to stay healthy. Thousands of people and trillions of dollars have been spent on trying to win our war against chronic disease. Our loved ones and heroes have first lost the joy in their life and then life itself. My beloved dad James Geer died on this day in 2019. Every year I unhealthily repeat to myself for weeks around this time “Beware the Ides of March”. I don’t like that. I don’t want to do it anymore. I intend, on this forsaken day, to shine a powerful spotlight on an idea that many are already working on that I believe is the turning point we have all been waiting for.

100 years of hard work and failure

Unfortunately, most objective measures show we haven’t made all that much of a dent against chronic disease. Sure we make “breakthrough” after “breakthrough”, but we don’t see our loved ones losing life at much of a slower rate. As most of us have read in one article or another, other than our major win against infection with antibiotics nearly a century ago and some admittedly clever surgery techniques over the years to save a small percentage of us from major physical birth or living related defects/damage, we have not lengthened our health spans much at all in over a century of earnest full on scientific trying.

The turning point

But, and yes there luckily is a but, this is the beauty of turning points. They are not overnight successes. They come from years of innovation and belief and failure and, yes, a decent amount of dumb luck (or is it fate?). It is my strong belief that the massive first domino has finally now fallen with the mainstream success and massive funding now piling into LLMs (Large Language Models – one very popular example ChatGPT is an application built on top of OpenAIs LLM). And what I propose today, to commemorate Ides of March, is this…

I propose we focus and adjust and, in some lovely cases, continue many of our efforts towards the building of a new variant of Large Model. The Large Biological Model – aka an LBM. Whereas an LLM is trained on large and varied datasets of words (L for Language), an LBM is trained on large and varied datasets of biological measurement values (B for Biological values).

Disease should beware – LBMs have arrived

Coining the term Large Biological Model – LBM

So hopefully at this point, I have piqued some geeky and visceral intellectual curiosity in you. Also, I am fairly certain you now have many more questions than answers. Let me then take this moment to set your expectations correctly. I write you today without all the answers. I also would be remiss if I led you to believe that I invented any of this. There are also other traditional terms that have been used to describe similar things, like “foundation/foundational models”. I am, however, here to coin the term Large Biological Model, and am doing so because I believe it is actually fundamental to the massive leap we are about to take in health. We should follow closely the path of LLMs and therefore naming these models LBMs is actually quite key for that to materialise. Let me explain why following the LLM path, with all its nuanced and evolving genius, is so important.

LLMs went against all conventional wisdom

LLMs actually represent a much larger departure from status quo thinking than most truly appreciate. Also, probably important to add, that most of us, including myself till very recently, didn’t quite understand what LLMs were under the hood.

Side note: I will not attempt to fully explain how LLMs are made and what makes them work here, but here is an amazing (I do not use that word lightly) explanation video made by Andrej Karpathy. It is well worth the actually only 40 minutes of your time and, I feel, hits that magical balance of being useful for both highly technical and “I can just about handle joining a Zoom call” folks.

LLMs at their base are large files of parameters (weightings) of how words are related to each other. They are built from training neural networks on massive amounts of random words on the web pages of the Internet. The important concept here is that they are not trained on specific information or specific websites. They are trained on fairly arbitrary information and websites, but just a lot of it. This is NOT how almost any other AI/ML (Artificial Intelligence/Machine Learning) algorithms/models have been trained in the past. The status quo thinking was always that the datasets that models were trained on must be clean and carefully curated. We all have heard the saying “garbage in → garbage out”. But something magical happened with LLMs.

It’s alive!

The LLM parameter files appeared, at first, to be fairly useless, as the conventional wisdom would assume. Just a very large file of weightings between randomly found words on the Internet. Thanks a lot for that (← sarcasm). However, when the teams then added in some finetuning using some specific words (e.g. some coding language samples, or some “properly” written business emails, etc.), something eerily magical happened. It was so eerie that many thought the thing had come alive. The massive LLM parameter file went from being useless to almost all-knowing. Cutting to the chase, the patterns in those parameters held a near complete understanding of all the meaning in the English language. But this only became visible when directly applied to particular tasks with some small amount finetuned data.

This is a very important discovery as the effort to make sure all data is clean and curated is a major expense of time and money. Removing this need for these base LBM parameter files will make a massive difference.

So what should we do?

Simply put, we have done language and now we must do biological measurements. Any non-word data that measures in any way the human body. And now we have the path laid out so neatly in front of us. We know now that we don’t need specific and specially curated data. It doesn’t matter. We need massive amounts of diverse biological measurement data no matter where we can find it. We need to start creating these Large Biological Model parameter files and releasing them for everyone in the world to build upon. By training LBM parameter files, we will bring to life a fairly magically complete understanding of human biology. And, importantly, just like LLMs have become a massive accelerating force to the world in just the last 24 months alone, LBMs will likely do exactly the same in the health and science sectors.

Disease should beware – LBMs have arrived

Great! You first.

So I am not just coining a term and saying vaya con dios. We are far from the first, but we are putting our money where our mouth is on this. Humanity (the Longevity company I am co-founder of with Pete Ward) spent some time and resources developing an innovative blood model that allows every blood test taken on earth to now deliver a Biological Age to the patient/customer. We had the honor of having that model peer reviewed and published in a Nature science journal. We, however, did something not very status quo. We didn’t just publish our findings, we actually published all the parameters between all the blood markers measured on 300,000 people in the UK Biobank. Thus was born the seed of Humanity’s first LBM! We are now actively looking for collaborators and already have a few ready to go that will help us expand our training set and make our first LBM more and more robust.

You can also already upload any of your recent blood results and have them run through that published model and get a more accurate Biological Age in the Humanity iOS app if you upgrade to the Pro subscription level. Bringing scientific breakthroughs quickly to people is a core part of Humanity.

A call to action

So there is so so much more to say. For the sake of this article being readable in one sitting, I will end this one here and write about more and more aspects of this over the coming weeks. Thank you so much for those that have already shared their feedback.

My call to action is this. I know many of you have been working on parts of this for years (as part of my follow up posts, I want to highlight some of your great work, so please reach out). I know others of you reading this may not feel there is any way for you to directly contribute. I say to both groups of folks and everyone in between, this is all of our work now. Think about how you can contribute a small part of this. That might be intellectually with helping us think through better the nuances of this. That may be by opening up biological datasets that can be trained on to build up the LBM parameter file of either the Humanity LBM or others that will surely be released. That may even be you having experience training LLM parameter files and want to now apply those skills to training an LBM.

There are a myriad of areas here where I personally do not have the experience/skills to anticipate the pitfalls or propose the solutions. Case in point, there are obvious differences between words and biological measurement data, which will no doubt complicate or at least change some of the process of training the base LBM parameter file. What I do know, is that one or several of you dear readers will have those skills and other relevant skills and together we will solve each issue.

Please reach out to me if you would like to collaborate.

Next year it will be disease’s turn to beware the Ides of March

All our LBM are belong to you. I am done being wary of the Ides of March. I think it’s disease’s turn to worry.

Disease should beware – LBMs have arrived
Michael Geer, Humanity Health co-founder

About Michael Geer

MG is the Cofounder of Humanity, a platform that allows users to monitor how fast they are aging and show them how to slow it down. A serial entrepreneur, Michael has built networks that reached over a billion users including Badoo and AnchorFree.