neural mutliplexer

emerging ai platform paradigms

there have been a lot of ai wearables in the news over the last 6-12 months. the two big, expensive, hyped launches were rabbit and humane. both flopped quite rather spectacularly. rabbit introduced something called a "large action model at the operating system level," but it turned out to be too slow. the humane founder was completely unable to show why his product was interesting. however, the most controversial was released recently, called friend.com. the device is cheap ($99) and is meant to be an ai friend companion that listens to everything thats happening, responding to you and sending texts throughout the day. the founder is a prolific poaster, and spent a lot on the domain name, which was particularly controversial.

this launch was the most interesting to me because the thesis behind the product is pretty unique. i don't really know if the product is a good idea, but it's much more affordable and lower commitment, and doesn't pretend to be a fancy new piece of hardware: front and center in the product is a foundation model running in the cloud, not a "smart" device where the computational power is on the device. basically, i think "cheap hardware, expensive model" is a more unique take on ai wearables that i have seen.

the product made me think about how the ai application platform is being built. there's an interesting dynamic playing out; on one hand, you have these large powerful models that can only run in the cloud, and then on the other hand you have tiny devices with way less computing power that can only run smaller models localized to the user. it is a dichotomy between local inference and cloud inference: smart devices + dumb models, or dumb devices + smart models. it's a big reason why all the model providers are launching mini versions of their models.

nvidia is powering the cloud inference paradigm. apple and google have been building their own chips for local inference. in a way, meta is betting on local inference by releasing open source models that can run anywhere even though they are running the models in their own cloud. the model strategy and approach to cloud vs local inference dramatically changes the distribution mechanism- hence why openai partnered with first microsoft, and now apple. of course, apple is still betting on local inference, as a lot of their ai functionality is powered by their own models.

so far there has been a lot of discussion on ai infrastructure and who the winners will be, but the winners in infrastructure will be determined at the application layer. the infrastructure platforms that can meet the needs of the applications that people actually use will end up being the winners, and i think there are two broad categories: consumer and enterprise. for consumer, i expect local inference to win because consumers are much more performance sensitive; for enterprise, i expect cloud inference to win because correctness matters more in an enterprise context.

cloud inference is definitely more mature, but i think that is going to change as local and edge-based inference becomes more important as people want to deliver high quality experiences to their consumers. there's a lot of opportunity in tooling for the ai application layer, and i haven't quite worked with any tool that feels natural; a lot of them still feel like overly complex abstractions over relatively simple functionality. models running in the cloud will need to interact with models running on a device, especially when the on device model needs "more intelligence."

one of the biggest gaps that i think we will need in the future is a framework for different ai applications to talk to each other; for example, i have my personal assistant ai agent, and delta has their travel agent that helps book and plan trips. my personal assistant agent will need to talk and plan with the travel agent. a naive approach would be for these agents to just communicate with each other in english, but that assumes that the two models interoperate with each other well. the context window of the conversation being passed between models will become increasingly large, and that often causes LLM inference to degrade. in reality, you probably want some sort of mediator between the agents so they can work together that can ensure each agent can do its job properly. that is a pretty challenging problem to solve, especially with a reusable framework or sdk, but the solution is probably worth a lot of money!