Intresting. But I’m curious about the performance.
A bigger LLM (mixtral) already struggles to run on my mid-range gaming PC. Trying to run an LLM that isn’t terrible on a standard laptop wouldn’t be a good experience.
I have no idea how this is set up to work technically, but most of the heavy lifting is gonna be on the GPU. I’m not sure that it matters much whether the browser is what’s pushing data to the GPU or some other package.
The thought that internet becomes shitty enough that you need a GPU to browse it is really frightening me. If we really reach that point that may be to run an AI which filters out AI generated spam which would really depress me 😭
The thought that internet becomes shitty enough that you need a GPU to browse it is really frightening me.
I mean, there was a point where an FPU was a separate chip and wasn’t the norm; now it’s built into the CPU.
I think that it’s probably safe to say that, in the future, there will be broader use of parallel processing, as we’ve fundamental limits on what we know we can do there with existing laws of physics with serial processing. That could wind up being part of the CPU. It could live on a separate piece of hardware – which may not necessarily be a “GPU” – parallel processing hardware entered the PC because the most-immediate need was to do 3d graphics rendering, but as you can see from the LLMs that people are running on GPUs today, that’s not the only application. The parallel compute accelerator cards that Nvidia is selling today for an arm and a leg on servers aren’t aimed at doing 3d graphics.
It may not be 3d graphics rendering or running LLMs that becomes the primary application. But I’d be reasonably comfortable saying that down the line, relative to today, there will be more parallel-processing hardware in computers than is present today.
Most people probably don’t have a dedicated GPU and an iGPU is probably not powerfull enough to run an LLM at decent speed. Also a decent model requires like 20GB of RAM which most people don’t have.
Not exactly. Most integrated chips have a small pool of dedicated VRAM, and then a bit more that they share with the system memory, though it’s generally only a portion, not all of it. It’s only Apple’s unified memory, and maybe other mobile chips that has them both share memory pool entirely, for better or worse, as far as I’m aware.
But it is worth noting that if you don’t have enough VRAM and have to put it into RAM, the minimum expectation is that you have twice the amount of RAM space. So if you have a GPU with 4GB of VRAM, and need to offload the extra to the system, you don’t need 16 GB, you need 32 GB.
Unlikely, at least on non-nvidia chips, and even on AMD, it’s only the latest four chips that support it. Anything older isn’t going to cut it.
You also need a fairly big amount of VRAM for models like that. (4 GB is the minimum for the common kinds, which is more than typical integrated systems, or 8 GB of system memory). You can get by with system RAM, but the performance will be quite bad, since you’re either relying on the CPU, or you’ll be adding the latency from data moving between them.
Intresting. But I’m curious about the performance.
A bigger LLM (mixtral) already struggles to run on my mid-range gaming PC. Trying to run an LLM that isn’t terrible on a standard laptop wouldn’t be a good experience.
I have no idea how this is set up to work technically, but most of the heavy lifting is gonna be on the GPU. I’m not sure that it matters much whether the browser is what’s pushing data to the GPU or some other package.
The thought that internet becomes shitty enough that you need a GPU to browse it is really frightening me. If we really reach that point that may be to run an AI which filters out AI generated spam which would really depress me 😭
I mean, there was a point where an FPU was a separate chip and wasn’t the norm; now it’s built into the CPU.
I think that it’s probably safe to say that, in the future, there will be broader use of parallel processing, as we’ve fundamental limits on what we know we can do there with existing laws of physics with serial processing. That could wind up being part of the CPU. It could live on a separate piece of hardware – which may not necessarily be a “GPU” – parallel processing hardware entered the PC because the most-immediate need was to do 3d graphics rendering, but as you can see from the LLMs that people are running on GPUs today, that’s not the only application. The parallel compute accelerator cards that Nvidia is selling today for an arm and a leg on servers aren’t aimed at doing 3d graphics.
It may not be 3d graphics rendering or running LLMs that becomes the primary application. But I’d be reasonably comfortable saying that down the line, relative to today, there will be more parallel-processing hardware in computers than is present today.
Most people probably don’t have a dedicated GPU and an iGPU is probably not powerfull enough to run an LLM at decent speed. Also a decent model requires like 20GB of RAM which most people don’t have.
It doesn’t just require 20GB of RAM, it requires that in VRAM. Which is a much higher barrier to entry.
But what if you have an AMD APU. Doesn’t that use your normal RAM as VRAM?
Not exactly. Most integrated chips have a small pool of dedicated VRAM, and then a bit more that they share with the system memory, though it’s generally only a portion, not all of it. It’s only Apple’s unified memory, and maybe other mobile chips that has them both share memory pool entirely, for better or worse, as far as I’m aware.
But it is worth noting that if you don’t have enough VRAM and have to put it into RAM, the minimum expectation is that you have twice the amount of RAM space. So if you have a GPU with 4GB of VRAM, and need to offload the extra to the system, you don’t need 16 GB, you need 32 GB.
Unlikely, at least on non-nvidia chips, and even on AMD, it’s only the latest four chips that support it. Anything older isn’t going to cut it.
You also need a fairly big amount of VRAM for models like that. (4 GB is the minimum for the common kinds, which is more than typical integrated systems, or 8 GB of system memory). You can get by with system RAM, but the performance will be quite bad, since you’re either relying on the CPU, or you’ll be adding the latency from data moving between them.