The first GPT-4-class AI model anyone can download has arrived: Llama 405B

Wilshire@lemmy.world · 4 months ago

The first GPT-4-class AI model anyone can download has arrived: Llama 405B

modeler@lemmy.world · edit-2 4 months ago

Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.

Edit: you can try quantizing it. This reduces the amount of memory required per parameter to 4 bits, 2 bits or even 1 bit. As you reduce the size, the performance of the model can suffer. So in the extreme case you might be able to run this in under 64GB of graphics RAM.

TipRing@lemmy.world · 4 months ago

When the 8 bit quants hit, you could probably lease a 128GB system on runpod.

1984@lemmy.today · 4 months ago

Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?

Deceptichum@quokk.au · 4 months ago

Or you could run it via cpu and ram at a much slower rate.

chiisana@lemmy.chiisana.net · 4 months ago

Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.

errer@lemmy.world · 4 months ago

Yeah uh let me just put in my 512GB ram stick…

Deceptichum@quokk.au · 4 months ago

Samsung do make them.

Goodluck finding 512gb of VRAM.

Siegfried@lemmy.world · edit-2 4 months ago

At work we habe a small cluster totalling around 4TB of RAM

It has 4 cooling units, a m3 of PSUs and it must take something like 30 m2 of space

cheddar@programming.dev · 4 months ago

Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model.

obbeel@lemmy.eco.br · 4 months ago

According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.