This works in a 4GB card now: ``` python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20 ```
3.3 KiB
3.3 KiB
This works in a 4GB card now: ``` python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20 ```