Files
text-generation-webui/modules/GPTQ_loader.py
oobabooga 7618f3fe8c Add -gptq-preload for 4-bit offloading (#460)
This works in a 4GB card now:

```
python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20
```
2023-03-20 16:30:56 -03:00

3.3 KiB