This works in a 4GB card now: ``` python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20 ```