Installation of llmc¶
git clone https://github.com/anonymous-emnlp123/llmc.git
pip install -r requirements.txt
llmc does not need to be installed. To use llmc you only need to add this to the script.
PYTHONPATH=[llmc's save path]:$PYTHONPATH
Prepare the model¶
Currently, llmc only supports models in the Hugging Face format. In the case of Qwen2-0.5B, the model can be found here.
A simple download example can be used:
pip install -U hf-transfer
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download --resume-download Qwen/Qwen2-0.5B --local-dir Qwen2-0.5B
Download the datasets¶
The datasets required by llmc can be divided into calibration datasets and eval datasets. The calibration dataset can be downloaded here, and the eval dataset can be downloaded here.
Of course, llmc also supports online download of datasets, as long as the download in the config is set to True.
Set Configs¶
In the case of smoothquant, the config is here.
base:
seed: &seed 42
model:
type: Qwen2 # Set the model name, which can support Llama, Qwen2, Llava, Gemma2 and other models.
path: # Set model weight path.
torch_dtype: auto
calib:
name: pileval
download: False
path: # Set calibration dataset path.
n_samples: 512
bs: 1
seq_len: 512
preproc: pileval_smooth
seed: *seed
eval:
eval_pos: [pretrain, transformed, fake_quant]
name: wikitext2
download: False
path: # Set eval dataset path.
bs: 1
seq_len: 2048
quant:
method: SmoothQuant
weight:
bit: 8
symmetric: True
granularity: per_channel
act:
bit: 8
symmetric: True
granularity: per_token
save:
save_trans: True # Set to True to save the adjusted weights.
save_path: ./save
Start to run¶
Once you are prepared above, you can run the following commands
PYTHONPATH=[llmc's save path]:$PYTHONPATH \
python -m llmc \
--config ../configs/quantization/methods/SmoothQuant/smoothquant_w_a.yml \
LLMC provides many algorithm configuration files in the configs/quantization/methods directory for reference.
#!/bin/bash
gpu_id=0 # Set the GPU id used.
export CUDA_VISIBLE_DEVICES=$gpu_id
llmc= # Set the save path of llmc.
export PYTHONPATH=$llmc:$PYTHONPATH
task_name=smoothquant_llama_w8a8_fakequant_eval # Set task_name, the file name used to save the log.
# Select a config to run.
nohup \
python -m llmc \
--config ../configs/quantization/methods/SmoothQuant/smoothquant_w_a.yml \
> ${task_name}.log 2>&1 &
echo $! > ${task_name}.pid
FAQ¶
Q1
ValueError: Tokenizer class xxx does not exist or is not currently imported.
Solution
pip install transformers –upgrade
Q2
If you are running a large model and a single gpu card cannot store the entire model, then the gpu memory will be out during eval.
Solution
Use per block for inference, turn on inference_per_block, and increase bs appropriately to improve inference speed without exploding the gpu memory.
bs: 10
inference_per_block: True
Q3
Exception: ./save/transformed_model existed before. Need check.
Solution
The saving path is an existing directory and needs to be changed to a non-existing saving directory.