Model Sparsification¶
The llmc is currently gradually supporting sparse methods, having already implemented Magnitude, Wanda, and ShortGPT, and will support more algorithms in the future.
Here is a sample of Wanda’s settings:
base:
seed: &seed 42
model:
type: Qwen2 # Set the model name, which can support Llama, Qwen2, Llava, Gemma2 and other models.
path: # Set model weight path.
torch_dtype: auto
calib:
name: pileval
download: False
path: # Set calibration dataset path.
n_samples: 512
bs: 1
seq_len: 512
preproc: pileval_smooth
seed: *seed
eval:
eval_pos: [pretrain, transformed] # In the process of unstructured sparsification, the corresponding position weight is reset to 0 directly, and the sparse model can be obtained directly after transformed, without additional deployment stage
name: wikitext2
download: False
path: # Set eval dataset path.
bs: 1
seq_len: 2048
sparse:
method: Wanda
weight:
sparsity: 0.5 # Set model sparsity
sparsity_out: False # Set whether use the output of the sparse layer as the input of the next layer.
save:
save_trans: True # Set to True to save the adjusted weights.
save_path: ./save
Here are some of the results of using Wanda:
| Model | PPL | |||||||
| dense | 0.25 | 0.5 | 0.75 | |||||
| c4 | wikitext2 | c4 | wikitext2 | c4 | wikitext2 | c4 | wikitext2 | |
| LLaMa2-7B | 7.26 | 5.47 | 7.46 | 5.61 | 9.25 | 6.85 | 260.42 | 259.91 |
| LLaMa2-70B | 5.71 | 3.32 | 5.76 | 3.4 | 6.49 | 4.17 | 32.5 | 21.66 |
| LLaMa3-8B | 9.44 | 6.13 | 10.01 | 6.47 | 15.07 | 9.68 | 336.62 | 290.38 |
| LLaMa3-70B | 7.16 | 2.85 | 7.44 | 3.22 | 9.96 | 5.81 | 93.99 | 74.78 |
The results compared to origin Wanda repository are shown below. In this experimental setup, the hyperparameters, calibration data sets, and data preprocessing and evaluation methods used are aligned with Wanda.
| Model | Wanda | LLMC |
| LLaMa2-7b | 6.91 | 6.91 |
| LLaMa2-70b | 4.22 | 4.19 |
| LLaMa3-8b | 9.56 | 9.58 |
| LLaMa3-70b | OOM | 5.75 |