Model Sparsification¶

The llmc is currently gradually supporting sparse methods, having already implemented Magnitude, Wanda, and ShortGPT, and will support more algorithms in the future.

Here is a sample of Wanda’s settings:

base:
    seed: &seed 42
model:
    type: Qwen2 #  Set the model name, which can support Llama, Qwen2, Llava, Gemma2 and other models.
    path: # Set model weight path.
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: # Set calibration dataset path.
    n_samples: 512
    bs: 1
    seq_len: 512
    preproc: pileval_smooth
    seed: *seed
eval:
    eval_pos: [pretrain, transformed] # In the process of unstructured sparsification, the corresponding position weight is reset to 0 directly, and the sparse model can be obtained directly after transformed, without additional deployment stage
    name: wikitext2
    download: False
    path: # Set eval dataset path.
    bs: 1
    seq_len: 2048
sparse:
    method: Wanda
    weight:
        sparsity: 0.5 # Set model sparsity
    sparsity_out: False # Set whether use the output of the sparse layer as the input of the next layer.
save:
    save_trans: True # Set to True to save the adjusted weights.
    save_path: ./save

Here are some of the results of using Wanda:

Model	PPL
	dense		0.25		0.5		0.75
	c4	wikitext2	c4	wikitext2	c4	wikitext2	c4	wikitext2
LLaMa2-7B	7.26	5.47	7.46	5.61	9.25	6.85	260.42	259.91
LLaMa2-70B	5.71	3.32	5.76	3.4	6.49	4.17	32.5	21.66
LLaMa3-8B	9.44	6.13	10.01	6.47	15.07	9.68	336.62	290.38
LLaMa3-70B	7.16	2.85	7.44	3.22	9.96	5.81	93.99	74.78

The results compared to origin Wanda repository are shown below. In this experimental setup, the hyperparameters, calibration data sets, and data preprocessing and evaluation methods used are aligned with Wanda.

www.lingdaima.com（零代码excel转HTML）

Model	Wanda	LLMC
LLaMa2-7b	6.91	6.91
LLaMa2-70b	4.22	4.19
LLaMa3-8b	9.56	9.58
LLaMa3-70b	OOM	5.75

Model Sparsification¶

Docs