Evaluation results of llama2 with exetorch #3568

l2002924700 · 2024-05-10T05:57:58Z

hi, kindly helper,
I am a newer of exetorch and I want to test the llama2 model as descriped on the "https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md". the test steps of mine are as follows:

Download the llama2 model from hugfaceweb site.
install the exetorch as descriped in this github of "https://github.com/pytorch/executorch".
the I execute the test using the command :
"python -m examples.models.llama2.eval_llama --checkpoint /home/LLM-Models/Llama-2-7b/consolidated.00.pth --params /home/LLM-Models/Llama-2-7b/params.json -t /home/LLM-Models/Llama-2-7b/tokenizer.model --group_size 128 --quantization_mode int8 --max_seq_len 2048 --limit 1000".
Then I get the results as follows:
Tasks Version Filter n-shot Metric Value
wikitext 2 none 0 word_perplexity 16.77
none 0 byte_perplexity 1.717
none 0 bits_per_byte 0.780
Are the results nomal? I think the value is too high compared with the test results of "https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md"

mergennachin · 2024-05-10T15:07:14Z

I think you should try with -qmode 8da4w instead of int8

cc @digantdesai @kimishpatel

l2002924700 · 2024-05-11T08:56:02Z

thank you@mergennachin,
I have used -qmode 8da4w with command "python -m examples.models.llama2.eval_llama --checkpoint /home/LLM-Models/Llama-2-7b/consolidated.00.pth --params /home/LLM-Models/Llama-2-7b/params.json -t /home/LLM-Models/Llama-2-7b/tokenizer.model --group_size 256 --quantization_mode 8da4w --max_seq_len 2048 --limit 1000", However i get more high value :
word_perplexity 26.321708194170252
byte_perplexity 1.8721607782928849
bits_per_byte 0.9047043366764531.
what can I do next?
Thank you

kimishpatel · 2024-05-13T13:47:33Z

@Jack-Khuu can you reproduce these numbers on your end?

l2002924700 · 2024-05-14T11:42:56Z

about the dataset of wikitext, I downlaod from the url "https://huggingface.co/datasets/wikitext/tree/main/wikitext-2-raw-v1". is the dataset download same as you? thank you

Jack-Khuu · 2024-05-14T17:44:44Z

@l2002924700 Which commit hash are you using?

l2002924700 · 2024-05-15T00:38:07Z

@Jack-Khuu the commit hash is 8aaa8b27d493dba10b8553290236799e6dc57829

Jack-Khuu · 2024-05-17T22:10:07Z

@l2002924700 Can you try rerunning with -qmode 8da4w?

The numbers provided in the README were for groupwise 4b

Jack-Khuu · 2024-05-17T22:49:23Z

Context: I ran your command on main and got reasonable numbers

python -m examples.models.llama2.eval_llama --checkpoint /home/jackkhuu/llm_files/7b/consolidated.00.pth --params /home/jackkhuu/llm_files/7b/config.json -t /home/jackkhuu/llm_files/7b/tokenizer.model --group_size 128 --quantization_mode int8 --max_seq_len 2048 --limit 1000

wikitext: {'word_perplexity,none': 9.168552791655282, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.5134046290382204, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5977977630089415, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

And was able to repro our README with 8da4w

l2002924700 · 2024-05-18T02:13:00Z

Thank you @Jack-Khuu. Follow your suggestion, I rerun the test with -qmode 8da4w. The command is as follows:

``` python -m examples.models.llama2.eval_llama -c /home/LLM-Models/Llama-2-7b/consolidated.00.pth --params /home/LLM-Models/Llama-2-7b/params.json -t /home/LLM-Models/Llama-2-7b/tokenizer.model -qmode 8da4w -G 128 --max_seq_len 2048 --limit 1000"

I got the more weird results.
wikitext: {'word_perplexity,none': 25.471536960494742, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.8604115013524496, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.8956217639672349, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
Since we use the same program to evaluate the model, I think my weird results may be from dataset or LLM model.
So could you please share your download url of Dataset of wikitext and LLM model of Llama-2-7b.
thank you in anvance for your kind help

l2002924700 · 2024-05-18T07:57:16Z

I have tested the llama2 model with wikitext dataset using llama.cpp. The test results are similar with the github value.

Model	Mesure	F16	Q4_0	Q4_1
7B	perplexity	5.9066	6.1565	6.0912

my test results:

Model	Mesure	F16	Q4_0	Q4_1
7B	perplexity	5.7962	5.9625	6.0008

So I think my model and dataset seems like which the llama.cpp use. I don't know why the test results differences are so widely between ours.
thank you for your kindly answer in advanced.

Jack-Khuu · 2024-05-20T18:45:39Z

We're using the HF download of 7b: https://huggingface.co/meta-llama/Llama-2-7b/tree/main

Are you using the tokenizer/model/params from there?
If so can you share the hashes, just as a sanity check?

For evaluation we're using EleutherAI's lm_Eval so the dataset is abstract there

l2002924700 · 2024-05-21T03:29:55Z

hi, Jack-Khuu,
thank you vary much.
my download commit'hash value is 69656aac4cb47911a639f5890ff35b41ceb82e98. In your commad

python -m examples.models.llama2.eval_llama --checkpoint /home/jackkhuu/llm_files/7b/consolidated.00.pth --params /home/jackkhuu/llm_files/7b/config.json -t /home/jackkhuu/llm_files/7b/tokenizer.model --group_size 128 --quantization_mode int8 --max_seq_len 2048 --limit 1000

the params is "config.json", but in the HF:https://huggingface.co/meta-llama/Llama-2-7b/tree/main, there is no this file. I think you should modify the params.json file and rename it as config.json, am i right?
my params.json is as follows:
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-05, "vocab_size": 32000}.
Is my params.json file the same as your config.json?

thank you in advanced

Jack-Khuu · 2024-05-21T06:11:34Z

Ah I should've clarified, I meant the hashes of your model/params/tokenizer files.
That way we can verify which files are different.

As for the naming: my params just happens to be named config.json, the contents are the same

l2002924700 · 2024-05-21T09:17:58Z

I am sorry that I don't know hoe to get the hashes of my model/params/tokenizer files. Could you please teach me how to get the hashes of model/params/tokenizer files?
thank you in advanced

Jack-Khuu · 2024-05-21T16:44:26Z

You can call md5sum on the files

For examples md5sum tokenizer.model

l2002924700 · 2024-05-22T00:56:28Z

Thank you @Jack-Khuu .
I have get the hashes of my model/params/tokenizer files with md5sum , and the results are as follows:

md5sum tokenizer.model
eeec4125e9c7560836b4873b6f8e3025  tokenizer.model
md5sum params.json
faeb3d79269b5783e9a9a0e99956c018  params.json
md5sum consolidated.00.pth
daa8e3109935070df7fe8fc42d34525e  consolidated.00.pth

would you please help me check whether the hases of the files are the same as yours.
thank you in advanced

Jack-Khuu · 2024-05-22T18:35:51Z

Those are the exact same hashes that we're using...

Are you getting the same results on a clean conda instance and install?

If the files are the same then the only discrepancy I can think of would be either local changes or env differences

l2002924700 · 2024-05-23T01:33:10Z

ok, I will try it. Thank you

Jack-Khuu · 2024-05-29T23:45:31Z

Closing this issue for now, since we aren't able to reproduce your results

Thanks @l2002924700 for surfacing and feel free to spin up a new issue should anything else arise!

iseeyuan assigned Jack-Khuu May 13, 2024

iseeyuan added llm: evaluation Perplexity, accuracy triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 13, 2024

Jack-Khuu closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation results of llama2 with exetorch #3568

Evaluation results of llama2 with exetorch #3568

l2002924700 commented May 10, 2024 •

edited

mergennachin commented May 10, 2024

l2002924700 commented May 11, 2024 •

edited

kimishpatel commented May 13, 2024

l2002924700 commented May 14, 2024

Jack-Khuu commented May 14, 2024

l2002924700 commented May 15, 2024

Jack-Khuu commented May 17, 2024

Jack-Khuu commented May 17, 2024

l2002924700 commented May 18, 2024

l2002924700 commented May 18, 2024

Jack-Khuu commented May 20, 2024

l2002924700 commented May 21, 2024

Jack-Khuu commented May 21, 2024

l2002924700 commented May 21, 2024

Jack-Khuu commented May 21, 2024

l2002924700 commented May 22, 2024

Jack-Khuu commented May 22, 2024

l2002924700 commented May 23, 2024

Jack-Khuu commented May 29, 2024

Evaluation results of llama2 with exetorch #3568

Evaluation results of llama2 with exetorch #3568

Comments

l2002924700 commented May 10, 2024 • edited

mergennachin commented May 10, 2024

l2002924700 commented May 11, 2024 • edited

kimishpatel commented May 13, 2024

l2002924700 commented May 14, 2024

Jack-Khuu commented May 14, 2024

l2002924700 commented May 15, 2024

Jack-Khuu commented May 17, 2024

Jack-Khuu commented May 17, 2024

l2002924700 commented May 18, 2024

l2002924700 commented May 18, 2024

Jack-Khuu commented May 20, 2024

l2002924700 commented May 21, 2024

Jack-Khuu commented May 21, 2024

l2002924700 commented May 21, 2024

Jack-Khuu commented May 21, 2024

l2002924700 commented May 22, 2024

Jack-Khuu commented May 22, 2024

l2002924700 commented May 23, 2024

Jack-Khuu commented May 29, 2024

l2002924700 commented May 10, 2024 •

edited

l2002924700 commented May 11, 2024 •

edited