Soumya Ravi
Jan 24, 2025

--

There is some mismatch . See the model towards the end.

...

(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)

(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)

)

)

(norm): LlamaRMSNorm((4096,), eps=1e-05)

(rotary_emb): LlamaRotaryEmbedding()

)

(lm_head): Linear(in_features=4096, out_features=128256, bias=False)

)...

Theere are 3 RMSnorm and a rotary embedding but they are not there in the picture or python code.

--

--

Soumya Ravi
Soumya Ravi

Written by Soumya Ravi

Digital Marketing (Lead Gen), and Marketing Ops are my work & my passion. Steward of @Clifford_Library. Binge watching, and reading are how I relax.

No responses yet