Tried to install Salesforce/codegen25-7b-multi_P on my Macbook with Huggingface transformers 4.45, which failed with the following error:
.env/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1590, in __init__
raise AttributeError(f"{key} conflicts with the method {key} in {self.__class__.__name__}")
AttributeError: add_special_tokens conflicts with the method add_special_tokens in CodeGen25Tokenizer
Going back a few transformer versions gives this error:
codegen25-7b-multi/0bdf3f45a09e4f53b333393205db1388634a0e2e/tokenization_codegen25.py", line 149, in vocab_size
return self.encoder.n_vocab
^^^^^^^^^^^^
AttributeError: 'CodeGen25Tokenizer' object has no attribute 'encoder'. Did you mean: 'encode'?
After zipping through older transformer versions I found a note in their release saying it requires transformers 4.29.2. That version didn’t want to compile on my currnet Mac setup anymore because of Rust, with this error:
error: could not compile `tokenizers` (lib) due to 1 previous error; 3 warnings emitted
Caused by:
process didn't exit successfully: `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib
...
error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib -- -C 'link-args=-undefined dynamic_lookup -Wl,-install_name,@rpath/tokenizers.cpython-312-darwin.so'` failed with code 101
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
The Solution
Here is a solution that worked for me:
RUSTFLAGS="-A invalid_reference_casting" pip install transformers==4.33.2
Transformers 4.29.2 works as well. Then install torch.
And here everything together:
virtualenv .env
source .env/bin/activate
RUSTFLAGS="-A invalid_reference_casting" HF_HOME=.cache pip install tiktoken==0.4.0 torch transformers==4.33.2
python test.py
where test.py would be the following:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen25-7b-instruct")
text = "def hello_world():"
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))