Man

Development & AI | Alper Akgun

Trying Mistral 7B in python and kodlokal

October, 2023

Mistral-7B is a small, yet powerful model for hacking and playing with. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence :>

Let's start with the python code. Make sure you set you have a GPU or in Google colab you set your runtime type as T4 GPU

We start by installing several libraries:

  • - accelerate: A library from HuggingFace that aids in utilizing hardware accelerators like GPUs and TPUs more efficiently.
  • - bitsandbytes: Provides fast gradient compression, beneficial for accelerated training, particularly in distributed scenarios.
  • - sentencepiece: A library for Neural Network-based text processing, often used in tokenization processes for language models.


!pip install git+https://github.com/huggingface/transformers
!pip install -q peft  accelerate bitsandbytes safetensors
!pip install sentencepiece

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

model_name = "filipealmeida/Mistral-7B-Instruct-v0.1-sharded"

# the bitsandbytes quantization settings
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1
stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

text = "[INST] How many neurons does average human cerebrum, cerebellum and major structures in human brain have? [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
            

In the second part of this I have tried a fine tuned version of mistral by using kodlokal set up using https://github.com/kodlokal/kodlokal


cd kodlokal/models
wget https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q4_0.gguf

# change your config.py to the new text model

# Use in emacs https://github.com/kodlokal/kodlokal.el
# Use the following prompt in your emacs setup using

<|im_start|>system
Give a concise answer.<|im_end|>
<|im_start|>user
Create a flask endpoint to upload a file to aws s3.<|im_end|>
<|im_start|>assistant