logo
Menu
Fine-tuning Phi-3 using PEFT

Fine-tuning Phi-3 using PEFT

PEFT, merge, ship

Randy D
Amazon Employee
Published May 3, 2024
In a previous article I wrote about how to deploy a small language model (SLM) to an AWS IoT Greengrass core device. The model I used was Microsoft's Phi-3 in ONNX format. In this brief follow-up I'm going to look at how to quickly fine-tune this model and ship the new version to a Greengrass core device.
It's not yet possible to fine-tune using PEFT once the model is already in ONNX format. I instead ran the example fine-tuning script on a g5.48xlarge EC2 instance running the Deep Learning AMI. I did make a couple of quick adjustments, namely using Deepspeed zero-2 instead of zero-3, and disabling the saving of safe tensors.
The output of that script is a fine-tuned PEFT adapter layer. At that point we have a choice of either shipping the layer to the device or building a new ONNX variant to send to the device. While shipping the PEFT layer would require less data transfer, it would require that we have the non-ONNX version on the device in order to merge in the adapter layers and then produce a new ONNX artifact. I don't think that's always easy or feasible, so I took the second route, building a new ONNX artifact in the cloud and then using the technique from the first article to ship it to the device.
The steps to try out the PEFT model and then output a merged model are below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from peft import AutoPeftModel, AutoPeftModelForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

model = AutoPeftModelForCausalLM.from_pretrained("checkpoint_dir",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True)

inputs = tokenizer("<|user|>Tell me a fact about Venus<|end|><|assistant|>", return_tensors="pt")

outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=200)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

merged_model = model.merge_and_unload()
merged_model.save_pretrained("./checkpoint_merged")
Then I used this command to produce the new ONNX artifact:
1
python -m onnxruntime_genai.models.builder -i ./checkpoint_merged -o ./phi3-int4-cpu -p int4 -e cpu
And that's it - again, at this point you can use the technique from the previous article to ship the fine-tuned ONNX artifact to the Greengrass core device.
 

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Comments