Journey Series for Generative AI Application Architecture – Fine-tune SLM with Microsoft Olive

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In the era of Artificial Intelligence 2.0, every company will build models in its own industry around AI. This process involves different knowledge areas, and Azure OpenAI Service brings very powerful model choices to enterprises, such as gpt-3.5-turbo-16k, gpt-4-turbo, gpt-4-turbo-32k, gpt-4-vision wait. However, for some special industries and traditional enterprises, they prefer to train their own industry models based on their own underlying structures or applications combined with their own data. The usual way is to use different third-party SLMs models on Hugging Face for fine-tuning. Although Azure AI Studio/Azure Machine Learning Service both provide cloud solutions for data, models, fine-tuning, optimization, computing and reference deployment, enterprises prefer to complete this work through a combination of local and cloud methods. Through Microsoft Olive, enterprises can simply configure to meet the needs of mixed scenarios, complete fine-tuning and model reference work.

What's SLM Small Language Model

SLM Small Language Model is essentially a scaled down version of LLM Large Language Model. Compared to LLMs, which have hundreds of billions or even trillions, they have far fewer parameters, typically in the range of millions to billions.

SLM vs LLM

Efficiency: SLM requires less computing power and memory and can be deployed on smaller computing power and edge devices. It is very important for enterprise environments with limited computing power. Especially in the era of AI PC, local deployment of SLM will have greater significance.

Accessibility: SLM is easier to use by more developers and enterprises due to lower resource requirements. Allow smaller teams and researchers to optimize locally.

Customization: SLM is easier to fine-tune for specific domains and tasks. This enables the creation of specialized models suitable for niche applications, resulting in higher performance and accuracy

There are many good SLMs on Hugging Face, such as Microsoft Phi-2, Meta Llama-2, Mistral 7B, Google Gemma, etc. These SLMs can be used for fine-tuning.

Use Microsoft Olive to fine-tune your SLMs

Microsoft Olive is a very easy-to-use open source model optimization tool that can cover both fine-tuning and reference in the field of generative artificial intelligence. It only requires simple configuration, combined with the use of open source small language models and related runtime environments (AzureML / local GPU, CPU, DirectML), you can complete the fine-tuning or reference of the model through automatic optimization, and find the best model to deploy to the cloud Or on edge devices. Allow enterprises to build their own industry vertical models on-premises and in the cloud.

Setup Microsoft Olive

Microsoft Olive installation is very simple, and can also be installed for CPU, GPU ,DirectML, and Azure ML


pip install olive-ai

If you wish to run an ONNX model with a CPU, you can use


pip install olive-ai[cpu]

If you want to run an ONNX model with a GPU, you can use


pip install olive-ai[gpu]

If you want to use Azure ML, use


pip install git+https://github.com/microsoft/Olive#egg=olive-ai[azureml]

Notice

The latest version of Microsoft Olive 0.5.0
Install using x86 environment (WSL is recommended)

Microsoft Olive's Config.json

After installation, you can configure different model-specific settings through the Config file, including data, computing, training, deployment, and model generation.

1. Data

On Microsoft Olive, training on local data and cloud data can be supported, and can be configured in the settings.

Local data settings

You can simply set up the data set that needs to be trained for fine-tuning, usually in json format, and adapt it with the data template. This needs to be adjusted based on the requirements of the model (for example, adapt it to the format required by Microsoft Phi-2. If you have other models, please Refer to the required fine-tuning formats of other models for processing)



    "data_configs": {
        "dataset-default_train": {
            "name": "dataset-default",
            "type": "HuggingfaceContainer",
            "params_config": {
                "data_name": "json", 
                "data_files":"dataset/dataset-classification.json",
                "split": "train",
                "component_kwargs": {
                    "pre_process_data": {
                        "dataset_type": "corpus",
                        "text_cols": [
                            "phrase",
                            "tone"
                        ],
                        "text_template": "### Text: {phrase}\n### The tone is:\n{tone}",
                        "corpus_strategy": "join",
                        "source_max_len": 1024,
                        "pad_to_max_len": false,
                        "use_attention_mask": false
                    }
                }
            }
        }
    },

Cloud data source settings

By linking the datastore of Azure AI Studio/Azure Machine Learning Service to link the data in the cloud, you can choose to introduce different data sources to Azure AI Studio/Azure Machine Learning Service through Microsoft Fabric and Azure Data as a support for fine-tuning the data.


    "data_configs": {
        "dataset_default_train": {
            "name": "dataset_default",
            "type": "HuggingfaceContainer",
            "params_config": {
                "data_name": "json", 
                "data_files": {
                    "type": "azureml_datastore",
                    "config": {
                        "azureml_client": {
                            "subscription_id": "Your Azure Subscription ID",
                            "resource_group": "Your Azure AI Studio / Azure Machine Learning Service Resource Group",
                            "workspace_name": "Your Azure AI Studio / Azure Machine Learning Service Name"
                        },
                        "datastore_name": "Your Azure Datastore Name",
                        "relative_path": "Your Azure Datastore File such as JSON"
                    }
                },
                "split": "train",
                "component_kwargs": {
                    "pre_process_data": {
                        "dataset_type": "corpus",
                        "text_cols": [
                            "phrase",
                            "tone"
                        ],
                        "text_template": "### Text: {phrase}\n### The tone is:\n{tone}",
                        "corpus_strategy": "join",
                        "source_max_len": 1024,
                        "pad_to_max_len": false,
                        "use_attention_mask": false
                    }
                }
            }
        }
    },

2. Computing configuration

If you need to be local, you can directly use local data resources. You need to use the resources of Azure AI Studio / Azure Machine Learning Service. You need to configure the relevant Azure parameters, computing power name, etc.



    "systems": {
        "aml": {
            "type": "AzureML",
            "config": {
                "accelerators": ["gpu"],
                "hf_token": true,
                "aml_compute": "Your Azure AI Studio / Azure Machine Learning Service Compute Name",
                "aml_docker_config": {
                    "base_image": "Your Azure AI Studio / Azure Machine Learning Service docker",
                    "conda_file_path": "conda.yaml"
                }
            }
        },
        "azure_arc": {
            "type": "AzureML",
            "config": {
                "accelerators": ["gpu"],
                "aml_compute": "Your Azure AI Studio / Azure Machine Learning Service Compute Name",
                "aml_docker_config": {
                    "base_image": "Your Azure AI Studio / Azure Machine Learning Service docker",
                    "conda_file_path": "conda.yaml"
                }
            }
        }
    },

Notice

Because it is run through a container on Azure AI Studio/Azure Machine Learning Service, the required environment needs to be configured. This is configured in the conda.yaml environment.


name: project_environment
channels:
  - defaults
dependencies:
  - python=3.8.13
  - pip=22.3.1
  - pip:
      - einops
      - accelerate
      - azure-keyvault-secrets
      - azure-identity
      - bitsandbytes
      - datasets
      - huggingface_hub
      - peft
      - scipy
      - sentencepiece
      - torch>=2.2.0
      - transformers
      - git+https://github.com/microsoft/Olive@jiapli/mlflow_loading_fix#egg=olive-ai[gpu]

3. Chooese your SLM

You can use the model directly from Hugging face, or you can directly combine it with the Model Catalog of Azure AI Studio / Azure Machine Learning to select the model to use. Here I take Microsoft Phi-2 as an example.

If you have the model locally, you can use this method



    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "hf_config": {
                "model_name": "model-cache/microsoft/phi-2",
                "task": "text-generation",
                "model_loading_args": {
                    "trust_remote_code": true
                }
            }
        }
    },

If you want to use a model from Azure AI Studio / Azure Machine Learning Service, you can use this method



    "input_model":{
        "type": "PyTorchModel",
        "config": {
            "model_script": "qlora_user_script.py",
            "model_path": {
                "type": "azureml_registry_model",
                "config": {
                    "name": "microsoft-phi-2",
                    "registry_name": "azureml-msr",
                    "version": "11"
                }
            },
             "model_file_format": "PyTorch.MLflow",
             "hf_config": {
                "model_name": "microsoft/phi-2",
                "task": "text-generation",
                "from_pretrained_args": {
                    "trust_remote_code": true
                }
            }
        }
    },

Notice:

We need to integrate with Azure AI Studio / Azure Machine Learning Service, so when setting up the model, please refer to the version number and related naming.
All models on Azure need to be set to PyTorch.MLflow
You need to have a Hugging face account and bind the key to the Key value of Azure AI Studio / Azure Machine Learning

4. Algorithm

Microsoft Olive encapsulates Lora and QLora fine-tuning algorithms very well. All you need to configure are some relevant parameters. Here I take QLora as an example.


        "qlora": {
            "type": "QLoRA",
            "config": {
                "compute_dtype": "bfloat16",
                "quant_type": "nf4",
                "double_quant": true,
                "lora_r": 64,
                "lora_alpha": 64,
                "lora_dropout": 0.1,
                "train_data_config": "dataset_default_train",
                "eval_dataset_size": 0.3,
                "training_args": {
                    "seed": 0,
                    "data_seed": 42,
                    "per_device_train_batch_size": 1,
                    "per_device_eval_batch_size": 1,
                    "gradient_accumulation_steps": 4,
                    "gradient_checkpointing": false,
                    "learning_rate": 0.0001,
                    "num_train_epochs":3,
                    "max_steps": 1200,
                    "logging_steps": 10,
                    "evaluation_strategy": "steps",
                    "eval_steps": 187,
                    "group_by_length": true,
                    "adam_beta2": 0.999,
                    "max_grad_norm": 0.3
                }
            }
        },

5. Format conversion

Microsoft Olive supports model format conversion, especially for ONNX. You can configure the model conversion format. If not configured, it will be exported in the original format.



        "convert": {
            "type": "OnnxConversion",
            "config": {
                "use_dynamo_exporter": true,
                "target_opset": 18,
                "save_as_external_data": true,
                "all_tensors_to_one_file": true
            }
        },
        "transformers_optimization": {
            "type": "OrtTransformersOptimization",
            "config": {
                "model_type": "phi",
                "use_gpu": false,
                "keep_io_types": false,
                "num_heads": 32,
                "hidden_size": 2560,
                "opt_level": 0,
                "optimization_options": {
                    "attention_op_type": "MultiHeadAttention"
                },
                "save_as_external_data": true,
                "all_tensors_to_one_file": true
            }
        }

It should be pointed out here that you can set the above steps according to your own needs. It is not necessary to completely configure the above five steps. Depending on your needs, you can directly use the steps of the algorithm without fine-tuning. Finally you need to configure the relevant engines



    "engine": {
        "log_severity_level": 0,
        "host": "aml",
        "target": "aml",
        "search_strategy": false,
        "execution_providers": ["CUDAExecutionProvider"],
        "cache_dir": "../model-cache/models/phi2-finetuned/cache",
        "output_dir" : "../model-cache/models/phi2-finetuned"
    }

Run your Microsoft Olive script

On the command line, execute in the directory of olive-config.json


python -m olive.workflows.run --config olive-config.json

Thinking about Enterprise AI 2.0 Architecture

With Microsoft Olive, you can better build your own AI 2.0 solutions based on your enterprise, especially in terms of fine-tuning and model reference.

We can easily use Microsoft Olive to configure the fine-tuning section, which is very helpful to many teams who want to manage SLM model pipline. If we look at LLMsOps alone, Microsoft Olive is a very important part. (This content only focuses on fine-tuning, and the conversion and evaluation of model formats will be introduced in subsequent content). Some people may think that the effect of fine-tuning is not obvious, depending on your data quality and application scenarios. Fine-tuning SLM needs to be more targeted, and it can be easier to use in combination with LLM in the industry, such as content filtering, industry domain knowledge, etc. Fine-tuned SLM is more of a supplement than a replacement for LLM.

Summary

Through the content of this blog, we learned how to use Microsoft Olive to fine-tune SLM using a combination of cloud and local. Enterprises can allocate resources more effectively to build their own vertical industry models. In the next blog, I will continue to tell you how to use Microsoft Olive to convert the model format for deployment, and how to verify the effectiveness of fine-tuning SLM Small Language Model.

Resources

Learn about Microsoft Olive https://microsoft.github.io/Olive/
Learn about Microsoft Phi-2 https://www.microsoft.com/research/blog/phi-2-the-surprising-power-of-small-language-models/
Learn about Azure AI Studio https://learn.microsoft.com/azure/ai-studio/what-is-ai-studio?tabs=home
Learn about the Model Catalog on Azure AI Studio https://learn.microsoft.comazure/ai-studio/how-to/model-catalog
Fine-tuning related introduction https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples