This post has been republished via RSS; it originally appeared at: Microsoft Research.
Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.
In this episode, Dr. Sheng Zhang, a Senior Researcher at Microsoft Research, joins host Dr. Gretchen Huizinga to discuss “UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.” In this paper, Zhang and his coauthors present mission-focused instruction tuning, a method for distilling large language models into smaller, more efficient ones for a broad application class. Their UniversalNER models achieved state-of-the-art performance in named entity recognition, an important natural language processing (NLP) task. Model distillation has the potential to make NLP and other capabilities more accessible, particularly in specialized domains such as biomedicine, which could benefit from more resource-efficient and transparent options.
GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract!—of their new and noteworthy papers. Today, I’m talking to Dr. Sheng Zhang, a Senior Researcher at Microsoft Research. Dr. Zhang is coauthor of a paper called “UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition,” and you can read this paper now on arXiv. Sheng Zhang, thanks for joining us on Abstracts!
SHENG ZHANG: Thanks for having me.
HUIZINGA: So in a few sentences, give us a brief introduction or overview of the issue or problem that your research addresses and why we should care about it.
ZHANG: Sure. Well, our research addresses the challenge of efficiently replicating the capabilities of large language models for targeted applications. Particularly, we focus on named entity recognition, or NER, and people should care because this work aims to create more cost-effective and transparent models that can recognize a wide range of entity types across various domains, which is crucial for knowledge extraction and has numerical practical applications.
HUIZINGA: So how does your approach, your particular approach, build on or differ from what’s been done previously in this field?
ZHANG: Well, our approach builds on the idea of instruction tuning, which is used to fine-tune language models to follow human instructions. However, unlike existing work that focuses on tuning models into replicas of large language models in every aspect, we propose a method called mission-focused instruction tuning, where we train a smaller model to specifically excel in a broad application class, such as open information extraction. And in our case study, we focus on named entity recognition, NER, and we demonstrate how targeted distillation from large language models can maximize their capabilities for this application. At the same time, the smaller model, the student model, also preserves generalizability across different semantic types and domains. This approach differs from previous work also because we emphasize the importance of increasing the diversity of input data and generating more comprehensive coverage of entity types, which ultimately leads to better performance in the targeted application.
HUIZINGA: OK. And in the paper, you talk about student models trailing the original large language models by large margins in what you call downstream applications. Give me an example of what downstream application looks like.
ZHANG: Yeah. So we here specifically focus on named entity recognition. That is, identifying named entities in a written text.
HUIZINGA: Ah …
ZHANG: So there’s various types of named entities so the canonical ones, like person, geographic location, organization … And people have, you know, various needs. They can go beyond those coarse-grained types. They can go into very fine-grained types, like athlete or politician …
HUIZINGA: Wow …
ZHANG: … or even, you know, finer-grain types. And you cannot like predefine what types will be considered in your task. That’s why we care about this universal concept of named entity recognition.
HUIZINGA: Well, let’s talk about methodology for a bit. What kind of research methodology did you use, and how did you conduct this research?
ZHANG: We developed a general recipe for targeted distillation from large language models, and in this case, we applied it to open NER. And our methodology consists of two main steps: data construction and mission-focused instruction tuning. For data construction, we sampled inputs from a large corpus across diverse domains, and then we used a large language model, ChatGPT, to annotate entity mentions and their associated entity types in the sampled inputs. This process allowed us to create a dataset with wide coverage of entity types. For mission-focused instruction tuning, we fine-tuned smaller models using our constructed dataset in a conversational-style format. For each entity type in the output, we transformed it into a natural language query and tuned the model to generate structured outputs that contain all entities of that type in the input passage. We also incorporated negative sampling to account for entity types not mentioned in that passage. And besides these two main steps, our research also involved assembling the largest-to-date, and most diverse, NER benchmark for evaluation. We compared the performance of our targeted distillation approach with other state-of-the-art models to demonstrate the effectiveness of our methodology.
HUIZINGA: OK, so you talk about NER as a case study, and you had 43 datasets and nine domains. Give me an example of some of those domains that you pulled from.
ZHANG: Yeah. So one very, you know, typical domain is like news, right. We read news every day, and the news mentions about, you know, people, events, and location. So that’s like a very common domain. And there are other very interesting domains like code. People also write code, and the computer can understand code, but a person would also want to understand code in some different way. So if you have like a code-specific named entity recognition capability, that would be awesome for, you know, some people that want to understand what’s happening in the code.
HUIZINGA: Right. And, and you mentioned programing, or code, but I also see in the paper biomedicine on one kind of complex and academic end and social media on another. So those are wildly different domains that you pulled from. Did you do that for a reason, that spectrum of different kinds of data?
ZHANG: Yes. The reason is that, you know, for some high-value domains like biomedicine, it’s quite expensive to annotate some data to train your model like that. So traditionally, people will have to hire an expert to do that. That is quite expensive and not scalable. And here, in the UniversalNER paper, we propose a way to distill that specific domain knowledge from the large language model. So the whole process is automatic. And the resulting model, you can see, it does pretty well, and maybe equally well, on the model that’s based on, you know, human expert–annotated corpus.
HUIZINGA: So after all this, a research paper presents findings. I imagine you had some interesting discoveries in, in this study. What were your major findings?
ZHANG: Yes. Our major findings were that the targeted distillation approach, specifically here the UniversalNER model we developed, it achieved state-of-the-art performance in named entity recognition across a wide range of entity types and domains. And when we compared it to other models like Alpaca, Vicuna, and InstructUIE, UniversalNER significantly outperformed them in terms of F1 score. This demonstrates the effectiveness of mission-focused instruction tuning for creating more cost-effective and transparent models that can excel in targeted applications such as open NER.
HUIZINGA: So let’s talk a little bit more about real-world impact. Uh, we’ve already discussed a little bit about that. But how would you say, based on these findings, that this impacts the real world and how people will use this?
ZHANG: Yeah, absolutely. I would say our work is very significant in terms of real-world impact because, first of all, NER is a fundamental task in natural language processing, and it plays a crucial role in knowledge extraction, information retrieval, and data mining. And by developing a more cost-effective and transparent model like UniversalNER, which can recognize a wide range of entity types and domains, we enable better performance in these downstream applications. And like I said, this is particularly important in high-value domains, such as biomedicine, where you know specialized expertise is required for annotation and the new entity types keep emerging. Our approach can help save time and resources for effectively recognizing these new entity types without the need for extensive annotated data. And secondly, our work can have a broader impact as it represents a general recipe for targeted distillation from large language models, and this approach can be applied to other application classes, such as, you know, open relation extraction. And this allows researchers and the practitioner to create much smaller models that can be more efficient and transparent while maintaining high performance in their targeted tasks.
HUIZINGA: If there was one thing you want our listeners to take away from this work and you could distill that into a short take, what would it be?
ZHANG: Mm hmm. One key takeaway from our work is that targeted distillation from large language models using our mission-focused instruction tuning can lead to more cost-effective and transparent models that excel in a broader application class. And our application demonstrated that it is possible to harness the capabilities of large language models and distill them into much smaller models that not only maintain generalizability across semantic types and domains but also surpass the performance of their larger counterparts in the targeted application. And this opens up new avenues for research and practical application in various fields, making knowledge extractions and the natural language processing tasks more efficient and accessible.
HUIZINGA: It sounds very promising, and it sounds like you’re excited about it.
ZHANG: Yeah, I’m pretty excited!
HUIZINGA: Well then tell us, given this new vista that you’ve opened up with this UniversalNER, what unanswered questions or unsolved problems still remain in this area, and what’s next on your research agenda?
ZHANG: Yeah. Our work demonstrates the effectiveness of targeted distillation for open NER, but several unanswered questions remain. And I would say the first one is adapting the approach to other application classes. Our method is a general recipe for targeted distillation, and it would be interesting to explore its effectiveness in other broader application classes, such as open relation extraction. And the second one is handling label conflicts and dataset-specific definitions. So in our work, we propose a dataset-specific instruction tuning template to address label conflicts. But more research is needed to better understand and develop methods for harmonizing discrepancies in label definitions across datasets. And the last one is exploring more efficient data construction methods. We used ChatGPT for data construction, but, you know, alternative approaches could be explored to generate more diverse and comprehensive datasets for mission-focused instruction tuning. And as for our research agenda, we plan to continue exploring targeted distillation techniques and apply them to other application classes, as well as investigate ways to improve data construction for better performance and efficiency in real-world tasks.
HUIZINGA: Sounds like you got your work cut out for you.
ZHANG: Yes. [LAUGHS] Thank you.
HUIZINGA: Sheng Zhang, thanks for joining us today. And to our listeners, thanks for tuning in. If you’re interested in learning more about this paper, you can find a link at aka.ms/Abstracts, or you can read the paper on arXiv. See you next time on Abstracts!