Skip to content

Latest commit

 

History

History
40 lines (32 loc) · 1.19 KB

2024_WSDM_对话式多文档QA.md

File metadata and controls

40 lines (32 loc) · 1.19 KB

比赛介绍

https://sites.google.com/view/wsdm24-docqa

冠军方案介绍

wsdm 2024,基于大模型进行多文档问答

tricks:

  • SOLAR-10.7B-Instruct model作为基干模型
  • hybrid training:utilize a well-trained model to produce (pseudo) answers for the eval dataset before adding them to the original training set to finetune a new model from scratch
  • 噪音数据过滤:提升数据的质量
  • model ensemble

数据样例介绍

相比其他场景下的数据,增加了history的数据。

{
"uuid": "xxxxx",
"history": [
 {"question": xxx, "history": xxx},
 {"question": xxx, "history": xxx},
 ...
],
"documents": 
[
"Jun 17th through Fri the 21st, 2024 at the Seattle Convention Center, Vancouver Convention Center.", "Workshops within a “track” will take place in the same room (or be co-located), and workshop organizers will be asked to work closely with others in their track ...", 
...
],
"question": "Where will CVPR 2024 happen?",
"answer": "CVPR 2024 will happen at the Seattle Convention Center, Vancouver.",
"keywords": # Will not be given.
[
"Vancouver", "CVPR 2024", "Seattle Convention Center"
] 
}