Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中英文混合 + 按标点符号切分,小数会被切分 #1084

Open
liudiao1992 opened this issue May 13, 2024 · 3 comments
Open

中英文混合 + 按标点符号切分,小数会被切分 #1084

liudiao1992 opened this issue May 13, 2024 · 3 comments
Labels
bug Something isn't working todolist

Comments

@liudiao1992
Copy link

融资4.15亿美元会变成 融资四 + 十五亿美元

@BOCEAN-FENG
Copy link

不按标点切分就好了,如果是api调用可以设置切分符号,其实webui
也可以设置,不过具体位置不好找

@RVC-Boss RVC-Boss added bug Something isn't working todolist labels May 19, 2024
@originwufish
Copy link

修改split函数切分逻辑,进行一个额外的判断,判断是否是数字字符是否分割
def split(todo_text):
todo_text = todo_text.replace("……", "。").replace("——", ",")
if todo_text[-1] not in splits:
todo_text += "。"
i_split_head = i_split_tail = 0
len_text = len(todo_text)
todo_texts = []
while 1:
if i_split_head >= len_text:
break # 结尾一定有标点,所以直接跳出即可,最后一段在上次已加入
if todo_text[i_split_head] in splits:
if i_split_head > 0 and todo_text[i_split_head-1].isdigit() and todo_text[i_split_head] == ".":
i_split_head += 1
else:
i_split_head += 1
todo_texts.append(todo_text[i_split_tail:i_split_head])
i_split_tail = i_split_head
else:
i_split_head += 1
return todo_texts

@biupiapa
Copy link

biupiapa commented May 28, 2024

GPT_SoVITS/inference_webui.py中cut5函数,修改一下正则表达式匹配模式就可以了。
punds = r'[,.;?!、,。?!;:]' -> punds = r'\.(?![0-9])|(?<![0-9])\.|[,;?!、,。?!;:…]'
我是把"."单独拿出来做做两次断言

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working todolist
Projects
None yet
Development

No branches or pull requests

5 participants