You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Match({'ruleId': 'MORFOLOGIK_RULE_ES', 'message': 'Se ha encontrado un posible error ortográfico.', 'replacements': ['teléfonos', 'teléfono', 'telefotos'], 'offsetInContext': 43, 'context': '...𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊', 'offset': 307, 'errorLength': 9, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido 𝑴𝒆 𝒐𝒍𝒗𝒊𝒅𝒆 𝒖𝒏𝒂 𝒎𝒐𝒄𝒋𝒊𝒍𝒂 𝒏𝒆𝒈𝒓𝒔 ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje 𝒂𝒏𝒅𝒓𝒆𝒔𝒊𝒕𝒐 𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊'})
Original sentence look like shown in example: Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido 𝑴𝒆 𝒐𝒍𝒗𝒊𝒅𝒆 𝒖𝒏𝒂 𝒎𝒐𝒄𝒋𝒊𝒍𝒂 𝒏𝒆𝒈𝒓𝒔 ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje 𝒂𝒏𝒅𝒓𝒆𝒔𝒊𝒕𝒐 𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊
Problem is that offset is said to be 307, while sentence length in chars 296.
I think that the problem is that the text has some chars that actually internally take more than one position in unicode encoding (are compose but multiple chars).
The problem is that when I try to reference detection to original text I get an error because that position is wrong and does not reference the true position in the text
The text was updated successfully, but these errors were encountered:
Your problem is indeed reproducible with the following code:
fromlanguage_tool_pythonimportLanguageToollanguage="ES"tool=LanguageTool(language)
text='Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido 𝑴𝒆 𝒐𝒍𝒗𝒊𝒅𝒆 𝒖𝒏𝒂 𝒎𝒐𝒄𝒋𝒊𝒍𝒂 𝒏𝒆𝒈𝒓𝒔 ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje 𝒂𝒏𝒅𝒓𝒆𝒔𝒊𝒕𝒐 𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊'print("len(text)", len(text))
matches=tool.check(text)
formatchinmatches:
print(match)
corrected_text=language_tool_python.utils.correct(text, matches)
print(corrected_text)
tool.close()
Note that some words of your text seem to have special formatting.
If you clean your text as follows it seems to work normally:
text_cleaned='Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido Me olvide una mocjila negrs ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje andresito y zapiola podemos compartir tus telefonos con el conductor Si'
I have a match that look like this:
Match({'ruleId': 'MORFOLOGIK_RULE_ES', 'message': 'Se ha encontrado un posible error ortográfico.', 'replacements': ['teléfonos', 'teléfono', 'telefotos'], 'offsetInContext': 43, 'context': '...𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊', 'offset': 307, 'errorLength': 9, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido 𝑴𝒆 𝒐𝒍𝒗𝒊𝒅𝒆 𝒖𝒏𝒂 𝒎𝒐𝒄𝒋𝒊𝒍𝒂 𝒏𝒆𝒈𝒓𝒔 ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje 𝒂𝒏𝒅𝒓𝒆𝒔𝒊𝒕𝒐 𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊'})
Original sentence look like shown in example:
Rider > Lost Items > Standard lost item > Driver found riders itemdescripcion del articulo perdido 𝑴𝒆 𝒐𝒍𝒗𝒊𝒅𝒆 𝒖𝒏𝒂 𝒎𝒐𝒄𝒋𝒊𝒍𝒂 𝒏𝒆𝒈𝒓𝒔 ingresa un numero de telefono alternativo incluye el codigo de tu pais informacion sobre el viaje 𝒂𝒏𝒅𝒓𝒆𝒔𝒊𝒕𝒐 𝒚 𝒛𝒂𝒑𝒊𝒐𝒍𝒂 podemos compartir tus telefonos con el conductor 𝑺𝒊
Problem is that offset is said to be 307, while sentence length in chars 296.
I think that the problem is that the text has some chars that actually internally take more than one position in unicode encoding (are compose but multiple chars).
The problem is that when I try to reference detection to original text I get an error because that position is wrong and does not reference the true position in the text
The text was updated successfully, but these errors were encountered: