URLExtract() init really slow #129

gilbd · 2022-05-19T14:10:25Z

Hi, while trying to use the URLEextract() in function to parse a dataframe column, it runs really slow.
Here is my code:

def extract_urls(last):
    extractor = URLExtract()
    count = 0
    for text in lst:
        urls_found = extractor.find_urls(text)
        if len(urls_found) > 0 and MY_URL in urls_found:
            count += len(urls_found)
    return count 

df['col2'] = df['col1'].apply(extract_url)

It takes a long time due to the loading time of the TLDs and the FileLocks.
Maybe you shall convert this object to Singleton?
Another idea is to load the TLDs just once by converting the TLDs object to Singleton.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URLExtract() init really slow #129

URLExtract() init really slow #129

gilbd commented May 19, 2022

URLExtract() init really slow #129

URLExtract() init really slow #129

Comments

gilbd commented May 19, 2022