Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importer/twitter: archive the URL shortlinks' destination #1279

Open
bradfitz opened this issue Sep 26, 2019 · 2 comments
Open

importer/twitter: archive the URL shortlinks' destination #1279

bradfitz opened this issue Sep 26, 2019 · 2 comments

Comments

@bradfitz
Copy link
Contributor

If you make a tweet containing a URL, it gets short-linkified with Twitter's URL shortener and the URL we get back from Twitter's API only has the Twitter shortlink URL, not the original URL as tweeted.

Example:

Screen Shot 2019-09-25 at 8 25 32 PM

We should index those t.co shortlinks and record somewhere where they actually point to, in case that shortlink service goes down.

We could just store new attributes on the permanode, like:

  shortlink:<t.co-suffix> = http://domain.tld/target
@sbshah97
Copy link

Hey, could I help with this issue ?
Looks interesting to me and would love to help understand the project as well.

@rjp
Copy link

rjp commented May 7, 2022

Looks like the expanded information is in the entities block of the API response these days, e.g.

json.entities.Urls[0].Url = "https://t.co/yVlNlAnRtr"; json.user.entities.Url.Urls[0].Display_url = "lawyersgunsmoneyblog.com/author/Scott-L…"; json.user.entities.Url.Urls[0].Expanded_url = "http://www.lawyersgunsmoneyblog.com/author/Scott-Lemieux/";

I can probably have a look at adding these as new attributes (entities also contains links to mentioned users, etc. which might also be handy to store.)

Is there a document or discussion about naming of attributes (such as what to do if there's more than one shortlink, etc.) anywhere? Or any support for hierarchy in the attributes?

The attr search doesn't seem to support wildcards which means attr:shortlink:~ (or attr:~shortlink:~) won't find shortlink.0 (or other variant).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants