Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is scraping shadow DOM an option? #48

Open
jtlimo opened this issue Apr 28, 2022 · 6 comments
Open

Is scraping shadow DOM an option? #48

jtlimo opened this issue Apr 28, 2022 · 6 comments

Comments

@jtlimo
Copy link

jtlimo commented Apr 28, 2022

Hi, I'm trying to web scrapping YouTube charts, unsuccessfully because they use polymer / shadow DOM. With Geziyor, could I do that? I'm using colly, and they don't have support for that.

@musabgultekin
Copy link
Collaborator

musabgultekin commented Apr 28, 2022

Hi! Thanks for reporting that!

Its not possible rn. Since actions are hardcoded. Im working on adding a way to add custom actions.

@musabgultekin
Copy link
Collaborator

Ive added custom actions in request.
So you'll add custom chromedp.Action to the geziyor.Request object. like req.Actions = []chromedp.Actions{XXX}

See TestGetRenderedCustomActions test

738852f

And you'll access to shadow doms like this action:

chromedp.Click(document.querySelector('#container').shadowRoot.querySelector('#foo'), chromedp.ByJSPath),

See this issue: chromedp/chromedp#376

Let me know if this fixes your problem!

@jtlimo
Copy link
Author

jtlimo commented May 3, 2022

Ive added custom actions in request. So you'll add custom chromedp.Action to the geziyor.Request object. like req.Actions = []chromedp.Actions{XXX}

See TestGetRenderedCustomActions test

738852f

And you'll access to shadow doms like this action:

chromedp.Click(document.querySelector('#container').shadowRoot.querySelector('#foo'), chromedp.ByJSPath),

See this issue: chromedp/chromedp#376

Let me know if this fixes your problem!

I'll try this between today and tomorrow… after testing, I say if it is work for my case.

@jtlimo
Copy link
Author

jtlimo commented May 4, 2022

Hey @musabgultekin I'm trying to scrape yt charts, but unfortunately unsuccessfully. I push my code to this repo if you have time to take a look, I appreciate.

The new error is when try to start scraping, the context was cancelled.

@musabgultekin
Copy link
Collaborator

Yeah I can look into it tomorrow, busy for today. Will comment here when i can find time.

@jtlimo
Copy link
Author

jtlimo commented Jul 15, 2022

Yeah I can look into it tomorrow, busy for today. Will comment here when i can find time.

Hi @musabgultekin some news in this issue? I can help you providing some more information or anything else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants