Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets not found on Google Dataset Search #199

Open
maxclac opened this issue Aug 3, 2021 · 10 comments
Open

Datasets not found on Google Dataset Search #199

maxclac opened this issue Aug 3, 2021 · 10 comments

Comments

@maxclac
Copy link

maxclac commented Aug 3, 2021

Hi,

I am running CKAN 2.9.2 on Ubuntu 20 and I installed the DCAT plugin. I followed the instructions on the README file (activating the structured_data and dcat plugins) in order to have my Datasets discovered by Google Dataset Search but this has not happened until now.

What could I be missing?

Best regards

@metaodi
Copy link
Member

metaodi commented Nov 5, 2021

Did you verify if the structured data is generated in the frontend (i.e. view source and check for a json+ld block)? Maybe you have customized your frontend?

Then you could check if the schema validator indicates any errors for your domain (test with the URL of a dataset).

@maxclac
Copy link
Author

maxclac commented Nov 24, 2021

Hi @metaodi and thank you for your answer. The validator does not indicate any error and it seems my urls are correct.

@anuveyatsu
Copy link

We also had some issues with indexing datasets by Google Dataset Search. Only a few datasets get indexed.

@sagargg
Copy link

sagargg commented Feb 10, 2022

Maybe google dataset search require standard JSON-LD structure for indexing
https://developers.google.com/search/docs/advanced/structured-data/dataset#example

@metaodi
Copy link
Member

metaodi commented Feb 10, 2022

@sagargg this is exactly what this extension provides. But it's hard to tell what went wrong with no further details.

  • Is the JSON+LD block generated?
  • Do you have a robots.txt?
  • Does your site submit a sitemap to Google?

@maxclac
Copy link
Author

maxclac commented Feb 10, 2022

Thank you @metaodi for your answer.

The JSON+LD is correctly formed. As I have no former experience with letting crawlers access a website, I was not aware of the necessity to take care of a robots.txt file and a sitemap. I realized it is important to read the Google Search guidelines before using the extension. Are there CKAN-specific instructions about setting up a robots.txt and a sitemap?

@metaodi
Copy link
Member

metaodi commented Feb 10, 2022

No there is nothing CKAN specific. We use this extension on the open data catalogue of the City of Zurich, and it works for us.

See the Google Dataset Search help page for specific instructions: https://datasetsearch.research.google.com/help

Hope this helps.

@maxclac
Copy link
Author

maxclac commented Feb 10, 2022

Thanks! Is a robots.txt really needed? I thought that, when none is given, Google would just crawl everything.

@metaodi
Copy link
Member

metaodi commented Feb 10, 2022

No, it's not necessary. But since I don't know your setup, it could be that an existing robots.txt is blocking the google crawler.

Just something to keep in mind.

@maxclac
Copy link
Author

maxclac commented Feb 10, 2022

I see. I am not aware of any pre-existing robots.txt in my CKAN instance. Maybe if I explicitly put one, the indexing will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants