Skip to content

Export the articles and reports from Google Scholar easily! (The spider structure is inspired by the Scrapy framework)

License

Notifications You must be signed in to change notification settings

sheikhartin/google-scholar-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Scholar Scraper

GitHub repo status GitHub license GitHub contributors GitHub tag (latest by date) GitHub repo size

Unfortunately, Google Scholar does not support exporting results... I needed the most cited papers for a research project, and after trying an imperfect script I decided to write my own.

Important note: The spiders don't send more than 2 requests per second to Google Scholar. The reason is that we don't like to solve the CAPTCHA, so it's better to wait a little and acting like a human. Changing IP address sometimes is a good idea... 😩

Features

  • Supports multiple languages
  • Customizable date range
  • Sorts by number of citations
  • Sorts by year
  • Searches for articles
  • Searches for case law
  • Searches in a profile by ID
  • Graphical interface

A shocked skeleton

Usage

Install the dependencies:

pip install -r requirements.txt

Run the scraper just by typing the keyword:

python core.py "cryptography"

Customize the date range:

python core.py "metaverse" -s 1997 -e 2018

Limit the languages to one or more:

python core.py "medical" -l en es zh-tw fr

Set the output file path:

python core.py "machine learning" -s 2002 -o exports/most_cited_ml_articles_since_2002.csv

Sort the output by year:

python core.py "oceanography" -y

Search for case law:

python core.py "privacy" -c

Get a specific profile articles by the user ID:

python core.py "nms69lqaaaaj" -p -o jeff_dean_articles.csv

Make the program quiet:

python core.py "philosophy" -e 1234 -q

Here is some example exports to see if the scraper meets your needs or not!

License

This project is licensed under the MIT license found in the LICENSE file in the root directory of this repository.

About

Export the articles and reports from Google Scholar easily! (The spider structure is inspired by the Scrapy framework)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published