Google Scholar Web Scraping and CSV Creation

This Jupyter Notebook allows you to scrape Google Scholar search results, extract paper information, and create a CSV file with the following columns: Paper Title, Year of Publication, Author, Publication Journal, and URL of the Paper.

Sample Output

Prerequisites

Before using this notebook, make sure you have the following:

Python 3.10 installed
Required libraries installed (you can install them using pip:
- requests
- beautifulsoup4
- pandas

Usage

Clone or download this repository to your local machine.
Open the Jupyter Notebook parsing.ipynb in your Jupyter Notebook environment.
Modify the url variable to specify your Google Scholar search query. For example, you can change "tirzepatide" to your desired search term.
```
url = "https://scholar.google.com/scholar?start=0&q=tirzepatide&hl=en&as_sdt=0,5"
```

Execute the notebook cell by cell. The notebook is divided into sections, each responsible for a specific task:

Scraping the Google Scholar search results page.
Extracting paper tags, citation links, and other relevant information.
Parsing and formatting the data.
Creating a CSV file with the extracted information.

After executing the entire notebook, the CSV file containing the paper information will be generated. You can find this file in the same directory as the notebook.

Customization

You can customize the notebook by modifying the following functions:

get_paperinfo(paper_url): This function retrieves the content of a Google Scholar page. You can use it to scrape other search results.
get_tags(doc): Modify this function to select different tags or elements from the page source based on your requirements.
get_papertitle(paper_tag): If you want to extract additional information from the paper tags, customize this function.
get_author_year_publi_info(authors_tag): Adjust this function to extract different information from the author tags.

Issues and Contributions

If you encounter any issues or have suggestions for improvements, please open an issue in this GitHub repository. Contributions and pull requests are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
google-scholar		google-scholar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
table.png		table.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

google-scholar

google-scholar

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

table.png

table.png

Repository files navigation

Google Scholar Web Scraping and CSV Creation

Sample Output

Prerequisites

Usage

Customization

Issues and Contributions

Buy me a coffee!

About

Releases

Packages

Languages

License

mirsadra/google-scholar-parser

Folders and files

Latest commit

History

Repository files navigation

Google Scholar Web Scraping and CSV Creation

Sample Output

Prerequisites

Usage

Customization

Issues and Contributions

Buy me a coffee!

About

Topics

Resources

License

Stars

Watchers

Forks

Languages