The repository and website hosting the peer review process for new Programming Historian lessons
-
Updated
May 23, 2024 - Jupyter Notebook
The repository and website hosting the peer review process for new Programming Historian lessons
Run a high-fidelity browser-based crawler in a single Docker container
Makes saving pages in bulk to the wayback machine much easier
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.
A Memento Aggregator CLI and Server in Go
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
Home of the official apt/deb package for Ubuntu/Debian-based systems.
ODU Web Science and Digital Libraries Research Group (WS-DL) home page.
🗄️ A simple CLI for converting WARC to Parquet.
Really hacky proof of concept http archival using mitmproxy
🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
Streaming WARC/ARC library for fast web archive IO
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."