SurvivorLibraryScrape/README.md

816 B

Survival Library

Various scripts for scraping and parsing survivallibrary.com

Keep in mind it was meant to be a quick-and-dirty project, so things were kind of hotglued together as I went along.

Requirements

  1. Node.js + npm for parse_html_pages.js
    • I was using v16.13.2 (LTS) at the time of writing.
    • Remember to run npm install before attempting to run node parse_html_pages.js
  2. pdfinfo via poppler-utils
    • Used by one of the Bash scripts to validate the downloaded PDF files
  3. Bash for the various scripts
    • Bash scripts were used on a Debian 10 (Buster) machine, which has it by default. Theoretically they should work on Windows (e.g. via Git Bash), but due to requirement #2 it might not work as expected.
  4. curl - which downloads all the pages.