Fix typos
This commit is contained in:
parent
a578850d37
commit
b1f238c965
@ -1,6 +1,6 @@
|
|||||||
# Survival Library
|
# Survivor Library
|
||||||
|
|
||||||
Various scripts for scraping and parsing survivallibrary.com
|
Various scripts for scraping and parsing survivorlibrary.com
|
||||||
|
|
||||||
Keep in mind it was meant to be a quick-and-dirty project, so things were kind of hotglued together as I went along.
|
Keep in mind it was meant to be a quick-and-dirty project, so things were kind of hotglued together as I went along.
|
||||||
|
|
||||||
@ -18,7 +18,7 @@ Keep in mind it was meant to be a quick-and-dirty project, so things were kind o
|
|||||||
## Order of scripts
|
## Order of scripts
|
||||||
|
|
||||||
1. Browser: `get_page_urls_browser.js`
|
1. Browser: `get_page_urls_browser.js`
|
||||||
1. Add URLs into file `survivallibrary_pages.txt`
|
1. Add URLs into file `survivorlibrary_pages.txt`
|
||||||
2. Bash: `get_pages_with_pdfs.sh`
|
2. Bash: `get_pages_with_pdfs.sh`
|
||||||
1. This one will take a while, since it downloads the HTML of all the category pages and dumps it into the `pages/` directory.
|
1. This one will take a while, since it downloads the HTML of all the category pages and dumps it into the `pages/` directory.
|
||||||
3. Node: `parse_html_pages.js`
|
3. Node: `parse_html_pages.js`
|
||||||
|
@ -108,14 +108,14 @@ async function parseHtml()
|
|||||||
await fs.writeFile('./folderLink.sh', folderLinkCmds.join('\n'));
|
await fs.writeFile('./folderLink.sh', folderLinkCmds.join('\n'));
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* It seems the web server for SurvivalLibrary doesn't support
|
* It seems the web server for SurvivorLibrary doesn't support
|
||||||
* the `Range` HTTP header. We can't just "continue" a download.
|
* the `Range` HTTP header. We can't just "continue" a download.
|
||||||
*
|
*
|
||||||
* I wouldn't be surprised if one (or more) of the PDFs end up corrupted, as we just check if the file _exists_
|
* I wouldn't be surprised if one (or more) of the PDFs end up corrupted, as we just check if the file _exists_
|
||||||
* before skipping it (if it does exist).
|
* before skipping it (if it does exist).
|
||||||
*
|
*
|
||||||
* As a workaround, I created `validate_pdfs.sh` to at least validate that the PDFs are valid.
|
* As a workaround, I created `validate_pdfs.sh` to at least validate that the PDFs are valid.
|
||||||
* Keep in mind that most of the PDFs that are invalid, are also corrupted on Survival Library's website.
|
* Keep in mind that most of the PDFs that are invalid, are also corrupted on Survivor Library's website.
|
||||||
* Meaning it's the _source_ that's corrupt, not the downloaded file specifically.
|
* Meaning it's the _source_ that's corrupt, not the downloaded file specifically.
|
||||||
*/
|
*/
|
||||||
const scriptOutput = pdfUrls.map(url => {
|
const scriptOutput = pdfUrls.map(url => {
|
||||||
|
Loading…
Reference in New Issue
Block a user