1
0
mirror of https://github.com/c9fe/22120.git synced 2024-09-11 18:42:29 +02:00
This commit is contained in:
Cris Stringfellow 2021-11-03 04:51:04 +00:00
commit 24ab9c8613
30 changed files with 5683 additions and 0 deletions

3
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1,3 @@
# These are supported funding model platforms
custom: https://dosyago.com

122
.gitignore vendored Normal file
View File

@ -0,0 +1,122 @@
.*.swp
# Bundling and packaging
22120.exe
22120.nix
22120.mac
22120.win32.exe
22120.nix32
#Leave these to allow install by npm -g
#22120.js
#*.22120.js
# Library
public/library/cache.json
public/library/http*
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
# Runtime data
pids
*.pid
*.seed
*.pid.lock
# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov
# Coverage directory used by tools like istanbul
coverage
*.lcov
# nyc test coverage
.nyc_output
# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt
# Bower dependency directory (https://bower.io/)
bower_components
# node-waf configuration
.lock-wscript
# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release
# Dependency directories
node_modules/
jspm_packages/
# TypeScript v1 declaration files
typings/
# TypeScript cache
*.tsbuildinfo
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
*.tgz
# Yarn Integrity file
.yarn-integrity
# dotenv environment variables file
.env
.env.test
# parcel-bundler cache (https://parceljs.org/)
.cache
# Next.js build output
.next
# Nuxt.js build / generate output
.nuxt
dist
# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and *not* Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public
# vuepress build output
.vuepress/dist
# Serverless directories
.serverless/
# FuseBox cache
.fusebox/
# DynamoDB Local files
.dynamodb/
# TernJS port file
.tern-port

1
.npm.release.date Normal file
View File

@ -0,0 +1 @@
Wed Oct 27 09:49:23 UTC 2021

9
.npmignore Normal file
View File

@ -0,0 +1,9 @@
.*.swp
# Bundling and packaging
22120.exe
22120.nix
22120.mac
22120.win32.exe
22120.nix32

138
LICENSE Normal file
View File

@ -0,0 +1,138 @@
Copyright Dosyago Corporation & Cris Stringfellow (https://dosaygo.com)
22120 and all previously released versions, including binaries, NPM packages, and
Docker images (including all named archivist1 and any other names)
is re-licensed under the following PolyForm Noncommercial License 1.0.0 and all previous
licenses are revoked.
# PolyForm Noncommercial License 1.0.0
<https://polyformproject.org/licenses/noncommercial/1.0.0>
## Acceptance
In order to get any license under these terms, you must agree
to them as both strict obligations and conditions to all
your licenses.
## Copyright License
The licensor grants you a copyright license for the
software to do everything you might do with the software
that would otherwise infringe the licensor's copyright
in it for any permitted purpose. However, you may
only distribute the software according to [Distribution
License](#distribution-license) and make changes or new works
based on the software according to [Changes and New Works
License](#changes-and-new-works-license).
## Distribution License
The licensor grants you an additional copyright license
to distribute copies of the software. Your license
to distribute covers distributing the software with
changes and new works permitted by [Changes and New Works
License](#changes-and-new-works-license).
## Notices
You must ensure that anyone who gets a copy of any part of
the software from you also gets a copy of these terms or the
URL for them above, as well as copies of any plain-text lines
beginning with `Required Notice:` that the licensor provided
with the software. For example:
> Required Notice: Copyright Dosyago Corporation & Cris Stringfellow (https://dosaygo.com)
## Changes and New Works License
The licensor grants you an additional copyright license to
make changes and new works based on the software for any
permitted purpose.
## Patent License
The licensor grants you a patent license for the software that
covers patent claims the licensor can license, or becomes able
to license, that you would infringe by using the software.
## Noncommercial Purposes
Any noncommercial purpose is a permitted purpose.
## Personal Uses
Personal use for research, experiment, and testing for
the benefit of public knowledge, personal study, private
entertainment, hobby projects, amateur pursuits, or religious
observance, without any anticipated commercial application,
is use for a permitted purpose.
## Noncommercial Organizations
Use by any charitable organization, educational institution,
public research organization, public safety or health
organization, environmental protection organization,
or government institution is use for a permitted purpose
regardless of the source of funding or obligations resulting
from the funding.
## Fair Use
You may have "fair use" rights for the software under the
law. These terms do not limit them.
## No Other Rights
These terms do not allow you to sublicense or transfer any of
your licenses to anyone else, or prevent the licensor from
granting licenses to anyone else. These terms do not imply
any other licenses.
## Patent Defense
If you make any written claim that the software infringes or
contributes to infringement of any patent, your patent license
for the software granted under these terms ends immediately. If
your company makes such a claim, your patent license ends
immediately for work on behalf of your company.
## Violations
The first time you are notified in writing that you have
violated any of these terms, or done anything with the software
not covered by your licenses, your licenses can nonetheless
continue if you come into full compliance with these terms,
and take practical steps to correct past violations, within
32 days of receiving notice. Otherwise, all your licenses
end immediately.
## No Liability
***As far as the law allows, the software comes as is, without
any warranty or condition, and the licensor will not be liable
to you for any damages arising out of these terms or the use
or nature of the software, under any kind of legal claim.***
## Definitions
The **licensor** is the individual or entity offering these
terms, and the **software** is the software the licensor makes
available under these terms.
**You** refers to the individual or entity agreeing to these
terms.
**Your company** is any legal entity, sole proprietorship,
or other kind of organization that you work for, plus all
organizations that have control over, are under the control of,
or are under common control with that organization. **Control**
means ownership of substantially all the assets of an entity,
or the power to direct its management and policies by vote,
contract, or otherwise. Control can be direct or indirect.
**Your licenses** are all the licenses granted to you for the
software under these terms.
**Use** means anything you do with the software requiring one
of your licenses.

7
NOTICE Normal file
View File

@ -0,0 +1,7 @@
Copyright Dosyago Corporation & Cris Stringfellow (https://dosaygo.com)
22120 and all previously released versions, including binaries, NPM packages, and
Docker images (including all named archivist1, and all other previous names)
is re-licensed under the following PolyForm Noncommercial License 1.0.0 and all previous
licenses are revoked.

240
README.md Normal file
View File

@ -0,0 +1,240 @@
# :classical_building: [22120](https://github.com/c9fe/22120) ![npm downloads](https://img.shields.io/npm/dt/archivist1?label=npm%20downloads) ![binary downloads](https://img.shields.io/github/downloads/c9fe/22120/total?label=binary%20downloads) [![latest package](https://img.shields.io/github/v/release/c9fe/22120?label=latest%20release)](https://github.com/c9fe/22120/releases) [![visitors+++](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fc9fe%2F22120&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=%28today%2Ftotal%29%20visitors%2B%2B%2B%20since%20Oct%2027%202020&edge_flat=false)](https://hits.seeyoufarm.com) [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fi5ik%2F22120.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fi5ik%2F22120?ref=badge_shield)
:classical_building: - An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.
If you use or like this, don't forget to show your appreciation by [starring this repo](https://github.com/i5ik/22120/stargazers), or [following me](https://github.com/i5ik) 😹
**News - 22120 plus interview featured in [Console - the open source newsletter](https://console.substack.com/p/console-28)**
<span id=toc></span>
----------------
- [Overview](#classical_building-22120---)
* [License](#license)
* [About](#about)
* [Get 22120](#get-22120)
* [Using](#using)
+ [Pick save mode or serve mode](#pick-save-mode-or-serve-mode)
+ [Exploring your 22120 archive](#exploring-your-22120-archive)
* [Format](#format)
* [Why not WARC (or another format like MHTML) ?](#why-not-warc-or-another-format-like-mhtml-)
* [How it works](#how-it-works)
* [FAQ](#faq)
+ [Do I need to download something?](#do-i-need-to-download-something)
+ [Can I use this with a browser that's not Chrome-based?](#can-i-use-this-with-a-browser-thats-not-chrome-based)
+ [How does this interact with Ad blockers?](#how-does-this-interact-with-ad-blockers)
+ [How secure is running chrome with remote debugging port open?](#how-secure-is-running-chrome-with-remote-debugging-port-open)
+ [Is this free?](#is-this-free)
+ [What if it can't find my chrome?](#what-if-it-cant-find-my-chrome)
+ [What's the roadmap?](#whats-the-roadmap)
+ [What about streaming content?](#what-about-streaming-content)
+ [Can I black list domains to not archive them?](#can-i-black-list-domains-to-not-archive-them)
+ [Is there a DEBUG mode for troubleshooting?](#is-there-a-debug-mode-for-troubleshooting)
+ [Can I version the archive?](#can-i-version-the-archive)
+ [Can I change the archive path?](#can-i-change-the-archive-path)
+ [Can I change this other thing?](#can-i-change-this-other-thing)
------------------
## License
This is released into the public domain.
<p align=right><small><a href=#toc>Top</a></small></p>
## About
**This project literally makes your web browsing available COMPLETELY OFFLINE.** Your browser does not even know the difference. It's literally that amazing. Yes.
Save your browsing, then switch off the net and go to `http://localhost:22120` and switch mode to **serve** then browse what you browsed before. It all still works.
**warning: if you have Chrome open, it will close it automatically when you open 22120, and relaunch it. You may lose any unsaved work.**
<p align=right><small><a href=#toc>Top</a></small></p>
## Get 22120
3 ways to get it:
1. Get binary from the [releases page.](https://github.com/c9fe/22120/releases), or
2. Run with npx: `npx archivist1@latest`, or
- `npm i -g archivist1@latest && archivist1`
3. Clone this repo and run as a Node.JS app: `npm i && npm start`
Also, coming soon is a Chrome Extension.
<p align=right><small><a href=#toc>Top</a></small></p>
## Using
### Pick save mode or serve mode
Go to http://localhost:22120 in your browser,
and follow the instructions.
<p align=right><small><a href=#toc>Top</a></small></p>
### Exploring your 22120 archive
Archive will be located in `22120-arc/public/library`\*
But it's not public, don't worry!
You can also check out the archive index, for a listing of every title in the archive. The index is accessible from the control page, which by default is at [http://localhost:22120](http://localhost:22120) (unless you changed the port).
\**Note:`22120-arc` is the archive root of a single archive, and by defualt it is placed in your home directory. But you can change the parent directory for `22120-arc` to have multiple archvies.*
<p align=right><small><a href=#toc>Top</a></small></p>
## Format
The archive format is:
`22120-arc/public/library/<resource-origin>/<path-hash>.json`
Inside the JSON file, is a JSON object with headers, response code, key and a base 64 encoded response body.
<p align=right><small><a href=#toc>Top</a></small></p>
## Why not WARC (or another format like MHTML) ?
**The case for the 22120 format.**
Other formats (like MHTML and SingleFile) save translations of the resources you archive. They create modifications, such as altering the internal structure of the HTML, changing hyperlinks and URLs into "flat" embedded data URIs, or local references, and require other "hacks* in order to save a "perceptually similar" copy of the archived resource.
22120 throws all that out, and calls rubbish on it. 22120 saves a *verbatim* **high-fidelity** copy of the resources your archive. It does not alter their internal structure in any way. Instead it records each resource in its own metadata file. In that way it is more similar to HAR and WARC, but still radically different. Compared to WARC and HAR, our format is radically simplified, throwing out most of the metadata information and unnecessary fields these formats collect.
**Why?**
At 22120, we believe in the resources and in verbatim copies. We don't annoint ourselves as all knowing enough to modify the resource source of truth before we archive it, just so it can "fit the format* we choose. We don't believe we need to decorate with obtuse and superfluous metadata. We don't believe we should be modifying or altering resources we archive. We belive we should save them exactly as they were presented. We believe in simplicity. We believe the format should fit (or at least accommodate, and be suited to) the resource, not the other way around. We don't believe in conflating **metadata** with **content**; so we separate them. We believe separating metadata and content, and keeping the content pure and altered throughout the archiving process is not only the right thing to do, it simplifies every part of the audit trail, because we know that the modifications between archived copies of a resource of due to changes to the resources themselves, not artefacts of the format or archiving process.
Both SingleFile and MHTML require mutilatious modifications of the resources so that the resources can be "forced to fit" the format. At 22120, we believe this is not required (and in any case should never be performed). We see it as akin to lopping off the arms of a Roman statue in order to fit it into a presentation and security display box. How ridiculous! The web may be a more "pliable" medium but that does not mean we should treat it without respect for its inherent content.
**Why is changing the internal structure of resources so bad?**
In our view, the internal structure of the resource as presented, *is the cannon*. Internal structure is not just substitutable "presentation" - no, in fact it encodes vital semantic information such as hyperlink relationships, source choices, and the "strokes" of the resource author as they create their content, even if it's mediated through a web server or web framework.
**Why else is 22120 the obvious and natural choice?**
22120 also archives resources exactly as they are sent to the browser. It runs connected to a browser, and so is able to access the full-scope of resources (with, currently, the exception of video, audio and websockets, for now) in their highest fidelity, without modification, that the browser receives and is able to archive them in the exact format presented to the user. Many resources undergo presentational and processing changes before they are presented to the user. This is the ubiquitous, "web app", where client-side scripting enabled by JavaScript, creates resources and resource views on the fly. These sorts of "hyper resources" or "realtime" or "client side" resources, prevalent in SPAs, are not able to be archived, at least not utilizing the normal archive flow, within traditional `wget`-based archiving tools.
In short, the web is an *online* medium, and it should be archived and presented in the same fashion. 22120 archives content exactly as it is received and presented by a browser, and it also replays that content exactly as if the resource were being taken from online. Yes, it requires a browser for this exercise, but that browser need not be connected to the internet. It is only natural that viewing a web resource requires the web browser. And because of 22120 the browser doesn't know the difference! Resources presented to the browser form a remote web site, and resources given to the browser by 22120, are seen by the browser as ***exactly the same.*** This ensures that the people viewing the archive are also not let down and are given the change to have the exact same experience as if they were viewing the resource online.
<p align=right><small><a href=#toc>Top</a></small></p>
## How it works
Uses DevTools protocol to intercept all requests, and caches responses against a key made of (METHOD and URL) onto disk. It also maintains an in memory set of keys so it knows what it has on disk.
<p align=right><small><a href=#toc>Top</a></small></p>
## FAQ
### Do I need to download something?
Yes. But....If you like **22120**, you might love the clientless hosted version coming in future. You'll be able to build your archives online from any device, without any download, then download the archive to run on any desktop. You'll need to sign up to use it, but you can jump the queue and sign up [today](https://dosyago.com).
### Can I use this with a browser that's not Chrome-based?
No.
<p align=right><small><a href=#toc>Top</a></small></p>
### How does this interact with Ad blockers?
Interacts just fine. The things ad blockers stop will not be archived.
<p align=right><small><a href=#toc>Top</a></small></p>
### How secure is running chrome with remote debugging port open?
Seems pretty secure. It's not exposed to the public internet, and pages you load that tried to use it cannot use the protocol for anything (except to open a new tab, which they can do anyway). It seems there's a potential risk from malicious browser extensions, but we'd need to confirm that and if that's so, work out blocks. See [this useful security related post](https://github.com/c9fe/22120/issues/67) for some info.
<p align=right><small><a href=#toc>Top</a></small></p>
### Is this free?
Yes this is totally free to download and use. It's also open source (under AGPL-3.0) so do what you want with it. For more information about licensing, see the [license section](#license).
<p align=right><small><a href=#toc>Top</a></small></p>
### What if it can't find my chrome?
See this useful [issue](https://github.com/c9fe/22120/issues/68).
<p align=right><small><a href=#toc>Top</a></small></p>
### What's the roadmap?
- Full text search
- Library server to serve archive publicly.
- Distributed p2p web browser on IPFS
<p align=right><small><a href=#toc>Top</a></small></p>
### What about streaming content?
The following are probably hard (and I haven't thought much about):
- Streaming content (audio, video)
- "Impure" request response pairs (such as if you call GET /endpoint 1 time you get "A", if you call it a second time you get "AA", and other examples like this).
- WebSockets (how to capture and replay that faithfully?)
Probably some way to do this tho.
<p align=right><small><a href=#toc>Top</a></small></p>
### Can I black list domains to not archive them?
Yes! Put any domains into `22120-arc/no.json`\*, eg:
```json
[
"*.horribleplantations.com",
"*.cactusfernfurniture.com",
"*.gustymeadows.com",
"*.nytimes.com",
"*.cnn.co?"
]
```
Will not cache any resource with a host matching those. Wildcards:
- `*` (0 or more anything) and
- `?` (0 or 1 anything)
\**Note: the `no` file is per-archive. `22120-arc` is the archive root of a single archive, and by defualt it is placed in your home directory. But you can change the parent directory for `22120-arc` to have multiple archvies, and each archive requires its own `no` file, if you want a blacklist in that archive.*
<p align=right><small><a href=#toc>Top</a></small></p>
### Is there a DEBUG mode for troubleshooting?
Yes, just make sure you set an environment variable called `DEBUG_22120` to anything non empty.
So for example in posix systems:
```bash
export DEBUG_22120=True
```
<p align=right><small><a href=#toc>Top</a></small></p>
### Can I version the archive?
Yes! But you need to use `git` for versioning. Just initiate a git repo in your archive repository. And when you want to save a snapshot, make a new git commit.
<p align=right><small><a href=#toc>Top</a></small></p>
### Can I change the archive path?
Yes, there's a control for changing the archive path in the control page: http://localhost:22120
<p align=right><small><a href=#toc>Top</a></small></p>
### Can I change this other thing?
There's a few command line arguments. You'll see the format printed as the first printed line when you start the program.
For other things you can examine the source code.
<p align=right><small><a href=#toc>Top</a></small></p>

17
SECURITY.md Normal file
View File

@ -0,0 +1,17 @@
# Security Policy
## Supported Versions
Use this section to tell people about which versions of your project are
currently being supported with security updates.
| Version | Supported |
| ------- | ------------------ |
| Latest | :white_check_mark: |
## Reporting a Vulnerability
To report a vulnerability, contact: cris@dosycorp.com
To view previous responsible disclosure vulnerability reports, mediation write ups, notes and other information, please visit the [Dosyago Responsible Dislcousre Center](https://github.com/dosyago/vulnerability-reports)

133
app.js Normal file
View File

@ -0,0 +1,133 @@
import {DEBUG, context, sleep, NO_SANDBOX} from './common.js';
import Archivist from './archivist.js';
import LibraryServer from './libraryServer.js';
import args from './args.js';
const {server_port, mode, chrome_port} = args;
const CHROME_OPTS = !NO_SANDBOX ? [
'--restore-last-session',
`--disk-cache-dir=${args.temp_browser_cache()}`,
`--aggressive-cache-discard`
] : [
'--restore-last-session',
`--disk-cache-dir=${args.temp_browser_cache()}`,
`--aggressive-cache-discard`,
'--no-sandbox'
];
const LAUNCH_OPTS = {
logLevel: DEBUG ? 'verbose' : 'silent',
port: chrome_port,
chromeFlags:CHROME_OPTS,
userDataDir:false,
startingUrl: `http://localhost:${args.server_port}`,
ignoreDefaultFlags: true
}
const KILL_ON = {
win32: 'taskkill /IM chrome.exe /F',
darwin: 'pkill -15 chrome',
freebsd: 'pkill -15 chrome',
linux: 'pkill -15 chrome',
};
let quitting, ChildProcess;
start();
async function start() {
if ( context == 'node' ) {
console.log(`Running in node...`);
process.on('beforeExit', cleanup);
process.on('SIGBREAK', cleanup);
process.on('SIGHUP', cleanup);
process.on('SIGINT', cleanup);
process.on('SIGTERM', cleanup);
console.log(`Importing dependencies...`);
const fs = await import('fs');
const {launch:ChromeLaunch} = await import('chrome-launcher');
await killChrome();
console.log(`Removing 22120's existing temporary browser cache if it exists...`);
if ( fs.existsSync(args.temp_browser_cache()) ) {
console.log(`Temp browser cache directory (${args.temp_browser_cache()}) exists, deleting...`);
fs.rmdirSync(args.temp_browser_cache(), {recursive:true});
console.log(`Deleted.`);
}
console.log(`Launching library server...`);
await LibraryServer.start({server_port});
console.log(`Library server started.`);
console.log(`Waiting 1 second...`);
await sleep(1000);
console.log(`Launching chrome...`);
try {
await ChromeLaunch(LAUNCH_OPTS);
} catch(e) {
console.log(`Could not launch chrome.`);
DEBUG && console.info('Chrome launch error:', e);
process.exit(1);
}
console.log(`Chrome started.`);
console.log(`Waiting 1 second...`);
await sleep(1000);
}
console.log(`Launching archivist and connecting to browser...`);
await Archivist.collect({chrome_port, mode});
console.log(`System ready.`);
}
async function killChrome(wait = true) {
try {
if ( process.platform in KILL_ON ) {
console.log(`Attempting to shut running chrome...`);
if ( ! ChildProcess ) {
const {default:child_process} = await import('child_process');
ChildProcess = child_process;
}
const [err, stdout, stderr] = (await new Promise(
res => ChildProcess.exec(KILL_ON[process.platform], (...a) => res(a))
));
if ( err ) {
console.log(`There was no running chrome.`);
//DEBUG && console.warn("Error closing existing chrome", err);
} else {
console.log(`Running chrome shut down.`);
if ( wait ) {
console.log(`Waiting 1 second...`);
await sleep(1000);
}
}
} else {
console.warn(`If you have chrome running, you may need to shut it down manually and restart 22120.`);
}
} catch(e) {
console.warn("in kill chrome", e);
}
}
async function cleanup(reason) {
console.log(`Cleanup called on reason: ${reason}`);
if ( quitting ) {
console.log(`Cleanup already called so not running again.`);
return;
}
quitting = true;
Archivist.shutdown();
LibraryServer.stop();
killChrome(false);
console.log(`Take a breath. Everything's done. 22120 is exiting in 3 seconds...`);
await sleep(3000);
process.exit(0);
}

466
archivist.js Normal file
View File

@ -0,0 +1,466 @@
import hasha from 'hasha';
import {URL} from 'url';
import path from 'path';
import fs from 'fs';
import args from './args.js';
import {APP_ROOT, context, sleep, DEBUG} from './common.js';
import {connect} from './protocol.js';
import {getInjection} from './public/injection.js';
//import xapian from 'xapian';
// cache is a simple map
// that holds the serialized requests
// that are saved on disk
let Fs, Mode, Close;
const Cache = new Map();
const State = {
Cache,
SavedCacheFilePath: null,
SavedIndexFilePath: null,
saver: null,
indexSaver: null
}
const IGNORE_NODES = new Set([
'script',
'style',
'noscript',
'datalist'
]);
const TextNode = 3;
const AttributeNode = 2;
const Archivist = {
collect, getMode, changeMode, shutdown, handlePathChanged
}
const BODYLESS = new Set([
301,
302,
303,
307
]);
const NEVER_CACHE = new Set([
`http://localhost:${args.server_port}`,
`http://localhost:${args.chrome_port}`
]);
const SORT_URLS = ([urlA],[urlB]) => urlA < urlB ? -1 : 1;
const CACHE_FILE = args.cache_file;
const INDEX_FILE = args.index_file;
const NO_FILE = args.no_file;
const TBL = /:\/\//g;
const HASH_OPTS = {algorithm: 'sha1'};
const UNCACHED_BODY = b64('We have not saved this data');
const UNCACHED_CODE = 404;
const UNCACHED_HEADERS = [
{ name: 'Content-type', value: 'text/plain' },
{ name: 'Content-length', value: '26' }
];
const UNCACHED = {
body:UNCACHED_BODY, responseCode:UNCACHED_CODE, responseHeaders:UNCACHED_HEADERS
}
export default Archivist;
async function collect({chrome_port:port, mode} = {}) {
if ( context == 'node' ) {
const {default:fs} = await import('fs');
Fs = fs;
}
const {library_path} = args;
const {send, on, close} = await connect({port});
const Sessions = new Map();
const Installations = new Set();
const ConfirmedInstalls = new Set();
const DELAY = 500;
Close = close;
Mode = mode;
let requestStage;
loadFiles();
clearSavers();
if ( Mode == 'save' ) {
requestStage = "Response";
// in case we get a updateBasePath call before an interval
// and we don't clear it in time, leading us to erroneously save the old
// cache to the new path, we always used our saved copy
State.saver = setInterval(() => saveCache(State.SavedCacheFilePath), 10000);
State.indexSaver = setInterval(() => saveIndex(State.SavedIndexFilePath), 10001);
} else if ( Mode == 'serve' ) {
requestStage = "Request";
} else {
throw new TypeError(`Must specify mode`);
}
on("Target.targetInfoChanged", indexURL);
on("Target.targetInfoChanged", reloadIfNotLive);
on("Target.targetInfoChanged", attachToTarget);
on("Target.attachedToTarget", guard(installForSession, 'attached'));
on("Fetch.requestPaused", cacheRequest);
on("Runtime.consoleAPICalled", confirmInstall);
await send("Fetch.enable", {
patterns: [
{
urlPattern: "http*://*",
requestStage
}
]
});
await send("Network.setCacheDisabled", {cacheDisabled:true});
await send("Network.setBypassServiceWorker", {bypass:true});
await send("Target.setDiscoverTargets", {discover:true});
await send("Target.setAutoAttach", {autoAttach:false, waitForDebuggerOnStart:false, flatten: true});
const {targetInfos:targets} = await send("Target.getTargets", {});
const pageTargets = targets.filter(({type}) => type == 'page');
pageTargets.forEach(attachToTarget);
function guard(func, text = '') {
return (...args) => {
//DEBUG && console.log({text, func:func.name, args:JSON.stringify(args,null,2)});
return func(...args);
};
}
function confirmInstall(args) {
const {type, args:[{value:strVal}], context} = args;
if ( type == 'info' ) {
try {
const val = JSON.parse(strVal);
const {installed:{sessionId}} = val;
if ( ! ConfirmedInstalls.has(sessionId) ) {
ConfirmedInstalls.add(sessionId);
console.log({confirmedInstall:val, context});
}
} finally {}
}
}
async function reloadIfNotLive({targetInfo}) {
if ( Mode == 'serve' ) return;
const {attached, type} = targetInfo;
if ( attached && type == 'page' ) {
const {url, targetId} = targetInfo;
const sessionId = Sessions.get(targetId);
if ( !!url && url != "about:blank" && !url.startsWith('chrome') && !ConfirmedInstalls.has(sessionId) ) {
console.log({reloadingAsNotConfirmedInstalled:{url, sessionId}});
send("Page.stopLoading", {}, sessionId);
send("Page.reload", {}, sessionId);
}
}
}
async function installForSession({sessionId, targetInfo, waitingForDebugger}) {
const {targetId, url} = targetInfo;
if ( targetInfo.type != 'page' ) return;
if ( Mode == 'serve' ) return;
indexURL({targetInfo});
if ( ! Installations.has(targetId) ) {
if ( sessionId ) {
Sessions.set(targetId, sessionId);
} else {
sessionId = Sessions.get(targetId);
}
if ( sessionId && Mode == 'save' ) {
send("Network.setCacheDisabled", {cacheDisabled:true}, sessionId);
send("Network.setBypassServiceWorker", {bypass:true}, sessionId);
await send("Runtime.enable", {}, sessionId);
await send("Page.enable", {}, sessionId);
await send("Page.addScriptToEvaluateOnNewDocument", {
source: getInjection({sessionId}),
worldName: "Context-22120-Indexing"
}, sessionId);
DEBUG && console.log("Just request install", targetId, url);
}
Installations.add(targetId);
} else if ( ConfirmedInstalls.has(sessionId) ) {
DEBUG && console.log("Already confirmed install", targetId, url);
}
}
async function indexURL({targetInfo:info = {}} = {}) {
if ( Mode == 'serve' ) return;
if ( info.type != 'page' ) return;
if ( ! info.url || info.url == 'about:blank' ) return;
if ( info.url.startsWith('chrome') ) return;
if ( dontCache(info) ) return;
State.Index.set(info.url, info.title);
if ( Installations.has(info.targetId) ) {
const sessionId = Sessions.get(info.targetId);
send("DOM.enable", {}, sessionId);
await sleep(5000);
const {nodes:pageNodes} = await send("DOM.getFlattenedDocument", {
depth: -1,
pierce: true
}, sessionId);
// we collect TextNodes, ignoring any under script, style or an attribute
const ignoredParentIds = new Set(
pageNodes.filter(
({localName,nodeType}) => IGNORE_NODES.has(localName) || nodeType == AttributeNode
).map(({nodeId}) => nodeId)
);
const pageText = pageNodes.filter(
({nodeType,parentId}) => nodeType == TextNode && ! ignoredParentIds.has(parentId)
).reduce(
(Text, {nodeValue}) => Text + nodeValue + ' ',
''
);
if ( false ) {
console.log({
page : {
url: info.url,
title: info.title,
text: pageText
}
});
}
}
console.log(`Indexed ${info.url} to ${info.title}`);
}
async function attachToTarget(targetInfo) {
if ( dontCache(targetInfo) ) return;
const {url} = targetInfo;
if ( !!url && url != "about:blank" && !url.startsWith('chrome') ) {
if ( targetInfo.type == 'page' && ! targetInfo.attached ) {
const {sessionId} = await send("Target.attachToTarget", {
targetId: targetInfo.targetId,
flatten: true
});
Sessions.set(targetInfo.targetId, sessionId);
}
}
}
async function cacheRequest(pausedRequest) {
const {requestId, request, resourceType, responseStatusCode, responseHeaders} = pausedRequest;
if ( dontCache(request) ) {
DEBUG && console.log("Not caching", request.url);
return send("Fetch.continueRequest", {requestId});
}
const key = serializeRequest(request);
if ( Mode == 'serve' ) {
if ( State.Cache.has(key) ) {
let {body, responseCode, responseHeaders} = await getResponseData(State.Cache.get(key));
responseCode = responseCode || 200;
//DEBUG && console.log("Fulfilling", key, responseCode, responseHeaders, body.slice(0,140));
DEBUG && console.log("Fulfilling", key, responseCode, body.slice(0,140));
await send("Fetch.fulfillRequest", {
requestId, body, responseCode, responseHeaders
});
} else {
DEBUG && console.log("Sending cache stub", key);
await send("Fetch.fulfillRequest", {
requestId, ...UNCACHED
});
}
} else if ( Mode == 'save' ) {
const response = {key, responseCode: responseStatusCode, responseHeaders};
let resp;
if ( ! BODYLESS.has(responseStatusCode) ) {
resp = await send("Fetch.getResponseBody", {requestId});
} else {
resp = {body:'', base64Encoded:true};
}
if ( ! resp ) {
DEBUG && console.warn("get response body error", key, responseStatusCode, responseHeaders, pausedRequest.responseErrorReason);
await sleep(DELAY);
return send("Fetch.continueRequest", {requestId});
}
if ( !! resp ) {
let {body, base64Encoded} = resp;
if ( ! base64Encoded ) {
body = b64(body);
}
response.body = body;
} else {
response.body = '';
}
const responsePath = await saveResponseData(key, request.url, response);
State.Cache.set(key, responsePath);
await sleep(DELAY);
await send("Fetch.continueRequest", {requestId});
}
}
function dontCache(request) {
if ( ! request.url ) return false;
const url = new URL(request.url);
return NEVER_CACHE.has(url.origin) || (State.No && State.No.test(url.host));
}
async function getResponseData(path) {
try {
return JSON.parse(await Fs.promises.readFile(path));
} catch(e) {
console.warn(`Error with ${path}`, e);
return UNCACHED;
}
}
async function saveResponseData(key, url, response) {
const origin = (new URL(url).origin);
let originDir = State.Cache.get(origin);
if ( ! originDir ) {
originDir = path.resolve(library_path(), origin.replace(TBL, '_'));
try {
await Fs.promises.mkdir(originDir, {recursive:true});
} catch(e) {
console.warn(`Issue with origin directory ${path.dirname(responsePath)}`, e);
}
State.Cache.set(origin, originDir);
}
const fileName = `${await hasha(key, HASH_OPTS)}.json`;
const responsePath = path.resolve(originDir, fileName);
await Fs.promises.writeFile(responsePath, JSON.stringify(response,null,2));
return responsePath;
}
function serializeRequest(request) {
const {url, urlFragment, method, headers, postData, hasPostData} = request;
/**
let sortedHeaders = '';
for( const key of Object.keys(headers).sort() ) {
sortedHeaders += `${key}:${headers[key]}/`;
}
**/
return `${method}${url}`;
//return `${url}${urlFragment}:${method}:${sortedHeaders}:${postData}:${hasPostData}`;
}
}
function clearSavers() {
if ( State.saver ) {
clearInterval(State.saver);
State.saver = null;
}
if ( State.indexSaver ) {
clearInterval(State.indexSaver);
State.indexSaver = null;
}
}
function loadFiles() {
try {
State.Cache = new Map(JSON.parse(Fs.readFileSync(CACHE_FILE())));
State.Index = new Map(JSON.parse(Fs.readFileSync(INDEX_FILE())));
State.SavedCacheFilePath = CACHE_FILE();
State.SavedIndexFilePath = INDEX_FILE();
DEBUG && console.log(`Loaded cache key file ${CACHE_FILE()}`);
DEBUG && console.log(`Loaded index file ${INDEX_FILE()}`);
} catch(e) {
DEBUG && console.warn('Error reading file', e);
State.Cache = new Map();
State.Index = new Map();
}
try {
if ( !Fs.existsSync(NO_FILE()) ) {
DEBUG && console.log(`The 'No file' (${NO_FILE()}) does not exist, ignoring...`);
State.No = null;
} else {
State.No = new RegExp(JSON.parse(Fs.readFileSync(NO_FILE))
.join('|')
.replace(/\./g, '\\.')
.replace(/\*/g, '.*')
.replace(/\?/g, '.?')
);
}
} catch(e) {
DEBUG && console.warn('Error compiling regex from No file', e);
State.No = null;
}
}
function getMode() { return Mode; }
async function changeMode(mode) {
DEBUG && console.log({modeChange:mode});
clearSavers();
saveCache();
saveIndex();
Close && Close();
Mode = mode;
await collect({chrome_port:args.chrome_port, mode});
}
function handlePathChanged() {
DEBUG && console.log({libraryPathChange:args.library_path()});
clearSavers();
// saves the old cache path
saveCache(State.SavedCacheFilePath);
saveIndex(State.SavedIndexFilePath);
// reloads from new path and updates Saved FilePaths
loadFiles();
}
function saveCache(path) {
if ( context == 'node' ) {
//DEBUG && console.log("Writing to", path || CACHE_FILE());
Fs.writeFileSync(path || CACHE_FILE(), JSON.stringify([...State.Cache.entries()],null,2));
}
}
function saveIndex(path) {
if ( context == 'node' ) {
//DEBUG && console.log("Writing to", path || INDEX_FILE());
//DEBUG && console.log([...State.Index.entries()].sort(SORT_URLS));
Fs.writeFileSync(
path || INDEX_FILE(),
JSON.stringify([...State.Index.entries()].sort(SORT_URLS),null,2)
);
}
}
function shutdown() {
DEBUG && console.log(`Archivist shutting down...`);
saveCache();
Close && Close();
DEBUG && console.log(`Archivist shut down.`);
}
function b64(s) {
if ( context == 'node' ) {
return Buffer.from(s).toString('base64');
} else {
return btoa(s);
}
}

104
args.js Normal file
View File

@ -0,0 +1,104 @@
import os from 'os';
import path from 'path';
import fs from 'fs';
const server_port = process.env.PORT || process.argv[2] || 22120;
const mode = process.argv[3] || 'save';
const chrome_port = process.argv[4] || 9222;
const Pref = {};
const pref_file = path.resolve(os.homedir(), '.22120.config.json');
const cacheId = Math.random();
loadPref();
let BasePath = Pref.BasePath;
const archive_root = () => path.resolve(BasePath, '22120-arc');
const no_file = () => path.resolve(archive_root(), 'no.json');
const temp_browser_cache = () => path.resolve(archive_root(), 'temp-browser-cache' + cacheId);
const library_path = () => path.resolve(archive_root(), 'public', 'library');
const cache_file = () => path.resolve(library_path(), 'cache.json');
const index_file = () => path.resolve(library_path(), 'index.json');
console.log(`Args usage: <server_port> <save|serve> <chrome_port> <library_path>`);
updateBasePath(process.argv[5] || Pref.BasePath || os.homedir());
const args = {
mode,
server_port,
chrome_port,
updateBasePath,
getBasePath,
library_path,
no_file,
temp_browser_cache,
cache_file,
index_file
};
export default args;
function updateBasePath(new_base_path) {
new_base_path = path.resolve(new_base_path);
if ( BasePath == new_base_path ) {
return false;
}
console.log(`Updating base path from ${BasePath} to ${new_base_path}...`);
BasePath = new_base_path;
if ( !fs.existsSync(library_path()) ) {
console.log(`Archive directory (${library_path()}) does not exist, creating...`);
fs.mkdirSync(library_path(), {recursive:true});
console.log(`Created.`);
}
if ( !fs.existsSync(cache_file()) ) {
console.log(`Cache file does not exist, creating...`);
fs.writeFileSync(cache_file(), JSON.stringify([]));
console.log(`Created!`);
}
if ( !fs.existsSync(index_file()) ) {
console.log(`Index file does not exist, creating...`);
fs.writeFileSync(index_file(), JSON.stringify([]));
console.log(`Created!`);
}
console.log(`Base path updated to: ${BasePath}. Saving to preferences...`);
Pref.BasePath = BasePath;
savePref();
console.log(`Saved!`);
return true;
}
function getBasePath() {
return BasePath;
}
function loadPref() {
if ( fs.existsSync(pref_file) ) {
try {
Object.assign(Pref, JSON.parse(fs.readFileSync(pref_file)));
} catch(e) {
console.warn("Error reading from preferences file", e);
}
} else {
console.log("Preferences file does not exist. Creating one...");
savePref();
}
}
function savePref() {
try {
fs.writeFileSync(pref_file, JSON.stringify(Pref,null,2));
} catch(e) {
console.warn("Error writing preferences file", pref_file, Pref, e);
}
}

19
build_setup.sh Executable file
View File

@ -0,0 +1,19 @@
#!/bin/sh
echo "Installing nexe and upx..."
npm i -g nexe
curl -L -o upx.tar.xz https://github.com/upx/upx/releases/download/v3.96/upx-3.96-amd64_linux.tar.xz
tar -xJf upx.tar.xz
rm upx.tar.xz
sudo cp upx-3.96-amd64_linux/upx /usr/local/bin
rm -rf upx-3.96-amd64_linux
./dl-node.sh
cd ~/.nexe/
chmod +x *
upx * || :
echo "Done"

33
common.js Normal file
View File

@ -0,0 +1,33 @@
import path from 'path';
import {fileURLToPath} from 'url';
// determine where this code is running
let Context = 'unknown';
// ignore the possibility that window or global or chrome could be overwritten
const isBrowser = function () { try {return window && window.fetch;}catch(e){ return false;} };
const isNode = function () { try {return global && global.Math;}catch(e){return false;} };
const isExtension = function () { try {return chrome.runtime && chrome.debugger;}catch(e){return false;} };
if ( isNode() ) {
Context = 'node';
} else if ( isBrowser() ) {
Context = 'browser';
if ( isExtension() ) {
Context = 'extension';
}
}
export const context = Context;
export const DEBUG = process.env.DEBUG_22120 || false;
export const NO_SANDBOX = process.env.DEBUG_22120 || false;
export const APP_ROOT = __dirname;
export const sleep = ms => new Promise(res => setTimeout(res, ms));
export function say(o) {
console.log(JSON.stringify(o));
}

10
compile.sh Executable file
View File

@ -0,0 +1,10 @@
#!/bin/bash
unset npm_config_prefix
source $HOME/.nvm/nvm.sh
. $HOME/.profile
nvm install v14.15.3
nvm use v14.15.3
npx webpack
npx nexe -t windows -i 22120.js -r \"./?.22120.js\" -r \"./public/*\" && npx nexe -t linux-x64 -o 22120.nix -i 22120.js -r \"./?.22120.js\" -r \"./public/*\" && npx nexe -t macos-x64 -o 22120.mac -i 22120.js -r \"./?.22120.js\" -r \"./public/*\" && npx nexe -t windows-x32 -o 22120.win32.exe -i 22120.js -r \"./?.22120.js\" -r \"./public/*\" && npx nexe -t linux-x32 -o 22120.nix32 -i 22120.js -r \"./?.22120.js\" -r \"./public/*\"

15
dl-node.sh Executable file
View File

@ -0,0 +1,15 @@
#!/bin/bash
unset npm_config_prefix
source $HOME/.nvm/nvm.sh
. $HOME/.profile
nvm install v14.15.3
nvm use v14.15.3
npx nexe -t linux-x64 -o bin/hello.nix -i ./hello.js -r \"./build/*\"
npx nexe -t windows -o bin/hello.exe -i ./hello.js -r \"./build/*\"
npx nexe -t macos-x64 -o bin/hello.mac -i ./hello.js -r \"./build/*\"
npx nexe -t windows-x32 -o bin/hello.win32.exe -i ./hello.js -r \"./build/*\" && npx nexe -t linux-x32 -o bin/hello.nix32 -i ./hello.js -r \"./build/*\"
rm -rf bin/hello.*

4
ext/bg_page.html Normal file
View File

@ -0,0 +1,4 @@
<!DOCTYPE html>
<meta charset=utf8>
<title>Archivists Anonymous</title>
<script src=bg_script.js type=module></script>

9
ext/bg_script.js Normal file
View File

@ -0,0 +1,9 @@
import {save, load} from './storage.js';
import Archivist from '../archivist.js';
import {say} from '../common.js';
console.log("I am the background script.");
console.log({save, load, Archivist});
const {send, on} = Archivist.collect({mode:'save'});

29
ext/storage.js Normal file
View File

@ -0,0 +1,29 @@
export async function load(key) {
if ( key == null || key == undefined ) {
throw new Error(`load cannot be used to get everything.`);
}
let resolver;
const promise = new Promise(res => resolver = res);
chrome.storage.local.get(key, items => {
resolver.call(null, items[key]);
});
return promise;
}
export async function save(key, value) {
let resolver;
const promise = new Promise(res => resolver = res);
chrome.storage.local.set({[key]:value}, () => {
if ( chrome.runtime.lastError ) {
throw chrome.runtime.lastError;
}
resolver.call();
});
return promise;
}

1
hello.js Normal file
View File

@ -0,0 +1 @@
console.log(`hello...is it me you're looking for?`);

2
index.js Normal file
View File

@ -0,0 +1,2 @@
require = require('esm')(module/*, options*/);
module.exports = require('./app.js');

154
libraryServer.js Normal file
View File

@ -0,0 +1,154 @@
import fs from 'fs';
import path from 'path';
import express from 'express';
import args from './args.js';
import {say, sleep} from './common.js';
import Archivist from './archivist.js';
const SITE_PATH = path.resolve(__dirname, 'public');
const app = express();
const INDEX_FILE = args.index_file;
let Server, upAt, port;
const LibraryServer = {
start, stop
}
export default LibraryServer;
async function start({server_port}) {
port = server_port;
addHandlers();
Server = app.listen(Number(port), err => {
if ( err ) {
throw err;
}
upAt = new Date;
say({server_up:{upAt,port}});
});
}
function addHandlers() {
const {chrome_port} = args;
app.use(express.urlencoded({extended:true}));
app.use(express.static(SITE_PATH));
if ( !! args.library_path() ) {
app.use("/library", express.static(args.library_path()))
}
app.get('/search', async (req, res) => {
res.end('Not implemented yet');
});
app.get('/mode', async (req, res) => {
res.end(Archivist.getMode());
});
app.get('/archive_index.html', async (req, res) => {
const index = JSON.parse(fs.readFileSync(INDEX_FILE()));
res.end(IndexView(index));
});
app.post('/mode', async (req, res) => {
const {mode} = req.body;
await Archivist.changeMode(mode);
res.end(`Mode set to ${mode}`);
});
app.get('/base_path', async (req, res) => {
res.end(args.getBasePath());
});
app.post('/base_path', async (req, res) => {
const {base_path} = req.body;
const change = args.updateBasePath(base_path);
if ( change ) {
Archivist.handlePathChanged();
Server.close(async () => {
console.log(`Server closed.`);
console.log(`Waiting 1 second...`);
await sleep(1000);
await start({server_port:port});
console.log(`Server restarted.`);
});
res.end(`Base path set to ${base_path} and saved to preferences. Server restarting...`);
} else {
res.end(`Base path not changed.`);
}
});
}
async function stop() {
let resolve;
const pr = new Promise(res => resolve = res);
console.log(`Closing library server...`);
Server.close(() => {
console.log(`Library server closed.`);
resolve();
});
return pr;
}
function IndexView(urls) {
return `
<!DOCTYPE html>
<meta charset=utf-8>
<title>Your HTML Library</title>
<style>
:root {
font-family: sans-serif;
background: lavenderblush;
}
body {
display: table;
margin: 0 auto;
background: silver;
padding: 0.5em;
box-shadow: 0 1px 1px purple;
}
form {
}
fieldset {
border: thin solid purple;
}
button, input, output {
}
input.long {
width: 100%;
min-width: 250px;
}
output {
font-size: smaller;
color: purple;
}
h1 {
margin: 0;
}
h2 {
margin-top: 0;
}
</style>
<h1>22120</h1>
<h2>Internet Offline Library</h2>
<h2>Archive Index</h2>
<ul>
${
urls.map(([url,title]) => `
<li>
<a target=_blank href=${url}>${title||url}</a>
</li>
`).join('\n')
}
</ul>
`
}

24
manifest.json Normal file
View File

@ -0,0 +1,24 @@
{
"name": "Archivists Anonymous",
"short_name": "22120",
"version": "1.0",
"manifest_version": 2,
"minimum_chrome_version": "33",
"description": "An archivist browser controller that caches everything you browse, and serves it when you're offline.",
"homepage_url": "https://github.com/dosyago/22120",
"permissions" :
[
"tabs",
"<all_urls>",
"debugger",
"storage",
"unlimitedStorage"
],
"background": {
"persistent": true,
"page": "ext/bg_page.html"
},
"browser_action": {
}
}

3643
package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

46
package.json Normal file
View File

@ -0,0 +1,46 @@
{
"name": "archivist1",
"version": "1.3.14",
"description": "Library server and an archivist browser controller.",
"main": "index.js",
"bin": {
"archivist1": "22120.js"
},
"scripts": {
"start": "node index.js",
"postinstall": "bash ./build_setup.sh",
"build": "bash ./compile.sh",
"clean": "rm 22120.js *22120.js 22120.???",
"pack": "upx 22120.exe && upx 22120.nix && upx 22120.mac && upx 22120.win32.exe && upx 22120.nix32",
"test": "node-dev index.js",
"save": "node-dev index.js 22120 save",
"serve": "node-dev index.js 22120 serve"
},
"repository": {
"type": "git",
"url": "git+https://github.com/dosyago/22120.git"
},
"keywords": [
"archivist",
"library"
],
"author": "@dosy",
"license": "AGPL-3.0",
"bugs": {
"url": "https://github.com/dosyago/22120/issues"
},
"homepage": "https://github.com/dosyago/22120#readme",
"dependencies": {
"chrome-launcher": "latest",
"esm": "latest",
"express": "latest",
"hasha": "latest",
"node-fetch": "latest",
"ws": "latest"
},
"devDependencies": {
"node-dev": "latest",
"webpack": "latest",
"webpack-cli": "latest"
}
}

285
protocol.js Normal file
View File

@ -0,0 +1,285 @@
import {context} from './common.js';
const ROOT_SESSION = "browser";
// actually we use 'tot' but in chrome.debugger.attach 'tot' is
// not a supported version string
const VERSION = "1.3";
function promisify(context, name, err) {
return async function(...args) {
let resolver, rejector;
const pr = new Promise((res,rej) => ([resolver, rejector] = [res,rej]));
args.push(promisifiedCallback);
context[name](...args);
return pr;
function promisifiedCallback(...result) {
let error = err(name);
if ( !! error ) {
return rejector(error);
}
return resolver(...result);
}
}
}
let Ws, Fetch;
async function loadDependencies() {
if ( context == 'extension' ) {
// no need to do anything here
} else if ( context == 'node' ) {
const {default:ws} = await import('ws');
const {default:nodeFetch} = await import('node-fetch');
Ws = ws;
Fetch = nodeFetch;
}
}
export async function connect({port:port = 9222} = {}) {
if ( context == 'extension' ) {
const Handlers = {};
const getTargets = promisify(chrome.debugger, 'getTargets', guardError);
const attach = promisify(chrome.debugger, 'attach', guardError);
const sendCommand = promisify(chrome.debugger, 'sendCommand', guardError);
let resp, firstTarget, targets;
chrome.debugger.onEvent.addListener(handle);
// attach to all existing targets
targets = await getTargets();
targets = targets.filter(T => T.type == 'page' && T.url.startsWith('http'));
for ( const T of targets ) {
if ( ! T.attached ) {
resp = await attach({targetId:T.id}, VERSION);
console.log("attached", {resp});
}
}
if ( targets.length ) {
firstTarget = targets[0].id;
}
await confirmAllAttached();
// discover targets is blocked in extensions
// instead we manually discover via tabs onCreated
let nextAttachConfirmation;
chrome.tabs.onCreated.addListener(async Tab => {
console.log(Tab);
const url = Tab.url || Tab.pendingUrl;
const attachable = url.startsWith('about') || url.startsWith('http');
if ( attachable ) {
const target = {tabId:Tab.id};
const r = await attach(target, VERSION);
if ( ! firstTarget ) {
firstTarget = Tab.id;
}
console.log("attach", {resp:r});
}
if ( nextAttachConfirmation ) {
clearTimeout(nextAttachConfirmation);
}
nextAttachConfirmation = setTimeout(confirmAllAttached, 200);
});
chrome.tabs.onUpdated.addListener(async (id, changed, Tab) => {
const {url} = changed;
const attachable = url && (url.startsWith('about') || url.startsWith('http'));
if ( attachable && ! Tab.attached ) {
const target = {tabId:id};
const r = await attach(target, VERSION);
if ( ! firstTarget ) {
firstTarget = id;
}
console.log("attach", {resp:r});
}
if ( nextAttachConfirmation ) {
clearTimeout(nextAttachConfirmation);
}
nextAttachConfirmation = setTimeout(confirmAllAttached, 200);
});
return {send, on};
async function on(method, handler) {
let listeners = Handlers[method];
if ( ! listeners ) {
Handlers[method] = listeners = [];
}
listeners.push(handler);
}
async function send(method, params = {}, id = firstTarget) {
let tabId, targetId;
if ( Number.isInteger(id) ) {
tabId = id;
} else if ( typeof id == "string" ) {
targetId = id;
} else {
throw new Error(`Must specify an id to send command to. ${method}`);
}
try {
return await sendCommand(
{targetId, tabId},
method,
params,
);
} catch(e) {
console.warn(`${method}`, e);
return {error:e};
}
}
async function handle(source, method, params) {
const listeners = Handlers[method];
if ( Array.isArray(listeners) ) {
for( const func of listeners ) {
try {
func(method, params, source);
} catch(e) {
console.warn(`Listener failed`, method, JSON.stringify(params), e, func.toString().slice(0,140));
}
}
}
}
function guardError(prefix = '') {
if ( chrome.runtime.lastError ) {
if ( typeof prefix == 'object' ) {
try {
prefix = JSON.stringify(prefix, null, 2);
} catch(e) {
console.warn(e);
prefix = prefix + '';
}
}
const error = `${prefix}: ${chrome.runtime.lastError.message}`;
return error;
}
return false;
}
async function confirmAllAttached() {
resp = await getTargets();
targets = resp.filter(T => T.type == 'page' && T.url.startsWith('http') && !T.attached);
console.assert(targets.length == 0, "We are not attached to some attachable targets", targets);
}
} else if ( context == 'node' ) {
if ( ! Ws || ! Fetch ) {
await loadDependencies();
}
try {
const {webSocketDebuggerUrl} = await Fetch(`http://localhost:${port}/json/version`).then(r => r.json());
const socket = new Ws(webSocketDebuggerUrl);
const Resolvers = {};
const Handlers = {};
socket.on('message', handle);
let id = 0;
let resolve;
const promise = new Promise(res => resolve = res);
socket.on('open', () => resolve());
await promise;
return {
send,
on, ons,
close
}
async function send(method, params = {}, sessionId) {
const message = {
method, params, sessionId,
id: ++id
};
if ( ! sessionId ) {
delete message[sessionId];
}
const key = `${sessionId||ROOT_SESSION}:${message.id}`;
let resolve;
const promise = new Promise(res => resolve = res);
Resolvers[key] = resolve;
socket.send(JSON.stringify(message));
return promise;
}
async function handle(message) {
const stringMessage = message;
message = JSON.parse(message);
if ( message.error ) {
//console.warn(message);
}
const {sessionId} = message;
const {method, params} = message;
const {id, result} = message;
if ( id ) {
const key = `${sessionId||ROOT_SESSION}:${id}`;
const resolve = Resolvers[key];
if ( ! resolve ) {
console.warn(`No resolver for key`, key, stringMessage.slice(0,140));
} else {
Resolvers[key] = undefined;
try {
await resolve(result);
} catch(e) {
console.warn(`Resolver failed`, e, key, stringMessage.slice(0,140), resolve);
}
}
} else if ( method ) {
const listeners = Handlers[method];
if ( Array.isArray(listeners) ) {
for( const func of listeners ) {
try {
func({message, sessionId});
} catch(e) {
console.warn(`Listener failed`, method, e, func.toString().slice(0,140), stringMessage.slice(0,140));
}
}
}
} else {
console.warn(`Unknown message on socket`, message);
}
}
function on(method, handler) {
let listeners = Handlers[method];
if ( ! listeners ) {
Handlers[method] = listeners = [];
}
listeners.push(wrap(handler));
}
function ons(method, handler) {
let listeners = Handlers[method];
if ( ! listeners ) {
Handlers[method] = listeners = [];
}
listeners.push(handler);
}
function close() {
socket.close();
}
function wrap(fn) {
return ({message, sessionId}) => fn(message.params)
}
} catch(e) {
console.log("Error communicating with browser", e);
process.exit(1);
}
} else {
throw new TypeError('Currently only supports running in Node.JS or as a Chrome Extension with Debugger permissions');
}
}

109
public/index.html Normal file
View File

@ -0,0 +1,109 @@
<!DOCTYPE html>
<meta charset=utf-8>
<title>Your HTML Library</title>
<style>
:root {
font-family: sans-serif;
background: lavenderblush;
}
body {
display: table;
margin: 0 auto;
background: silver;
padding: 0.5em;
box-shadow: 0 1px 1px purple;
}
form {
}
fieldset {
border: thin solid purple;
}
button, input, output {
}
input.long {
width: 100%;
min-width: 250px;
}
output {
font-size: smaller;
color: purple;
}
h1 {
margin: 0;
}
h2 {
margin-top: 0;
}
</style>
<h1>22120</h1>
<h2>Internet Offline Library</h2>
<p>
View <a href=/archive_index.html>the index</a>
</p>
<form method=GET action=/search>
<fieldset>
<legend>Search your archive</legend>
<input type=search name=query placeholder="search your library">
<button>Search</button>
</fieldset>
</form>
<form method=POST action=/mode>
<fieldset>
<legend>Save or Serve: Mode Control</legend>
<p>
<label>
<input type=radio name=mode value=save>
Save
</label>
<label>
<input type=radio name=mode value=serve>
Serve
</label>
<output name=notification>
<p>
<button>Change mode</button>
<script>
{
const form = document.currentScript.closest('form');
form.notification.value = "Getting current mode...";
setTimeout(showCurrentMode, 1000);
async function showCurrentMode() {
const mode = await fetch('/mode').then(r => r.text());
form.notification.value = "";
form.querySelector(`[name="mode"][value="${mode}"]`).checked = true;
}
}
</script>
</fieldset>
</form>
<form method=POST action=/base_path>
<fieldset>
<legend>File system path of archive</legend>
<p>
Set the path to where your archive folder will go
<br>
<small>The default is your home directory</small>
<p>
<label>
Base path
<input class=long type=text name=base_path placeholder="A folder path...">
</label>
<p>
<button>Change base path</button>
<script>
{
const form = document.currentScript.closest('form');
showCurrentLibraryPath();
form.base_path.onchange = e => {
self.target = e.target;
}
async function showCurrentLibraryPath() {
const base_path = await fetch('/base_path').then(r => r.text());
form.querySelector(`[name="base_path"]`).value = base_path;
}
}
</script>
</fieldset>
</form>

19
public/injection.js Normal file
View File

@ -0,0 +1,19 @@
export function getInjection({sessionId}) {
return `
{
if ( top === self ) {
const sessionId = "${sessionId}";
const sleep = ms => new Promise(res => setTimeout(res, ms));
install();
async function install() {
console.log("Installing...");
console.info(JSON.stringify({installed: { sessionId, startUrl: location.href }}));
await sleep(500);
console.log("Installed.");
}
}
}
`;
}

10
public/library/README.md Normal file
View File

@ -0,0 +1,10 @@
# ALT Default storage directory for library
Remove `public/library/http*` and `public/library/cache.json` from `.gitignore` if you forked this repo and want to commit your library using git.
## Clearing your cache
To clear everything, delete all directories that start with `http` or `https` and delete cache.json
To clear only stuff from domains you don't want, delete all directories you don't want that start with `http` or `https` and DON'T delete cache.json

14
upload.sh Executable file
View File

@ -0,0 +1,14 @@
#!/bin/sh
gpush patch "New release"
description=$1
latest_tag=$(git describe --abbrev=0)
grel release -u dosyago -r 22120 --tag $latest_tag --name "New release" --description '"'"$description"'"'
grel upload -u c9fe -r 22120 --tag $latest_tag --name "22120.exe" --file 22120.exe
grel upload -u c9fe -r 22120 --tag $latest_tag --name "22120.macos" --file 22120.mac
grel upload -u c9fe -r 22120 --tag $latest_tag --name "22120.linux" --file 22120.nix
grel upload -u c9fe -r 22120 --tag $latest_tag --name "22120.linx32" --file 22120.nix32
grel upload -u c9fe -r 22120 --tag $latest_tag --name "22120.win32.exe" --file 22120.win32.exe

17
webpack.config.js Normal file
View File

@ -0,0 +1,17 @@
const path = require('path');
const webpack = require('webpack');
module.exports = {
entry: "./app.js",
output: {
path: path.resolve(__dirname),
filename: "22120.js"
},
target: "node",
node: {
__dirname: false
},
plugins: [
new webpack.BannerPlugin({ banner: "#!/usr/bin/env node", raw: true }),
]
};