1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 11:12:40 +01:00
Commit Graph

33 Commits

Author SHA1 Message Date
Mike Fährmann
27ec653991
fix bug in test_init and update example URLs 2023-09-14 13:27:03 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
4a3a1f4c87
[komikcast] update domain and fix extraction 2022-12-06 22:00:23 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
fe2b3d57d4
[komikcast] update domain 2022-07-12 23:07:58 +02:00
Mike Fährmann
bef3105121
[komikcast] fix extraction 2021-04-15 17:04:53 +02:00
Mike Fährmann
1a540fbe00
[komikcast] fix extraction 2021-03-28 21:18:58 +02:00
Mike Fährmann
f360778e60
[komikcast] fix extraction 2021-02-27 21:02:52 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bb97e87989
[komikcast] ignore banner image 2019-10-03 17:34:06 +02:00
Mike Fährmann
fe849382d8
[komikcast] improve extraction 2019-04-26 15:14:10 +02:00
Mike Fährmann
eacebf41e4
fix typo in README 2019-03-24 11:03:02 +01:00
Mike Fährmann
fe27154a10
[komikcast] fix extraction
... again
2019-03-23 09:50:39 +01:00
Mike Fährmann
d0f88c35be
[komikcast] fix extraction 2019-03-18 11:12:19 +01:00
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
0887fb61f4
[komikcast] update test results 2019-03-07 14:55:52 +01:00
Mike Fährmann
f6734142ee
[komikcast] remove 'width' and 'height' info 2019-02-19 15:12:40 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results 2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
d70db2d555
Revert "[komikcast] fix extraction"
This reverts commit 5507f5ce2e.
2018-10-02 20:38:42 +02:00
Mike Fährmann
5507f5ce2e
[komikcast] fix extraction 2018-09-29 16:37:30 +02:00
Mike Fährmann
1694039de0
[komikcast] update ad-filter 2018-08-15 21:49:44 +02:00
Mike Fährmann
38d4f43cc0
[komikcast] skip ads 2018-08-14 11:17:59 +02:00
Mike Fährmann
f7e7306e5a
[komikcast] update URL pattern and unescape image URLs 2018-05-29 10:35:08 +02:00
Mike Fährmann
7f899bd5d8
Merge branch 'master' into 1.4-dev 2018-05-14 14:50:02 +02:00
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction (closes #84)
Chapter listings for manga now use
https://mangadex.org/manga/<id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/<id>/_//2/
2018-05-06 17:43:50 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
7073ab7707
[komikcast] update regex to only match manga pages
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
2018-04-11 15:48:17 +02:00
Mike Fährmann
5f37d40a3e
[komikcast] bypass cloudflare challenge 2018-03-10 16:09:40 +01:00
Mike Fährmann
2dd3aeeeae
[komikcast] add chapter- and manga-extractor (#70) 2018-02-04 22:02:10 +01:00