Mike Fährmann
22647c2626
[naverwebtoon] fix 'title' for comics with empty tags ( #5120 )
2024-01-27 16:24:03 +01:00
Mike Fährmann
3433481dd2
[gofile] update 'website_token' extraction
2024-01-27 01:10:14 +01:00
Mike Fährmann
1f7101d606
[archivedmoe] fix thebarchive webm URLs ( #5116 )
2024-01-27 00:24:41 +01:00
Mike Fährmann
34a4ddc399
[sankaku] add 'id-format' option ( #5073 )
2024-01-26 17:56:08 +01:00
Mike Fährmann
afd20ef42c
[kemonoparty] implement filtering duplicate revisions ( #5013 )
...
set 'revisions' to '"unique"' to have it ignore duplicate revisions
2024-01-26 14:44:15 +01:00
Mike Fährmann
c28475d325
[kemonoparty] fix deleting 'name' in orginal objects ( #5103 )
...
... when computing 'revision_hash'
regression caused by 3d68eda4
dict.copy() only creates a shallow copy
I know that and still managed to get I wrong ...
2024-01-25 23:46:19 +01:00
Mike Fährmann
beacfa7436
[bunkr] update domain to 'bunkr.sk' ( #5114 )
2024-01-25 23:45:41 +01:00
Constantin Hong
21ddeab21d
Dockerfile/chore: Reduce layer, update pip and apk, and reduce increased image size by optimization.
2024-01-22 12:22:26 +09:00
Mike Fährmann
0502256251
release version 1.26.7
2024-01-21 23:02:50 +01:00
Mike Fährmann
67c99b1366
[patreon] prevent HttpError for stream.mux.com URLs
2024-01-21 22:50:40 +01:00
Mike Fährmann
0d3af0d35b
[tests] ignore 'ytdl' categories when import fails ( #5095 )
2024-01-21 15:31:12 +01:00
Mike Fährmann
f3ad91b44f
[bunkr] update domain ( #5088 )
2024-01-21 03:00:57 +01:00
Mike Fährmann
c7a42880ab
[wikimedia] support fandom wikis ( #1443 , #2677 , #3378 )
...
Wikis hosted on fandom.com are just wikimedia instances
and support its API.
2024-01-21 00:52:02 +01:00
Mike Fährmann
5bf156f0b1
merge #5094 : [webtoons] fix extracting comic and episode name with commas
2024-01-21 00:47:26 +01:00
blankie
df718887c2
[webtoons] fix extracting comic and episode name with commas
2024-01-21 09:50:27 +11:00
Mike Fährmann
b1561a21dc
merge #5091 : [blogger] fix 'lh*-*.googleusercontent.com' URLs
2024-01-20 23:19:54 +01:00
Wiiplay123
6eb62f2140
Combine lh*(-**).googleusercontent.com URL regex into one line.
...
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-01-20 15:53:11 -06:00
Wiiplay123
a6fed628dd
[blogger] Fix lh*.googleusercontent.com forward slash bug, add support for lh*-**.googleusercontent.com
...
Some URLs use "lh(number)-(locale).googleusercontent.com" format, so I added support for those.
Also, "lh(number).googleusercontent.com" formats were broken because the regex was looking for a second forward slash.
Examples:
lh7.googleusercontent.com
lh7-us.googleusercontent.com
2024-01-20 15:07:52 -06:00
Mike Fährmann
6f8592eaff
[hbrowse] remove from modules list
2024-01-20 18:25:38 +01:00
Mike Fährmann
0d367ce1b9
[tests] update extractor results
2024-01-20 18:02:36 +01:00
Mike Fährmann
acc94ac187
[realbooru] fix extraction
...
revert ac97aca99c
2024-01-20 17:56:07 +01:00
Mike Fährmann
9599151118
[issuu] fix extraction
2024-01-20 16:44:48 +01:00
Mike Fährmann
9ca6117c67
[hbrowse] remove module
...
website gone
2024-01-20 02:53:44 +01:00
Mike Fährmann
375eefb886
[chevereto] remove 'pixl.li'
...
"Pixl is closing down"
"All images will be deleted January 1st."
2024-01-20 02:21:40 +01:00
Mike Fährmann
321861af7e
[erome] fix 'count' metadata
2024-01-20 00:26:41 +01:00
Mike Fährmann
b41d9bf616
[paheal] fix 'source' metadata
2024-01-19 22:24:39 +01:00
Mike Fährmann
b0a441f1e3
[nitter] remove 'nitter.lacontrevoie.fr'
...
"Fermeture de Nitter / Closing down Nitter"
2024-01-19 19:34:16 +01:00
Mike Fährmann
a1c1e80f67
[giantessbooru] update domain
2024-01-19 14:21:56 +01:00
Mike Fährmann
2007cb2f59
[tests] check extractor category values
2024-01-19 14:21:09 +01:00
Mike Fährmann
fc4e737f67
[wikimedia] include 'sha1' in default filenames
2024-01-19 03:08:43 +01:00
Mike Fährmann
44f2c15a04
[wikimedia] handle 'File:' paths
2024-01-19 03:05:45 +01:00
Mike Fährmann
93b4120e77
[gelbooru] support 'all' and empty tag ( #5076 )
2024-01-18 21:49:33 +01:00
Mike Fährmann
a416d4c3d5
[sankaku] support post URLs with alphanumeric IDs ( #5073 )
2024-01-18 16:23:14 +01:00
Mike Fährmann
ea553a1d55
[wikimedia] generalize ( #1443 )
...
- support mediawiki.org
- support mariowiki.com (#3660 )
- combine code into a single extractor
(use prefix as subcategory)
- handle non-wiki instances
- unescape titles
2024-01-18 15:36:16 +01:00
Mike Fährmann
89066844f4
add 'config_instance' method
...
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
34a7afdbc1
merge #2340 : [wikimedia] add 'article' and 'category' extractors ( #1443 , #2906 )
2024-01-17 00:08:32 +01:00
Mike Fährmann
c3c1635ef3
[wikimedia] update
...
- rewrite using BaseExtractor
- support most Wiki* domains
- update docs/supportedsites
- add tests
2024-01-17 00:08:06 +01:00
Ailothaen
221f54309c
[wikimedia] Improved archive identifiers
2024-01-16 02:32:32 +01:00
Ailothaen
e33056adcd
[wikimedia] Add Wikipedia/Wikimedia extractor
2024-01-16 02:32:25 +01:00
Mike Fährmann
3d68eda4ab
[kemonoparty] add 'revision_hash' metadata ( #4706 , #4727 , #5013 )
...
A SHA1 hexdigest of other relevant metadata fields like
title, content, file and attachment URLs.
This value does NOT reflect which revisions are listed on the website.
Neither does 'edited' or any other metadata field (combinations).
2024-01-16 00:38:10 +01:00
Mike Fährmann
4d6ec6958d
[scripts] add 'push --force' to pull-request
2024-01-15 22:37:33 +01:00
Mike Fährmann
799a8206ad
merge #5061 : [webtoons] extract more metadata
...
- author_name
- comic_name
- episode_name
- username
2024-01-15 18:27:12 +01:00
Mike Fährmann
8ffa0cd3c8
[webtoons] small optimization
...
don't extract the entire 'author_area' and
avoid creating a second 'text.extract_from()' object
2024-01-15 18:24:47 +01:00
Mike Fährmann
59cf4b3884
merge #4444 : [2ch] add 'thread' and 'board' extractors ( #1009 , #3540 )
2024-01-15 17:50:34 +01:00
Mike Fährmann
90b382304a
[deviantart] fix KeyError: 'premium_folder_data' ( #5063 )
2024-01-15 17:30:03 +01:00
Mike Fährmann
4cedf378d5
[deviantart] fix AttributeError for URLs without username ( #5065 )
...
caused by 4f367145
2024-01-15 16:28:57 +01:00
Mike Fährmann
68196589c4
[2ch] update
...
- simplify extractor code
- more metadata
- add tests
2024-01-15 04:09:05 +01:00
hunter-gatherer8
6c4abc982e
[2ch] add 'thread' and 'board' extractors
...
- [2ch] add thread extractor
- [2ch] add board extractor
- [2ch] add new entry to supported sites
2024-01-15 03:51:03 +01:00
Mike Fährmann
69726fc82c
[tests] skip tests requiring auth when non is provided
2024-01-14 22:47:16 +01:00
blankie
bb446b1598
[webtoons] extract more metadata
2024-01-14 19:26:49 +11:00