Mike Fährmann
1824267447
[dl:ytdl] implement explicit HLS/DASH handling
...
add '_ytdl_manifest' to specify a manifest type to process
2024-10-16 15:16:21 +02:00
Mike Fährmann
6e7da6310c
[behance] fix video extraction ( #5965 )
...
a lot slower than before since each video now requires an extra HTTP
request and 'sleep-request' is set to 2s-4s by default.
it now also requires ytdl.
2024-08-10 11:06:54 +02:00
Mike Fährmann
9783d95585
[behance] fix "KeyError: 'fields'" ( #5965 )
2024-08-08 16:29:56 +02:00
Mike Fährmann
36a64a3aa7
[behance] fix image extraction ( #5873 )
2024-07-21 10:54:12 +02:00
Mike Fährmann
07cb584231
[behance] add 'modules' option ( #4799 )
2023-11-17 22:54:38 +01:00
Mike Fährmann
6a753d9ff3
[behance] support 'text' modules ( #4799 )
2023-11-17 22:54:38 +01:00
Mike Fährmann
fd8f58ad76
[behance] unescape embed URLs ( #4742 )
2023-10-30 13:38:49 +01:00
Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
6ae92da57e
Merge branch 'tests'
2023-09-13 21:34:28 +02:00
Mike Fährmann
32da3c70d3
[behance] handle videos without 'renditions' ( #4523 )
2023-09-12 22:00:04 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
6482f9453b
[behance] fix cookie usage ( #4417 )
2023-08-18 14:48:20 +02:00
Mike Fährmann
d34195b41d
[behance] fix and update 'user' extractor ( #4417 )
2023-08-17 16:06:35 +02:00
Mike Fährmann
4d3cf709da
[behance] add 'date' metadata field ( #4417 )
2023-08-17 15:33:47 +02:00
Mike Fährmann
c689cd9720
[behance] show error for mature content ( #4417 )
2023-08-17 15:31:37 +02:00
Mike Fährmann
15d7c5a199
[behance] 'items()' -> 'values()'
...
we only need 'size', 'name' is unnecessary
2023-04-30 13:53:51 +02:00
Mike Fährmann
0fb580135d
[behance] fix extraction ( #3980 )
2023-04-29 16:18:35 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
3a0450adbf
[behance] use default delay between requests ( #2507 )
2023-01-07 14:49:26 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
f9096584ab
[behance] fix 'collection' extraction
2021-08-10 00:48:31 +02:00
Mike Fährmann
6b2bce3b7d
[behance] support 'video' modules ( closes #1282 )
...
(requires youtube-dl to download from m3u8 manifests)
2021-01-29 21:30:14 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
ddd6840509
[behance] fix 'collection' extraction
2020-10-11 18:15:41 +02:00
Mike Fährmann
a3fa45bbb1
[behance] get images from 'media_collection' modules
2019-11-27 01:04:33 +01:00
Mike Fährmann
1b9bf4fc6e
[behance] fix 'tags' extraction
2019-10-03 17:36:02 +02:00
Mike Fährmann
3969f9cbbd
[behance] fix collection extraction
2019-07-27 14:26:40 +02:00
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
1c1367ec5b
[behance] fix empty docstring
2019-02-02 14:41:05 +01:00
Mike Fährmann
45e529ab91
[behance] fix extraction
...
HTML structure for gallery pages changed quite a bit, so it is now using
the embedded JSON data. This changes a lot of metadata field names, but
'gallery_id', 'title', and 'user' are still provided for backwards
compatibility.
The internal API endpoint for user galleries also changed its data
structure, but nothing too major.
2019-01-31 14:33:23 +01:00
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
2019-01-29 16:46:21 +01:00
Mike Fährmann
9b8ac12eed
[behance] enable 'categorytransfer' for collections ( #157 )
2019-01-19 20:02:20 +01:00
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
2019-01-19 18:11:20 +01:00
Mike Fährmann
14ee6bf611
[behance] handle external URLs with youtube-dl
2018-11-13 15:10:23 +01:00
Mike Fährmann
c00dce2adc
[behance] enable 'categorytransfer'
2018-10-09 23:40:49 +02:00
Mike Fährmann
75862715ac
[behance] add user extractor
2018-08-31 17:42:09 +02:00
Mike Fährmann
a2eeef1f5e
[behance] replace test
...
The "UVMW Studio" account and their galleries are gone.
2018-08-19 21:17:21 +02:00
Mike Fährmann
2a9f3341a2
[behance] fix title extraction
2018-08-08 10:48:58 +02:00
Mike Fährmann
3fc2f269fa
[behance] filter 'fields' list
2018-08-07 12:14:41 +02:00
Mike Fährmann
f3793660ef
update tests
2018-08-02 14:57:28 +02:00
Mike Fährmann
df082e923c
[behance] add gallery extractor ( #95 )
2018-08-01 21:46:55 +02:00