1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 03:02:50 +01:00
Commit Graph

1977 Commits

Author SHA1 Message Date
Mike Fährmann
004812258d
[hentaifox] fix extraction 2019-12-02 22:21:45 +01:00
Mike Fährmann
e2710702d4
fix Cloudflare bypss 2019-12-01 01:07:24 +01:00
Mike Fährmann
8759403f37
[plurk] add delay between comment requests 2019-12-01 01:03:31 +01:00
Mike Fährmann
a28552fd19
update test results
- hbrowse: one tag got removed
- mangoxo: gallery changed owner
- photobucket: ?, but photo still downloads
2019-11-30 23:59:32 +01:00
Mike Fährmann
dcaa3d01bd
[imagefap] adapt to new image URL format 2019-11-30 23:48:02 +01:00
Mike Fährmann
e62c209ca0
[nijie] fix 'date' parsing 2019-11-30 23:08:21 +01:00
Mike Fährmann
3bba763ab9
[twitter] improve
- update metadata structure
  - combine all user… entries into their own dict
  - let 'user' always specify the Timeline owner
  - add 'author' entry that specifies the original Tweet author
- create directories per post (closes #491)
- fix username issues with /i/web/ URLs
2019-11-30 22:30:37 +01:00
Mike Fährmann
26d2334550
[postprocessor:metadata] rename 'format' to 'content-format'
Just to be consistent with the other 'extension-format' option name,
and only 'format' is also still accepted.
2019-11-30 17:27:49 +01:00
Mike Fährmann
a412531451
[postprocessor:metadata] implement 'extension-format' option
closes #477
2019-11-30 17:26:17 +01:00
Mike Fährmann
0f1538af78
split filename formatting into its own function 2019-11-29 22:32:07 +01:00
Mike Fährmann
db35c3b581
[directlink] separate filenames from paths
With this, all default filename formats specify an '{extension}'
and PathFormat.set_extension() reliably works for all files.
2019-11-28 23:50:00 +01:00
Mike Fährmann
41a3169c67
[foolfuuka] use '{extension}' in default filename format 2019-11-28 23:12:48 +01:00
Mike Fährmann
e9aed62c91
[imgur] unescape image titles 2019-11-28 22:13:24 +01:00
Mike Fährmann
bca2222559
add '--exec-after' 2019-11-27 19:42:46 +01:00
Mike Fährmann
ed6592ea1a
remove '--abort-on-skip' 2019-11-27 19:41:24 +01:00
Mike Fährmann
2c332edaad
[plurk] fix comment pagination 2019-11-27 19:39:56 +01:00
Mike Fährmann
a3fa45bbb1
[behance] get images from 'media_collection' modules 2019-11-27 01:04:33 +01:00
Mike Fährmann
359c3bc1c5
[deviantart] revert to getting download URLs from OAuth API
This commit (partially) reverts 27b5b24, 94eb7c6, and a437e78.

Download URLs from the 'extended_fetch' endpoint are now only
usable for logged in users, while those from the respective
OAuth API endpoint are working again. Everything except
scraps and direct deviation links should be fixed, and those
two categories will work with exported cookies. (#488)

TODO:
- "native" login with --username and --password
- better handling of internally stored cookies
2019-11-26 23:29:46 +01:00
Mike Fährmann
42b9633c7e
update test results 2019-11-26 23:27:15 +01:00
Mike Fährmann
b28bd1c73e
[bobx] set generated session cookie (closes #482)
This reverts commit 490831f and also restores original image downloads
by setting a randomly generated session cookie. No login required.
2019-11-25 20:04:11 +01:00
Mike Fährmann
ae09f87602
improve SharedConfigMixin config lookups 2019-11-25 18:31:38 +01:00
Mike Fährmann
b5c964332b
improve config.py test coverage 2019-11-25 17:20:00 +01:00
Mike Fährmann
f5604492c3
update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
4ca883c66f
[smugmug] replace test for custom URLs
The old one (http://www.creativedogportraits.com/) is empty and/or
no longer handled by SmugMug.
2019-11-22 23:25:55 +01:00
Mike Fährmann
d45fabb79d
match user profile handling on deviantart and newgrounds 2019-11-22 23:20:21 +01:00
Mike Fährmann
ea80dadd09
[deviantart] restore archive keys
Commit 9fdc5e7 changed 'username' fields  to have consistent
capitalization, but that invalidated the archive keys of several
extractors where 'username' was usually lowercase.
2019-11-21 17:00:08 +01:00
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
i.e. keys starting with an underscore
2019-11-21 16:58:44 +01:00
Mike Fährmann
ea094692c8
[vsco] fix collection extraction (#480) 2019-11-20 22:06:23 +01:00
Mike Fährmann
490831f84a
[bobx] "fix" image download URLs
Access to original images got restricted to (paid) members only.
All that's publicly accessible now are essentially preview pictures.
2019-11-20 21:59:37 +01:00
Mike Fährmann
978cb03f81
update misc test results
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
2019-11-20 21:45:48 +01:00
Mike Fährmann
fca87974fe
[sexcom] fix video downloads by sending specific Referer headers 2019-11-19 23:52:34 +01:00
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers 2019-11-19 23:50:54 +01:00
Mike Fährmann
edc080468d
[instagram] make 'video_url' fields optional (fixes #479)
[ci skip]
2019-11-19 11:18:43 +01:00
Mike Fährmann
9fdc5e74cb
[deviantart] ensure consistent username capitalization (#455)
The 'username' field was capitalized in a very inconsistent manner:
Either all lowercase, or as given by the input URL, or with the
"original" capitalization, depending on the extractor used among
other things.

Now usernames use their original capitalization for all extractors.
('UserName' instead of 'username' or 'uSeRnAmE')
2019-11-18 22:09:58 +01:00
Mike Fährmann
b1f0609de5
[newgrounds] rewrite (#394)
- restructure extractor hierarchy
- extract more metadata
- extract videos without youtube-dl
- be more resilient to errors

TODO:
- favorites
- games, but that might be near impossible for non-flash titles
2019-11-18 21:13:33 +01:00
Mike Fährmann
3ece3976ae
[newgrounds] implement login support (#394) 2019-11-16 23:45:32 +01:00
Mike Fährmann
3a07c06865
[newgrounds] update
- create directory per post
- rename variables and methods
2019-11-14 23:17:14 +01:00
Mike Fährmann
5513b66eb0
[vsco] fix user profile extraction 2019-11-12 23:36:48 +01:00
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes (closes #472) 2019-11-10 17:52:51 +01:00
Mike Fährmann
521fcd2eb9
[imgbb] fix error in galleries without user info (closes #471) 2019-11-10 17:10:51 +01:00
Mike Fährmann
8061263d4c
[imgbb] improve pagination logic
- avoid unnecessary API calls for small or empty galleries
- combine duplicate code
2019-11-10 17:07:27 +01:00
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
2019-11-10 17:04:51 +01:00
Mike Fährmann
67e54ed8ea
release version 1.11.1 2019-11-09 00:52:06 +01:00
Mike Fährmann
ce98a86c0e
fix data file inclusion in source distributions 2019-11-09 00:47:13 +01:00
Mike Fährmann
6c86fbfe2a
release version 1.11.0 2019-11-08 22:34:56 +01:00
Mike Fährmann
94a94f3b86
miscellaneous stuff 2019-11-08 20:58:53 +01:00
Mike Fährmann
b0197098e6
[imgur] get title from webpage if missing in API response
(closes #467)
2019-11-07 21:10:04 +01:00
Mike Fährmann
dd5d2b2eac
[deviantart] add user profile extractor (#377, #419) 2019-11-07 18:29:49 +01:00
Mike Fährmann
a437e78620
[deviantart] minimize cookie usage during scraps extraction
(#445)
2019-11-05 21:55:13 +01:00
Mike Fährmann
1a197d2195
store the original cookiejar as Extractor._cookiejar 2019-11-05 21:53:22 +01:00
Mike Fährmann
de83ae4576
make 'method' argument of Extractor.request keyword-only 2019-11-05 17:28:09 +01:00
Mike Fährmann
a5be08a830
[downloader:ytdl] forward proxy settings 2019-11-05 16:16:26 +01:00
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries 2019-11-04 21:17:22 +01:00
Mike Fährmann
94dbdbf506
[nijie] change default filename format
… to be consistent with Pixiv filenames
2019-11-04 20:47:38 +01:00
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve (#421, #413)
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
2019-11-03 21:45:45 +01:00
Mike Fährmann
c18fadc221
[instagram] extract videos without youtube-dl (#391) 2019-11-03 14:02:56 +01:00
Mike Fährmann
f15eedb634
[sexcom] set Referer header for file downloads (closes #464) 2019-11-03 13:27:58 +01:00
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit 2019-11-02 20:42:09 +01:00
Mike Fährmann
b3b9da6d74
[photobucket] replace test URL
The other user deleted all of is images.
2019-11-02 20:17:08 +01:00
Mike Fährmann
64786363be
[4chan] simplify
- remove 'chan.py'
- slight adjustments to directory and filenames
2019-11-02 20:11:21 +01:00
Mike Fährmann
557e2c018b
[8chan] remove module 2019-11-02 20:06:47 +01:00
Mike Fährmann
e14782a948
[instagram] simplify graphql extraction for post pages 2019-11-01 22:08:25 +01:00
Mike Fährmann
c01ff78467
[twitter] extend 'videos' option to force extraction with ytdl
(closes #459)
2019-11-01 22:06:07 +01:00
Mike Fährmann
f8ac67ce50
[hitomi] extend URL pattern + follow redirects 2019-11-01 21:40:10 +01:00
Mike Fährmann
e877ca97c3
[naver] adjust directory names and metadata structure 2019-10-31 16:53:48 +01:00
Mike Fährmann
702f2fbd1f
[issuu] add publication and user extractors (#413) 2019-10-31 16:52:57 +01:00
Mike Fährmann
8361d874d7
[hitomi] fix extraction 2019-10-29 16:23:20 +01:00
Mike Fährmann
5fa6ff04dd
[instagram] extract '__additionalDataLoaded' (#391)
The '_sharedData' of Post pages is missing its 'graphql' part for
logged in users. This data is now included in the parameters of a
function call to '__additionalDataLoaded(...)'

And, of course, video extraction with youtube-dl broke because of
this change as well.
2019-10-29 16:00:31 +01:00
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status 2019-10-29 15:56:54 +01:00
Mike Fährmann
322c2e7ed4
renaming variables
mostly 'keyword(s)' to 'kwdict'
2019-10-29 15:46:35 +01:00
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs 2019-10-28 21:11:06 +01:00
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
d5e3910270
adjust 'util.raises()' 2019-10-28 15:06:17 +01:00
Mike Fährmann
d44f790e81
adjust output for HTTP status related errors 2019-10-27 23:55:02 +01:00
Mike Fährmann
03e0cec715
return with non-zero exit status on error 2019-10-27 23:54:18 +01:00
Mike Fährmann
c887493a80
overhaul exception stuff 2019-10-27 23:53:37 +01:00
Mike Fährmann
109718a5e3
[blogger] add blog and post extractors (closes #364) 2019-10-26 14:15:55 +02:00
Mike Fährmann
244d396b0b
add '--ugoira-conv-lossless' command-line option (#432)
and cleanup the arguments for the regular '--ugoira-conv':
- remove '-an'
- enable two-pass encoding
2019-10-26 00:32:19 +02:00
Mike Fährmann
49a6b1b6c0
[twitter] extract video stream info without youtube-dl (#452)
This should allow video downloads when logged in without
'forward-cookies' disabled and from protected tweets.

youtube-dl still gets used to download HLS playlists, but the data
extraction part, which doesn't work with youtube-dl at the moment,
now gets handled by gallery-dl itself.
2019-10-25 13:41:36 +02:00
Mike Fährmann
9f0dbf2a72
[twitter] raise proper exception for protected Tweets 2019-10-25 13:26:16 +02:00
Mike Fährmann
083e14ad9a
[downloader:ytdl] add data from '_ytdl_extra' to info_dicts 2019-10-25 13:17:13 +02:00
Mike Fährmann
6e08ada4fe
[luscious] simplify some metadata entries 2019-10-25 13:14:59 +02:00
Mike Fährmann
9e3a8607ee
[deviantart] update usernames (#455)
In the case that a user changed his username, requesting deviations
with an old name might cause problems (missing deviations, etc.)

The internal 'username' value therefore now gets updated to the
current username taken from the user profile.
2019-10-24 22:23:16 +02:00
Mike Fährmann
2eb38810c5
[twitter] fix image extraction when logged in (#452)
... for individual tweets.

To get a Tweet page with the old Twitter layout, an Internet
Explorer User-Agent (e.g. Mozilla/5.0 (Windows NT 6.1; WOW64;
Trident/7.0; rv:11.0) like Gecko) as well as a Referer header
pointing to the page itself is required. The "app_shell_visited"
cookie appears to be optional at the moment, but that is what
a regular web browser would send.
2019-10-23 22:18:29 +02:00
Mike Fährmann
8f38a35b91
[imgur] use API with "public" client_id (#446)
Using the API endpoints makes it possible to access NSFW content
without logging in.
2019-10-23 21:43:55 +02:00
Mike Fährmann
b23c822b23
[luscious] use GraphQL 2019-10-22 21:17:08 +02:00
Mike Fährmann
ef17d94469
update test results 2019-10-21 21:53:21 +02:00
Mike Fährmann
2057c6ba29
[naver] add blog and post extractors (closes #447) 2019-10-21 16:59:15 +02:00
Mike Fährmann
389d2d7e38
implement 'cookies-update' option (#445) 2019-10-19 15:23:55 +02:00
Mike Fährmann
fbc0a6a059
[nozomi] skip unavailable posts (#388) 2019-10-17 23:05:04 +02:00
Mike Fährmann
ae98dbcbb3
[nozomi] implement searching for negated terms (#388)
It's incredibly slow and resource intensive (> 1GB of memory),
but that is also how it is implemented on nozomi.la itself.
2019-10-17 22:53:37 +02:00
Mike Fährmann
1c03a389df
[twitter] small improvements to search extractor
- put search results in separate directories
- set 'max_position' to '-1' for first request
  -> prevent duplicate results
- add a test
- flake8
2019-10-17 19:50:59 +02:00
Mike Fährmann
c3042978b8
[deviantart] match "/gallery/all" (closes #449) 2019-10-17 17:54:44 +02:00
Alice
bcddcca6db Add search downloading to twitter.py (#448)
Adds the functionality to download search results on twitter.com/search. Since twitter only allows downloading of up to 3,200 of a users most recent tweets, you will be unable to download old images from users with a lot of tweets. To bypass this, you can use the twitter search to get the tweets from the sections in time you were stopped at. An example search would be "from:user since:2015-01-01 until:2016-01-01 filter:images". The URL you would use will look something like this https://twitter.com/search?f=tweets&q=from%3Asupernaturepics%20since%3A2015-01-01%20until%3A2016-01-01%20filter%3Aimages&src=typd&lang=en

The _tweets_from_api function had to be changed because it would not get the next page of results using the last "data-tweet-id". It would return the same JSON but with a "min_position" string added. Using this string for the "max_position" param from the second page onwards correctly returned the next pages. This change does not interfere with how the other extractors work as far as I know. The 2 regex patterns in the extractors had to be changed to not match the search URL.
2019-10-16 18:23:10 +02:00
Mike Fährmann
1693d97bd3
update extractor class hierarchies
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
2019-10-16 18:15:29 +02:00
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found (#446) 2019-10-16 17:45:14 +02:00
Mike Fährmann
5882b00f2f
[imgur] implement login support (#446) 2019-10-15 22:00:22 +02:00
Mike Fährmann
91643ca54b
[nozomi] add search extractor (#388) 2019-10-14 23:49:46 +02:00
Mike Fährmann
df2b3c6888
restore OAuth2 authentication error messages 2019-10-13 22:48:01 +02:00
Mike Fährmann
6779512fc7
[nozomi] add post and tag extractors (#388) 2019-10-13 22:16:03 +02:00