Nikolaj Olsson
|
14af6cba65
|
Add many correction to eng_OCRFixReplaceList.xml - thx Ding-adong :)
Fix #3339
|
2019-02-06 18:41:35 +01:00 |
|
Ivandro Ismael
|
e3bf46ab98
|
update names
|
2019-02-03 14:29:48 +00:00 |
|
Ivandro Ismael
|
274dbd5205
|
update #2
|
2019-02-02 04:38:03 +00:00 |
|
nikolaj.olsson
|
8a6dbb3994
|
Some fixes for eng_OCRFixReplaceList.xml - thx Ding-adong :)
Fix #3319
|
2019-01-30 12:03:47 +01:00 |
|
Nikolaj Olsson
|
b81d97363b
|
Add two words to en ocr fix list
|
2019-01-17 20:07:27 +01:00 |
|
May Kittens Devour Your Soul
|
1a36e01e68
|
Update eng_OCRFixReplaceList.xml
|
2019-01-14 15:43:15 +01:00 |
|
Nikolaj Olsson
|
6a6f51e052
|
Work on #3289
|
2019-01-12 01:11:11 +01:00 |
|
Nikolaj Olsson
|
98c189a20f
|
Improve eng_OCRFixReplaceList - thx Ding-adong :)
Fix #3289
|
2019-01-12 00:22:43 +01:00 |
|
Nikolaj Olsson
|
64caff97fe
|
Fix a few minor issues in "Fix common errors" - thx darnn :)
Fix #3244
|
2019-01-04 14:20:38 +01:00 |
|
Nikolaj Olsson
|
047a8cdfc3
|
Improve eng_OCRFixReplaceList.xml - thx Ding-adong :)
Work on #3269
|
2019-01-02 17:02:25 +01:00 |
|
nikolaj.olsson
|
de2667da87
|
Improve OCR of comma / quote - thx Tuukka :)
|
2018-12-29 07:03:02 +01:00 |
|
Nikolaj Olsson
|
8ae0a6e89d
|
Work on OCR
|
2018-11-24 12:04:44 +01:00 |
|
Nikolaj Olsson
|
87af4f872c
|
Work on ocr
|
2018-11-22 16:12:12 +01:00 |
|
Nikolaj Olsson
|
d8535f5e05
|
Minor work on ocr
|
2018-11-20 20:14:32 +01:00 |
|
Nikolaj Olsson
|
490a8ff1c2
|
Work on Tesseract4/OCR (make images binary)
|
2018-11-14 22:55:20 +01:00 |
|
Nikolaj Olsson
|
36b3d9dea3
|
Work on dictionaries
|
2018-11-05 17:19:48 +01:00 |
|
Nikolaj Olsson
|
d8bc89564c
|
Work on dictionaries
|
2018-10-23 09:28:13 +02:00 |
|
Nikolaj Olsson
|
83a1fa6a9b
|
Fix issue with "I." in "Fix common OCR errors" - thx Zoltán :)
|
2018-10-22 23:43:05 +02:00 |
|
May Kittens Devour Your Soul
|
1fcade090d
|
Merge branch 'master' into patch-4
|
2018-10-03 10:50:36 +02:00 |
|
Nikolaj Olsson
|
3a00ada115
|
Fix common OCR errors change " L* to " I." - thx
Araynilmar :)
Fi #3099
|
2018-09-24 06:04:26 +02:00 |
|
May Kittens Devour Your Soul
|
6f9b0917ba
|
Update eng_OCRFixReplaceList.xml
closes #3099
|
2018-09-13 14:28:03 +02:00 |
|
May Kittens Devour Your Soul
|
64a51bb5ff
|
Update eng_OCRFixReplaceList.xml
|
2018-06-12 23:21:56 +02:00 |
|
May Kittens Devour Your Soul
|
ae69e21858
|
Update eng_OCRFixReplaceList.xml
|
2018-04-30 21:07:01 +02:00 |
|
Nikolaj Olsson
|
179d1333d4
|
Fix issue in English ocr fix replace list - thx Paul :)
|
2018-03-30 18:47:56 +02:00 |
|
Nikolaj Olsson
|
ed514669f5
|
Improve OCR fix engine a little bit
Work on #2694
|
2018-01-02 17:41:49 +01:00 |
|
anewuser
|
5faf2cf54d
|
Add English OCR Fix Replace List rules - thx anewuser
Work on #2653
|
2017-11-29 19:30:06 +01:00 |
|
Nikolaj Olsson
|
8fd1ab2edf
|
Add words to English OCR fix replace list - thx anewuser :)
Fix #2653
|
2017-11-28 20:14:25 +01:00 |
|
Nikolaj Olsson
|
54afa358a1
|
Fix OCR issue with apos vs comma - thx Jamakmake :)
|
2017-10-28 10:51:59 +02:00 |
|
Nikolaj Olsson
|
aa6fab1ac3
|
Fix minor OCR UI issues
|
2017-09-14 15:12:24 +02:00 |
|
Nikolaj Olsson
|
d3eaa58f4f
|
Minor OCR improvements
|
2017-09-12 17:25:56 +02:00 |
|
Nikolaj Olsson
|
09b746b160
|
Update OCR data
|
2017-08-12 23:14:45 +02:00 |
|
Nikolaj Olsson
|
8dba75db6c
|
Update OCR data
|
2017-08-12 15:55:54 +02:00 |
|
Nikolaj Olsson
|
b05db70b06
|
A few minor improvements
|
2017-05-08 17:42:09 +02:00 |
|
Nikolaj Olsson
|
2b81d4af77
|
Updated a few words in ocr replace list - thx Boulder08 :)
|
2016-12-12 16:28:05 +01:00 |
|
Nikolaj Olsson
|
992aef4c82
|
Fixed crash in "Binary image compare" + minor dictionary update - thx Zoltan :)
|
2016-10-11 19:03:47 +02:00 |
|
Nikolaj Olsson
|
621643ad2a
|
Minor ocr additions
|
2016-10-09 10:51:51 +02:00 |
|
Waldi Ravens
|
1aa9400b1d
|
Updated eng_OCRFixReplaceList.xml
|
2016-09-27 21:42:31 +02:00 |
|
aaaxx
|
e5c3157767
|
Update eng_OCRFixReplaceList.xml
Closes #1978
Edits
========================================
Should be spaced instead of hyphenated (probably joined by OCR):
- `<Word from="airstrike" to="air-strike" />`
- `<Word from="wallplant" to="wall-plant" />`
Typo in replacement:
- `<Word from="lfeelonelung" to="l feel one lung" />`
- `<Word from="lneed" to="l need" />`
- `<Word from="lthink___" to="l think..." />`
- `<Word from="ltold" to="l told" />`
- `<Word from="lv\/asn't" to="l wasen't" />`
- `<Word from="Voilé" to="Voilá" />`
- `<Ending from="pshycol" to="pshyco!" />`
Capital "i" is a more likely replacement:
- `<Word from="lt" to="it" />`
- `<Word from="lt'II" to="it'll" />`
- `<Word from="lt'Il" to="it'll" />`
- `<Word from="lt'll" to="it'll" />`
- `<Word from="lt's" to="it's" />`
- `<Word from="lfstill" to="if still" />`
Vocative, always needs a comma:
- `<Word from="HeyJennifer" to="Hey Jennifer" />`
Removals
========================================
Spelling varies between dictionaries:
- `<Word from="kickflip" to="kick-flip" />`
- `<Word from="voicemail" to="voice-mail" />`
British vs. American spelling:
- `<Word from="judgement" to="judgment" />`
- `<Word from="fulfilment" to="fulfillment" />`
Typo, not an OCR error, so spellchecker should deal with it (it doesn't make sense to keep a list of all possible misspellings):
- `<Word from="Goddamit" to="Goddammit" />`
- `<Word from="mischevious" to="mischievous" />`
- `<Word from="perscribed" to="prescribed" />`
- `<Word from="perscription" to="prescription" />`
- `<Word from="pshyco" to="psycho" />`
- `<Word from="thoguht" to="thought" />`
Spelling changes meaning:
- `<Word from="ahold" to="a hold" />`
- `<Word from="google" to="Google" />`
Find and replace are the same:
- `<Word from="I thought" to="I thought" />`
- `<Word from="literally" to="literally" />`
Resulting punctuation seems unlikely:
- `<Word from="'Qkay_" to="- Okay!" />`
- `<Word from="_Qkay-" to="- Okay!" />`
- `<Word from="'Qkay" to="- Okay" />`
- `<Word from="JOEY-" to="Joey!" />`
- `<Word from="_NO__" to="No--" />`
Other reason:
Replacement rule | Comment
:---------------------------------------------|:-------------------
`<Word from="cp" to="op" />` | doesn't seem useful
`<Word from="lnte" to="inte" />` | doesn't seem useful
`<Word from="gothere" to="go there" />` | could also be "got here"
`<Word from="ridonculous" to="ridiculous" />` | intentional mispronunciation
`<Word from="I02" to="Pops" />` | seems really implausible, and it could mess up IDs, codes, etc.
|
2016-09-27 15:10:58 +02:00 |
|
Waldi Ravens
|
e26c5acdf5
|
dictionaries: automated XML upkeep
|
2016-09-21 12:40:08 +02:00 |
|
aaaxx
|
1443e279a6
|
Removed licence/license rule: it's not a typo
In British English "licence" is a noun and "license" a verb.
|
2016-09-16 06:40:36 +02:00 |
|
Nikolaj Olsson
|
f9a2e99d54
|
Added a few words to the English OCR fix replace list
|
2016-08-21 20:20:36 +02:00 |
|
Waldi Ravens
|
5b312d4a3a
|
dictionaries: automated XML upkeep
|
2016-08-20 19:29:34 +02:00 |
|
Nikolaj Olsson
|
eca8f0546a
|
Minor dictionary update
|
2016-08-07 15:25:26 +02:00 |
|
Waldi Ravens
|
125d2dceb6
|
dictionaries: automated XML upkeep
|
2016-05-29 13:02:16 +02:00 |
|
Nikolaj Olsson
|
7d09349e0b
|
Some minor improvements for OCR via "Binary image compare"
|
2016-05-06 15:38:42 +02:00 |
|
Waldi Ravens
|
b8ebe12640
|
dictionaries: automated XML upkeep
|
2016-04-13 13:32:50 +02:00 |
|
Kruno H
|
03f1f7d7c0
|
Update eng_OCRFixReplaceList.xml
|
2016-03-23 17:40:31 +01:00 |
|
Kruno H
|
00caef81ec
|
Update eng_OCRFixReplaceList.xml
|
2016-03-22 20:35:44 +01:00 |
|
Waldi Ravens
|
e9964d82f8
|
dictionaries: automated XML upkeep
|
2016-02-17 20:55:21 +01:00 |
|
niksedk
|
8f489fd611
|
Minor update of eng_OCRFixReplaceList.xml
|
2016-01-22 20:58:42 +01:00 |
|
niksedk
|
ae203e5e7b
|
Updated of eng_OCRFixReplaceList.xml
|
2016-01-15 13:08:51 +01:00 |
|
niksedk
|
157ebe44c7
|
Minor update of word lists
|
2016-01-13 20:24:07 +01:00 |
|
Waldi Ravens
|
dedb933b9a
|
dictionaries: automated XML upkeep
|
2015-11-08 20:37:41 +01:00 |
|
niksedk
|
79394e8656
|
Added a few words to English ocr fix list
|
2015-10-26 20:07:11 +01:00 |
|
niksedk
|
108a0ae6a5
|
Minor fixes for beta ocr method (for a future version...)
|
2015-09-24 06:10:12 +02:00 |
|
Waldi Ravens
|
f070d913b7
|
dictionaries: automated XML upkeep
|
2015-07-28 19:49:19 +02:00 |
|
Kruno H
|
9e75dcc98c
|
Update eng_OCRFixReplaceList.xml
|
2015-07-19 17:42:26 +02:00 |
|
Waldi Ravens
|
fcd746fea2
|
dictionaries: automated XML upkeep
|
2015-06-25 11:40:47 +02:00 |
|
Kruno H
|
018b11a80c
|
Update eng_OCRFixReplaceList.xml
|
2015-06-24 21:41:46 +02:00 |
|
Waldi Ravens
|
5f682d1242
|
Removed duplicates (Dictionaries/eng_OCRFixReplaceList.xml)
|
2015-06-14 07:44:15 +02:00 |
|
Waldi Ravens
|
032dbcebab
|
dictionaries: automated XML upkeep
|
2015-05-23 04:45:21 +02:00 |
|
Nikolaj Olsson
|
8a6d6e21d1
|
Merge pull request #752 from diomed/patch-1
Update eng_OCRFixReplaceList.xml
|
2015-05-19 17:45:22 +02:00 |
|
Kruno H
|
11f936cd6e
|
Update eng_OCRFixReplaceList.xml
|
2015-05-13 11:23:16 +02:00 |
|
Kruno H
|
8e12d1806f
|
Update eng_OCRFixReplaceList.xml
|
2015-05-09 20:07:19 +02:00 |
|
Waldi Ravens
|
60a5f02d7e
|
Added missing dquote (eng_OCRFixReplaceList.xml)
|
2015-05-05 07:37:57 +02:00 |
|
Nikolaj Olsson
|
a168da372b
|
Merge pull request #724 from xylographe/ocrfrl
Updated eng_OCRFixReplaceList.xml
|
2015-05-04 18:34:24 +02:00 |
|
Kruno H
|
600f71b0fc
|
Update eng_OCRFixReplaceList
|
2015-05-04 18:04:38 +02:00 |
|
Waldi Ravens
|
126c4f94d4
|
Updated eng_OCRFixReplaceList.xml
|
2015-05-03 19:42:16 +02:00 |
|
XhmikosR
|
1aa302c1d3
|
Update eng_OCRFixReplaceList.xml.
[ci skip]
|
2014-12-06 13:06:15 +02:00 |
|
XhmikosR
|
9bcd72a8b8
|
Update eng_OCRFixReplaceList.xml.
[ci skip]
|
2014-12-05 14:13:57 +02:00 |
|
XhmikosR
|
0228ed4575
|
Update dictionaries.
[ci skip]
|
2014-12-01 15:07:50 +02:00 |
|
XhmikosR
|
85d37dcf79
|
Update eng_OCRFixReplaceList.xml.
|
2014-10-05 18:58:06 +03:00 |
|
XhmikosR
|
6e4decb15c
|
Update eng_OCRFixReplaceList.xml.
|
2014-10-05 18:47:44 +03:00 |
|
XhmikosR
|
9ecd7553ca
|
Update eng_OCRFixReplaceList.xml.
|
2014-09-24 09:33:24 +03:00 |
|
niksedk
|
7fd73b3deb
|
A new fixes for English ocr fix list
|
2014-06-07 00:53:41 +02:00 |
|
niksedk
|
2788719e8c
|
Some improvements for "Split long lines" regarding dialogues - thx Joel :)
|
2014-05-25 20:14:10 +02:00 |
|
niksedk
|
f39ba84753
|
minor fix for eng ocr replace list
|
2014-05-20 22:14:42 +02:00 |
|
niksedk
|
1b7ccb4bba
|
fix invalid char in ocr fix replace list...
|
2014-05-14 09:08:57 +02:00 |
|
niksedk
|
6dc1c231ac
|
Fixed bug in English ocr fix replace list (GIVING to GMNG) - thx Joel :)
|
2014-05-14 08:49:49 +02:00 |
|
nikse.dk
|
c1aa8def3e
|
Version 3.3.14
|
2014-02-23 10:01:02 +01:00 |
|
niksedk
|
ab755ca16f
|
updated dictionary
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@2225 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-11-22 17:02:27 +00:00 |
|
niksedk
|
4312873efb
|
Updated change log + dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@2109 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-09-22 13:48:17 +00:00 |
|
niksedk
|
fdcbcbc8bb
|
Updated dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@2093 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-09-16 05:57:51 +00:00 |
|
niksedk
|
6f35094515
|
updated dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@2047 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-09-01 15:36:55 +00:00 |
|
niksedk
|
e355202811
|
updated names for 3.3.3
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1746 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-03-19 12:15:42 +00:00 |
|
niksedk
|
832ad0f19e
|
3.3.2 - last ocr improvements
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1671 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-02-22 21:41:20 +00:00 |
|
niksedk
|
b5784734e6
|
updated eng ocr fix replace list
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1635 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2013-02-15 08:37:04 +00:00 |
|
niksedk
|
120ca6feed
|
Updated english ocr fix replace list + names
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1535 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-12-09 14:48:29 +00:00 |
|
niksedk
|
15d2187458
|
Update of dictionaries + tesseract
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1456 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-11-02 11:01:24 +00:00 |
|
niksedk
|
e9f35b4667
|
Minor update of eng_OCRFixReplaceList.xml
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1144 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-04-27 10:11:12 +00:00 |
|
niksedk
|
abffe03960
|
update dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1077 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-03-28 08:59:53 +00:00 |
|
niksedk
|
08bd62547e
|
Updated ocr dictionary (removed "anymore" -> "any more")
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1050 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-03-17 09:47:40 +00:00 |
|
niksedk
|
a4481ede92
|
Updated dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1031 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-03-11 08:21:58 +00:00 |
|
niksedk
|
1945da1111
|
Updated en-us dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@1003 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2012-02-26 08:53:26 +00:00 |
|
niksedk
|
a9a51bab03
|
Updated xml dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@787 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2011-11-06 11:08:22 +00:00 |
|
niksedk
|
5acd9747d7
|
Updated eng_OCRFixReplaceList.xml
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@703 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2011-10-07 17:44:56 +00:00 |
|
niksedk
|
adc638fa08
|
Updated dictionaries
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@630 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2011-09-13 06:05:03 +00:00 |
|
niksedk
|
6a77bb3f45
|
Updated eng_OCRFixReplaceList.xml
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@357 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2011-03-01 18:52:47 +00:00 |
|
niksedk
|
c8ff9ef006
|
Updated ocr fix replace lists
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@132 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2010-11-05 05:04:17 +00:00 |
|
niksedk
|
8a16c702cc
|
Initial check-in
git-svn-id: https://subtitleedit.googlecode.com/svn/trunk@22 99eadd0c-20b8-1223-b5c4-2a2b2df33de2
|
2010-10-12 13:08:35 +00:00 |
|