Commit Graph

111 Commits

Author SHA1 Message Date
Nikolaj Olsson
b6da4414c5 Use display-friendly language name in "Fix common errors" - thx Zoltan :) 2019-11-16 14:47:09 +01:00
Ivandro Ismael
a5b62d939e Update English OCR xml
Remove consecutive words
2019-09-17 03:12:38 +01:00
Waldi Ravens
0469c7f59f dictionaries: automated XML upkeep 2019-08-10 21:50:57 +02:00
Waldi Ravens
a3629d4816 Update English OCRFixReplaceList 2019-05-10 11:16:51 +02:00
Waldi Ravens
575d52c61c dictionaries: automated XML upkeep 2019-05-10 00:22:06 +02:00
Nikolaj Olsson
3a213ece5d Update dictionaries (minor) 2019-02-28 19:33:27 +01:00
Nikolaj Olsson
ad3a391689 Improve change casing - add stuttering support 2019-02-23 21:20:33 +01:00
Nikolaj Olsson
1fd26db4d0 Remove "gotta" to "got to" from English OCR fix replace list 2019-02-13 05:10:51 +01:00
nikolaj.olsson
85c46c0d22 Spell check English OCR fix replace list - thx Ding-adong :)
Fix #3357
2019-02-12 17:49:39 +01:00
Ivandro Ismael
1e3a13da10 Normalize single quote 2019-02-08 00:52:36 +00:00
Nikolaj Olsson
ee27c440f2 Work on #3343 2019-02-07 19:26:30 +01:00
Nikolaj Olsson
14af6cba65 Add many correction to eng_OCRFixReplaceList.xml - thx Ding-adong :)
Fix #3339
2019-02-06 18:41:35 +01:00
Ivandro Ismael
e3bf46ab98 update names 2019-02-03 14:29:48 +00:00
Ivandro Ismael
274dbd5205 update #2 2019-02-02 04:38:03 +00:00
nikolaj.olsson
8a6dbb3994 Some fixes for eng_OCRFixReplaceList.xml - thx Ding-adong :)
Fix #3319
2019-01-30 12:03:47 +01:00
Nikolaj Olsson
b81d97363b Add two words to en ocr fix list 2019-01-17 20:07:27 +01:00
May Kittens Devour Your Soul
1a36e01e68
Update eng_OCRFixReplaceList.xml 2019-01-14 15:43:15 +01:00
Nikolaj Olsson
6a6f51e052 Work on #3289 2019-01-12 01:11:11 +01:00
Nikolaj Olsson
98c189a20f Improve eng_OCRFixReplaceList - thx Ding-adong :)
Fix #3289
2019-01-12 00:22:43 +01:00
Nikolaj Olsson
64caff97fe Fix a few minor issues in "Fix common errors" - thx darnn :)
Fix #3244
2019-01-04 14:20:38 +01:00
Nikolaj Olsson
047a8cdfc3 Improve eng_OCRFixReplaceList.xml - thx Ding-adong :)
Work on #3269
2019-01-02 17:02:25 +01:00
nikolaj.olsson
de2667da87 Improve OCR of comma / quote - thx Tuukka :) 2018-12-29 07:03:02 +01:00
Nikolaj Olsson
8ae0a6e89d Work on OCR 2018-11-24 12:04:44 +01:00
Nikolaj Olsson
87af4f872c Work on ocr 2018-11-22 16:12:12 +01:00
Nikolaj Olsson
d8535f5e05 Minor work on ocr 2018-11-20 20:14:32 +01:00
Nikolaj Olsson
490a8ff1c2 Work on Tesseract4/OCR (make images binary) 2018-11-14 22:55:20 +01:00
Nikolaj Olsson
36b3d9dea3 Work on dictionaries 2018-11-05 17:19:48 +01:00
Nikolaj Olsson
d8bc89564c Work on dictionaries 2018-10-23 09:28:13 +02:00
Nikolaj Olsson
83a1fa6a9b Fix issue with "I." in "Fix common OCR errors" - thx Zoltán :) 2018-10-22 23:43:05 +02:00
May Kittens Devour Your Soul
1fcade090d
Merge branch 'master' into patch-4 2018-10-03 10:50:36 +02:00
Nikolaj Olsson
3a00ada115 Fix common OCR errors change " L* to " I." - thx
Araynilmar :)
Fi #3099
2018-09-24 06:04:26 +02:00
May Kittens Devour Your Soul
6f9b0917ba
Update eng_OCRFixReplaceList.xml
closes #3099
2018-09-13 14:28:03 +02:00
May Kittens Devour Your Soul
64a51bb5ff
Update eng_OCRFixReplaceList.xml 2018-06-12 23:21:56 +02:00
May Kittens Devour Your Soul
ae69e21858
Update eng_OCRFixReplaceList.xml 2018-04-30 21:07:01 +02:00
Nikolaj Olsson
179d1333d4 Fix issue in English ocr fix replace list - thx Paul :) 2018-03-30 18:47:56 +02:00
Nikolaj Olsson
ed514669f5 Improve OCR fix engine a little bit
Work on #2694
2018-01-02 17:41:49 +01:00
anewuser
5faf2cf54d Add English OCR Fix Replace List rules - thx anewuser
Work on #2653
2017-11-29 19:30:06 +01:00
Nikolaj Olsson
8fd1ab2edf Add words to English OCR fix replace list - thx anewuser :)
Fix #2653
2017-11-28 20:14:25 +01:00
Nikolaj Olsson
54afa358a1 Fix OCR issue with apos vs comma - thx Jamakmake :) 2017-10-28 10:51:59 +02:00
Nikolaj Olsson
aa6fab1ac3 Fix minor OCR UI issues 2017-09-14 15:12:24 +02:00
Nikolaj Olsson
d3eaa58f4f Minor OCR improvements 2017-09-12 17:25:56 +02:00
Nikolaj Olsson
09b746b160 Update OCR data 2017-08-12 23:14:45 +02:00
Nikolaj Olsson
8dba75db6c Update OCR data 2017-08-12 15:55:54 +02:00
Nikolaj Olsson
b05db70b06 A few minor improvements 2017-05-08 17:42:09 +02:00
Nikolaj Olsson
2b81d4af77 Updated a few words in ocr replace list - thx Boulder08 :) 2016-12-12 16:28:05 +01:00
Nikolaj Olsson
992aef4c82 Fixed crash in "Binary image compare" + minor dictionary update - thx Zoltan :) 2016-10-11 19:03:47 +02:00
Nikolaj Olsson
621643ad2a Minor ocr additions 2016-10-09 10:51:51 +02:00
Waldi Ravens
1aa9400b1d Updated eng_OCRFixReplaceList.xml 2016-09-27 21:42:31 +02:00
aaaxx
e5c3157767 Update eng_OCRFixReplaceList.xml
Closes #1978

Edits
========================================

Should be spaced instead of hyphenated (probably joined by OCR):

- `<Word from="airstrike" to="air-strike" />`
- `<Word from="wallplant" to="wall-plant" />`

Typo in replacement:

- `<Word from="lfeelonelung" to="l feel one lung" />`
- `<Word from="lneed"        to="l need" />`
- `<Word from="lthink___"    to="l think..." />`
- `<Word from="ltold"        to="l told" />`
- `<Word from="lv\/asn't"    to="l wasen't" />`
- `<Word from="Voilé"        to="Voilá" />`
- `<Ending from="pshycol"    to="pshyco!" />`

Capital "i" is a more likely replacement:

- `<Word from="lt"      to="it" />`
- `<Word from="lt'II"   to="it'll" />`
- `<Word from="lt'Il"   to="it'll" />`
- `<Word from="lt'll"   to="it'll" />`
- `<Word from="lt's"    to="it's" />`
- `<Word from="lfstill" to="if still" />`

Vocative, always needs a comma:

- `<Word from="HeyJennifer" to="Hey Jennifer" />`

Removals
========================================

Spelling varies between dictionaries:

- `<Word from="kickflip"  to="kick-flip" />`
- `<Word from="voicemail" to="voice-mail" />`

British vs. American spelling:

- `<Word from="judgement"  to="judgment" />`
- `<Word from="fulfilment" to="fulfillment" />`

Typo, not an OCR error, so spellchecker should deal with it (it doesn't make sense to keep a list of all possible misspellings):

- `<Word from="Goddamit"     to="Goddammit" />`
- `<Word from="mischevious"  to="mischievous" />`
- `<Word from="perscribed"   to="prescribed" />`
- `<Word from="perscription" to="prescription" />`
- `<Word from="pshyco"       to="psycho" />`
- `<Word from="thoguht"      to="thought" />`

Spelling changes meaning:

- `<Word from="ahold"  to="a hold" />`
- `<Word from="google" to="Google" />`

Find and replace are the same:

- `<Word from="I thought" to="I thought" />`
- `<Word from="literally" to="literally" />`

Resulting punctuation seems unlikely:

- `<Word from="'Qkay_"         to="- Okay!" />`
- `<Word from="_Qkay-"         to="- Okay!" />`
- `<Word from="'Qkay"          to="- Okay" />`
- `<Word from="JOEY-"          to="Joey!" />`
- `<Word from="_NO__"          to="No--" />`

Other reason:

Replacement rule                              | Comment
:---------------------------------------------|:-------------------
`<Word from="cp"          to="op" />`         | doesn't seem useful
`<Word from="lnte"        to="inte" />`       | doesn't seem useful
`<Word from="gothere"     to="go there" />`   | could also be "got here"
`<Word from="ridonculous" to="ridiculous" />` | intentional mispronunciation
`<Word from="I02"         to="Pops" />`       | seems really implausible, and it could mess up IDs, codes, etc.
2016-09-27 15:10:58 +02:00
Waldi Ravens
e26c5acdf5 dictionaries: automated XML upkeep 2016-09-21 12:40:08 +02:00