Commit Graph

76 Commits

Author SHA1 Message Date
Nikolaj Olsson
ed514669f5 Improve OCR fix engine a little bit
Work on #2694
2018-01-02 17:41:49 +01:00
anewuser
5faf2cf54d Add English OCR Fix Replace List rules - thx anewuser
Work on #2653
2017-11-29 19:30:06 +01:00
Nikolaj Olsson
8fd1ab2edf Add words to English OCR fix replace list - thx anewuser :)
Fix #2653
2017-11-28 20:14:25 +01:00
Nikolaj Olsson
54afa358a1 Fix OCR issue with apos vs comma - thx Jamakmake :) 2017-10-28 10:51:59 +02:00
Nikolaj Olsson
aa6fab1ac3 Fix minor OCR UI issues 2017-09-14 15:12:24 +02:00
Nikolaj Olsson
d3eaa58f4f Minor OCR improvements 2017-09-12 17:25:56 +02:00
Nikolaj Olsson
09b746b160 Update OCR data 2017-08-12 23:14:45 +02:00
Nikolaj Olsson
8dba75db6c Update OCR data 2017-08-12 15:55:54 +02:00
Nikolaj Olsson
b05db70b06 A few minor improvements 2017-05-08 17:42:09 +02:00
Nikolaj Olsson
2b81d4af77 Updated a few words in ocr replace list - thx Boulder08 :) 2016-12-12 16:28:05 +01:00
Nikolaj Olsson
992aef4c82 Fixed crash in "Binary image compare" + minor dictionary update - thx Zoltan :) 2016-10-11 19:03:47 +02:00
Nikolaj Olsson
621643ad2a Minor ocr additions 2016-10-09 10:51:51 +02:00
Waldi Ravens
1aa9400b1d Updated eng_OCRFixReplaceList.xml 2016-09-27 21:42:31 +02:00
aaaxx
e5c3157767 Update eng_OCRFixReplaceList.xml
Closes #1978

Edits
========================================

Should be spaced instead of hyphenated (probably joined by OCR):

- `<Word from="airstrike" to="air-strike" />`
- `<Word from="wallplant" to="wall-plant" />`

Typo in replacement:

- `<Word from="lfeelonelung" to="l feel one lung" />`
- `<Word from="lneed"        to="l need" />`
- `<Word from="lthink___"    to="l think..." />`
- `<Word from="ltold"        to="l told" />`
- `<Word from="lv\/asn't"    to="l wasen't" />`
- `<Word from="Voilé"        to="Voilá" />`
- `<Ending from="pshycol"    to="pshyco!" />`

Capital "i" is a more likely replacement:

- `<Word from="lt"      to="it" />`
- `<Word from="lt'II"   to="it'll" />`
- `<Word from="lt'Il"   to="it'll" />`
- `<Word from="lt'll"   to="it'll" />`
- `<Word from="lt's"    to="it's" />`
- `<Word from="lfstill" to="if still" />`

Vocative, always needs a comma:

- `<Word from="HeyJennifer" to="Hey Jennifer" />`

Removals
========================================

Spelling varies between dictionaries:

- `<Word from="kickflip"  to="kick-flip" />`
- `<Word from="voicemail" to="voice-mail" />`

British vs. American spelling:

- `<Word from="judgement"  to="judgment" />`
- `<Word from="fulfilment" to="fulfillment" />`

Typo, not an OCR error, so spellchecker should deal with it (it doesn't make sense to keep a list of all possible misspellings):

- `<Word from="Goddamit"     to="Goddammit" />`
- `<Word from="mischevious"  to="mischievous" />`
- `<Word from="perscribed"   to="prescribed" />`
- `<Word from="perscription" to="prescription" />`
- `<Word from="pshyco"       to="psycho" />`
- `<Word from="thoguht"      to="thought" />`

Spelling changes meaning:

- `<Word from="ahold"  to="a hold" />`
- `<Word from="google" to="Google" />`

Find and replace are the same:

- `<Word from="I thought" to="I thought" />`
- `<Word from="literally" to="literally" />`

Resulting punctuation seems unlikely:

- `<Word from="'Qkay_"         to="- Okay!" />`
- `<Word from="_Qkay-"         to="- Okay!" />`
- `<Word from="'Qkay"          to="- Okay" />`
- `<Word from="JOEY-"          to="Joey!" />`
- `<Word from="_NO__"          to="No--" />`

Other reason:

Replacement rule                              | Comment
:---------------------------------------------|:-------------------
`<Word from="cp"          to="op" />`         | doesn't seem useful
`<Word from="lnte"        to="inte" />`       | doesn't seem useful
`<Word from="gothere"     to="go there" />`   | could also be "got here"
`<Word from="ridonculous" to="ridiculous" />` | intentional mispronunciation
`<Word from="I02"         to="Pops" />`       | seems really implausible, and it could mess up IDs, codes, etc.
2016-09-27 15:10:58 +02:00
Waldi Ravens
e26c5acdf5 dictionaries: automated XML upkeep 2016-09-21 12:40:08 +02:00
aaaxx
1443e279a6 Removed licence/license rule: it's not a typo
In British English "licence" is a noun and "license" a verb.
2016-09-16 06:40:36 +02:00
Nikolaj Olsson
f9a2e99d54 Added a few words to the English OCR fix replace list 2016-08-21 20:20:36 +02:00
Waldi Ravens
5b312d4a3a dictionaries: automated XML upkeep 2016-08-20 19:29:34 +02:00
Nikolaj Olsson
eca8f0546a Minor dictionary update 2016-08-07 15:25:26 +02:00
Waldi Ravens
125d2dceb6 dictionaries: automated XML upkeep 2016-05-29 13:02:16 +02:00
Nikolaj Olsson
7d09349e0b Some minor improvements for OCR via "Binary image compare" 2016-05-06 15:38:42 +02:00
Waldi Ravens
b8ebe12640 dictionaries: automated XML upkeep 2016-04-13 13:32:50 +02:00
Kruno H
03f1f7d7c0 Update eng_OCRFixReplaceList.xml 2016-03-23 17:40:31 +01:00
Kruno H
00caef81ec Update eng_OCRFixReplaceList.xml 2016-03-22 20:35:44 +01:00
Waldi Ravens
e9964d82f8 dictionaries: automated XML upkeep 2016-02-17 20:55:21 +01:00
niksedk
8f489fd611 Minor update of eng_OCRFixReplaceList.xml 2016-01-22 20:58:42 +01:00
niksedk
ae203e5e7b Updated of eng_OCRFixReplaceList.xml 2016-01-15 13:08:51 +01:00
niksedk
157ebe44c7 Minor update of word lists 2016-01-13 20:24:07 +01:00
Waldi Ravens
dedb933b9a dictionaries: automated XML upkeep 2015-11-08 20:37:41 +01:00
niksedk
79394e8656 Added a few words to English ocr fix list 2015-10-26 20:07:11 +01:00
niksedk
108a0ae6a5 Minor fixes for beta ocr method (for a future version...) 2015-09-24 06:10:12 +02:00
Waldi Ravens
f070d913b7 dictionaries: automated XML upkeep 2015-07-28 19:49:19 +02:00
Kruno H
9e75dcc98c Update eng_OCRFixReplaceList.xml 2015-07-19 17:42:26 +02:00
Waldi Ravens
fcd746fea2 dictionaries: automated XML upkeep 2015-06-25 11:40:47 +02:00
Kruno H
018b11a80c Update eng_OCRFixReplaceList.xml 2015-06-24 21:41:46 +02:00
Waldi Ravens
5f682d1242 Removed duplicates (Dictionaries/eng_OCRFixReplaceList.xml) 2015-06-14 07:44:15 +02:00
Waldi Ravens
032dbcebab dictionaries: automated XML upkeep 2015-05-23 04:45:21 +02:00
Nikolaj Olsson
8a6d6e21d1 Merge pull request #752 from diomed/patch-1
Update eng_OCRFixReplaceList.xml
2015-05-19 17:45:22 +02:00
Kruno H
11f936cd6e Update eng_OCRFixReplaceList.xml 2015-05-13 11:23:16 +02:00
Kruno H
8e12d1806f Update eng_OCRFixReplaceList.xml 2015-05-09 20:07:19 +02:00
Waldi Ravens
60a5f02d7e Added missing dquote (eng_OCRFixReplaceList.xml) 2015-05-05 07:37:57 +02:00
Nikolaj Olsson
a168da372b Merge pull request #724 from xylographe/ocrfrl
Updated eng_OCRFixReplaceList.xml
2015-05-04 18:34:24 +02:00
Kruno H
600f71b0fc Update eng_OCRFixReplaceList 2015-05-04 18:04:38 +02:00
Waldi Ravens
126c4f94d4 Updated eng_OCRFixReplaceList.xml 2015-05-03 19:42:16 +02:00
XhmikosR
1aa302c1d3 Update eng_OCRFixReplaceList.xml.
[ci skip]
2014-12-06 13:06:15 +02:00
XhmikosR
9bcd72a8b8 Update eng_OCRFixReplaceList.xml.
[ci skip]
2014-12-05 14:13:57 +02:00
XhmikosR
0228ed4575 Update dictionaries.
[ci skip]
2014-12-01 15:07:50 +02:00
XhmikosR
85d37dcf79 Update eng_OCRFixReplaceList.xml. 2014-10-05 18:58:06 +03:00
XhmikosR
6e4decb15c Update eng_OCRFixReplaceList.xml. 2014-10-05 18:47:44 +03:00
XhmikosR
9ecd7553ca Update eng_OCRFixReplaceList.xml. 2014-09-24 09:33:24 +03:00