1
0
mirror of https://github.com/imapsync/imapsync.git synced 2024-11-17 00:02:29 +01:00
imapsync/FAQ.d/FAQ.Duplicates.txt
Nick Bebout 275436c5a0 1.945
2019-07-02 18:25:47 -05:00

204 lines
8.0 KiB
Plaintext

#!/bin/cat
$Id: FAQ.Duplicates.txt,v 1.20 2019/05/17 10:10:30 gilles Exp gilles $
This documentation is also available online at
https://imapsync.lamiral.info/FAQ.d/
https://imapsync.lamiral.info/FAQ.d/FAQ.Duplicates.txt
=======================================================================
Imapsync tips about duplicated messages issues.
=======================================================================
=======================================================================
Q. How can I know if imapsync will generate duplicates on a second run?
R. To see if imapsync will generate duplicates on a second run, start
a second run with --dry option added. With --dry, imapsync will
show whether it would mistakenly copy messages again, but without
really copying them:
imapsync ... --dry
The final stats should also show a positive value for the line
"Messages skipped:" since most of the skipped messages are skipped
because they are already on host2. Example of final stats:
++++ Statistics
Transfer started on : Thu Aug 31 04:28:32 2017
Transfer ended on : Thu Aug 31 04:28:44 2017
Transfer time : 11.7 sec
Folders synced : 1/1 synced
Messages transferred : 0
Messages skipped : 1555
=======================================================================
Q: Multiple copies, duplicates, when I run imapsync twice ore more.
R0. First, some explanations to understand the issue.
Normally and by default, imapsync doesn't generate duplicates.
So, if it does generate duplicates it means a problem occurs
with message identification. It happens sometimes with IMAP
servers changing the "Message-Id" header line or one or more
of the "Received:" header lines in the header part of messages.
By default, Imapsync uses "Message-Id" header line and
"Received:" header lines to identify messages on both sides.
R1. This solution is R3 simplified.
A quick practical solution is to change the way imapsync
identify messages that works most of the time. But since
you're reading this because you encountered duplicates issue,
let's check this solution in a safe way.
First use the same commmand with additionnal options:
imapsync ... --useheader "Message-Id" --dry
The previous command does nothing real but it will show you
if imapsync handles duplicates in a better way.
The criterium is to search at the end of the sync for a line
like this one:
Messages skipped : 1555
where 1555 is an example but reflects mostly the number
of all messages already transferred.
If you end with:
Messages skipped : 0
don't go on, it means imapsync is still suffering to
identify messages.
If you end with many messages skipped then it's very
good and now you can safely resync the mailbox
and get rid of the dupplicates messages on host2 with:
imapsync ... --useheader "Message-Id" --delete2duplicates
End of the problem!
R2.
A second solution is to use option --useuid.
With option --useuid, imapsync doesn't use header lines
to identify and compare messages in folders.
Instead of some headers, --useuid tell imapsync to use
the imap UIDs given by imap servers on both sides.
To avoid duplicates on next runs, imapsync uses a local cache
where it keeps UIDs already transferred.
imapsync ... --useuid
There is an issue when --useuid is not used the first time.
A big issue with --useuid is that it doesn't generate duplicates if
used from the first time but it does generate duplicates after a previous
run without --useuid (because it then uses a different method to identify
the messages).
A solution? Two solutions.
The easiest is --delete2 if you are permitted to use it.
Option --delete2 removes messages on host2
that are not on host1. So, with --delete2 you go for resyncing all
messages again. All previously transferred messages are deleted,
but also messages previously there without imapsync.
So --useuid --delete2 is an easy way to remove duplicates but it
is not suitable in all contexts. The good context is that the host2
account must be considered as a strict replication of the host1
account, ie, host2 not active yet.
A second solution, better if R3 works (see R3 below), is to build
the cache before using --useuid
First sync:
imapsync ... --useheader "Message-Id" --addheader --usecache
Next syncs:
imapsync ... --useuid
imapsync ... --useuid
...
R3.
Best way if you can follow it.
Multiple copies of the emails on the destination server. Some IMAP
servers (Domino for example) change some headers for each message
transferred. All messages are transferred again and again each time you
run imapsync. This is bad of course. The explanation is that imapsync
considers messages are not the same on each side, default headers used
to identify the messages have changed.
You can look at the headers found by imapsync by using the --debug
option (and search for the message on both part), Header lines from
the source server begin with a "FH:" prefix, Header lines from the
destination server begin with a "TH:" prefix. Since --debug is very
verbose I suggest to isolate a email in a specific folder in case you
want to forward me the output.
A way to avoid this problem is by using option --useheader with
a different set than the default ones used by imapsync.
The default set is equivalent to:
imapsync ... --useheader "Message-Id" --useheader "Received"
The problem now is that what can be used instead of Message-Id
and Received lines? Often standalone Message-Id works:
imapsync ... --useheader "Message-Id"
Once imapsync does not generate duplicates, the previous duplicates
can be deleted with option --delete2duplicates
imapsync ... --useheader "Message-Id" --delete2duplicates
Another good way toward a solution is to isolate two or three messages
in a BUG folder and send me the --debug output by email to
gilles@lamiral.info
imapsync ... --debug --folder BUG
I will take a close look at the log and modify imapsync to fix
this faulty duplicate behavior.
Remark. (Trick found by Tomasz Kaczmarski)
Option --useheader "Message-Id" asks the server to send only header
lines beginning with "Message-Id". Some (buggy) servers send the whole
header (all lines) instead of the "Message-Id" line. In that case, a
trick to keep the --useheader filtering behavior is to use
--skipheader with a negative lookahead pattern:
imapsync ... --skipheader "^(?!Message-Id)"
Read it as "skip every header except Message-Id".
=======================================================================
Q. imapsync calculates 479 messages in a folder but only transfers 400
messages. What's happen?
R1. Unless --useuid is used, imapsync considers a header part
of a message to identify a message on both sides.
By default the header part used is lines "Message-Id:" "Message-ID:"
and "Received:" or specific lines depending on --useheader
--skipheader. Whole header can be set by --useheader ALL
Consequences:
1) Duplicate messages on host1 (identical header) are not transferred.
The result is that you can have more messages on host1 than on host2.
R2. With option --useuid imapsync doesn't use headers to identify
messages on both sides but it uses their imap uid identifier.
In that case duplicates on host1 are also transferred on host2.
=======================================================================
Q. How can I remove duplicates in a lone account?
R. In order to remove duplicates in a lone account, just run imapsync
on the same account as source and destination, plus the
option --delete2duplicates, ie, with
host1 == host2, user1 == user2, password1 == password2
=======================================================================
=======================================================================