2015-08-04 03:44:40 +02:00
|
|
|
#!/bin/cat
|
2019-07-03 01:17:46 +02:00
|
|
|
$Id: FAQ.Duplicates.txt,v 1.18 2019/02/14 17:22:10 gilles Exp gilles $
|
2016-09-19 17:15:41 +02:00
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
This documentation is also available online at
|
|
|
|
https://imapsync.lamiral.info/FAQ.d/
|
|
|
|
https://imapsync.lamiral.info/FAQ.d/FAQ.Duplicates.txt
|
2015-08-04 03:44:40 +02:00
|
|
|
|
2017-09-23 23:54:48 +02:00
|
|
|
=======================================================================
|
2019-07-03 01:17:46 +02:00
|
|
|
Imapsync tips about duplicated messages issues.
|
2017-09-23 23:54:48 +02:00
|
|
|
=======================================================================
|
2015-08-04 03:44:40 +02:00
|
|
|
|
2017-09-23 23:54:48 +02:00
|
|
|
=======================================================================
|
|
|
|
Q. How can I know if imapsync will generate duplicates on a second run?
|
|
|
|
|
|
|
|
R. To see if imapsync will generate duplicates on a second run, start
|
|
|
|
a second run with --dry option added. imapsync will then show if it
|
|
|
|
would mistakenly copy messages again, but without really copying them
|
|
|
|
|
|
|
|
imapsync ... --dry
|
|
|
|
|
|
|
|
The final stats should also show a positive value for the line
|
|
|
|
"Messages skipped:" since most of the skipped messages are skipped
|
|
|
|
because they are already on host2. Example of final stats:
|
|
|
|
|
|
|
|
++++ Statistics
|
|
|
|
Transfer started on : Thu Aug 31 04:28:32 2017
|
|
|
|
Transfer ended on : Thu Aug 31 04:28:44 2017
|
|
|
|
Transfer time : 11.7 sec
|
|
|
|
Folders synced : 1/1 synced
|
|
|
|
Messages transferred : 0
|
|
|
|
Messages skipped : 1555
|
|
|
|
|
2015-08-04 03:44:40 +02:00
|
|
|
=======================================================================
|
|
|
|
Q: Multiple copies, duplicates, when I run imapsync twice ore more.
|
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
R0. First, some explanations to understand the issue.
|
2015-08-04 03:44:40 +02:00
|
|
|
Normally and by default, imapsync doesn't generate duplicates.
|
|
|
|
So if it does generate duplicates it means a problem occurs
|
|
|
|
with message identification. It happens sometimes with IMAP
|
2016-09-19 17:17:24 +02:00
|
|
|
servers changing the "Message-Id" header line or one or more
|
|
|
|
of the "Received:" header lines in the header part of messages.
|
2018-05-07 16:04:23 +02:00
|
|
|
By default, Imapsync uses "Message-Id" header line and
|
|
|
|
"Received:" header lines to identify messages on both sides.
|
2015-08-04 03:44:40 +02:00
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
R1. This solution is R3 simplified.
|
|
|
|
A quick practical solution is to change the way imapsync
|
|
|
|
identify messages that works most of the time. But since
|
|
|
|
you're reading this because you encountered duplicates issue,
|
|
|
|
let's check this solution in a safe way.
|
|
|
|
|
|
|
|
First use the same commmand with additionnal options:
|
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id" --dry
|
|
|
|
|
|
|
|
The previous command does nothing real but it will show you
|
|
|
|
if imapsync handles duplicates in a better way.
|
|
|
|
The criterium is to search at the end of the sync for a line
|
|
|
|
like this one:
|
|
|
|
Messages skipped : 1555
|
|
|
|
where 1555 is an example but reflects mostly the number
|
|
|
|
of all messages already transferred.
|
|
|
|
|
|
|
|
If you end with:
|
|
|
|
Messages skipped : 0
|
|
|
|
don't go on, it means imapsync is still suffering to
|
|
|
|
identify messages.
|
|
|
|
|
|
|
|
If you end with many messages skipped then it's very
|
|
|
|
good and now you can safely resync the mailboxe
|
|
|
|
and get rid of the dupplicates messages on host2 with:
|
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id" --delete2duplicates
|
|
|
|
|
|
|
|
End of the problem!
|
|
|
|
|
|
|
|
R2.
|
|
|
|
A second solution is to use option --useuid.
|
2016-09-19 17:17:24 +02:00
|
|
|
With option --useuid, imapsync doesn't use header lines
|
|
|
|
to identify and compare messages in folders.
|
|
|
|
Instead of some headers, --useuid tell imapsync to use
|
2018-05-07 16:04:23 +02:00
|
|
|
the imap UIDs given by imap servers on both sides.
|
2016-09-19 17:17:24 +02:00
|
|
|
To avoid duplicates on next runs, imapsync uses a local cache
|
2018-05-07 16:04:23 +02:00
|
|
|
where it keeps UIDs already transfered.
|
2015-08-04 03:44:40 +02:00
|
|
|
|
|
|
|
imapsync ... --useuid
|
|
|
|
|
2016-09-19 17:17:24 +02:00
|
|
|
There is an issue when --useuid is not used the first time.
|
2015-08-04 03:44:40 +02:00
|
|
|
A big issue with --useuid is that it doesn't generate duplicates if
|
2016-09-19 17:15:41 +02:00
|
|
|
used from the first time but it does generate duplicates after a previous
|
2018-05-07 16:04:23 +02:00
|
|
|
run without --useuid (because it then uses a different method to identify
|
2016-09-19 17:17:24 +02:00
|
|
|
the messages).
|
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
A solution? Two solutions.
|
2016-09-19 17:17:24 +02:00
|
|
|
|
2018-05-07 16:04:23 +02:00
|
|
|
The easiest is --delete2 if you are permitted to use it.
|
|
|
|
Option --delete2 removes messages on host2
|
|
|
|
that are not on host1. So, with --delete2 you go for resyncing all
|
2016-09-19 17:17:24 +02:00
|
|
|
messages again but all previously transferred messages are deleted,
|
2018-05-07 16:04:23 +02:00
|
|
|
but also messages previously there without imapsync.
|
2016-09-19 17:17:24 +02:00
|
|
|
So --useuid --delete2 is easy to remove duplicates but not for
|
2018-05-07 16:04:23 +02:00
|
|
|
all contexts. The host2 account must be considered as a strict
|
|
|
|
replication of the host1 account, ie, not active.
|
2016-09-19 17:17:24 +02:00
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
A second solution, better if R3 works (see R3 below), is to build
|
2016-09-19 17:17:24 +02:00
|
|
|
the cache before using --useuid
|
|
|
|
|
|
|
|
First sync:
|
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id" --addheader --usecache
|
|
|
|
|
|
|
|
Next syncs:
|
|
|
|
|
|
|
|
imapsync ... --useuid
|
|
|
|
imapsync ... --useuid
|
|
|
|
...
|
2015-08-04 03:44:40 +02:00
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
R3.
|
2015-08-04 03:44:40 +02:00
|
|
|
Best way if you can follow it.
|
|
|
|
Multiple copies of the emails on the destination server. Some IMAP
|
|
|
|
servers (Domino for example) change some headers for each message
|
|
|
|
transferred. All messages are transferred again and again each time you
|
|
|
|
run imapsync. This is bad of course. The explanation is that imapsync
|
|
|
|
considers messages are not the same on each side, default headers used
|
|
|
|
to identify the messages have changed.
|
|
|
|
|
|
|
|
You can look at the headers found by imapsync by using the --debug
|
|
|
|
option (and search for the message on both part), Header lines from
|
|
|
|
the source server begin with a "FH:" prefix, Header lines from the
|
|
|
|
destination server begin with a "TH:" prefix. Since --debug is very
|
|
|
|
verbose I suggest to isolate a email in a specific folder in case you
|
|
|
|
want to forward me the output.
|
|
|
|
|
|
|
|
A way to avoid this problem is by using option --useheader with
|
|
|
|
a different set than the default ones used by imapsync.
|
|
|
|
|
2016-09-19 17:15:41 +02:00
|
|
|
The default set is equivalent to:
|
2015-08-04 03:44:40 +02:00
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id" --useheader "Received"
|
|
|
|
|
2016-09-19 17:15:41 +02:00
|
|
|
The problem now is that what can be used instead of Message-Id
|
2015-08-04 03:44:40 +02:00
|
|
|
and Received lines? Often standalone Message-Id works:
|
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id"
|
|
|
|
|
2015-12-03 18:16:32 +01:00
|
|
|
Once imapsync does not generate duplicates, the previous duplicates
|
|
|
|
can be deleted with option --delete2duplicates
|
|
|
|
|
|
|
|
imapsync ... --useheader "Message-Id" --delete2duplicates
|
|
|
|
|
2015-08-04 03:44:40 +02:00
|
|
|
Another good way toward a solution is to isolate two or three messages
|
2018-05-07 16:04:23 +02:00
|
|
|
in a BUG folder and send me the --debug output by email to
|
|
|
|
gilles@lamiral.info
|
2015-08-04 03:44:40 +02:00
|
|
|
|
|
|
|
imapsync ... --debug --folder BUG
|
|
|
|
|
|
|
|
I will take a close look at the log and modify imapsync to fix
|
2015-12-03 18:16:32 +01:00
|
|
|
this faulty duplicate behavior.
|
2015-08-04 03:44:40 +02:00
|
|
|
|
|
|
|
Remark. (Trick found by Tomasz Kaczmarski)
|
|
|
|
|
|
|
|
Option --useheader "Message-Id" asks the server to send only header
|
|
|
|
lines beginning with "Message-Id". Some (buggy) servers send the whole
|
|
|
|
header (all lines) instead of the "Message-Id" line. In that case, a
|
2018-05-07 16:04:23 +02:00
|
|
|
trick to keep the --useheader filtering behavior is to use
|
2015-08-04 03:44:40 +02:00
|
|
|
--skipheader with a negative lookahead pattern:
|
|
|
|
|
2018-05-07 16:04:23 +02:00
|
|
|
imapsync ... --skipheader "^(?!Message-Id)"
|
2015-08-04 03:44:40 +02:00
|
|
|
|
|
|
|
Read it as "skip every header except Message-Id".
|
|
|
|
|
|
|
|
=======================================================================
|
2016-09-19 17:15:41 +02:00
|
|
|
Q. imapsync calculates 479 messages in a folder but only transfers 400
|
|
|
|
messages. What's happen?
|
|
|
|
|
|
|
|
R1. Unless --useuid is used, imapsync considers a header part
|
|
|
|
of a message to identify a message on both sides.
|
|
|
|
By default the header part used is lines "Message-Id:" "Message-ID:"
|
|
|
|
and "Received:" or specific lines depending on --useheader
|
|
|
|
--skipheader. Whole header can be set by --useheader ALL
|
|
|
|
|
|
|
|
Consequences:
|
|
|
|
|
|
|
|
1) Duplicate messages on host1 (identical header) are not transferred.
|
|
|
|
|
|
|
|
The result is that you can have more messages on host1 than on host2.
|
|
|
|
|
|
|
|
R2. With option --useuid imapsync doesn't use headers to identify
|
|
|
|
messages on both sides but it uses their imap uid identifier.
|
|
|
|
In that case duplicates on host1 are also transferred on host2.
|
|
|
|
|
2019-07-03 01:17:46 +02:00
|
|
|
=======================================================================
|
|
|
|
Q. How can I remove duplicates in a lone account?
|
|
|
|
|
|
|
|
R. Just run imapsync on the same account with option --delete2duplicates,
|
|
|
|
ie, with host1 == host2, user1 == user2, password1 == password2
|
|
|
|
|
2016-09-19 17:15:41 +02:00
|
|
|
=======================================================================
|
2017-09-23 23:54:48 +02:00
|
|
|
=======================================================================
|