mirror of
https://github.com/imapsync/imapsync.git
synced 2024-11-17 00:02:29 +01:00
204 lines
8.0 KiB
Plaintext
204 lines
8.0 KiB
Plaintext
#!/bin/cat
|
|
$Id: FAQ.Duplicates.txt,v 1.20 2019/05/17 10:10:30 gilles Exp gilles $
|
|
|
|
This documentation is also available online at
|
|
https://imapsync.lamiral.info/FAQ.d/
|
|
https://imapsync.lamiral.info/FAQ.d/FAQ.Duplicates.txt
|
|
|
|
=======================================================================
|
|
Imapsync tips about duplicated messages issues.
|
|
=======================================================================
|
|
|
|
=======================================================================
|
|
Q. How can I know if imapsync will generate duplicates on a second run?
|
|
|
|
R. To see if imapsync will generate duplicates on a second run, start
|
|
a second run with --dry option added. With --dry, imapsync will
|
|
show whether it would mistakenly copy messages again, but without
|
|
really copying them:
|
|
|
|
imapsync ... --dry
|
|
|
|
The final stats should also show a positive value for the line
|
|
"Messages skipped:" since most of the skipped messages are skipped
|
|
because they are already on host2. Example of final stats:
|
|
|
|
++++ Statistics
|
|
Transfer started on : Thu Aug 31 04:28:32 2017
|
|
Transfer ended on : Thu Aug 31 04:28:44 2017
|
|
Transfer time : 11.7 sec
|
|
Folders synced : 1/1 synced
|
|
Messages transferred : 0
|
|
Messages skipped : 1555
|
|
|
|
=======================================================================
|
|
Q: Multiple copies, duplicates, when I run imapsync twice ore more.
|
|
|
|
R0. First, some explanations to understand the issue.
|
|
Normally and by default, imapsync doesn't generate duplicates.
|
|
So, if it does generate duplicates it means a problem occurs
|
|
with message identification. It happens sometimes with IMAP
|
|
servers changing the "Message-Id" header line or one or more
|
|
of the "Received:" header lines in the header part of messages.
|
|
By default, Imapsync uses "Message-Id" header line and
|
|
"Received:" header lines to identify messages on both sides.
|
|
|
|
R1. This solution is R3 simplified.
|
|
A quick practical solution is to change the way imapsync
|
|
identify messages that works most of the time. But since
|
|
you're reading this because you encountered duplicates issue,
|
|
let's check this solution in a safe way.
|
|
|
|
First use the same commmand with additionnal options:
|
|
|
|
imapsync ... --useheader "Message-Id" --dry
|
|
|
|
The previous command does nothing real but it will show you
|
|
if imapsync handles duplicates in a better way.
|
|
The criterium is to search at the end of the sync for a line
|
|
like this one:
|
|
Messages skipped : 1555
|
|
where 1555 is an example but reflects mostly the number
|
|
of all messages already transferred.
|
|
|
|
If you end with:
|
|
Messages skipped : 0
|
|
don't go on, it means imapsync is still suffering to
|
|
identify messages.
|
|
|
|
If you end with many messages skipped then it's very
|
|
good and now you can safely resync the mailbox
|
|
and get rid of the dupplicates messages on host2 with:
|
|
|
|
imapsync ... --useheader "Message-Id" --delete2duplicates
|
|
|
|
End of the problem!
|
|
|
|
R2.
|
|
A second solution is to use option --useuid.
|
|
With option --useuid, imapsync doesn't use header lines
|
|
to identify and compare messages in folders.
|
|
Instead of some headers, --useuid tell imapsync to use
|
|
the imap UIDs given by imap servers on both sides.
|
|
To avoid duplicates on next runs, imapsync uses a local cache
|
|
where it keeps UIDs already transferred.
|
|
|
|
imapsync ... --useuid
|
|
|
|
There is an issue when --useuid is not used the first time.
|
|
A big issue with --useuid is that it doesn't generate duplicates if
|
|
used from the first time but it does generate duplicates after a previous
|
|
run without --useuid (because it then uses a different method to identify
|
|
the messages).
|
|
|
|
A solution? Two solutions.
|
|
|
|
The easiest is --delete2 if you are permitted to use it.
|
|
Option --delete2 removes messages on host2
|
|
that are not on host1. So, with --delete2 you go for resyncing all
|
|
messages again. All previously transferred messages are deleted,
|
|
but also messages previously there without imapsync.
|
|
So --useuid --delete2 is an easy way to remove duplicates but it
|
|
is not suitable in all contexts. The good context is that the host2
|
|
account must be considered as a strict replication of the host1
|
|
account, ie, host2 not active yet.
|
|
|
|
A second solution, better if R3 works (see R3 below), is to build
|
|
the cache before using --useuid
|
|
|
|
First sync:
|
|
|
|
imapsync ... --useheader "Message-Id" --addheader --usecache
|
|
|
|
Next syncs:
|
|
|
|
imapsync ... --useuid
|
|
imapsync ... --useuid
|
|
...
|
|
|
|
R3.
|
|
Best way if you can follow it.
|
|
Multiple copies of the emails on the destination server. Some IMAP
|
|
servers (Domino for example) change some headers for each message
|
|
transferred. All messages are transferred again and again each time you
|
|
run imapsync. This is bad of course. The explanation is that imapsync
|
|
considers messages are not the same on each side, default headers used
|
|
to identify the messages have changed.
|
|
|
|
You can look at the headers found by imapsync by using the --debug
|
|
option (and search for the message on both part), Header lines from
|
|
the source server begin with a "FH:" prefix, Header lines from the
|
|
destination server begin with a "TH:" prefix. Since --debug is very
|
|
verbose I suggest to isolate a email in a specific folder in case you
|
|
want to forward me the output.
|
|
|
|
A way to avoid this problem is by using option --useheader with
|
|
a different set than the default ones used by imapsync.
|
|
|
|
The default set is equivalent to:
|
|
|
|
imapsync ... --useheader "Message-Id" --useheader "Received"
|
|
|
|
The problem now is that what can be used instead of Message-Id
|
|
and Received lines? Often standalone Message-Id works:
|
|
|
|
imapsync ... --useheader "Message-Id"
|
|
|
|
Once imapsync does not generate duplicates, the previous duplicates
|
|
can be deleted with option --delete2duplicates
|
|
|
|
imapsync ... --useheader "Message-Id" --delete2duplicates
|
|
|
|
Another good way toward a solution is to isolate two or three messages
|
|
in a BUG folder and send me the --debug output by email to
|
|
gilles@lamiral.info
|
|
|
|
imapsync ... --debug --folder BUG
|
|
|
|
I will take a close look at the log and modify imapsync to fix
|
|
this faulty duplicate behavior.
|
|
|
|
Remark. (Trick found by Tomasz Kaczmarski)
|
|
|
|
Option --useheader "Message-Id" asks the server to send only header
|
|
lines beginning with "Message-Id". Some (buggy) servers send the whole
|
|
header (all lines) instead of the "Message-Id" line. In that case, a
|
|
trick to keep the --useheader filtering behavior is to use
|
|
--skipheader with a negative lookahead pattern:
|
|
|
|
imapsync ... --skipheader "^(?!Message-Id)"
|
|
|
|
Read it as "skip every header except Message-Id".
|
|
|
|
=======================================================================
|
|
Q. imapsync calculates 479 messages in a folder but only transfers 400
|
|
messages. What's happen?
|
|
|
|
R1. Unless --useuid is used, imapsync considers a header part
|
|
of a message to identify a message on both sides.
|
|
By default the header part used is lines "Message-Id:" "Message-ID:"
|
|
and "Received:" or specific lines depending on --useheader
|
|
--skipheader. Whole header can be set by --useheader ALL
|
|
|
|
Consequences:
|
|
|
|
1) Duplicate messages on host1 (identical header) are not transferred.
|
|
|
|
The result is that you can have more messages on host1 than on host2.
|
|
|
|
R2. With option --useuid imapsync doesn't use headers to identify
|
|
messages on both sides but it uses their imap uid identifier.
|
|
In that case duplicates on host1 are also transferred on host2.
|
|
|
|
=======================================================================
|
|
Q. How can I remove duplicates in a lone account?
|
|
|
|
R. In order to remove duplicates in a lone account, just run imapsync
|
|
on the same account as source and destination, plus the
|
|
option --delete2duplicates, ie, with
|
|
host1 == host2, user1 == user2, password1 == password2
|
|
|
|
|
|
=======================================================================
|
|
=======================================================================
|