mirror of
https://github.com/imapsync/imapsync.git
synced 2024-11-17 00:02:29 +01:00
262 lines
9.8 KiB
Plaintext
262 lines
9.8 KiB
Plaintext
#!/bin/cat
|
|
$Id: FAQ.Massive.txt,v 1.15 2017/06/17 13:45:15 gilles Exp gilles $
|
|
|
|
This documentation is also at http://imapsync.lamiral.info/#doc
|
|
|
|
=======================================================================
|
|
Imapsync tips for massive/bulk migrations.
|
|
=======================================================================
|
|
|
|
Questions answered here are:
|
|
|
|
Q. I need to migrate hundred accounts, how can I do?
|
|
|
|
Q. I have to migrate 500k users using 400 TB of disk space.
|
|
How do I proceed? How about speed?
|
|
|
|
Q. How to determine where is the bottleneck in an imapsync process?
|
|
|
|
Q. Can I run more instances of imapsync in parallel on a Windows host?
|
|
|
|
Q. I run multiple imapsync applications at the same time then get a
|
|
warning "imapsync.pid already exists, overwriting it".
|
|
Is this a potential problem when trying to sync multiple
|
|
IMAP account in parallel?
|
|
|
|
|
|
=======================================================================
|
|
Q. I need to migrate hundred accounts, how can I do?
|
|
|
|
R. If you have many mailboxes to migrate think about a little
|
|
script program. Write a file called file.txt (for example)
|
|
containing hosts, users and passwords on both sides.
|
|
The separator used in this example is ";"
|
|
|
|
The file.txt file contains for example:
|
|
|
|
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2;
|
|
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2;
|
|
host003_1;user003_1;password003_1;host003_2;user003_2;password003_2;
|
|
host004_1;user004_1;password004_1;host004_2;user004_2;password004_2;
|
|
etc.
|
|
|
|
Most of the times, the first column (host001_1, host002_1 ...) will
|
|
contains the same value, the value of --host1 parameter. Same
|
|
thing for the third column (host001_2, host002_2).
|
|
|
|
On Unix the shell script can be:
|
|
|
|
#!/bin/sh
|
|
{ while IFS=';' read h1 u1 p1 h2 u2 p2 fake
|
|
do
|
|
imapsync --host1 "$h1" --user1 "$u1" --password1 "$p1" \
|
|
--host2 "$h2" --user2 "$u2" --password2 "$p2" "$@"
|
|
done
|
|
} < file.txt
|
|
|
|
You can add extra options inside this script, just after the variable "$@".
|
|
You can also pass extra options via the parameters of this script
|
|
since they will go in "$@"
|
|
|
|
Here is a complete Unix example ready to use:
|
|
http://imapsync.lamiral.info/examples/sync_loop_unix.sh
|
|
|
|
|
|
On Windows the batch script can be:
|
|
|
|
CD /D %~dp0
|
|
SET csvfile=file.txt
|
|
FOR /F "tokens=1,2,3,4,5,6,7 delims=; eol=#" %%G IN (%csvfile%) DO (
|
|
imapsync ^
|
|
--host1 %%G --user1 %%H --password1 %%I ^
|
|
--host2 %%J --user2 %%K --password2 %%L %%M ...
|
|
)
|
|
|
|
You can add extra options inside this script, just after the variable %%M.
|
|
You can add extra options inside the file.txt, in the last column. Add
|
|
an extra semicolon at the end (optional)
|
|
Example:
|
|
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2;
|
|
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2;
|
|
becomes
|
|
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2; --automap --addheader
|
|
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2; --automap --addheader
|
|
|
|
With this solution, options can be added, changed or removed per account.
|
|
Technically those options will go in %%M in the loop body
|
|
|
|
Here is a complete Windows example ready to use:
|
|
http://imapsync.lamiral.info/examples/sync_loop_windows.bat
|
|
|
|
Another solution to add extra arguments is to write another .bat that
|
|
calls sync_loop_windows.bat with the extra arguments, like this
|
|
for example:
|
|
|
|
sync_loop_windows.bat --automap --addheader --maxmessagespersecond 4
|
|
|
|
Technically those options will go in %arguments% in the loop body
|
|
of sync_loop_windows.bat
|
|
|
|
|
|
=======================================================================
|
|
Q. I have to migrate 500k users using 400 TB of disk space.
|
|
How do I proceed? How about speed?
|
|
|
|
R. Solution to this issue is two words: parallelism and measurements.
|
|
|
|
Since all mailboxes are functionnaly independent, they can be processed
|
|
independently, here comes parallelism, lunching several imapsync
|
|
processes in parallel.
|
|
|
|
Meanwhile, mailboxes usually belong to the same server and syncs
|
|
share the same imapsync host via the same bandwidth, here come
|
|
some limitations and bottlenecks.
|
|
|
|
How many syncs can we run in parallel? here comes measurements.
|
|
|
|
1) Measure the total transfer rate by adding each one printed in each run.
|
|
Since adding this way is not so easy, just look at the overall
|
|
network rate of the imapsync host.
|
|
|
|
On Linux, nload is good candidate to measure this overall
|
|
network rate, every 6 seconds, on eth0 interface, values in Kbytes:
|
|
|
|
nload -t 6000 eth0 -u K
|
|
|
|
Another excellent network tool is dstat:
|
|
|
|
dstat -n -N eth0 6
|
|
|
|
On Windows, get the overall network rate with the classical
|
|
task manager (Ctrl-Alt-Sup), there is a network tab in it.
|
|
Don't hesitate to send me free good tools to measure the
|
|
overall transfer rate (the best would be one to sum up only
|
|
imap traffic but that's not mandatory at all).
|
|
|
|
2) Launch new parallel runs, one by one, as long as the total
|
|
transfer rate increase.
|
|
|
|
3) When the total transfer rate starts to diminish, stop new launches.
|
|
Note N as the number of parallel runs you got until then.
|
|
|
|
4) Only keep N-2 parallel runs for the future.
|
|
|
|
|
|
=======================================================================
|
|
Q. How to determine where is the bottleneck in an imapsync process?
|
|
|
|
R1. Divide and conquer.
|
|
|
|
In order to detect whether host1/link1 is the bottleneck or
|
|
host2/link2, we have several tests to explore:
|
|
|
|
1) run a sync from host1 to host1, with a host1 test account as destination.
|
|
This way, only host1+link1 are tested, host2 is not directly concerned.
|
|
If performances increase a lot then host2/link2 is the bottleneck.
|
|
|
|
2) run a sync from host2 to host2, with a host2 test account as destination.
|
|
This way, only host2+link2 are tested, host1 is not concerned.
|
|
If performances increase a lot then host1/link1 is the bottleneck.
|
|
|
|
If performances increase on both tests 1) and 2), I have no clue to explain that.
|
|
Same thing if they both decrease!
|
|
|
|
R2. Isolating and overcoming bottlenecks
|
|
|
|
On any process involving several mechanisms, among all elements taking
|
|
part on the process there is always a bottleneck. No one knows in
|
|
advance what is the first bottleneck. The first bottleneck has to be
|
|
determined, by measurements, not by guesses. Once this first
|
|
bottleneck is known and overcome then the next bottleneck has to be
|
|
determined and overcome too, if needed. Repeat the process of looking
|
|
for the next bottleneck and its elimination until you estimate the
|
|
transfer rates, money costs and final dates are good enough to proceed
|
|
the whole huge migration.
|
|
|
|
Possible bottlenecks:
|
|
|
|
- Throttles.
|
|
IMAP servers have artificial limits.
|
|
For example Gmail, Office365, Exchange have throttle limits.
|
|
|
|
- Bandwidth.
|
|
Usually available bandwidth is not a bottleneck.
|
|
Meanwhile, it can be a bottleneck on small Internet connexions.
|
|
Imapsync downloads messages from host1 and upload messages to host2,
|
|
consider this in case the connexion are asymetric.
|
|
|
|
- I/O on disks.
|
|
I/O are a classical bottleneck, almost always forgotten.
|
|
Unlike CPU and RAM, Input/Output performances don't improve
|
|
very much as time goes on so it's often a bottleneck.
|
|
|
|
- RAM memory.
|
|
On all sides, monitor that your systems don't swap on disk,
|
|
because swapping memory on disks decreases performance by
|
|
a factor of 20, at least.
|
|
|
|
- CPU.
|
|
100% CPU during a whole transfer means the system is busy.
|
|
Usually CPU is not a problem with imapsync but it can be a problem
|
|
with one of the imap servers.
|
|
Most often CPU is not the real bottleneck, I/O are.
|
|
|
|
Other possible bottlenecks:
|
|
- Number of hosts available to run imapsync processes.
|
|
- Imapsync itself.
|
|
- Errors management.
|
|
- MX domains, DNS.
|
|
- Money.
|
|
- Time.
|
|
- Bad luck.
|
|
- ...
|
|
|
|
|
|
=======================================================================
|
|
Q. Can I run more instances of imapsync in parallel on a Windows host?
|
|
|
|
R. Yes!
|
|
|
|
Q. Any performance issue?
|
|
|
|
You have to try and check the transfer rates, sum them up to
|
|
have a uniq numeric criteria.
|
|
There is always a limit, depending on remote imap servers
|
|
and the one running imapsync;
|
|
CPU, memory, Inputs/Outputs are the classical bottlenecks,
|
|
the worst bottleneck is the winner that sets the limit.
|
|
|
|
examples/sync_loop_windows.bat says
|
|
...
|
|
REM ==== Parallel executions ====
|
|
REM If you want to do parallel runs of imapsync then this current script is a good start.
|
|
REM Just copy it several times and replace, on each copy, the csvfile variable value.
|
|
REM Instead of SET csvfile=file.txt write for example
|
|
REM SET csvfile=file01.txt in the first copy
|
|
REM then also
|
|
REM SET csvfile=file02.txt in the second copy etc.
|
|
REM Of course you also have to split the data contained in file.txt
|
|
REM into file01.txt file02.txt etc.
|
|
REM After that, just double-click on each batch file to launch each process
|
|
|
|
|
|
=======================================================================
|
|
Q. I run multiple imapsync applications at the same time then get a
|
|
warning "imapsync.pid already exists, overwriting it".
|
|
Is this a potential problem when trying to sync multiple
|
|
IMAP account in parallel?
|
|
|
|
R1. No issue with the file imapsync.pid if you don't use its content
|
|
by yourself.
|
|
|
|
This file can help you to manage multiple runs by sending signals
|
|
to the processes (sigterm or sigkill) using their PID.
|
|
Each run can have its own pid file with --pidfile option.
|
|
The file imapsync.pid contains the PID of the current imapsync process.
|
|
This file is removed at the end of a normal run.
|
|
You can safely ignore the warning if you don't use imapsync.pid file
|
|
to manage imapsync processes.
|
|
|
|
=======================================================================
|
|
=======================================================================
|