1
0
mirror of https://github.com/imapsync/imapsync.git synced 2024-11-17 00:02:29 +01:00
imapsync/FAQ.d/FAQ.Massive.txt

274 lines
10 KiB
Plaintext
Raw Normal View History

2015-05-28 19:04:57 +02:00
#!/bin/cat
2018-05-07 16:04:23 +02:00
$Id: FAQ.Massive.txt,v 1.17 2017/12/20 03:38:07 gilles Exp gilles $
2016-09-19 17:15:41 +02:00
This documentation is also at http://imapsync.lamiral.info/#doc
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
=======================================================================
Imapsync tips for massive/bulk migrations.
=======================================================================
2015-05-28 19:04:57 +02:00
Questions answered here are:
Q. I need to migrate hundred accounts, how can I do?
2016-09-19 17:15:41 +02:00
2015-05-28 19:04:57 +02:00
Q. I have to migrate 500k users using 400 TB of disk space.
2017-09-23 23:54:48 +02:00
How do I proceed? How about speed?
2016-09-19 17:15:41 +02:00
2017-09-23 23:54:48 +02:00
Q. How to determine where is the bottleneck in an imapsync process?
2015-05-28 19:04:57 +02:00
2016-09-19 17:15:41 +02:00
Q. Can I run more instances of imapsync in parallel on a Windows host?
Q. I run multiple imapsync applications at the same time then get a
warning "imapsync.pid already exists, overwriting it".
Is this a potential problem when trying to sync multiple
IMAP account in parallel?
2015-05-28 19:04:57 +02:00
=======================================================================
Q. I need to migrate hundred accounts, how can I do?
R. If you have many mailboxes to migrate think about a little
2015-12-03 18:16:32 +01:00
script program. Write a file called file.txt (for example)
2017-09-23 23:54:48 +02:00
containing hosts, users and passwords on both sides.
2015-05-28 19:04:57 +02:00
The separator used in this example is ";"
2015-12-03 18:16:32 +01:00
The file.txt file contains for example:
2015-05-28 19:04:57 +02:00
2015-12-03 18:16:32 +01:00
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2;
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2;
host003_1;user003_1;password003_1;host003_2;user003_2;password003_2;
host004_1;user004_1;password004_1;host004_2;user004_2;password004_2;
etc.
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
Most of the times, the first column (host001_1, host002_1 ...) will
contains the same value, the value of --host1 parameter. Same
thing for the third column (host001_2, host002_2).
2015-12-03 18:16:32 +01:00
On Unix the shell script can be:
2015-05-28 19:04:57 +02:00
2015-12-03 18:16:32 +01:00
#!/bin/sh
{ while IFS=';' read h1 u1 p1 h2 u2 p2 fake
do
imapsync --host1 "$h1" --user1 "$u1" --password1 "$p1" \
2017-09-23 23:54:48 +02:00
--host2 "$h2" --user2 "$u2" --password2 "$p2" "$@"
2015-12-03 18:16:32 +01:00
done
} < file.txt
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
You can add extra options inside this script, just after the variable "$@".
You can also pass extra options via the parameters of this script
since they will go in "$@"
2015-12-03 18:16:32 +01:00
Here is a complete Unix example ready to use:
2015-05-28 19:04:57 +02:00
http://imapsync.lamiral.info/examples/sync_loop_unix.sh
2015-12-03 18:16:32 +01:00
On Windows the batch script can be:
2015-05-28 19:04:57 +02:00
2015-12-03 18:16:32 +01:00
CD /D %~dp0
SET csvfile=file.txt
2017-09-23 23:54:48 +02:00
FOR /F "tokens=1,2,3,4,5,6,7 delims=; eol=#" %%G IN (%csvfile%) DO (
2015-12-03 18:16:32 +01:00
imapsync ^
--host1 %%G --user1 %%H --password1 %%I ^
2017-09-23 23:54:48 +02:00
--host2 %%J --user2 %%K --password2 %%L %%M ...
2015-12-03 18:16:32 +01:00
)
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
You can add extra options inside this script, just after the variable %%M.
You can add extra options inside the file.txt, in the last column. Add
an extra semicolon at the end (optional)
Example:
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2;
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2;
becomes
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2; --automap --addheader
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2; --automap --addheader
With this solution, options can be added, changed or removed per account.
Technically those options will go in %%M in the loop body
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
Here is a complete Windows example ready to use:
2015-05-28 19:04:57 +02:00
http://imapsync.lamiral.info/examples/sync_loop_windows.bat
2017-09-23 23:54:48 +02:00
Another solution to add extra arguments is to write another .bat that
calls sync_loop_windows.bat with the extra arguments, like this
for example:
sync_loop_windows.bat --automap --addheader --maxmessagespersecond 4
Technically those options will go in %arguments% in the loop body
of sync_loop_windows.bat
2015-05-28 19:04:57 +02:00
=======================================================================
Q. I have to migrate 500k users using 400 TB of disk space.
2017-09-23 23:54:48 +02:00
How do I proceed? How about speed?
2015-05-28 19:04:57 +02:00
R. Solution to this issue is two words: parallelism and measurements.
2017-09-23 23:54:48 +02:00
Since all mailboxes are functionnaly independent, they can be processed
independently, here comes parallelism, lunching several imapsync
processes in parallel.
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
Meanwhile, mailboxes usually belong to the same server and syncs
share the same imapsync host via the same bandwidth, here come
some limitations and bottlenecks.
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
How many syncs can we run in parallel? here comes measurements.
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
1) Measure the total transfer rate by adding each one printed in each run.
Since adding this way is not so easy, just look at the overall
network rate of the imapsync host.
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
On Linux, nload is good candidate to measure this overall
network rate, every 6 seconds, on eth0 interface, values in Kbytes:
nload -t 6000 eth0 -u K
2015-05-28 19:04:57 +02:00
2018-05-07 16:04:23 +02:00
Another good network tool is dstat:
2017-09-23 23:54:48 +02:00
dstat -n -N eth0 6
2015-05-28 19:04:57 +02:00
2018-05-07 16:04:23 +02:00
A excellent tool for this purpose is iftop, the following
command will monitor imap and imaps connexions
on interfce eth0, only them, and sum them up:
iftop -i eth0 -f 'port imap or port imaps' -B
2017-09-23 23:54:48 +02:00
On Windows, get the overall network rate with the classical
task manager (Ctrl-Alt-Sup), there is a network tab in it.
Don't hesitate to send me free good tools to measure the
overall transfer rate (the best would be one to sum up only
imap traffic but that's not mandatory at all).
2015-05-28 19:04:57 +02:00
2017-09-23 23:54:48 +02:00
2) Launch new parallel runs, one by one, as long as the total
transfer rate increase.
3) When the total transfer rate starts to diminish, stop new launches.
Note N as the number of parallel runs you got until then.
4) Only keep N-2 parallel runs for the future.
2015-05-28 19:04:57 +02:00
2015-12-03 18:16:32 +01:00
=======================================================================
2017-09-23 23:54:48 +02:00
Q. How to determine where is the bottleneck in an imapsync process?
2015-12-03 18:16:32 +01:00
2017-09-23 23:54:48 +02:00
R1. Divide and conquer.
2015-12-03 18:16:32 +01:00
In order to detect whether host1/link1 is the bottleneck or
host2/link2, we have several tests to explore:
1) run a sync from host1 to host1, with a host1 test account as destination.
2017-09-23 23:54:48 +02:00
This way, only host1+link1 are tested, host2 is not directly concerned.
2015-12-03 18:16:32 +01:00
If performances increase a lot then host2/link2 is the bottleneck.
2) run a sync from host2 to host2, with a host2 test account as destination.
2017-09-23 23:54:48 +02:00
This way, only host2+link2 are tested, host1 is not concerned.
2015-12-03 18:16:32 +01:00
If performances increase a lot then host1/link1 is the bottleneck.
If performances increase on both tests 1) and 2), I have no clue to explain that.
Same thing if they both decrease!
2017-09-23 23:54:48 +02:00
R2. Isolating and overcoming bottlenecks
On any process involving several mechanisms, among all elements taking
2018-05-07 16:04:23 +02:00
part on the process, there is always a bottleneck. No one knows in
2017-09-23 23:54:48 +02:00
advance what is the first bottleneck. The first bottleneck has to be
determined, by measurements, not by guesses. Once this first
bottleneck is known and overcome then the next bottleneck has to be
determined and overcome too, if needed. Repeat the process of looking
for the next bottleneck and its elimination until you estimate the
2018-05-07 16:04:23 +02:00
transfer rates, money costs, time spent on this, and final dates
are good enough to proceed the whole huge migration.
2017-09-23 23:54:48 +02:00
Possible bottlenecks:
- Throttles.
IMAP servers have artificial limits.
For example Gmail, Office365, Exchange have throttle limits.
- Bandwidth.
2018-05-07 16:04:23 +02:00
Usually available bandwidth is NOT a bottleneck.
2017-09-23 23:54:48 +02:00
Meanwhile, it can be a bottleneck on small Internet connexions.
Imapsync downloads messages from host1 and upload messages to host2,
consider this in case the connexion are asymetric.
- I/O on disks.
I/O are a classical bottleneck, almost always forgotten.
Unlike CPU and RAM, Input/Output performances don't improve
very much as time goes on so it's often a bottleneck.
2018-05-07 16:04:23 +02:00
To measure and overcome an I/O disk bottleneck, you need
usually a direct access to host1 and host2.
An I/O bottleneck where imapsync runs is possible if
--usecache or --useuid is used or with very big messages.
2017-09-23 23:54:48 +02:00
- RAM memory.
On all sides, monitor that your systems don't swap on disk,
because swapping memory on disks decreases performance by
a factor of 20, at least.
- CPU.
100% CPU during a whole transfer means the system is busy.
Usually CPU is not a problem with imapsync but it can be a problem
with one of the imap servers.
Most often CPU is not the real bottleneck, I/O are.
Other possible bottlenecks:
- Number of hosts available to run imapsync processes.
- Imapsync itself.
- Errors management.
- MX domains, DNS.
- Money.
- Time.
- Bad luck.
- ...
2016-09-19 17:15:41 +02:00
=======================================================================
Q. Can I run more instances of imapsync in parallel on a Windows host?
R. Yes!
Q. Any performance issue?
You have to try and check the transfer rates, sum them up to
have a uniq numeric criteria.
There is always a limit, depending on remote imap servers
and the one running imapsync;
CPU, memory, Inputs/Outputs are the classical bottlenecks,
the worst bottleneck is the winner that sets the limit.
examples/sync_loop_windows.bat says
...
REM ==== Parallel executions ====
REM If you want to do parallel runs of imapsync then this current script is a good start.
REM Just copy it several times and replace, on each copy, the csvfile variable value.
REM Instead of SET csvfile=file.txt write for example
REM SET csvfile=file01.txt in the first copy
REM then also
REM SET csvfile=file02.txt in the second copy etc.
REM Of course you also have to split the data contained in file.txt
REM into file01.txt file02.txt etc.
REM After that, just double-click on each batch file to launch each process
=======================================================================
Q. I run multiple imapsync applications at the same time then get a
warning "imapsync.pid already exists, overwriting it".
Is this a potential problem when trying to sync multiple
IMAP account in parallel?
R1. No issue with the file imapsync.pid if you don't use its content
by yourself.
This file can help you to manage multiple runs by sending signals
to the processes (sigterm or sigkill) using their PID.
Each run can have its own pid file with --pidfile option.
The file imapsync.pid contains the PID of the current imapsync process.
This file is removed at the end of a normal run.
You can safely ignore the warning if you don't use imapsync.pid file
to manage imapsync processes.
2017-09-23 23:54:48 +02:00
=======================================================================
=======================================================================