mirror of
https://github.com/imapsync/imapsync.git
synced 2024-11-17 00:02:29 +01:00
185 lines
6.6 KiB
Plaintext
185 lines
6.6 KiB
Plaintext
#!/bin/cat
|
|
$Id: FAQ.Massive.txt,v 1.8 2016/02/07 17:21:40 gilles Exp gilles $
|
|
|
|
This documentation is also at http://imapsync.lamiral.info/#doc
|
|
|
|
==============================================
|
|
Imapsync tips for massive/bulk migrations.
|
|
==============================================
|
|
|
|
Questions answered here are:
|
|
|
|
Q. I need to migrate hundred accounts, how can I do?
|
|
|
|
Q. I have to migrate 500k users using 400 TB of disk space.
|
|
How do I proceed?
|
|
|
|
Q. How to determine what is the bottleneck in my current imapsync process?
|
|
|
|
Q. Can I run more instances of imapsync in parallel on a Windows host?
|
|
|
|
Q. I run multiple imapsync applications at the same time then get a
|
|
warning "imapsync.pid already exists, overwriting it".
|
|
Is this a potential problem when trying to sync multiple
|
|
IMAP account in parallel?
|
|
|
|
|
|
=======================================================================
|
|
Q. I need to migrate hundred accounts, how can I do?
|
|
|
|
R. If you have many mailboxes to migrate think about a little
|
|
script program. Write a file called file.txt (for example)
|
|
containing hosts users and passwords on both sides.
|
|
The separator used in this example is ";"
|
|
|
|
The file.txt file contains for example:
|
|
|
|
host001_1;user001_1;password001_1;host001_2;user001_2;password001_2;
|
|
host002_1;user002_1;password002_1;host002_2;user002_2;password002_2;
|
|
host003_1;user003_1;password003_1;host003_2;user003_2;password003_2;
|
|
host004_1;user004_1;password004_1;host004_2;user004_2;password004_2;
|
|
etc.
|
|
|
|
On Unix the shell script can be:
|
|
|
|
#!/bin/sh
|
|
{ while IFS=';' read h1 u1 p1 h2 u2 p2 fake
|
|
do
|
|
imapsync --host1 "$h1" --user1 "$u1" --password1 "$p1" \
|
|
--host2 "$h2" --user2 "$u2" --password2 "$p2"
|
|
done
|
|
} < file.txt
|
|
|
|
|
|
Here is a complete Unix example ready to use:
|
|
http://imapsync.lamiral.info/examples/sync_loop_unix.sh
|
|
|
|
|
|
On Windows the batch script can be:
|
|
|
|
CD /D %~dp0
|
|
SET csvfile=file.txt
|
|
FOR /F "tokens=1,2,3,4,5,6 delims=; eol=#" %%G IN (%csvfile%) DO (
|
|
imapsync ^
|
|
--host1 %%G --user1 %%H --password1 %%I ^
|
|
--host2 %%J --user2 %%K --password2 %%L ...
|
|
)
|
|
|
|
The final ... can be replaced by nothing or any supplementary imapsync option.
|
|
|
|
Here is a complete Windows example nearly ready to use:
|
|
http://imapsync.lamiral.info/examples/sync_loop_windows.bat
|
|
|
|
|
|
=======================================================================
|
|
Q. I have to migrate 500k users using 400 TB of disk space.
|
|
How do I proceed?
|
|
|
|
R. Solution to this issue is two words: parallelism and measurements.
|
|
Since all 500k mailboxes are independent against each other,
|
|
they can be processed independently.
|
|
|
|
500k on 400TB is 800 MB per account on average.
|
|
|
|
On any process involving several mechanisms there is always a
|
|
bottleneck among all elements taking part on the process. No one knows
|
|
in advance what is the first bottleneck. The first bottleneck has to
|
|
be determined, by measurements, not by guesses. Once this first
|
|
bottleneck is known and overcome then the next bottleneck has to be
|
|
determined and overcome too, if needed. Repeat the process of looking
|
|
for the next bottleneck and its resolution until you estimate the
|
|
transfer rates, money costs and final dates are good enough to proceed
|
|
the whole 500k/400TB migration.
|
|
|
|
Possible bottlenecks:
|
|
|
|
- IMAP servers have artificial limits. For example Gmail and Office365
|
|
have throttle limits.
|
|
|
|
- Bandwidth, on any side, especially on small Internet connexions. But
|
|
usually bandwidth is not a bottleneck.
|
|
|
|
- Memory, on any side. Monitor your system doesn't swap on disk.
|
|
|
|
- CPU, on any side. When measuring that CPU is always 100% during a
|
|
transfer then it's useless to add imapsync processus on that host.
|
|
|
|
- I/O on disks. A classical one always forgotten. Unlike CPU and RAM
|
|
Input/Output performances don't improve very much as time goes on.
|
|
|
|
- Number of hosts available to run imapsync processes.
|
|
- Imapsync itself.
|
|
- Errors management.
|
|
- MX domains, DNS.
|
|
- Money.
|
|
- Time.
|
|
- Bad luck.
|
|
- ...
|
|
|
|
=======================================================================
|
|
Q. How to determine what is the bottleneck in my current imapsync process?
|
|
|
|
R. Divide and conquer.
|
|
|
|
In order to detect whether host1/link1 is the bottleneck or
|
|
host2/link2, we have several tests to explore:
|
|
|
|
1) run a sync from host1 to host1, with a host1 test account as destination.
|
|
This way, only host1+link1 are tested. host2 is not concerned.
|
|
If performances increase a lot then host2/link2 is the bottleneck.
|
|
|
|
2) run a sync from host2 to host2, with a host2 test account as destination.
|
|
This way, only host2+link2 are tested. host1 is not concerned.
|
|
If performances increase a lot then host1/link1 is the bottleneck.
|
|
|
|
If performances increase on both tests 1) and 2), I have no clue to explain that.
|
|
Same thing if they both decrease!
|
|
|
|
|
|
=======================================================================
|
|
Q. Can I run more instances of imapsync in parallel on a Windows host?
|
|
|
|
R. Yes!
|
|
|
|
Q. Any performance issue?
|
|
|
|
You have to try and check the transfer rates, sum them up to
|
|
have a uniq numeric criteria.
|
|
There is always a limit, depending on remote imap servers
|
|
and the one running imapsync;
|
|
CPU, memory, Inputs/Outputs are the classical bottlenecks,
|
|
the worst bottleneck is the winner that sets the limit.
|
|
|
|
examples/sync_loop_windows.bat says
|
|
...
|
|
REM ==== Parallel executions ====
|
|
REM If you want to do parallel runs of imapsync then this current script is a good start.
|
|
REM Just copy it several times and replace, on each copy, the csvfile variable value.
|
|
REM Instead of SET csvfile=file.txt write for example
|
|
REM SET csvfile=file01.txt in the first copy
|
|
REM then also
|
|
REM SET csvfile=file02.txt in the second copy etc.
|
|
REM Of course you also have to split the data contained in file.txt
|
|
REM into file01.txt file02.txt etc.
|
|
REM After that, just double-click on each batch file to launch each process
|
|
|
|
|
|
=======================================================================
|
|
Q. I run multiple imapsync applications at the same time then get a
|
|
warning "imapsync.pid already exists, overwriting it".
|
|
Is this a potential problem when trying to sync multiple
|
|
IMAP account in parallel?
|
|
|
|
R1. No issue with the file imapsync.pid if you don't use its content
|
|
by yourself.
|
|
|
|
This file can help you to manage multiple runs by sending signals
|
|
to the processes (sigterm or sigkill) using their PID.
|
|
Each run can have its own pid file with --pidfile option.
|
|
The file imapsync.pid contains the PID of the current imapsync process.
|
|
This file is removed at the end of a normal run.
|
|
You can safely ignore the warning if you don't use imapsync.pid file
|
|
to manage imapsync processes.
|
|
|
|
|