#!/bin/cat $Id: FAQ.Massive.txt,v 1.8 2016/02/07 17:21:40 gilles Exp gilles $ This documentation is also at http://imapsync.lamiral.info/#doc ============================================== Imapsync tips for massive/bulk migrations. ============================================== Questions answered here are: Q. I need to migrate hundred accounts, how can I do? Q. I have to migrate 500k users using 400 TB of disk space. How do I proceed? Q. How to determine what is the bottleneck in my current imapsync process? Q. Can I run more instances of imapsync in parallel on a Windows host? Q. I run multiple imapsync applications at the same time then get a warning "imapsync.pid already exists, overwriting it". Is this a potential problem when trying to sync multiple IMAP account in parallel? ======================================================================= Q. I need to migrate hundred accounts, how can I do? R. If you have many mailboxes to migrate think about a little script program. Write a file called file.txt (for example) containing hosts users and passwords on both sides. The separator used in this example is ";" The file.txt file contains for example: host001_1;user001_1;password001_1;host001_2;user001_2;password001_2; host002_1;user002_1;password002_1;host002_2;user002_2;password002_2; host003_1;user003_1;password003_1;host003_2;user003_2;password003_2; host004_1;user004_1;password004_1;host004_2;user004_2;password004_2; etc. On Unix the shell script can be: #!/bin/sh { while IFS=';' read h1 u1 p1 h2 u2 p2 fake do imapsync --host1 "$h1" --user1 "$u1" --password1 "$p1" \ --host2 "$h2" --user2 "$u2" --password2 "$p2" done } < file.txt Here is a complete Unix example ready to use: http://imapsync.lamiral.info/examples/sync_loop_unix.sh On Windows the batch script can be: CD /D %~dp0 SET csvfile=file.txt FOR /F "tokens=1,2,3,4,5,6 delims=; eol=#" %%G IN (%csvfile%) DO ( imapsync ^ --host1 %%G --user1 %%H --password1 %%I ^ --host2 %%J --user2 %%K --password2 %%L ... ) The final ... can be replaced by nothing or any supplementary imapsync option. Here is a complete Windows example nearly ready to use: http://imapsync.lamiral.info/examples/sync_loop_windows.bat ======================================================================= Q. I have to migrate 500k users using 400 TB of disk space. How do I proceed? R. Solution to this issue is two words: parallelism and measurements. Since all 500k mailboxes are independent against each other, they can be processed independently. 500k on 400TB is 800 MB per account on average. On any process involving several mechanisms there is always a bottleneck among all elements taking part on the process. No one knows in advance what is the first bottleneck. The first bottleneck has to be determined, by measurements, not by guesses. Once this first bottleneck is known and overcome then the next bottleneck has to be determined and overcome too, if needed. Repeat the process of looking for the next bottleneck and its resolution until you estimate the transfer rates, money costs and final dates are good enough to proceed the whole 500k/400TB migration. Possible bottlenecks: - IMAP servers have artificial limits. For example Gmail and Office365 have throttle limits. - Bandwidth, on any side, especially on small Internet connexions. But usually bandwidth is not a bottleneck. - Memory, on any side. Monitor your system doesn't swap on disk. - CPU, on any side. When measuring that CPU is always 100% during a transfer then it's useless to add imapsync processus on that host. - I/O on disks. A classical one always forgotten. Unlike CPU and RAM Input/Output performances don't improve very much as time goes on. - Number of hosts available to run imapsync processes. - Imapsync itself. - Errors management. - MX domains, DNS. - Money. - Time. - Bad luck. - ... ======================================================================= Q. How to determine what is the bottleneck in my current imapsync process? R. Divide and conquer. In order to detect whether host1/link1 is the bottleneck or host2/link2, we have several tests to explore: 1) run a sync from host1 to host1, with a host1 test account as destination. This way, only host1+link1 are tested. host2 is not concerned. If performances increase a lot then host2/link2 is the bottleneck. 2) run a sync from host2 to host2, with a host2 test account as destination. This way, only host2+link2 are tested. host1 is not concerned. If performances increase a lot then host1/link1 is the bottleneck. If performances increase on both tests 1) and 2), I have no clue to explain that. Same thing if they both decrease! ======================================================================= Q. Can I run more instances of imapsync in parallel on a Windows host? R. Yes! Q. Any performance issue? You have to try and check the transfer rates, sum them up to have a uniq numeric criteria. There is always a limit, depending on remote imap servers and the one running imapsync; CPU, memory, Inputs/Outputs are the classical bottlenecks, the worst bottleneck is the winner that sets the limit. examples/sync_loop_windows.bat says ... REM ==== Parallel executions ==== REM If you want to do parallel runs of imapsync then this current script is a good start. REM Just copy it several times and replace, on each copy, the csvfile variable value. REM Instead of SET csvfile=file.txt write for example REM SET csvfile=file01.txt in the first copy REM then also REM SET csvfile=file02.txt in the second copy etc. REM Of course you also have to split the data contained in file.txt REM into file01.txt file02.txt etc. REM After that, just double-click on each batch file to launch each process ======================================================================= Q. I run multiple imapsync applications at the same time then get a warning "imapsync.pid already exists, overwriting it". Is this a potential problem when trying to sync multiple IMAP account in parallel? R1. No issue with the file imapsync.pid if you don't use its content by yourself. This file can help you to manage multiple runs by sending signals to the processes (sigterm or sigkill) using their PID. Each run can have its own pid file with --pidfile option. The file imapsync.pid contains the PID of the current imapsync process. This file is removed at the end of a normal run. You can safely ignore the warning if you don't use imapsync.pid file to manage imapsync processes.