小内存虚拟机中使用MongoDB数据库

目录

在1核1GB的阿里云云虚拟机中使用MongoDB数据库

问题

最近我在阿里云虚拟机上运行的MongoDB服务器经常崩溃。从 systemctl status 的输出看,mongod 进程被 kill 掉。

# systemctl status mongod.service
● mongod.service - SYSV: Mongo is a scalable, document-oriented database.
   Loaded: loaded (/etc/rc.d/init.d/mongod)
   Active: failed (Result: signal) since Thu 2016-07-28 15:56:58 CST; 29min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2545 ExecStop=/etc/rc.d/init.d/mongod stop (code=exited, status=0/SUCCESS)
  Process: 2322 ExecStart=/etc/rc.d/init.d/mongod start (code=exited, status=0/SUCCESS)
 Main PID: 2335 (code=killed, signal=KILL)
Jul 28 13:59:45 iZ25qo7yf0oZ systemd[1]: Starting SYSV: Mongo is a scalable, document-oriented database....
Jul 28 13:59:45 iZ25qo7yf0oZ runuser[2331]: pam_unix(runuser:session): session opened for user mongod by (uid=0)
Jul 28 13:59:45 iZ25qo7yf0oZ mongod[2322]: Starting mongod: [  OK  ]
Jul 28 13:59:45 iZ25qo7yf0oZ systemd[1]: Started SYSV: Mongo is a scalable, document-oriented database..
Jul 28 15:56:57 iZ25qo7yf0oZ systemd[1]: mongod.service: main process exited, code=killed, status=9/KILL
Jul 28 15:56:58 iZ25qo7yf0oZ mongod[2545]: Stopping mongod: [  OK  ]
Jul 28 15:56:58 iZ25qo7yf0oZ systemd[1]: Unit mongod.service entered failed state.
Jul 28 15:56:58 iZ25qo7yf0oZ systemd[1]: mongod.service failed.

mongod.service 的日志显示进程的详细情况。

Jul 27 16:40:47 iZ25qo7yf0oZ systemd[1]: Starting SYSV: Mongo is a scalable, document-oriented database....
Jul 27 16:40:47 iZ25qo7yf0oZ runuser[31290]: pam_unix(runuser:session): session opened for user mongod by (uid=0)
Jul 27 16:40:49 iZ25qo7yf0oZ runuser[31290]: pam_unix(runuser:session): session closed for user mongod
Jul 27 16:40:49 iZ25qo7yf0oZ mongod[31283]: Starting mongod: [  OK  ]
Jul 27 16:40:49 iZ25qo7yf0oZ systemd[1]: Started SYSV: Mongo is a scalable, document-oriented database..
Jul 28 06:49:40 iZ25qo7yf0oZ systemd[1]: Started SYSV: Mongo is a scalable, document-oriented database..
Jul 28 06:50:21 iZ25qo7yf0oZ systemd[1]: mongod.service: main process exited, code=killed, status=9/KILL
Jul 28 06:50:21 iZ25qo7yf0oZ mongod[32606]: Stopping mongod: [  OK  ]
Jul 28 06:50:21 iZ25qo7yf0oZ systemd[1]: Unit mongod.service entered failed state.
Jul 28 06:50:21 iZ25qo7yf0oZ systemd[1]: mongod.service failed.

可以看到 mongod 运行不到一天就被 kill 掉了。
再查看 MongoDB 的日志 mongod.log,省略部分信息。

2016-07-27T16:40:48.373+0800 I CONTROL  [main] ***** SERVER RESTARTED *****
...
2016-07-27T16:40:49.733+0800 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-07-27T16:40:49.761+0800 I NETWORK  [initandlisten] waiting for connections on port 27017
...
2016-07-27T17:33:55.250+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 2760, after asserts: 3520, after connections: 5380, after extra_info: 24790, after globalLock: 29120, after locks: 40880, after network: 43390, after opcounters: 45600, after opcountersRepl: 47070, after storageEngine: 55320, after tcmalloc: 117790, after wiredTiger: 170830, at end: 171030 }
...
2016-07-28T03:53:19.741+0800 I COMMAND  [ftdc] serverStatus was very slow: { after basic: 3260, after asserts: 5580, after connections: 9750, after extra_info: 35280, after globalLock: 41220, after locks: 50910, after network: 56970, after opcounters: 69950, after opcountersRepl: 75230, after storageEngine: 111160, after tcmalloc: 161620, after wiredTiger: 162220, at end: 162450 }
2016-07-28T03:53:23.442+0800 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:38308 #5 (1 connection now open)

上面日志没有除了服务响应慢外,没有出错信息。找不到mongod服务为啥会崩溃。
日志中都没记录,说明可能是系统将 MongDB 杀掉了。
查看 systemd 的日志 /var/log/message,找到下面的条目

Jul 28 06:50:20 iZ25qo7yf0oZ kernel: mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
...
Jul 28 06:50:20 iZ25qo7yf0oZ kernel: Out of memory: Kill process 31294 (mongod) score 518 or sacrifice child
Jul 28 06:50:20 iZ25qo7yf0oZ kernel: Killed process 31294 (mongod) total-vm:838876kB, anon-rss:525268kB, file-rss:0kB
Jul 28 06:50:21 iZ25qo7yf0oZ systemd: mongod.service: main process exited, code=killed, status=9/KILL
Jul 28 06:50:21 iZ25qo7yf0oZ mongod: Stopping mongod: [  OK  ]
Jul 28 06:50:21 iZ25qo7yf0oZ systemd: Unit mongod.service entered failed state.
Jul 28 06:50:21 iZ25qo7yf0oZ systemd: mongod.service failed.

可以确定,因为占用内存过高,系统将 MongoDB 服务杀掉(oom-killer)。

原因探究

不知道为什么 MongoDB 会占用这么高的内存,mongostat 的输出

insert query update delete getmore command % dirty % used flushes  vsize    res qr|qw ar|aw netIn netOut conn                      time
    *0    *0     *0     *0       0     1|0     0.0   47.7       0 879.0M 509.0M   0|0   0|0   79b    18k    2 2016-07-29T16:40:22+08:00

top 中 mongod 的输出显示,mongod 占了近一半的内存。

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 3127 mongod    20   0  901024 521996   2516 S  0.3 51.4   4:23.15 mongod 

需要增加内存容量。但阿里云虚拟机的内存太贵,可以尝试增加交换空间。

解决办法

阿里云虚拟机默认没有设置交换空间。

# free -h
              total        used        free      shared  buff/cache   available
Mem:           991M        701M         76M        6.9M        213M        134M

最简便的方式之一就是创建一个交换文件,将其设为swap。
创建512MB的空白文件

dd if=/dev/zero of=/swapfile1 bs=1024k count=512

设置该文件为交换文件

mkswap /swapfile1

启动交换分区

swapon /swapfile1

在系统启动时自动加载,在/etc/fstab中加入

/swapfile1 swap swap defaults 0 0

这样就加入一个交换分区。

total        used        free      shared  buff/cache   available
Mem:           991M        701M         76M        6.9M        213M        134M
Swap:          511M        179M        332M

后续

后面将持续观察 mongod.service 的运行情况,验证该方法是否有效。
2016.11.10:经过3个月实际运行测试,上述方法有效。添加交换分区后,mongod.service 运行正常,不再异常退出。