Labels

Tuesday, June 19, 2012

Application Server High CPU or Low Threads - Support AdminTeam Actions

Occasionally Weblogic/JEE app servers run into High CPU (>70% or as per your app-defined thresholds) , then below is a list of actions and investigations that the support teams need to do.

  1. Check prstat –L –p Capture into a file if required.
  2. Output kill -3 into a file i.e. take thread dump. On Weblogic 10,you can do this from the Admin console as well
  3. Output pstack into a file
  4. Repeat this 4 times at 5 second intervals. At the end, you will have 5 sets of prstat, TD and pstack files.
  5. Run the TDs through Samurai or TDA to identify deadlocks. These tools show a graphical view of what each thread is doing at each point of time (when the thread dump was taken).
    The image below from Samurai shows RED blocks which are Stuck, and Green with < in them. Both of these need to be analyzed by clicking on the link in blue and seeing what line the thread is at. The Green ones with < might not be stuck, but they were at the same line as in previous thread dump and that indicates a potential problem.



  6. Then analyze the prstat and pstack for any particular thread issues.
  7. On the prstat, look for the LWP number on the right hand corner for high cpu and any long time in action.
  8. Convert the lwp number to hexa decimal value as follows:
    echo nawk '{printf("%d=0x%x\n",$1,$1)}'
  9. In the thread dump, search for "nid=" these threads could be the issue for the application.
  10. In the pstack outputs, check for the LWP numbers and confirm that the threads show the java trace.

Below is an example - Note this is indicative and not a proper instance of a slow-performing server.

1. prstat -a shows the java process consuming 42.6% and it has 180 LWP (Light-weight processes) running.


PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP


20677 wlsuser 1116M 526M sleep 1 0 2:20:54 42.6% java/180
11922 wlsuser 232M 110M sleep 1 0 3:41:59 0.1% java/157
29861 noaccess 198M 72M sleep 8 0 32:25:49 0.0% java/55
18133 root 630M 55M sleep 27 0 12:25:57 0.0% java/60
12107 wlsuser 927M 288M sleep 1 0 3:51:28 0.0% java/187
29654 wlsuser 1270M 206M sleep 1 0 0:54:26 0.0% java/158
2633 wlsuser 899M 217M sleep 1 0 7:06:10 0.0% java/382
7431 wlsuser 880M 253M sleep 5 0 1:44:24 0.0% java/186
1058 wlsuser 881M 241M sleep 1 0 3:11:35 0.0% java/182
9864 wlsuser 4336K 3952K cpu21 59 0 0:00:00 0.0% prstat/1
13789 wlsuser 1090M 208M sleep 17 0 0:30:44 0.0% java/210
13147 wlsuser 635M 168M sleep 15 0 7:17:05 0.0% java/180
7724 root 5728K 4112K sleep 60 0 33:54:37 0.0% vasd/1
18131 root 2272K 1088K sleep 59 0 8:18:11 0.0% wrapper/1
15397 root 2272K 1296K sleep 59 0 3:33:59 0.0% procmon/1
2. prstat -Lp 20677 gives the breakdown of these 180 LWPs
The top 2 can be seen to be consuming 10% CPU each and are running for longer than 15 minutes.



PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID

20677 wlsuser  1116M  526M sleep   58    0   0:16:03 10.3% java/120

20677 wlsuser  1116M  526M sleep   24    0   0:15:15 10.1% java/123

20677 wlsuser  1116M  526M sleep    3    0   0:03:36 4.0% java/23

20677 wlsuser  1116M  526M sleep    5    0   0:01:42 2.0% java/21

20677 wlsuser  1116M  526M sleep    9    0   0:03:38 2.0% java/20

20677 wlsuser  1116M  526M sleep    9    0   0:00:38 1.7% java/19

20677 wlsuser  1116M  526M sleep    4    0   0:00:42 0.0% java/18

20677 wlsuser  1116M  526M sleep    4    0   0:00:36 0.0% java/17

20677 wlsuser  1116M  526M sleep    4    0   0:00:34 0.0% java/16

20677 wlsuser  1116M  526M sleep    1    0   0:00:42 0.0% java/15

20677 wlsuser  1116M  526M sleep    4    0   0:00:42 0.0% java/10

20677 wlsuser  1116M  526M sleep    6    0   0:07:43 0.0% java/8

20677 wlsuser  1116M  526M sleep    3    0   0:00:39 0.0% java/7

20677 wlsuser  1116M  526M sleep    1    0   0:00:40 0.0% java/4

20677 wlsuser  1116M  526M sleep    8    0   0:00:39 0.0% java/2

20677 wlsuser  1116M  526M sleep   29    0   0:04:55 0.0% java/121

20677 wlsuser  1116M  526M sleep    1    0   0:11:52 0.0% java/126

20677 wlsuser  1116M  526M sleep    1    0   0:00:05 0.0% java/119

20677 wlsuser  1116M  526M sleep   43    0   0:00:27 0.0% java/122

20677 wlsuser  1116M  526M sleep   28    0   0:00:03 0.0% java/118

20677 wlsuser  1116M  526M sleep    1    0   0:12:43 0.0% java/125


3. Run the cmd to convert the LWP decimal to Hex
echo 120 nawk '{printf("%d=0x%x\n",$1,$1)}' 120=0x78
echo 123 nawk '{printf("%d=0x%x\n",$1,$1)}' 123=0x7b
Now we need to investigate what these 2 threads are doing.

4. Check the Thread dump for the nid=0x78. It shows

"ExecuteThread: '83' for queue: 'wlr3Queue'" daemon prio=5 tid=0x00a886f0 nid=0x78 waiting for monitor entry [c3afe000..c3affc28]

at org.apache.log4j.Category.callAppenders(Category.java:204)

- waiting to lock <0xdc8e9080> (a org.apache.log4j.spi.RootLogger)

at org.apache.log4j.Category.forcedLog(Category.java:391)

at org.apache.log4j.Category.debug(Category.java:260)

...



5. Running the above TD through Samurai also shows that Execute Thread 83, has been stuck for a while. It also shows that the same object <0xdc8e9080> is being locked by another Thread which happens to be the nid=0x7b

"ExecuteThread: '86' for queue: 'wlr3Queue'" daemon prio=5 tid=0x00a85730 nid=0x7b runnable [c3efd000..c3effc28]

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:260)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:106)

- locked <0xd8b55a70> (a java.io.BufferedOutputStream)

at java.io.PrintStream.write(PrintStream.java:258)

- locked <0xd88512c8> (a java.io.PrintStream)

at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)

at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:404)

at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)

at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)

- locked <0xdce03ae8> (a java.io.OutputStreamWriter)

at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)

at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58)

at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:316)

at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)

at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

- locked <0xdc8eeb60> (a org.apache.log4j.ConsoleAppender)

at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

at org.apache.log4j.Category.callAppenders(Category.java:206)

- locked <0xdc8e9080> (a org.apache.log4j.spi.RootLogger)









6. Thus there is an issue around the log4j writing into log files, which needs further investigation.

In this case, one issue identified was that the log4j was in DEBUG mode which should not be the case in a Production platform. Thus there was unnecessary logging taking place.



7. Finally, a look at the pstack outputs for the two LWP 120 and 123 confirms what the LWPs were doing.




----------------- lwp# 120 / thread# 120 --------------------
ff341758 lwp_mutex_timedlock (f66e68, 0)
fedbbb60 __1cLOptoRuntimebAcomplete_monitor_locking_C6FpnHoopDesc_pnJBasicLock_pnKJavaThread__v_ (da5a6848, c3aff0b0, a886f0, da5b6980, d6d15e88, 0) + 84
f9c342dc ???????? (da5a6848, c3aff0b0, da5b6980, d6d15e70, 0, c3aff0f8)
fa643500 ???????? (da5c9fc0, 0, da5c9fc0, da5b6980, db3b3af8, 0)
fa5c0e2c ???????? (da5c9fc0, f1c8e7a0, da5b6980, d6d15e70, 0, c3aff0f8)
fa762524 ???????? (da5b65f8, d6d15e70, 0, f9c15ea0, 8, c3aff178)
f9c4eec4 ???????? (da5b65f8, f52b9bf0, f0ab1c94, f9c160d0, 8, c3aff0f8)
f9c05a8c ???????? (c3aff29c, b6, 0, f9c15ea0, 8, c3aff178)
f9c05804 ???????? (c3aff33c, b6, 0, f9c15e50, c, c3aff210)
f9c05804 ???????? (c3aff3cc, b6, 0, f9c15e50, c, c3aff2d8)
f9c05804 ???????? (c3aff464, b6, 0, f9c15e50, c, c3aff358)
f9c05804 ???????? (c3aff4f4, f1a8a5c8, 0, f9c15e50, c, c3aff400)
f9c05a8c ???????? (c3aff58c, f084f340, 0, f9c163d0, c, c3aff490)
f9c059d8 ???????? (c3aff674, b6, 0, f9c163d0, 4, c3aff518)
f9c05804 ???????? (c3aff6fc, b6, 0, f9c15e50, 10, c3aff608)
f9c05804 ???????? (c3aff760, d883f8c8, 0, f9c15e50, c, c3aff690)
f9c4f278 ???????? (d6d15bc0, d883f8c8, d884f5a0, f0ac6f08, 4, c3aff720)
fa0a23a4 ???????? (d884f5a0, d883f8c8, d6d15bc0, f9c15e50, c, c3aff7e8)
f9c30f48 ???????? (d884f5a0, b6, 82bb, f9c16250, 82b9, 0)
f9c05750 ???????? (c3aff8dc, b8, f373459c, f9c15e50, c, c3aff7e8)
f9c05750 ???????? (c3aff98c, b6, 0, f9c16250, c, c3aff868)
f9c05804 ???????? (c3aff9fc, d68129b0, 0, f9c15e50, c, c3aff920)
f9c4947c ???????? (d68129b0, d8808160, f81fc000, f0818618, 8, c3aff9b8)
fa114558 ???????? (d8808160, db3d24f0, f3745c98, f9c15e50, 8, c3affab0)
fa14b064 ???????? (d8808160, f096a2a0, f81fc000, f0818618, d8808160, 9)
f9cd2c38 ???????? (c3affb9c, c3affcf0, f3745c98, f9c15e50, 8, c3affab0)
f9c0010c ???????? (c3affc28, c3affe90, a, f112bc70, 4, c3affb40)
fed5bcf8 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (c3affe88, c3affcf0, c3affda8, a886f0, a886f0, c3affd00) + 27c
fee4a3e4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff182000, a88c98, c3affd9c, c3affd98, c3affda8, a886f0) + 164
fee5d5a8 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (c3affe88, c3affe84, c3affe7c, c3affe74, c3affe6c, a886f0) + 6c
fee6e8f8 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (a886f0, a886f0, c69ff8, a88c98, 31a08c, fee67ed8) + 128
fee67f00 __1cKJavaThreadDrun6M_v_ (a886f0, 78, 40, 0, 40, 0) + 284
fee643e0 _start (a886f0, c3b00000, 0, 0, fee642ac, 1) + 134
ff3404f4 _lwp_start (0, 0, 0, 0, 0, 0)



----------------- lwp# 123 / thread# 123 --------------------
ff341a08 _write (1, c3afab10, 6d, 0, 0, 22) + c
fed54288 JVM_Write (1, c3afab10, 6d, 6d, c3afab10, daa519f8) + 68
fe85e348 ???????? (c3afab10, c3afcbd0, c3afcbcc, 0, 6d, 22)
fe85e1c8 Java_java_io_FileOutputStream_writeBytes (a88784, c3afcbd0, c3afcbcc, 0, 6d, daa519f8) + 34
fa1e1608 ???????? (d8b55ad0, d8a5f6e8, 0, 6d, 0, c3afce50)
fa1e3dc0 ???????? (d8b55a70, daa726a8, 0, 6d, d351afb0, daa519f8)
f9dfae70 ???????? (d88512c8, daa726a8, 0, 6d, 0, c3afce50)
fa033244 ???????? (daa63ed8, d351afb0, da845a80, 0, d351afb0, daa519f8)
fa5fab54 ???????? (daa63ed8, d353dca0, da735c30, d351af98, 0, c3afce50)
fa599d40 ???????? (da845a80, d351afb0, da845a80, 0, d351afb0, daa519f8)
fa5977b8 ???????? (da845a80, d351afb0, da735c30, d351af98, 0, c3afce50)
fa4945b8 ???????? (2, 0, da845a80, 0, d351afb0, daa519f8)
fa5b98ec ???????? (db112a08, f1c8e7a0, da735c30, d351af98, 0, c3afce50)
fa1a906c ???????? (db146dd8, f42a9fd0, d351af98, 1, f4262f28, f81fc000)
f9c45dc8 ???????? (db146dd8, b7, 0, f9c152a0, 8, c3afcee0)
f9c05804 ???????? (c3afd05c, b6, 0, f9c16118, c, c3afcf60)
f9c05804 ???????? (c3afd0e4, b6, 0, f9c15e98, 10, c3afcff0)
f9c05804 ???????? (c3afd174, b6, 0, f9c15e98, 10, c3afd078)
f9c05804 ???????? (c3afd1dc, d34e93e0, 0, f9c15e98, 10, c3afd108)
f9c63804 ???????? (d34756e0, d34e83d8, 0, f9c, 10, c3afd190)
fa4bd36c ???????? (f9c, d34e93e0, f96, f95, d, c3afd268)
f9c63e4c ???????? (d34e83a8, b6, c3afd338, 1, 4, 0)
f9c05804 ???????? (c3afd338, f2ee27c0, 0, f9c15e98, 10, c3afd268)
f9c46bb8 ???????? (d34e9388, f0866a58, f2ee27c0, 1, 4, c3afd2f8)
fa6c56d8 ???????? (d34e9388, f2ee27c0, 2f, f0866a58, 0, c3afd478)
fa6c6c70 ???????? (d34e9388, f0866a58, f2ee27c0, 1, 10, c3afd540)
fa6ccbb8 ???????? (d34e9bc0, 0, 1, f0866a58, 8, c3afd478)
fa67bf28 ???????? (d34e9bc0, d3502ac0, d34e82d8, 1, 10, c3afd540)
fa67c168 ???????? (d34e82d8, e3c6b060, f0846b40, f0866a58, 8, c3afd478)
fa64773c ???????? (d3502ab0, d3502ac0, d34e82d8, 1, 10, c3afd540)
f9dc820c ???????? (d3502ab0, b6, c3afd628, f9c15e98, 10, 0)
f9c05804 ???????? (c3afd634, b6, 0, f9c15e98, 10, c3afd540)
f9c05804 ???????? (c3afd6b4, b6, 0, f9c15e98, 10, c3afd5c8)
f9c05804 ???????? (c3afd73c, b8, 0, f9c15e50, c, c3afd650)
f9c05804 ???????? (c3afd7c4, b6, 0, f9c16298, c, c3afd6d8)
f9c05804 ???????? (c3afd844, b6, 0, f9c15e98, c, c3afd760)
f9c05804 ???????? (c3afd8cc, b7, 0, f9c15e50, 8, c3afd7e0)
f9c05804 ???????? (c3afd95c, b6, 0, f9c160d0, c, c3afd868)
f9c05804 ???????? (c3afd9e4, b6, 0, f9c15e98, c, c3afd8f8)
f9c05804 ???????? (c3afda74, f4405ee8, 0, f9c15e50, 8, c3afd978)
f9c05a8c ???????? (c3afdb14, f45c37a0, 0, f9c163d0, c, c3afda08)
f9c05a8c ???????? (c3afdba4, f47f5e30, 0, f9c16420, 14, c3afdaa8)
f9c05a8c ???????? (c3afdc34, b6, 0, f9c163d0, 8, c3afdb38)
f9c056e4 ???????? (c3afdcd4, b6, 0, f9c15e50, c, c3afdbd0)
f9c05804 ???????? (c3afdd64, f4783158, 0, f9c15e50, c, c3afdc70)
f9c059fc ???????? (c3afddfc, b7, 0, f9c163d0, c, c3afdcf0)
f9c05804 ???????? (c3afdea4, b6, 0, f9c160d0, 8, c3afdd90)
f9c05804 ???????? (c3afdf3c, b6, 0, f9c15e50, 8, c3afde30)
f9c05774 ???????? (c3afdfc4, b6, 0, f9c15e50, 10, c3afded0)
f9c05774 ???????? (c3afe064, b6, 0, f9c15e50, c, c3afdf58)
f9c05804 ???????? (c3afe10c, b7, 0, f9c15e50, 14, c3afdfe8)
f9c05804 ???????? (c3afe19c, b6, 0, f9c160d0, 10, c3afe0a0)
f9c05804 ???????? (c3afe22c, b7, 0, f9c16250, 10, c3afe130)
f9c05804 ???????? (c3afe2cc, f4591318, 0, f9c160d0, 10, c3afe1c0)
f9c059d8 ???????? (c3afe374, b8, 0, f9c163d0, 8, c3afe260)
f9c05804 ???????? (c3afe40c, b6, 0, f9c16250, 4, c3afe308)
f9c05750 ???????? (c3afe4a4, b6, 0, f9c15e50, 8, c3afe398)
f9c05804 ???????? (c3afe53c, b6, 0, f9c15e98, 8, c3afe440)
f9c05804 ???????? (c3afe5e4, b6, 0, f9c15e98, 8, c3afe4c0)
f9c05750 ???????? (c3afe684, f43eee20, 0, f9c15e50, 8, c3afe580)
f9c059d8 ???????? (c3afe6e8, daa8e080, 0, f9c163d0, 8, c3afe618)
f9c30d40 ???????? (ddb9b518, daa8e080, d1afafe8, f9c15e50, c, c3afe6a0)
fa026230 ???????? (db146b88, daa8e080, d1afafe8, f9c16250, c, 0)
fa0182bc ???????? (daf02128, daa8e080, d1afafe8, f9c15e50, c, c3afe7b0)
f9c30f48 ???????? (daf02128, b6, c3afe89c, f9c16250, c, 0)
f9c05750 ???????? (c3afe8a4, b8, 0, f9c15e50, c, c3afe7b0)
f9c05750 ???????? (c3afe92c, b6, 0, f9c16250, c, c3afe840)
f9c05750 ???????? (c3afe990, d1afaff8, 0, f9c15e50, 4, c3afe8c0)
f9c4f278 ???????? (d1afaff8, daa8e5c0, d1b0b920, d1afb0f0, 4, c3afe950)
fa7400a0 ???????? (61a80, d1afaff8, 1, daa00d48, 2710, 34)
f9c30b04 ???????? (daf9f1a8, f441b8a8, c3afeb54, f9c163d0, f441b8a8, f441bd18)
f9c059d8 ???????? (c3afeb54, f45f12b0, 0, f9c163d0, 8, c3afea48)
f9c059d8 ???????? (c3afec1c, f0d5e7d8, 0, f9c163d0, 4, c3afead8)
f9c059d8 ???????? (c3afecac, f43eee20, 0, f9c16420, 10, c3afeb98)
f9c059d8 ???????? (c3afed54, f43eef20, 0, f9c163d0, 8, c3afec40)
f9c059d8 ???????? (c3afedf4, b6, 0, f9c163d0, 8, c3afece0)
f9c05750 ???????? (c3afee7c, b6, 0, f9c15e50, 8, c3afed88)
f9c05750 ???????? (c3afef04, b6, 0, f9c15e50, c, c3afee18)
f9c05750 ???????? (c3afef8c, b6, 0, f9c15e50, 14, c3afee98)
f9c05750 ???????? (c3aff01c, b6, 0, f9c15e50, 14, c3afef20)
f9c05750 ???????? (c3aff0ac, f43e7f08, 0, f9c15e50, c, c3afefb0)
f9c059d8 ???????? (c3aff13c, f429d368, 0, f9c163d0, c, c3aff048)
f9c059d8 ???????? (c3aff1e4, b6, 0, f9c15e50, 10, c3aff0c8)
f9c05804 ???????? (c3aff29c, b6, 0, f9c15e50, c, c3aff178)
f9c05804 ???????? (c3aff33c, b6, 0, f9c15e50, c, c3aff210)
f9c05804 ???????? (c3aff3cc, b6, 0, f9c15e50, c, c3aff2d8)
f9c05804 ???????? (c3aff464, b6, 0, f9c15e50, c, c3aff358)
f9c05804 ???????? (c3aff4f4, f1a8a5c8, 0, f9c15e50, c, c3aff400)
f9c05a8c ???????? (c3aff58c, f084f340, 0, f9c163d0, c, c3aff490)
f9c059d8 ???????? (c3aff674, b6, 0, f9c163d0, 4, c3aff518)
f9c05804 ???????? (c3aff6fc, b6, 0, f9c15e50, 10, c3aff608)
f9c05804 ???????? (c3aff760, d883f8c8, 0, f9c15e50, c, c3aff690)
f9c4f278 ???????? (d19577a8, d883f8c8, d884f5a0, f0ac6f08, 4, c3aff720)
fa0a23a4 ???????? (d884f5a0, d883f8c8, d19577a8, f9c15e50, c, c3aff7e8)
f9c30f48 ???????? (d884f5a0, b6, 786b, f9c16250, 7869, 0)
f9c05750 ???????? (c3aff8dc, b8, f375bd34, f9c15e50, c, c3aff7e8)
f9c05750 ???????? (c3aff98c, b6, 0, f9c16250, c, c3aff868)
f9c05804 ???????? (c3aff9fc, d1a10f80, 0, f9c15e50, c, c3aff920)
f9c4947c ???????? (d1a10f80, d8808160, f81fc000, f0818618, 8, c3aff9b8)
fa114558 ???????? (d8808160, d1a10f80, f3745c98, f9c15e50, 8, c3affab0)
fa14b064 ???????? (d8808160, f096a2a0, f81fc000, f0818618, d8808160, 9)
f9cd2c38 ???????? (c3affb9c, c3affcf0, f3745c98, f9c15e50, 8, c3affab0)
f9c0010c ???????? (c3affc28, c3affe90, a, f112bc70, 4, c3affb40)
fed5bcf8 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (c3affe88, c3affcf0, c3affda8, a886f0, a886f0, c3affd00) + 27c
fee4a3e4 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff182000, a88c98, c3affd9c, c3affd98, c3affda8, a886f0) + 164
fee5d5a8 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (c3affe88, c3affe84, c3affe7c, c3affe74, c3affe6c, a886f0) + 6c
fee6e8f8 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (a886f0, a886f0, c69ff8, a88c98, 31a08c, fee67ed8) + 128
fee67f00 __1cKJavaThreadDrun6M_v_ (a886f0, 78, 40, 0, 40, 0) + 284
fee643e0 _start (a886f0, c3b00000, 0, 0, fee642ac, 1) + 134
ff3404f4 _lwp_start (0, 0, 0, 0, 0, 0)



Another thing to check is for any other processes that are running at the same time on the server.

We faced a situation in the past, when backup processes ran at a fixed time on a particular directory, and caused CPU spikes. This was due to there being a huge number of old files in that directory.


We ran a du on the /data filesystem and as soon as it hits /data/software/xyz/log it stops dead and the CPU goes up to 0% free. This filesystem is used for deployments as the name implies. The reason was when the backup hits this directory, it will attempt to do an ls or something similar, right? As it encounters the 188994 logs that were there, it completely clogs up the CPU trying to process them. This is what is causing the CPU spikes. Whenever anyone or anything accesses this filesystem, everything stops.

No comments:

Post a Comment