使用 strace 分析 exp 时出现的奇怪问题(r3 注释第 41 天)
exp算是一个经典的数据导出工具了。对于小数量的表来说,个人还是比较钟爱exp。毕竟expdp还需要配置directory而且还在服务端。exp在数据量小的情况下速度还是很理想的。
关于exp导出的这个问题,已经拖了很久了,自己也排查了各种方法。通过查看wait event,查看exp的debug日志,都没有得出一些很有说服力的内容,今天下定决心来细细琢磨琢磨这个问题。有了一点收获。
之前在测试系统中碰到一个问题,导出一个比较大的分区表,分区数很多,其中有些分区里面没有数据,但是通过exp导出这些没有数据的分区时,平均每个分区都需要2秒左右的时间,如果表中有数据还可以理解,但是感觉cpu就在那空转一样。但是查看进程情况,耗费的cpu资源还不少。这个表刚好有好几百个分区,所以等待的时间很长,就在那看一行一行的慢慢输出0 rows exported.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17540 oraccbs1 25 0 127m 67m 8912 R 99.0 0.0 0:50.91 exp tables=xxxx file=a.dmp buffer=9102000
而且奇怪的是导出的时候强制退出也没反应,只能手工kill进程才可以。
当时因为需要做备份需要,全库备份太大,就使用了表导出模式,结果导出的时候速度很慢。无奈之下使用表导出模式,开了多个并行窗口同时导出,这个表就是有些特殊,还是慢。
最后在反复尝试之后,得出了一个折中办法,就是把参数buffer从9M降低,速度一下子就快了起来。
strace算是一个很有效率的工具,所以今天用strace先抓了两份日志来对比。
以下是通过exp导出这个分区表时,指定buffer为9M时的日志,其中分区A1_B1中没有数据。但是从日志里面看还进行了不少的操作。
write(2, ". . exporting partition "..., 54. . exporting partition A1_B1) = 54
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9991b9000
brk(0x122e3000) = 0x122e3000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9991f6000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999233000
brk(0x12305000) = 0x12305000
brk(0x12327000) = 0x12327000
brk(0x12349000) = 0x12349000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999270000
brk(0x1236b000) = 0x1236b000
brk(0x1238d000) = 0x1238d000
brk(0x123af000) = 0x123af000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9992ad000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9992ea000
brk(0x123d1000) = 0x123d1000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999327000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999364000
brk(0x123f3000) = 0x123f3000
brk(0x12416000) = 0x12416000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9993a1000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9993de000
brk(0x12438000) = 0x12438000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af99941b000
brk(0x1245a000) = 0x1245a000
brk(0x1247c000) = 0x1247c000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999458000
brk(0x1249e000) = 0x1249e000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999495000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9994d2000
brk(0x124c0000) = 0x124c0000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af99950f000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af99954c000
brk(0x124e2000) = 0x124e2000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999589000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9995c6000
brk(0x12504000) = 0x12504000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999603000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af999640000
brk(0x12526000) = 0x12526000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af99967d000
mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9996ba000
brk(0x12548000) = 0x12548000
brk(0x1256a000) = 0x1256a000
brk(0x1258c000) = 0x1258c000
brk(0x125ae000) = 0x125ae000
brk(0x125d0000) = 0x125d0000
brk(0x125f2000) = 0x125f2000
brk(0x12614000) = 0x12614000
mmap(NULL, , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9996f7000
write(7, "7? 6 3^&q200 1 376377377377377377377377y "..., 1855) = 1855
read(8, "v342 6 2027 2353313713313267G332M375360_360x316jx"..., 8208) = 3042
brk(0x12636000) = 0x12636000
brk(0x12657000) = 0x12657000
brk(0x12678000) = 0x12678000
brk(0x12699000) = 0x12699000
brk(0x126bb000) = 0x126bb000
brk(0x126dc000) = 0x126dc000
brk(0x126fd000) = 0x126fd000
brk(0x1271e000) = 0x1271e000
brk(0x12740000) = 0x12740000
brk(0x12761000) = 0x12761000
brk(0x12782000) = 0x12782000
brk(0x127a3000) = 0x127a3000
brk(0x127c5000) = 0x127c5000
brk(0x127e6000) = 0x127e6000
brk(0x12807000) = 0x12807000
。。。。。略去几十行的brk内容。。。
下面的日志是降低buffer之后的,从直观感受来说,日志内容要少很多。
write(2, ". . exporting partition "..., 54. . exporting partition A1_B1) = 54
brk(0x16ac7000) = 0x16ac7000
brk(0x16ae9000) = 0x16ae9000
brk(0x16b0b000) = 0x16b0b000
brk(0x16b2d000) = 0x16b2d000
brk(0x16b4f000) = 0x16b4f000
brk(0x16b71000) = 0x16b71000
write(7, "7? 6 3^&q200 1 376377377377377377377377y "..., 1855) = 1855
read(8, "v342 6 2027 2353313713313267G332M375360_360x316jx"..., 8208) = 3042
brk(0x16b92000) = 0x16b92000
brk(0x16bb3000) = 0x16bb3000
brk(0x16bd4000) = 0x16bd4000
brk(0x16bf5000) = 0x16bf5000
brk(0x16c17000) = 0x16c17000
brk(0x16c38000) = 0x16c38000
brk(0x16c59000) = 0x16c59000
brk(0x16c7a000) = 0x16c7a000
brk(0x16c9c000) = 0x16c9c000
brk(0x16cbd000) = 0x16cbd000
brk(0x16cde000) = 0x16cde000
brk(0x16cff000) = 0x16cff000
brk(0x16d21000) = 0x16d21000
brk(0x16d42000) = 0x16d42000
brk(0x16d63000) = 0x16d63000
brk(0x16d84000) = 0x16d84000
brk(0x16da6000) = 0x16da6000
brk(0x16dc7000) = 0x16dc7000
brk(0x16de8000) = 0x16de8000
brk(0x16e09000) = 0x16e09000
brk(0x16e2b000) = 0x16e2b000
brk(0x16e4c000) = 0x16e4c000
brk(0x16e6d000) = 0x16e6d000
brk(0x16e8e000) = 0x16e8e000
brk(0x16eb0000) = 0x16eb0000
brk(0x16ed1000) = 0x16ed1000
brk(0x16ef2000) = 0x16ef2000
brk(0x16f13000) = 0x16f13000
brk(0x16f35000) = 0x16f35000
brk(0x16f56000) = 0x16f56000
brk(0x16f77000) = 0x16f77000
brk(0x16f98000) = 0x16f98000
brk(0x16fba000) = 0x16fba000
brk(0x16fdb000) = 0x16fdb000
brk(0x16ffc000) = 0x16ffc000
lseek(3, 8192, SEEK_SET) = 8192
read(3, "r 307 V 310 226 311 310 313 353 314 61"..., 512) = 512
write(2, " 0 rows exported", 25 0 rows exported) = 25
write(7, " V 6 21k'Nr + 1 3T(376377377377"..., 86) = 86
read(8, " 34 6 10Nr + t1 375t", 8208) = 28
write(2, "n", 1
) = 1
write(2, ". . exporting partition "..., 54. . exporting partition A1_B2) = 54
对日志琢磨了一番,发现了一个奇怪的情况,不知道是不是巧合。
表的字段信息如下,其中有number类型的字段23个。在第一个很慢的日志中。类似“mmap(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af9992ad000”的内容就出现了23次。
Name Null? Type
------------------------------------- -------- --------------
OD_KEY NOT NULL NUMBER(5)
ITION_ID NOT NULL NUMBER(5)
T_ID NOT NULL NUMBER(12)
CREATION_DATE NOT NULL DATE
UPDATE_DATE DATE
ATOR_ID NUMBER(9)
ICATION_ID CHAR(6)
ERVICE_CODE CHAR(5)
PDATE_STAMP NUMBER(4)
UNT_ID NOT NULL NUMBER(12)
ING_ARRANGEMENT_ID NOT NULL NUMBER(12)
ICE_ID NUMBER(12)
P_TYPE VARCHAR2(6)
GES_AMOUNT NUMBER(18,2)
AMOUNT NUMBER(18,2)
NCE NUMBER(18,2)
TION_DATE DATE
DATE DATE
SACTION_ID NUMBER(12)
_SEQ_NO NUMBER(12)
RITY NUMBER(2)
UTE_NET_AMOUNT NUMBER(18,2)
UTE_TAX_AMOUNT NUMBER(18,2)
ING_CREDIT_NET_AMOUNT NUMBER(18,2)
ING_CREDIT_TAX_AMOUNT NUMBER(18,2)
BILL_SEQ_NO NUMBER(12)
SUB_BILL_SEQ_NO NUMBER(12)
ILL_SEQ_NO NUMBER(12)
UB_BILL_SEQ_NO NUMBER(12)
IOUS_LPF_CUTOFF_DATE DATE
IOUS_LPF_START_DATE DATE
ULATE_LPF_IND CHAR(1)
G_CATEGORY CHAR(1)
_TYPE VARCHAR2(6)
AST_LPF_CALC_DATE DATE
PF_WAIVED_AMOUNT NUMBER(18,2)
之前自己做过一个测试,把这个表的number字段减少一些,速度就快了很多,和平时做的导出就没有任何区别了,可能问题的所在还是在number型字段上。 oracle官方所给的关于buffer的解释如下:
BUFFER
Default: operating system-dependent. See your Oracle operating system-specific documentation to determine the default value for this parameter.
Specifies the size, in bytes, of the buffer used to fetch rows. As a result, this parameter determines the maximum number of rows in an array fetched by Export. Use the following formula to calculate the buffer size:
buffer_size = rows_in_array * maximum_row_size
If you specify zero, then the Export utility fetches only one row at a time.
Tables with columns of type LOBs, LONG, BFILE, REF, ROWID, LOGICAL ROWID, or DATE are fetched one row at a time.
Note:
The BUFFER parameter applies only to conventional path Export. It has no effect on a direct path Export. For direct path Exports, use the RECORDLENGTH parameter to specify the size of the buffer that Export uses for writing to the export file.
Example: Calculating Buffer Size
This section shows an example of how to calculate buffer size.
The following table is created:
CREATE TABLE sample (name varchar(30), weight number);
The maximum size of the name column is 30, plus 2 bytes for the indicator. The maximum size of the weight column is 22 (the size of the internal representation for Oracle numbers), plus 2 bytes for the indicator.
Therefore, the maximum row size is 56 (30+2+22+2).
To perform array operations for 100 rows, a buffer size of 5600 should be specified.
如果说number的字段值有多大,相比varchar2就小很多了。
个人感觉可能是一个bug.
至于为什么buffer设置到9M导出这个表就慢,到底慢在哪儿了?还可以通过strace来做一个很有意义的监控。
我们可以先运行exp的操作,让它先慢慢运行,然后开启strace,假设我们得到的exp的进程号是strace -c -p 25805
就可以通过strace -c -p 25805 来监控这个进程的一些详细信息。运行一会之后就强制中断,得到的结果如下,97%以上的资源都耗在brk操作上了。
> strace -c -p 25805
Process 25805 attached - interrupt to quit
Process 25805 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.86 0.001054 6 186 brk
2.14 0.000023 1 30 write
0.00 0.000000 0 12 read
------ ----------- ----------- --------- --------- ----------------
100.00 0.001077 228 total
brk的操作室返回内存管理的起始地址。比如brk(0x122e3000) = 0x122e3000 就是从0x122e3000 开始分配内存地址。
而如果调小buffer的部分,得到的日志如下:
> strace -c -p 26025
Process 26025 attached - interrupt to quit
Process 26025 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.01 0.004096 3 1197 read
34.99 0.002205 1 3522 write
------ ----------- ----------- --------- --------- ----------------
100.00 0.006301 4719 total
可以看到都是读写操作,日志里面也确实导出了不少的数据。 . . exporting partition A11_B10 65222 rows exported . . exporting partition A12_B1 181825 rows exported . . exporting partition A12_B2 181783 rows exported . . exporting partition A12_B3 181582 rows exported 看来这个问题还是和number数据类型存在着一定的关系,大家在数据导出的时候如果碰到这个问题也可以注意一下。