欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

oracle-troubleshooting- 如何使用 v$active_session_history 分析数据库历史记录性能问题

最编程 2024-07-18 08:48:13
...

适用于

Oracle 10G或更高版本的单机,本文适用于任何平台。

详情

- Active Session: on the CPU or was waiting for an event that didn't belong to the Idle wait class
- 在Oracle 10G中,我们引入了AWR和ASH采样机制,有一个视图gv$active_session_history会每秒钟将数据库所有节点的Active Session采样一次,
- 基于这个特征,我们可以通过分析v$active_session_history的Session采样情况,来定位问题发生的准确时间范围,并且可以观察每个采样点的top event和top holder。下面通过一个例子来详细说明。

v$active_session_history版

1. Dump出问题期间的ASH数据:

    为了不影响生产系统,我们可以将问题大概期间的ASH数据export出来在测试机上分析。
    基于gv$active_session_history创建一个新表t_ash,然后将其通过exp/imp导入到测试机。在发生问题的数据库上执行exp:
    SQL> conn user/passwd
    SQL> create table m_ash_20220416 as select * from v$ACTIVE_SESSION_HISTORY
        where  sample_time >
                    to_timestamp('2022-04-16 15:30:00','yyyy-mm-dd hh24:mi:ss')
                and sample_time <
                    to_timestamp('2022-04-16 16:30:00','yyyy-mm-dd hh24:mi:ss');

    $ exp user/passwd file=m_ash_20220416.dmp tables=(t_ash) log=t_ash.exp.log

    然后导入到测试机:
    $ imp user/passwd file=m_ash_20220416.dmp log=m_ash_20220416.imp.log

2. 验证导出的ASH时间范围:

    建议采用Oracle SQL Developer来查询以防止输出结果折行不便于观察。

    set line 200 pages 1000
    col sample_time for a25
    col event for a40
    alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';

    select
    min(sample_time), max(sample_time), count(*) session_count
      from m_ash_20220416 t;
     
    MIN(SAMPLE_TIME)                                   MAX(SAMPLE_TIME)                                   SESSION_COUNT
    -------------------------------------------------- -------------------------------------------------- -------------
    2022/4/16 16:01:14.687                             2022/4/16 16:29:59.501                                    125301


    从以上输出可知该数据库采样时间共半小时。

3. 确认问题发生的精确时间范围:

    参考如下脚本:
    col event for a35
    set lines 222
    set pages 9999
    select
     sample_id, sample_time, count(*) session_count
      from m_ash_20220416 t
     group by   sample_id, sample_time
     order by   sample_time;
     SAMPLE_ID SAMPLE_TIME                                        SESSION_COUNT
    ---------- -------------------------------------------------- -------------
     200574254 2022/4/16 16:01:14.687                                      1087
     200574255 2022/4/16 16:01:15.714                                      1090
     200574256 2022/4/16 16:01:16.743                                      1076
     200574257 2022/4/16 16:01:17.769                                      1080
     200574258 2022/4/16 16:01:19.937                                      1087
     200574259 2022/4/16 16:01:20.961                                      1092
     200574260 2022/4/16 16:01:21.989                                      1093
     200574261 2022/4/16 16:01:23.018                                      1093
     200574262 2022/4/16 16:01:24.051                                      1097
     200574263 2022/4/16 16:01:25.076                                      1100
     200574264 2022/4/16 16:01:26.102                                      1101
     200574265 2022/4/16 16:01:27.129                                      1064
     200574266 2022/4/16 16:01:28.155                                      1067
     200574267 2022/4/16 16:01:29.189                                      1071
     200574268 2022/4/16 16:01:30.214                                      1075
     ...
     200574374 2022/4/16 16:03:21.307                                       255
     200574375 2022/4/16 16:03:22.319                                       287
     200574376 2022/4/16 16:03:23.332                                       318
     200574377 2022/4/16 16:03:24.352                                       198
     200574378 2022/4/16 16:03:25.364                                       216
     200574379 2022/4/16 16:03:26.378                                        40
     200574380 2022/4/16 16:03:27.389                                        22
     200574381 2022/4/16 16:03:28.402                                        25
     200574382 2022/4/16 16:03:29.432                                        27
     200574383 2022/4/16 16:03:30.445                                        25

    注意观察以上输出的每个采样点的active session的数量,数量突然变多往往意味着问题发生了。从以上输出可以确定问题发生的精确时间在2022/4/16 16:01:14 ~ 2022/4/16 16:03:25,问题持续了大约2分钟。
    注意: 观察以上的输出有无断档,比如某些时间没有采样。

4. 确定每个采样点的top n event:

    在这里我们指定的是top 2 event 并且注掉了采样时间以观察所有采样点的情况。如果数据量较多,您也可以通过开启sample_time的注释来观察某个时间段的情况。
    注意最后一列session_count指的是该采样点上的等待该event的session数量。
    col event for a35
    set lines 1000
    set pages 9999
select 
           t.sample_id,
           t.sample_time,
           t.event,
           t.session_state,
           t.c session_count
      from (select t.*,
                   rank() over(partition by  sample_time order by c desc) r
              from (select /*+ parallel 2 */
                     t.*,
                     count(*) over(partition by  sample_time, event) c,
                     row_number() over(partition by   sample_time, event order by 1) r1
                      from m_ash_20220416 t
                    /*where sample_time >
                        to_timestamp('2022-04-16 13:59:00',
                                     'yyyy-mm-dd hh24:mi:ss')
                    and sample_time <
                        to_timestamp('2022-04-16 14:10:00',
                                     'yyyy-mm-dd hh24:mi:ss')*/
                    ) t
             where r1 = 1) t
     where r < 3
     order by   sample_time, r;


     SAMPLE_ID SAMPLE_TIME                                        EVENT                               SESSION_STATE SESSION_COUNT
    ---------- -------------------------------------------------- ----------------------------------- ------------- -------------
     200574254 2022/4/16 16:01:14.687                             buffer busy waits                   WAITING                 870
     200574254 2022/4/16 16:01:14.687                             log file sync                       WAITING                 143
     200574255 2022/4/16 16:01:15.714                             buffer busy waits                   WAITING                 870
     200574255 2022/4/16 16:01:15.714                             log file sync                       WAITING                 146
     200574256 2022/4/16 16:01:16.743                             buffer busy waits                   WAITING                 869
     200574256 2022/4/16 16:01:16.743                             log file sync                       WAITING                 133
     200574257 2022/4/16 16:01:17.769                             buffer busy waits                   WAITING                 869
     200574257 2022/4/16 16:01:17.769                             log file sync                       WAITING                 136
     200574258 2022/4/16 16:01:19.937                             buffer busy waits                   WAITING                 869
     200574258 2022/4/16 16:01:19.937                             log file sync                       WAITING                 144
     200574259 2022/4/16 16:01:20.961                             buffer busy waits                   WAITING                 868
     200574259 2022/4/16 16:01:20.961                             log file sync                       WAITING                 148
     200574260 2022/4/16 16:01:21.989                             buffer busy waits                   WAITING                 868
     200574260 2022/4/16 16:01:21.989                             log file sync                       WAITING                 150
     200574261 2022/4/16 16:01:23.018                             buffer busy waits                   WAITING                 869
     200574261 2022/4/16 16:01:23.018                             log file sync                       WAITING                 152
     200574262 2022/4/16 16:01:24.051                             buffer busy waits                   WAITING                 869
     200574262 2022/4/16 16:01:24.051                             log file sync                       WAITING                 156
     200574263 2022/4/16 16:01:25.076                             buffer busy waits                   WAITING                 870
     200574263 2022/4/16 16:01:25.076                             log file sync                       WAITING                 158
     200574264 2022/4/16 16:01:26.102                             buffer busy waits                   WAITING                 869
     ...
     200574396 2022/4/16 16:03:43.716                             direct path read                    WAITING                   1
     200574396 2022/4/16 16:03:43.716                             log file sync                       WAITING                   1
     200574397 2022/4/16 16:03:44.730                                                                 ON CPU                   14
     200574397 2022/4/16 16:03:44.730                             db file parallel write              WAITING                   2
     200574398 2022/4/16 16:03:45.744                                                                 ON CPU                   11
     200574398 2022/4/16 16:03:45.744                             db file parallel write              WAITING                   3
     200574399 2022/4/16 16:03:46.754                                                                 ON CPU                   11
     200574399 2022/4/16 16:03:46.754                             Backup: MML create a backup piece   WAITING                   1
     200574399 2022/4/16 16:03:46.754                             Disk file operations I/O            WAITING                   1
     200574399 2022/4/16 16:03:46.754                             db file sequential read             WAITING                   1
     
     
    从以上输出我们可以发现问题期间最严重的等待为buffer busy waits ,高峰期等待该event的session数达到了870个,其次为log file sync,高峰期session为490个。

    ***注意:
        1) 再次确认以上输出有无断档,是否有某些时间没有采样。
        2) 注意那些session_state为ON CPU的输出,比较ON CPU的进程个数与您的OS物理CPU的个数,如果接近或者超过物理CPU个数,那么您还需要检查OS当时的CPU资源状况,
           比如OSWatcher/NMON等工具,高的CPU Run Queue可能引发该问题,当然也可能是问题的结果,需要结合OSWatcher和ASH的时间顺序来验证。

5. 观察每个采样点的等待链:(更详细显示参考:附1)

    其原理为通过 $active_session_history. blocking_session记录的holder来通过connect by级联查询,找出最终的holder. 
    该输出中isleaf=1的都是最终holder,而iscycle=1的代表死锁了(也就是在同一个采样点中a等待b,b等待c,而c又等待a,这种情况如果持续发生,那么尤其值得关注)。采用如下查询能观察到阻塞链。

    col event for a30
    col SAMPLE_TIME for a30
    set lines 322
    set pages 9999
    
    select /*+ parallel 8 */
     level                     lv,
     connect_by_isleaf         isleaf,
     connect_by_iscycle        iscycle,
     t.sample_id,
     t.sample_time,
     t.session_id,
     t.sql_id,
     t.session_type,
     t.event,
     t.session_state,
     t.blocking_session,
     t.blocking_session_status
      from m_ash_20220416 t 
     start with blocking_session is not null
    connect by nocycle
         prior sample_id = sample_id
             and prior blocking_session = session_id
           and prior blocking_session_serial# = session_serial#
     order siblings by  sample_id;


        LV     ISLEAF    ISCYCLE  SAMPLE_ID SAMPLE_TIME                    SESSION_ID SQL_ID        SESSION_TYPE EVENT                          SESSION_STATE BLOCKING_SESSION BLOCKING_SESSION_STATUS
---------- ---------- ---------- ---------- ------------------------------ ---------- ------------- ------------ ------------------------------ ------------- ---------------- -----------------------
         1          0          0  200574254 2022/4/16 16:01:14.687              14306 dxxxh350mfyvj FOREGROUND   enq: US - contention           WAITING                     66 VALID                  
         2          0          0  200574254 2022/4/16 16:01:14.687                 66 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   4670 VALID                  
         3          0          0  200574254 2022/4/16 16:01:14.687               4670 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   6082 VALID                  
         4          0          0  200574254 2022/4/16 16:01:14.687               6082 dxxxh350mfyvj FOREGROUND   log file sync                  WAITING                  10501 VALID                  
         5          1          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write        WAITING                        NO HOLDER              
         1          0          0  200574254 2022/4/16 16:01:14.687               3829 dxxxh350mfyvj FOREGROUND   enq: US - contention           WAITING                    772 VALID                  
         2          0          0  200574254 2022/4/16 16:01:14.687                772 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   4670 VALID                  
         3          0          0  200574254 2022/4/16 16:01:14.687               4670 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   6082 VALID                  
         4          0          0  200574254 2022/4/16 16:01:14.687               6082 dxxxh350mfyvj FOREGROUND   log file sync                  WAITING                  10501 VALID                  
         5          1          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write        WAITING                        NO HOLDER              
         1          0          0  200574254 2022/4/16 16:01:14.687               9159 dxxxh350mfyvj FOREGROUND   enq: US - contention           WAITING                    922 VALID                  
         2          0          0  200574254 2022/4/16 16:01:14.687                922 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   4670 VALID                  
         3          0          0  200574254 2022/4/16 16:01:14.687               4670 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   6082 VALID                  
         4          0          0  200574254 2022/4/16 16:01:14.687               6082 dxxxh350mfyvj FOREGROUND   log file sync                  WAITING                  10501 VALID                  
         5          1          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write        WAITING                        NO HOLDER              
         1          0          0  200574254 2022/4/16 16:01:14.687                100 dxxxh350mfyvj FOREGROUND   enq: US - contention           WAITING                   1558 VALID                  
         2          0          0  200574254 2022/4/16 16:01:14.687               1558 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   4670 VALID                  
         3          0          0  200574254 2022/4/16 16:01:14.687               4670 dxxxh350mfyvj FOREGROUND   buffer busy waits              WAITING                   6082 VALID                  
         4          0          0  200574254 2022/4/16 16:01:14.687               6082 dxxxh350mfyvj FOREGROUND   log file sync                  WAITING                  10501 VALID                  
         5          1          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write        WAITING                        NO HOLDER              
         1          0          0  200574254 2022/4/16 16:01:14.687               2404 2pa3dfjgqarkg FOREGROUND   buffer busy waits              WAITING                   1624 VALID                  
         2          0          0  200574254 2022/4/16 16:01:14.687               1624 2pa3dfjgqarkg FOREGROUND   log file sync                  WAITING                  10501 VALID                  
         3          1          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write        WAITING                        NO HOLDER              

    ***注意:
         从上面的输出可见,在相同的采样点上(2022/4/16 16:01:14), 大部分session 在等待buffer busy waits和log file sync,其最终都被 session 10501 阻塞,
        而 session 10501 又在等待log file parallel write,因此这里buffer busy waits只是一个表面现象,
        问题的原因在于log file parallel write

6. 基于第5步的原理来找出每个采样点的最终top holder:

    比如如下SQL列出了每个采样点top 2的blocker session,并且计算了其最终阻塞的session数(参考blocking_session_count)

    col event for a30
    col SAMPLE_TIME for a30
    set lines 322
    set pages 9999
    select t.lv,
           t.iscycle,
           t.sample_id,
           t.sample_time,
           t.session_id,
           t.sql_id,
           t.session_type,
           t.event,
           t.seq#,
           t.session_state,
           t.blocking_inst_id,
           t.blocking_session,
           t.blocking_session_status,
           t.c blocking_session_count
      from (select t.*,
                   row_number() over(partition by   sample_time order by c desc) r
              from (select t.*,
                           count(*) over(partition by   sample_time, session_id) c,
                           row_number() over(partition by   sample_time, session_id order by 1) r1
                      from (select
                             level              lv,
                             connect_by_isleaf  isleaf,
                             connect_by_iscycle iscycle,
                             t.*
                              from m_ash_20220416 t
                            /*where sample_time >
                                to_timestamp('2013-11-17 13:55:00',
                                             'yyyy-mm-dd hh24:mi:ss')
                            and sample_time <
                                to_timestamp('2013-11-17 14:10:00',
                                             'yyyy-mm-dd hh24:mi:ss')*/
                             start with blocking_session is not null
                            connect by nocycle
                                    prior sample_time = sample_time
                                   and prior blocking_session = session_id
                                   and prior
                                        blocking_session_serial# = session_serial#) t
                     where t.isleaf = 1) t
             where r1 = 1) t
     where r < 3
     order by  sample_time, r;
     

        LV    ISCYCLE  SAMPLE_ID SAMPLE_TIME                    SESSION_ID SQL_ID        SESSION_TYPE EVENT                                SEQ# SESSION_STATE BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SESSION_STATUS BLOCKING_SESSION_COUNT
---------- ---------- ---------- ------------------------------ ---------- ------------- ------------ ------------------------------ ---------- ------------- ---------------- ---------------- ----------------------- ----------------------
         4          0  200574254 2022/4/16 16:01:14.687              10501               BACKGROUND   log file parallel write             43500 WAITING                                         NO HOLDER                                 1071
         4          0  200574255 2022/4/16 16:01:15.714              10501               BACKGROUND   log file parallel write             43500 WAITING                                         NO HOLDER                                 1074
         3          0  200574256 2022/4/16 16:01:16.743              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1063
         4          0  200574257 2022/4/16 16:01:17.769              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1066
         2          0  200574258 2022/4/16 16:01:19.937              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1073
         4          0  200574259 2022/4/16 16:01:20.961              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1075
         4          0  200574260 2022/4/16 16:01:21.989              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1077
         2          0  200574261 2022/4/16 16:01:23.018              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1079
         2          0  200574262 2022/4/16 16:01:24.051              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1083
         2          0  200574263 2022/4/16 16:01:25.076              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1086
         5          0  200574264 2022/4/16 16:01:26.102              10501               BACKGROUND   log file parallel write             43501 WAITING                                         NO HOLDER                                 1087
        ...
         2          0  200575893 2022/4/16 16:29:51.395              10501               BACKGROUND   log file parallel write             13558 WAITING                                         NO HOLDER                                    2
         2          0  200575894 2022/4/16 16:29:52.403              10501               BACKGROUND   log file parallel write             13777 WAITING                                         NO HOLDER                                    2
         2          0  200575896 2022/4/16 16:29:54.431              10501               BACKGROUND   log file parallel write             14150 WAITING                                         NO HOLDER                                    5
         1          0  200575898 2022/4/16 16:29:56.455              18135               FOREGROUND   log file sync                       32163 WAITING                      1            10501 VALID                                        1
         2          0  200575900 2022/4/16 16:29:58.476              10501               BACKGROUND   log file parallel write             15150 WAITING                                         NO HOLDER                                    1

已选择 923 行。

    从以上输出,比如第一行,代表在2022/4/16 16:01:14,节点1的session 10501 最终阻塞了1071个session.  顺着时间往下看,可见session 10501 是问题期间最严重的holder,
    它在每个采样点都阻塞了1000多个session,并且它持续等待log file parallel write,
    注意观察其seq#您会发现该event的seq#在16:01分有几秒没变化,表明session hang住了,过了会才不断变化,表明该session又不完全hang住,
    由于10501是redo写日志后台进程,并等待事件为具体的log file parallel write,这很可能是在问题时间段io有问题造成整个数据库无法响应。
    因此可以结合oswatch或系统层面的io监控输出进一步确定问题。
    **注意:
        1) blocking_session_count 指某一个holder最终阻塞的session数,比如 a <- b<- c (a被b阻塞,b又被c阻塞),只计算c阻塞了1个session,因为中间的b可能在不同的阻塞链中发生重复。
        2) 如果最终的holder没有被ash采样(一般因为该holder处于空闲),比如 a<- c 并且b<- c (a被c阻塞,并且b也被c阻塞),但是c没有采样,那么以上脚本无法将c统计到最终holder里,这可能会导致一些遗漏。
        3) 注意比较blocking_session_count的数量与第3步查询的每个采样点的总session_count数,如果每个采样点的blocking_session_count数远小于总session_count数,那表明大部分session并未记载holder,因此本查询的结果并不能代表什么。
        4) 在Oracle 10g中,ASH并没有blocking_inst_id列,在以上所有的脚本中,您只需要去掉该列即可,因此10g的ASH一般只能用于诊断单节点的问题。
        

附1:根据等待事件查看等待链

select /*+ parallel 8 */
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle     iscycle,
t.sample_id,
t.sample_time,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLOCATED,
t.TEMP_SPACE_ALLOCATED,
t.DELTA_READ_IO_BYTES,
t.DELTA_WRITE_IO_BYTES,
t.TM_DELTA_CPU_TIME,
t.event,
t.blocking_session,
t.blocking_session_serial#,
t.blocking_session_status,
t.WAIT_TIME,
t.TIME_WAITED,
t.CURRENT_OBJ#,
t.CURRENT_ROW#,
t.xid
from m_ash_20220416 t
start with event = 'buffer busy waits'
--start with blocking_session is not null
connect by nocycle  prior blocking_session = session_id and prior blocking_session_serial# = session_serial#
and prior sample_time = sample_time  order siblings by  sample_time;

附2: 查看某个具体session的采样等待信息

select /*+ parallel 4 */
t.sample_time,
t.session_id,
t.session_serial#,
t.session_type,
t.session_state,
t.SQL_OPNAME,
t.sql_id,
T.SQL_PLAN_HASH_VALUE,
t.SQL_PLAN_LINE_ID,
T.SQL_CHILD_NUMBER,
T.SQL_EXEC_ID,
T.SQL_EXEC_START,
T.IN_BIND,
t.IN_PARSE,
t.IN_HARD_PARSE,
t.IN_SQL_EXECUTION,
t.IN_PLSQL_COMPILATION,
t.IN_PLSQL_EXECUTION,
t.IN_SEQUENCE_LOAD,
T.IN_PLSQL_RPC,
t.PROGRAM,
t.MACHINE,
t.PGA_ALLO