A Peek Inside Oracle ASM Metadata
A Peek Inside Oracle ASM Metadata
OVERVIEW
Oracle ASM (Automatic Storage Management) is a new feature of Oracle 10g to streamline storage management and provisioning. ASM provides volume and (cluster) filesystem management where the IO subsystem is directly handled by the Oracle kernel [Ref 1,2]. With Oracle 10g and ASM it is possible to build a scalable and highly available storage infrastructure build on low-cost hardware [Ref 3]. A typical example is using SATA HD with fiber channel controllers arranged in a SAN network. A scalable architecture built on low-cost HW is deployed at CERN combining ASM and Oracle 10g RAC on Linux [Ref 4]. Oracle documentation and whitepapers [Ref 1,2,3,5] provide the necessary information to set up Oracle ASM instances and configure the storage with it. Configuration details and performance metrics are exposed via a few V$ views. Other possibilities are the command line interface, asmcmd (10g R2) and the graphical interface of OEM. Metadata are however partially hidden to the end user. That is the mapping between physical storage, ASM allocation units, and database files is not completely exposed via V$ views. The author has found that is however possible to query such information via undocumented X$ tables. For example, it is possible to determine the exact physical location on disk of each extent (or mirror copies of extents) for each file allocated on ASM (and if needed access the data directly via the OS). This kind of information can be put to profit by Oracle practitioners wanting to extend their knowledge of the inner workings of the ASM or wanting to diagnose hotspots and ASM rebalancing issues [Ref 6]. Direct access to ASM files (data peeking), possibly automated in a small utility, for didactical scopes or for emergency data rescue is a possible extension of the findings documented here.
lists ASM aliases (files, directories) lists instances DB instances connected to ASM lists running rebalancing operations Extent mapping table for ASM files
From the table above we can see that the V$ASM_* views are based on X$KF* (i.e. X$ tables with KF as a prefix). There are more of such tables that are not used to build V$ASM_* views: X$KFFXP, X$KFDAT, X$KFCBH, X$KFCCE, X$KFBH, X$KFDPARTNER, X$KFCLLE. Note: the findings reported here are based on querying the documented dictionary views: V$FIXED_VIEW_DEFINITION and V$FIXED_TABLE. By querying the undocumented X$ tables listed above the author has found that the extent mapping table for ASM is contained in X$KFFXP (see also Ref 7 and 8).
X$KFFXP DESCRIPTION
By querying X$KFFXP on a test database running ASM 10g R2 and RAC the following description for X$KFFXP has been speculated:
Column Name ADDR INDX INST_ID NUMBER_KFFXP COMPOUND_KFFXP INCARN_KFFXP PXN_KFFXP XNUM_KFFXP GROUP_KFFXP DISK_KFFXP AU_KFFXP LXN_KFFXP FLAGS_KFFXP CHK_KFFXP Description table address/identifier row identifier instance number (RAC) ASM file number. Join with v$asm_file and v$asm_alias File identifier. Join with compound_index in v$asm_file File incarnation id. Join with incarnation in v$asm_file Extent number per file Logical extent number per file (mirrored extents have the same value) ASM disk group number. Join with v$asm_disk and v$asm_diskgroup Disk number where the extent is allocated. Join with v$asm_disk Relative position of the allocation unit from the beginning of the disk. The allocation unit size (1 MB) in v$asm_diskgroup 0,1 used to identify primary/mirror extent, 2 identifies file header allocation unit (hypothesis) N.K. N.K.
2. We find the number and location of the extents where the spfile is written:
sys@+ASM1> select DISK_KFFXP,AU_KFFXP,PXN_KFFXP,XNUM_KFFXP,LXN_KFFXP from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=267; DISK_KFFXP AU_KFFXP PXN_KFFXP XNUM_KFFXP LXN_KFFXP ---------- ---------- ---------- ---------- ---------24 3820 0 0 0 0 176 1 0 1
3. From steps 1. and 2. above we know that the spfile is 3584 bytes long and is stored in 2 mirrored extents: one on disk 0, the other on disk 24 (on disk group). We can find the OS path of the disks with the following query (note the test system used was on Linux using asmlib):
sys@+ASM1> select failgroup,disk_number,path from v$asm_disk where GROUP_NUMBER=1 and DISK_NUMBER in (0,24); FAILGROUP DISK_NUMBER PATH ---------- ----------- -------------------FG1 24 ORCL:ITSTOR08_2_EXT FG2 0 ORCL:ITSTOR11_10_EXT
4. We can now confirm with OS commands that the mapping is correct. dd allows the sysadmin to read the disks directly (bs=1M is the block size, while skip=176 means that the command starts reading at the offset 176M) . Using the disk name found in step 3 (only disk 0 demonstrated here) and the offsets found in step 2 we can confirm that the spfile data is at the expected physical location.
$ dd if=/dev/oracleasm/disks/ITSTOR11_10_EXT bs=1M count=1 skip=176| strings|head -4 test12.__db_cache_size=1476395008 test11.__db_cache_size=1476395008 test12.__java_pool_size=16777216 test11.__java_pool_size=16777216
We can see the first 4 lines of the spfile are printed out, as expected.
24
13678 . . . . . .
8 14490 24 13750 20 13791 7 14493 7 14384 24 13714 65534 4294967294 205 rows selected.
0 1 0 1 0 1 2
We can see that 205 allocation units of 1MB each are listed for the tablespace created with a datafile of 100MB nominal size. The actual datafile size is 101MB (the extra megabyte is allocated by Oracle for tablespace internal structures independently of the use of ASM). The disk group used for this test has normal redundancy, i.e. 2-way mirroring, this accounts for 101x2=202 allocation units. The remaining 3 allocation units are listed in the last 3 rows in the x$kffxp table and are identifiable by aving XNUM_KFFXP=2147483648. These are most likely file headers and/or metadata (this conclusion has still to be further investigated). One of the goals of ASM is providing uniform allocation of datafiles across the available spindles. This is a key feature of ASM and has the scope of maximizing performance (throughput, IOPS), lower latency times and increase HA. The following query investigates how the extents are allocated across the four available disks in the test disk group used for this example.
select DISK_KFFXP, count(*) from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271 and XNUM_KFFXP!=2147483648 group by DISK_KFFXP ; DISK_KFFXP COUNT(*) ---------- ---------7 51 8 50 20 51 24 50
We can see that space is allocated in a uniform way between the different disks (and fail groups). With the following query we can see that the primary and mirror extents (identified by the value of LXN_KFFXP) are mixed uniformly for each disk. Note: Oracle needs only has to read the primary extent of a mirrored pair of ASM-allocated extents for read operations (while both extents need to be accessed for a write operation). Therefore a uniform primary/mirror extent allocation provides for better performance of read operations..
sys@+ASM1> select DISK_KFFXP, LXN_KFFXP, count(*) from x$kffxp where GROUP_KFFXP=1 and NUMBER_KFFXP=271 and XNUM_KFFXP!=2147483648 group by
DISK_KFFXP, LXN_KFFXP order by DISK_KFFXP, LXN_KFFXP; DISK_KFFXP LXN_KFFXP COUNT(*) ---------- ---------- ---------7 0 25 7 1 26 8 0 25 8 1 25 20 0 26 20 1 25 24 0 25 24 1 25
We can see that the datafile, after the online redefinition of the disk group, is spread uniformly across the disks and correctly mirrored across the two fail groups, as expected.
SUMMARY
Oracle ASM is a powerful and easy to use volume manager and filesystem for Oracle 10g and 10g RAC. Configuration details and performance metrics of the configured ASM disks and disk groups are exposed via V$ASM_* views. However, the space allocation mapping (ASM file extents to disk allocation unit mapping) is not fully documented. This paper details how queries on the X$KFFXP internal table can be used to work around this limitation. A set of working examples has been discussed to demonstrate the findings and to directly explore some inner workings of ASM. As expected, it was found that datafiles are automatically spread over the available disks in a disk group, that mirroring is taken care by ASM and it is done at the extent level (as opposed to volume-level mirroring found in many other volume managers), and that online disk additions to a disk group allow to spread datafiles uniformly over a large number of spindles in a transparent way and can be used to improve performance and possibly reduce the impact of hot spots. A few open points remain to be investigated, such as: the role of the extra 3 allocation unit allocated for each datafile, that were documented in example N.2 (see rows where XNUM_KFFXP=2147483648). Rebalancing operations have been demonstrated to (re)distribute datafile extents uniformly over the available disks. ASM rebalancing algorithm apparently does not utilize workload metrics (from v$asm_disk_stat) to spread datafiles (such as spread apart hot parts of the datafiles) but seems to use a simpler round robin algorithm. A few additional X$KF* tables have been identified (see above), but their purposes have not yet been documented.
REFERENCES
1. N. Vengurlekar, 2005, http://www.oracle.com/technology/products/database/asm/pdf/asm_10gr2_bptwp_sept05.pdf 2. A. Shakian, OOW 2005, take the guesswork out of db tuning, http://www.oracle.com/pls/wocprod/docs/page/ocom/technology/products/database/asm/pdf/take %20the%20guesswork%20out%20of%20db%20tuning%2001-06.pdf 3. J Loaiza and S Lee, OTN 2005, http://www.oracle.com/technology/deploy/availability/pdf/1262_Loaiza_WP.pdf 4. L. Canali, scalable Oracle 10g architecture, https://twiki.cern.ch/twiki/pub/PSSGroup/HAandPerf/Architecture_description_Feb05.pdf 5. Oracle 10g administrators guide, Using Automatic Storage Management 6. Metalink, Bug N. 4306135 7. J Loaiza, OTN 1999, Optimal storage configuration made easy, http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf 8. S. Rognes, 2004, posting to oracle-l mailing list