Database 12c: automatic generation of extended statistics
Why does the CBO get it wrong? Often because it has insufficient information. No matter how often you analyze your tables, if your queries use multi-column predicates, the CBO will mis-calculate the cardinalities. You have to understand your data, and create extended statistics to correlate the columns. How many people do this? Hardly any. Not a problem any more: release 12c can do this for you. If you configure it appropriately.
When discussing query performance, I always say two things: "you must understand your data" and "you must tell Oracle what you know". In this blog, I want to talk about just one aspect of this: correlation between columns.
The problem
Consider this query (working in the SCOTT demonstration schema):
select * from emp where job='SALESMAN' and deptno=30;
We know that there are six people in department 30, and that there are four salesmen. And so does Oracle, almost exactly:
orclz> set autot on exp orclz> select count(*) from emp where deptno=30; COUNT(*) ---------- 6 Execution Plan ---------------------------------------------------------- Plan hash value: 2083865914 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| EMP | 6 | 18 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("DEPTNO"=30) orclz> select count(*) from emp where job='SALESMAN'; COUNT(*) ---------- 4 Execution Plan ---------------------------------------------------------- Plan hash value: 2083865914 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 8 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 8 | | | |* 2 | TABLE ACCESS FULL| EMP | 3 | 24 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("JOB"='SALESMAN') orclz>
We also now that department 30 is SALES, and that is the department to which all the sales droids belong. But Oracle doesn't know this:
orclz> orclz> select count(*) from emp where job='SALESMAN' and deptno=30; COUNT(*) ---------- 4 Execution Plan ---------------------------------------------------------- Plan hash value: 2083865914 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 11 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 11 | | | |* 2 | TABLE ACCESS FULL| EMP | 1 | 11 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("JOB"='SALESMAN' AND "DEPTNO"=30) orclz>
The cardinality estimate is out by a factor of four. Why? Because Oracle has multiplied the estimated selectivity factors: deptno=30 is 6/14, job='SALESMAN' is 3/14, the result (rounded) is 1/14. Scale this up to the real world, and this is a disaster. In particular it will affect the join order of tables in queries. And we all know (I hope) that if there is any such thing as a silver bullet for SQL tuning, it is to get the join order right.
The 11g solution is to tell Oracle what we know, by creating extended statistics:
orclz> select dbms_stats.create_extended_stats(user,'emp','(deptno,job)') from dual; DBMS_STATS.CREATE_EXTENDED_STATS(USER,'EMP','(DEPTNO,JOB)') ------------------------------------------------------------------------------------------------------------------------ SYS_STU$7LJEWQEV#_NNT_P4FXAU5K orclz> set autot on exp orclz> select count(*) from emp where job='SALESMAN' and deptno=30; COUNT(*) ---------- 4 Execution Plan ---------------------------------------------------------- Plan hash value: 2083865914 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 19 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 19 | | | |* 2 | TABLE ACCESS FULL| EMP | 4 | 76 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("JOB"='SALESMAN' AND "DEPTNO"=30) orclz>
so now Oracle gets it right. But YOU have to understand your data, and YOU have to do the work. Look at your data - there will be any number of correlations: between the parts of an address; between job and salary; between product, season, and location; on how many of these have you created extended stats? In many databases, none.
A 12c solution
12c can automate the process. First, instruct Oracle to monitor the usage of columns in predicates for a period during which time you run your usual workload (for this example, I've re-initialized the SCOTT schema, and monitor for just one minute). Second, generate a report on how columns where used. Third, let Oracle create the extended stats it thinks necessary. Fourth, accept the praise for your phenomenal success at tuning:
orclz> orclz> exec dbms_stats.seed_col_usage(null,null,60) PL/SQL procedure successfully completed. Commit complete. orclz> select count(*) from emp where job='SALESMAN' and deptno=30; COUNT(*) ---------- 4 orclz> set long 100000 orclz> select dbms_stats.report_col_usage('scott','emp') from dual; DBMS_STATS.REPORT_COL_USAGE('SCOTT','EMP') -------------------------------------------------------------------------------- LEGEND: ....... EQ : Used in single table EQuality predicate RANGE : Used in single table RANGE predicate LIKE : Used in single table LIKE predicate NULL : Used in single table is (not) NULL predicate EQ_JOIN : Used in EQuality JOIN predicate NONEQ_JOIN : Used in NON EQuality JOIN predicate FILTER : Used in single table FILTER predicate JOIN : Used in JOIN predicate GROUP_BY : Used in GROUP BY expression ............................................................................... ############################################################################### COLUMN USAGE REPORT FOR SCOTT.EMP ................................. 1. DEPTNO : EQ 2. JOB : EQ 3. (JOB, DEPTNO) : FILTER ############################################################################### orclz> select dbms_stats.create_extended_stats('scott','emp') from dual; DBMS_STATS.CREATE_EXTENDED_STATS('SCOTT','EMP') -------------------------------------------------------------------------------- ############################################################################### EXTENSIONS FOR SCOTT.EMP ........................ 1. (JOB, DEPTNO) : SYS_STU3VG629OEYG6FN0EKTGV_HQ6 created ############################################################################### orclz> set autot on exp orclz> select count(*) from emp where job='SALESMAN' and deptno=30; COUNT(*) ---------- 4 Execution Plan ---------------------------------------------------------- Plan hash value: 2083865914 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 19 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 19 | | | |* 2 | TABLE ACCESS FULL| EMP | 4 | 76 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("JOB"='SALESMAN' AND "DEPTNO"=30) orclz>
So now Uncle Oracle gets it right - and you don't have to understand the data any more. This really is SQL statement tuning made easy.
All demonstrations done using release 12.1.0.1 on Windows.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com
- John Watson's blog
- Log in to post comments
Comments
Random Package Invalidation
If you continued testing you would discover that the automatic creation also automatically invalidates any packages that reference the table, effectively creating random invalidations. This then results in ORA-4068 errors and failing applications. Only solution is to disable extended statitics. Hope you haven't made use of this anywhere!
Extended stats: ORA-4068
All I can say is that I've been using extended stats (or "column group stats" as some people refer to them) for many years and have not seen this issue. Adding the hidden column will of course invalidate any dependent objects, but there is no reason why they will not recompile on demand.
If you care to provide an example, I'm sure people will find it valuable.