Skip to main content

Oracle Optimizer Master Class w/ Tom Kyte: Part 1

Presentation Slides can be downloaded from the AskTom website.

[Updated : 8:58am PDT]

What happens when a SQL Statement is issued?

  1. Syntax Check -- is it SQL?
  2. Semantic Check -- What is being queried? What Objects? Dictionary check. 
  3. Shared Pool Check
(steps 1-3 == Soft parse)

ways to reduce soft parse:

  • Enable JDBC Statement cacheing
  • Bind vars
  • PL/SQL functions
[Updated : 9:13am PDT]


Cost Based Optimizer introduced in 1992 in Oracle 7.0 #retroITfacts
Not really considered the default optimizer until version 9.0

Understanding how Optimizer work
Query Transformation

  • Rewrite query text to allows it to be processed more efficiently Oracle does this
  • YOU write code for readabilty and 'syntactically pleasing' -- says Mr Kyte
  • translate statement into semantically equivalent SQL that can be processed


Did you know: There is an ANY and ALL sql set operator? Check it out!

[Updated: 9:27am PDT]

OR Expansion
'OR is the worst thing in the world. "  Mr Kyte


  • Transforms queries that contain OR predicates into the form of a UNION ALL query with two or more branches. 
  • Without the transformation Optimizer treats OR predicates as a single unit
  • Can't use index on either column. <- THIS SHOULD BE IN ALL CAPS. IT"S IMPORTANT. 

Star Query Transformation

  • Turns query inside out, with joins back to 
Optimizer Dump : 10053 trace file

All above is how Optimizer Query Transformation happens automatically! 

[Updated: 9:40am PDT]


Understanding Optimizer Statistics

Who is responsible for Statistics in the database?

  • Developers or DBA? 
    • Tom Says 'Developers'
    • Does DBA know what the application code? 
    • Does DBA know that unindexed column contains highly skewed data and is used by joins to other tables? 
    • Does DBA know that table is truncated at 3am but has more than a million rows by 3pm? 
  • Constraints impact optimizer immensely!
How to Gather Stats
  • use DBMS_STATS
  • never, ever EVER use 'analyze'
  • Cost Base optimizer expects stats from DBMS_STATS, analyze has not been maintained. 
  • Incremental statistics (new in 11g)
    • generate global stats from partition level stats
    • control by the parameter INCREMENTAL (default is false)
  • Concurrent stats gathering
    • ability to gather stats on multiple options concurrently under a GATHER_SCHEMA_STATS command
    • controlled by CONCURRENT (default is false)
(all this in the slides, but worth repeating, it's great foundational information) 

[Updated: 9:57am PDT]

How to Gather Stats

  • SET_SCHEMA_PREF updates default collection settings
  • SET_GLOABL_PREF same as schema, but for global 
    • DON'T TOUCH ESTIMATE PERCENT IN 11G and beyond
    • DON'T TOUCH METHOD OPT
  • PUBLISH
    • allows you to collect stats and validate if optimization is better before and after. 
    • need ot update session parameter to use new schema. 
  • STALE_PERCENT
    • DEFAULT 10% 
    • should be updated or altered based activity pattern
Hierarchy for Parameter values
  • parameter values from command-line
  • table preferences
  • global preferences
How to gather stats? 
  • 11g on words use AUTO_SAMPLE_SIZE
    • New hash based algorithm
    • speed and resources of 10% sample 
    • accuracy of 100% sample
  • specifying ESTIMATE_PERCENT other than AUTO_SAMPLE_SIZE will use 10g algorthms
  • NDV and NUM_ROWS are most imporant values when gathering stats
What information is gathered with DBMS_STATS?

  • Number of rows
  • number of blocks
  • average row length
  • number of distinct values
  • number of nulls in column
Histograms
Frequency histograms:

  • only if the number of distinct values in column (NDV) is less than 254. 
  • Cardinatly estimate will be VERY accurate
Hight balenaced histograms
  • if NDV for a column is greater than 254
  • "What if you were like me in High School? Almost popular"
Top Frequency (new in 12c)
  • creates a freqency histogram for 99% of values 
  • 1%  must not happen often, so we'll give them a small cardinality.
  • ONLY created with AUTO_SAMPLE_SIZE
Hybrid Histograms
  • similar to hight balanced histograms as created when NDV > 254
  • stores actual frequency of bucket endpoints in histograms
  • no values allow to spill over other buckets
  • more endpoint values can be squeezed into histogram
[Updated 10:25am PDT]

(skipping pre-11g histogram discussion, as not relevant to me)

Histograms in 11g and beyond

  • Adaptive Cursor Sharing, w/ Bind Peeking
    • you can have BOTH plans
    • plans are marked as bind sensitive
    • create NEW plan if binds are not equivalent
    • Look in v$sql to see info on adaptive cursor
  • Nearly Popular Values
    • the value is classifed as non-popular but density calculation is not accurate
    • could use dynamic sampling hint
      • /*+ dyanamic_sampling(table_name 2) */
      • will get more accurate cardinality
      • Good Hint (tells the optimizer how to better calculate information)
      • Bad Hint (tells the optimizer how to do this) 
Additional stats in 11g
  • Column group stats
    • useful when multiple column for the same table are used in WHERE clause
    • DBMS_STATS.GATHER_EXTENDED_STATS ()
    • 12c will do something like this automatically
    • DBMS_STATS.SEED_COL_USAGE() will monitor for x seconds to help identify candidates for column group stats
    • DBMS_STATS.REPORT_COL_USAGE() 
  • Expression Stats
    • useful when column is used as part of a complex expression in WHERE clause

  • Both are automatically maintained when stats are gathered on table
[Updated: 11:123am PST]

When to gather stats?

  • Oracle database automatically collects by default. 
  • After a large data load
  • if trickle loading into a partition table
    • use DBMS_STATS.COPY_TABLE_STATS
"A BIG plan against a small table, is usually okay.  Small plan against a big table is always a disaster."
- Tom Kyte

Stats Gathering Performance
  • Three Parallel options
    • Intra object using parallel execution
      • used with DEGREE parameter in GATHER_*_STATS
      • up to 2x + 1 for parallel server (reader & writer  + query coordinator)
    • inter object using concurrency
      • used with CONCURRENCY parameter in GATHER_*_STATS

    • combination of the two
  • Incremental stats gathering for partitioned tables
[Updated: 11:45am PST]

When NOT to gather Stats? 
  • when data volume changes dramatically over time
  • when is a good time to gather stats? 
    • when data has a representative data volumne
    • lock stats to to ensure stats gathering job does not overwrite 
  • When table seen as part of an ETL process
    • written once, read once, and truncated or deleted
    • Use dynamic sampling instead

Other Types of Stats
  • Dictionary Stats
    • for EM , quest tools, etc
  • Fixed Object Stats
    • used for objects that depend on v$sessions
    • collect stats after major changes, database upgrades, patches or refresh(?)
  • System Stats
    • "don't play around with them"
    • maybe tinker  them when using Oracle on Engineered systems
  • Automatic Dynamic Sampling
[Updated 11:58am PST]

2 types of execution Plans
  • Explain plan execution plan
    •  what MIGHT happen
  • Row Source Execution plan
    •  what actually DID happen
How to get execution plan: 
dbms_xplan.display for nicely formated explain plans
use display_cursor
Explain plans don't always tell the truth
  • AUTOTRACE TRUE
  • SQL Developer
What's a good plan for an Optimizer?
  • What is Cost? 
    • Represents units of work and resources
    • uses CPU and I/O as units of work  
    • Cost is really an estimate of CPU and I/O
  • What is performance? 
    • fastest possible response time to query
    • exploit what ever resources are available to make this happen fastest
    • Optimizer does not focus on resources necessary to execute plan
Cardinality is the most important calculation!

Cardinality
  • estimate of the number of rows that will be returned by each operation
  • check cardinatlity estimates
    • /*+ gather_plan_statistics */ 
    • select * from table(dbms_display_cursor(format=>'ALLSTATS LAST')
    • will show estiamted cardinality and actuall cardinality
  • check cardinality with SQL Monitor (Enterprise Manager)
    • will show estimated and actual rows
    • is row source plan
    • use /*+ monitor */
Look at SLIDE #25, for Solutions for incorrect cardinality estimates. 


LUNCH BREAK

[Updated: 12:31pm PST]


Comments