Friday, June 16, 2017

Global vs Local Indexes in Oracle

I recently came across a case where I needed a fundamental lesson in how partitioning really works in Oracle.  The part of my brain that tries to think as a computer scientist couldn't reconcile how a certain execution plan in our production Oracle instance was being returned.

We have a multi-terabyte table (20+ TB actually) that we recently re-partitioned with a range partition on a date/time (e.g., CREATED_DATE) but also with a sub-hash partition on a particular unique device identifier (e.g., DEVICE_ID).

When I asked Oracle's cost based optimizer (CBO) to return a plan it came back with an index that was built with the columns in the order with the device identifier preceding the creation date.   From an indexing perspective this makes sense.  I want to start with the specificity of a numeric ID first but then focus in on a particular range of data for that device (e.g., search for all data for device X but do so for only a certain range of data between dates Y and Z).

At the same time our re-partitioning strategy was driven by a desire to implement an Information Life-cycle Management (ILM) Architecture where we could query and move our older data from faster to slower disk storage.  This is why we chose to implement range partitioning first on the creation date thereby bucketizing data by year/month and then for added performance we created a sub-hash partition (i.e., additional buckets) on the device identifier.


At the end of the day my confusion was with how the CBO would pick an index that was not constructed in such a way that it aligned with our partitioning strategy.  Below is how in my mind I pictured our B-tree index being laid out with respect to leaf nodes and their traversal to retrieve data for the example I provided earlier (i.e., for Device X search for data between dates Y and Z).


Again from an indexing perspective this makes sense.  I'm starting with the most specific column using an equality in my predicate (i.e., DEVICE_ID = 103) and then performing a range scan for CREATED_DATE for a particular device identifier.   What didn't make sense to me was how Oracle reconciled this against my partitioning strategy where my first level of "buckets" were by CREATED_DATE on Year/Month.


At that point one of our contractors was able to connect the dots for me explaining the difference between Global and Local indexes.  Take first Global indexes.  They are built essentially irrespective of the underlying partitioning strategy.  They work well if you do not have the capability to include in your predicate or underlying index a qualifier that would allow you to take advantage of partition elimination (i.e., in our case CREATED_DATE).

As depicted below, each leaf node in the index can have references to any of the underlying table partitions.  This also means that especially on range scans (e.g., let's say I only knew DEVICE_ID) you might end up touching every partition to retrieve your result set.   Again can be slightly more efficient if one doesn't have a predicate that aligns with your partitioning strategy.


Now take the case of Local indexes.  In this case Oracle is a bit smarter.  It takes the index, breaks it down and manages it using the underlying partitions.  In my mind I pictured this much like the figure below where the prior index is split and managed independently.  In my example, all rows with January CREATED_DATEs would be separated from rows with February values.

I can also now look at the EXPLAIN PLAN output for my query and see that it is actually using partition elimination.  Notice in the case below it used a PARTITION RANGE ITERATOR because the predicate in my query included a CREATED_DATE that went across multiple partitions.

      
-----------------------------------------------
| Id  | Operation                             |
-----------------------------------------------
|   0 |  SELECT STATEMENT                     |
|   1 |   SORT AGGREGATE                      |
|*  2 |    FILTER                             |
|   3 |     PARTITION RANGE ITERATOR          |
|*  4 |      TABLE ACCESS BY LOCAL INDEX ROWID|
|*  5 |       INDEX RANGE SCAN                |
-----------------------------------------------    
 

When adjusting my predicate to specify a range within a particular partition the plan changed indicating that partition elimination further reduced my results to a single partition.

      
-----------------------------------------------
| Id  | Operation                             |
-----------------------------------------------
|   0 |  SELECT STATEMENT                     |
|   1 |   SORT AGGREGATE                      |
|*  2 |    FILTER                             |
|   3 |     PARTITION RANGE SINGLE            |
|*  4 |      TABLE ACCESS BY LOCAL INDEX ROWID|
|*  5 |       INDEX RANGE SCAN                |
-----------------------------------------------    
 

In a subsequent post I'll go into greater detail on simple points for reading the output of  EXPLAIN PLAN traces.   Going through this particular indexing exercise forced me to relearn what previously second nature to me.  Specifically how one can use output from EXPLAIN PLAN to further tune your queries.

Wednesday, April 26, 2017

Transitioning to ANSI Compliant SQL



For years my guys have been a proponent of ANSI compliant SQL.    Today I decided to take the plunge and force myself to abandon the only SQL I have ever known and that was firmly grounded in an Oracle world (some might say grounded in cement shoes).

I made this decision not only because it is something that works across platforms but also as I been reminded over and over again by those I manage that at the end ANSI compliant SQL is simpler to understand and maintain.

To help others that might follow in my footsteps, I wanted to document my learning process by posting here.  I thought I'd start with something very simple - illustrate how joins differ between a more traditional Oracle SQL query and a purely ANSI one.

Typical Inner Joins


Let's start with a very simple query that uses a traditional join.  In a Oracle world, the syntax for this uses the WHERE clause to join the two tables.

Take the example below where I want to join an employee's profile record to the agency that he or she belongs.  Both tables are referenced in my FROM clause and in my WHERE clause I include a hard join that allows me to bring back the name of the agency to which the employee belongs.

One might also consider the case where for some reason the agency referenced didn't exist or there wasn't an agency specified in the employee's profile record.  In this case the query would not bring back a row.  That can be solved with an outer join - but more in that in a minute.

Below is the query as I used to write it for Oracle:

   select first_name
        , last_name
        , agency_name
   from employee_profile ep   
      , agency a
   where ep.agency_id = a.agency_id

To convert this to a ANSI compliant query my FROM clause becomes much more prominent as I use that to perform my join.

   select first_name     
        , last_name     
        , agency_name
   from employee_profile ep
          inner join agency a
            on ep.agency_id = a.agency_id

In this particular case as illustrated above my WHERE clause disappears.   That was one of the first things that jumped out at me as I could immediately see how my query is simplified.

Outer Joins


It is pretty safe to say that outer joins are an area where Oracle has their own "unusual" syntax that they introduced to handle this type of join.  Oracle uses a plus sign surrounded by parenthesis (+) to indicate that side of the join is "optional" when attempting to link the two tables.

Let's build on the prior inner join example and if the selected employee's agency has been awarded a certification we want to be able to display the type of certificate.  In a strictly Oracle world, I'd add the certificate table to my FROM clause and then add an outer join to my WHERE clause to indicate I want to pull back the certificate name only if there is a certification specified for the agency.

   select first_name
        , last_name
        , agency_name
   from employee_profile ep   
      , agency a
      , certificate c
   where ep.agency_id = a.agency_id
    and a.certificate_id = c.certificate_id (+)

Now I want to convert this to use a left outer join on the certificate table to ensure I only pull back certificate information if one has been specified for the agency.  So in essence I want to pull back data regardless of whether there is a matching certificate or not.

   select first_name     
        , last_name     
        , agency_name
   from employee_profile ep
          inner join agency a
            on ep.agency_id = a.agency_id
          left outer join certificate c
            on a.certificate_id = c.certificate_id

Looking at the SQL as written above it again is striking how simple the resultant query becomes.  I still do not have a WHERE clause and have performed all of the appropriate joins in the FROM clause.  The WHERE clause at this point strictly becomes a way to add additional filter criteria.  That being said, one could also add additional filter criteria within the FROM clause when building your joins.

For example, let's say that we only want to show the certificate if it is one that is part of a federal compliance program.  We would simply add that filter criteria to the outer join we have already provided.  Any records matching the outer join on the certificate ID simply wouldn't be displayed unless they are marked for federal compliance (i.e., and c.federal_compliance = 'Y').

   select first_name     
        , last_name     
        , agency_name
   from employee_profile ep
          inner join agency a
            on ep.agency_id = a.agency_id
          left outer join certificate c
            on a.certificate_id = c.certificate_id
           and c.federal_compliance = 'Y'

Reviewing the resulting SQL it is again easy to see how simplified the query becomes when you make it ANSI compliant.  I still do not have a WHERE clause having performed all of the appropriate joining in the FROM clause to include my additional filter criteria with ensuring the certificate is part of a federal compliance program.

Cartesian Joins


I almost didn't include this example but it also illustrates a simplification.  Let's say I have two tables - one that tracks day in year (e.g., 1 - 365) and a table to track hours (i.e., 1-24).  If I want to combine these two tables to fall all permutations, previously I would have written a query as follows:

   select day_id, hour_id
   from days, hours

Now that query becomes:

   select day_id, hour_id
   from days 
   cross join hours

Although not much simpler to write it does read in a much more straightforward fashion.  In general that probably is the most significant thing I learned when converting over to using ANSI compliant SQL - that what you produce is much easier for others to read.   This is especially more true as the size of your query grows and the number of joins you are specifying becomes larger and larger.

So that is it for the time being.  

Saturday, April 22, 2017

A Long Time Away

It has been quite a while since my last post.  Although I had ever good intention of sitting down to write I haven't and quite a bit has happened during that time.  Mile high I've been continuing on a path to find out what success today equates to in my current role as an executive.

I'll be posting details on where my journey has taken me as of late including attending some executive education programs at UC Berkeley but before that I thought it would be worthwhile to share a couple of thoughts and observations I've had recently.
I have no clue what you are saying: So some of my engineers I'm realizing have a difficult time explaining complex topics to those that do not have an engineering or computer science background.  It's fascinating to watch as from their perspective what they are saying makes total sense to them but now that I'm observing more I am able to see the blank looks and stares on people's faces as my engineers dive into gory details that the other party has no hope of understanding.  These same dazed and confused individuals are also at the same time terrified to ask questions further compounding the problem.
It's all about solving problems: I came to the realization early on in my career that of my strengths was the ability to find innovative and creative solutions to technical problems.  The same part of me that never liked saying "no" also translated to software development and architecture where there was a place deep down that could not stand (actually detested) saying "no - it can't be done".
I'm finding that same mentality is what is driving me and making me successful in my new role.  As an example I have the same stubbornness when faced with a hard capital budgetary constraint that could kill a project.  It's my responsibility to find a way to succeed even when presented with something that to others would be a showstopping roadblock.   I've also found that usually there is a solution out there that although it might not be a path people want to follow it can be done in most cases.
Less detail in electronic communication: where as in prior days detailed e-mails helped describe more complex technical topics in a way that others could understand I'm finding now that less details is better.  Perhaps for now my blog will be my outlet for spewing details - I have to say I've found this consolidation of electronic communication undertaking extremely difficult - certainly another blog post once I have some learning behind me.
Employees and Work-life balance: this is another topic that has come up and continues to be something I deal with throughout my organization.  More on that in another post.   It's an interesting debate whether this is a timeless phenomenon or something that is more generational.   Right now I'm of the opinion that it's the former.   At the end of the day life is about choices - give and take.  I find those that are successful and I enjoy working with have found the right balance but at the same time do put an emphasis on their professions - the challenge is conveying this to those who may not have the same philosophy.
That's it for the time being.  My goal for the next twelve months is to pen at least once of these posts per month.   And who knows, perhaps some may even dive into a technical problem I have solved.