21 July, 2009

The difference between NOT IN and NOT EXISTS

There is a popular misconception that NOT IN and NOT EXISTS are two ways to filter out rows present in one table and not in another table, with both methods being usable interchangeably. Popular wisdom is that the difference between the two is only in terms of performance (being on the basis of whether the larger table is the "outer" or the "inner" in the query) but that the results are the same.

However, we must remember that such an operation being an Anti-Join fails when NULLs are involved. In Oracle a NULL cannot be compared to any other value, not even another NULL. Therefore, a NOT IN operation would fail if the result set being probed returns a NULL. In such a case, the results of a NOT IN query is 0 rows while a NOT EXISTS query would still show the rows present in the one table but not in the other table.

Here is a simple demonstration :

SQL> -- create the two test tables
SQL>
SQL> drop table results_table purge;

Table dropped.

SQL> drop table query_table purge;

Table dropped.

SQL>
SQL> create table results_table
2 as select owner,object_name,object_type from dba_objects where owner in ('HEMANT','DBSNMP','OUTLN');

Table created.

SQL> create table query_table
2 as select owner,object_name,object_type from dba_objects where owner in ('DBSNMP','OUTLN');

Table created.

SQL>
SQL> -- a NOT IN query
SQL> select r.owner, r.object_name from results_table r
2 where r.owner not in (select q.owner from query_table q)
3 order by 1,2;

OWNER OBJECT_NAME
------------------------------ ------------------------------
HEMANT DUPDB
HEMANT MY_T_A
HEMANT RESULTS_TABLE
HEMANT SOURCE_TABLE
HEMANT TEST_APPEND
HEMANT TRACE_USER_SESSIONS
HEMANT TRACE_USER_SESSIONS_BEGIN
HEMANT TRACE_USER_SESSIONS_END

8 rows selected.

SQL>
SQL> -- a NOT EXISTS query
SQL> select r.owner, r.object_name from results_table r
2 where not exists (select '1' from query_table q where r.owner=q.owner)
3 order by 1,2;

OWNER OBJECT_NAME
------------------------------ ------------------------------
HEMANT DUPDB
HEMANT MY_T_A
HEMANT RESULTS_TABLE
HEMANT SOURCE_TABLE
HEMANT TEST_APPEND
HEMANT TRACE_USER_SESSIONS
HEMANT TRACE_USER_SESSIONS_BEGIN
HEMANT TRACE_USER_SESSIONS_END

8 rows selected.

SQL>
SQL> ---
SQL> REM So far, NOT IN and NOT EXISTS have presented the same results
SQL>
SQL> REM What happens if there is a row with a NULL value ?
SQL>
SQL> insert into query_table values (NULL,'ABCDEFGH','TABLE');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> -- retry the NOT IN query
SQL> select r.owner, r.object_name from results_table r
2 where r.owner not in (select q.owner from query_table q)
3 order by 1,2;

no rows selected

SQL>
SQL> -- retry the NOT EXISTS query
SQL> select r.owner, r.object_name from results_table r
2 where not exists (select '1' from query_table q where r.owner=q.owner)
3 order by 1,2;

OWNER OBJECT_NAME
------------------------------ ------------------------------
HEMANT DUPDB
HEMANT MY_T_A
HEMANT RESULTS_TABLE
HEMANT SOURCE_TABLE
HEMANT TEST_APPEND
HEMANT TRACE_USER_SESSIONS
HEMANT TRACE_USER_SESSIONS_BEGIN
HEMANT TRACE_USER_SESSIONS_END

8 rows selected.

SQL>
SQL> ---
SQL> REM Surprise ?! The NOT IN returned 0 rows !
SQL> REM Why ? Because of the presence of a NULL in the query_table !
SQL> REM
SQL> REM REMEMBER : A "NOT IN" anti-join fails because a NULL returned cannot be compared !
SQL>
SQL> --
SQL> REM One "workaround" is to filter out rows which contain NULLs
SQL> REM .... but think carefully before you do so. Are you sure you want to exclude them ?
SQL>
SQL> REM In the ideal world, such columns should be defined as NOT NULL columns !
SQL> REM That would be the right schema design !
SQL>
SQL> -- test the suggested workaround
SQL> select r.owner, r.object_name from results_table r
2 where r.owner not in (select q.owner from query_table q WHERE OWNER IS NOT NULL)
3 order by 1,2;

OWNER OBJECT_NAME
------------------------------ ------------------------------
HEMANT DUPDB
HEMANT MY_T_A
HEMANT RESULTS_TABLE
HEMANT SOURCE_TABLE
HEMANT TEST_APPEND
HEMANT TRACE_USER_SESSIONS
HEMANT TRACE_USER_SESSIONS_BEGIN
HEMANT TRACE_USER_SESSIONS_END

8 rows selected.

SQL>


You can see that if there is a single row with a NULL value on the query predicate (the OWNER column in QUERY_TABLE), the NOT IN query actually fails.

Tom Kyte has explained this in the section on Anti-Joins in his "Effective Oracle By Design" book. However, I prefer the much better explanation by Roger Schrag
.
.
.

9 comments:

Brian Tkatch said...

Excellent Back to Basics posts.

Thanx Hemant!

Randolf said...

Hemant,

I know I had once from a discussion an excellent explanation for the NOT IN behaviour regarding NULLs purely based on relational theory, but I don't remember it right now (which is a shame).

Another (more technical) way to explain the NULL treatment of NOT IN:

NOT IN can be re-written as the "!= ALL()" SQL operator.

I like this because it makes very clear that in order to make the operation true, ALL values (a logical AND of all values encountered) need to pass the unequal operation and obviously a NULL value doesn't fulfill this and therefor makes the expression false.

Whereas IN (or EXISTS in this case) can be rewritten as the "= ANY" SQL operator.

Again this makes obvious that it's sufficient to have only a single value pass the equal operator (and logical OR of the values encountered), which means that a NULL value doesn't make the whole expression false if any of the other values pass the test.

By the way, I think in pre-11g versions a NOT IN with a nullable column referenced in either the outer or the nested subquery actually prevents an ANTI-JOIN operation. That's why 11g added the "Null-aware" ANTI-JOIN operation. See e.g. Greg Rahn's blog note about this.

Regards,
Randolf

Hemant K Chitale said...

Yes, "IN" and "EXISTS" operations are similar but "NOT IN" and "NOT EXISTS" are not necessarily interchangeable because a NULL can invalidate a comparison.

I'll read Greg Rahn's note.

Anonymous said...

I came straight from Tom's explanation and this is MUCH clearer.
thanks,
Paul

Anonymous said...

Hi Hemant,
I just want to get information on patching types.I googled it but not found clear information.Could you please write blog on Patching that elaborate.

Hemant K Chitale said...

Anonymous,
You could access Oracle Support notes like
(1) Frequently Asked Questions (FAQ): Patching Oracle Database Server [ID 1446582.1]

(2) Oracle Database Server Patchset Information, Versions: 8.1.7 to 11.2.0 [ID 268895.1]

(3) Roadmap of Oracle Database Patchset Releases [ID 1360790.1]

(4) How to confirm that a Critical Patch Update (CPU) has been installed in Linux / UNIX [ID 821263.1]


Hemant

Anonymous said...

Awesome job hemant.

Anonymous said...

Good explanation hemant...it really helpful.

Unknown said...

SQL – IN, NOT IN Operatörü ve Kullanımı – Ders 14
Operatörler ve SQL’deki kullanımları ile ilgili anlatımlara IN, NOT IN operatörlerini anlatarak devam edelim. Efektif bir şekilde kullandığımızda işimize yarayacak bu operatörlerin syntaxı ve ne gibi problemlerde kullanılacağını yazıda detaylı olarak anlatmaya çalıştım.
https://www.dogushan.com/sql-in-not-in-operatoru-ve-kullanimi-ders-14/