In-Memory expressions in Oracle

Starting with Oracle version 12.2, the In-Memory column store can materialize expressions that resolve to a value. This includes deterministic PL/SQL functions. In-memory virtual expressions use the same underlying mechanism as in-memory virtual columns, adding some automation to the process. Virtual expressions are stored as a hidden virtual column. A notable addition is Expression Statistics Store, which tracks frequently evaluated expressions. It is exposed through {all,dba,user}_expression_statistics view.

Mind the data types

As of writing of this, not all the data types are supported. For example, NUMBERs are fine, but as we discovered the hard way, TIMESTAMPs are not. While virtual columns for expressions returning TIMESTAMPs are created, expressions are not loaded into the In-Memory Expression Units. On a more positive note, there are rumours about a backport to 19c being in the works, and there’s a documentation bug as well.

A little experiment

The rest of the blog post uses setup from the in-memory virtual columns blog. For in-memory virtual expressions to work, the parameter INMEMORY_EXPRESSIONS_USAGE must be set to ENABLE:

INMTEST@localhost/orclpdb1 > show parameter INMEMORY_EXPRESSIONS_USAGE

NAME                                 TYPE        VALUE                                                                                 
------------------------------------ ----------- ------------------------------                                                        
inmemory_expressions_usage           string      ENABLE

Let’s generate some frequently used expressions. My expectation here is that slow_round(amount, 2) should end up in the Expression Statistics Store, and after repopulation, the anonymous PL/SQL block becomes much faster.

INMTEST@localhost/orclpdb1 28-04-2023 07:31:19> DECLARE
  2      l_ts     TIMESTAMP;
  3      l_amount NUMBER;
  4      l_day TIMESTAMP;
  5  BEGIN
  6      FOR i IN 0..200 LOOP
  7          l_day := trunc(systimestamp - dbms_random.value(0, 366));
  8          SELECT
  9              trunc(created_ts),
 10              SUM(slow_round(amount, 2))
 11          INTO l_ts, l_amount
 12          FROM
 13              transactions
 14          WHERE
 15              trunc(created_ts) =  l_day
 16          GROUP BY
 17              trunc(created_ts);
 18
 19      END LOOP;
 20  END;
 21  /

PL/SQL procedure successfully completed.

Elapsed: 00:42:30.03

This indeed generated some expressions:

INMTEST@localhost/orclpdb1 28-04-2023 08:13:49> SELECT
  2      expression_text,
  3      evaluation_count,
  4      fixed_cost,
  5      dynamic_cost,
  6      created,
  7      last_modified
  8  FROM
  9      dba_expression_statistics
 10  WHERE
 11      table_name = 'TRANSACTIONS';

EXPRESSION_TEXT                          EVALUATION_COUNT FIXED_COST DYNAMIC_COST CREATED             LAST_MODIFIED                    
---------------------------------------- ---------------- ---------- ------------ ------------------- -------------------              
TRUNC("CREATED_TS")                              13726895 .000055108            0 28-04-2023 07:31:19 28-04-2023 08:13:49              
"CREATED_TS"                                     13726895 5.5108E-08            0 28-04-2023 07:30:38 28-04-2023 08:13:49              
"AMOUNT"                                         13726895 5.5108E-08            0 28-04-2023 07:30:38 28-04-2023 08:13:49              
"INMTEST"."SLOW_ROUND"("AMOUNT",2)               13726895 .000165325            0 28-04-2023 07:31:19 28-04-2023 08:13:49              
TRUNC("CREATED_TS")                              13726895 .000041331            0 28-04-2023 07:31:19 28-04-2023 08:13:49 

Apart from our "SLOW_ROUND"("AMOUNT",2), it has captured the relatively useless TIMESTAMP returning expressions as well. Now we can add the virtual columns with DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS. Regardless of what the procedure name suggests, it queries the Expression Statistics Store, adds the 20 hottest expressions as virtual columns and marks them as INMEMORY. Twenty expressions is a database-wide limit; there does not seem to be a way to embiggen it.

In-Memory expression units are populated as their parent IMCUs are populated. To force the population, I will later call DBMS_INMEMORY_ADMIN.POPULATE_WAIT.

SYS@ORCLCDB > exec DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS('CURRENT');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
INMTEST@localhost/orclpdb1 28-04-2023 08:13:49> declare
  2      l_out NUMBER;
  3  begin
  4      l_out := DBMS_INMEMORY_ADMIN.POPULATE_WAIT('NONE', force => true);
  5      dbms_output.put_line('populate_wait -> ' || l_out);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:19:30.98

Check if virtual columns/expressions have been created:

INMTEST@localhost/orclpdb1 28-04-2023 08:33:20> SELECT
  2      column_name,
  3      sql_expression
  4  FROM
  5      dba_im_expressions
  6  WHERE
  7      table_name = 'TRANSACTIONS';

COLUMN_NAME                    SQL_EXPRESSION                                                                                          
------------------------------ ----------------------------------------                                                                
SYS_IME0001000002843DB0        TRUNC("CREATED_TS")                                                                                     
SYS_IME0001000002843DB1        "INMTEST"."SLOW_ROUND"("AMOUNT",2)         

And now anonymous block is much faster:

INMTEST@localhost/orclpdb1 28-04-2023 08:33:20> DECLARE
  2      l_ts     TIMESTAMP;
  3      l_amount NUMBER;
  4      l_day TIMESTAMP;
  5  BEGIN
  6      FOR i IN 0..200 LOOP
  7          l_day := trunc(systimestamp - dbms_random.value(0, 366));
  8          SELECT
  9              trunc(created_ts),
 10              SUM(slow_round(amount, 2))
 11          INTO l_ts, l_amount
 12          FROM
 13              transactions
 14          WHERE
 15              trunc(created_ts) =  l_day
 16          GROUP BY
 17              trunc(created_ts);
 18
 19      END LOOP;
 20  END;
 21  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.20

Cleaning up the virtual expressions

When DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS discovers virtual expressions are no longer part of the “20 hottest expressions” list, and it marks those virtual columns as NO INMEMORY. So it is up to the DBA to clean up these virtual columns. There are three ways to do that:

  • Call DBMS_INMEMORY.IME_DROP_EXPRESSIONS. This procedure drops the specified virtual expression or all expressions on the table if the name of the virtual column is not specified.
  •  Call DBMS_INMEMORY_ADMIN.DROP_ALL_EXPRESSIONSThis removes all virtual expressions in the database.
  •  Drop the virtual columns manually. This seems to do the same thing as DBMS_INMEMORY.IME_DROP_EXPRESSIONS: removes IMEUs and drops the column.

Oracle In-Memory for virtual columns

If you want to materialize the results of a computation in Oracle, function-based indexes and materialized views are the standard answer. However, in-Memory provides the third option: in-memory virtual columns and expressions. While virtual expressions somewhat supersede in-memory virtual columns, they provide a relatively simple introduction to how virtual expression infrastructure works.

As per documentation, every IMCU has an associated In-memory Expression Unit (IMEU). IMEUs are populated with IMCUs; for a virtual column or expression to be loaded to in-memory, the table must be (re)populated.

Setup

Let’s have a look at how this thing works. First, some test data:

INMTEST@localhost/orclpdb1> CREATE TABLE transactions (
  2      id                 INTEGER PRIMARY KEY,
  3      customer_id        INTEGER NOT NULL,
  4      account_id         INTEGER NOT NULL,
  5      transaction_type   NUMBER(2) NOT NULL,
  6      external_reference VARCHAR2(256),
  7      amount             NUMBER(36, 2) NOT NULL,
  8      created_ts         TIMESTAMP NOT NULL,
  9      system_ts          TIMESTAMP
 10  )
 11      PARTITION BY RANGE (
 12          created_ts
 13      ) INTERVAL ( numtodsinterval(1, 'DAY') ) ( PARTITION p_first
 14          VALUES LESS THAN ( TIMESTAMP '2020-03-19 00:00:00.00' )
 15      );

Table created.

Elapsed: 00:00:00.03
INMTEST@localhost/orclpdb1> INSERT /*+ parallel enable_parallel_dml APPEND */ INTO transactions
  2      WITH gen AS (
  3          SELECT
  4              level id
  5          FROM
  6              dual
  7          CONNECT BY
  8              level < 5000
  9      )
 10      SELECT
 11          ROWNUM,
 12          mod(1 + g1.id * g2.id, 36000),
 13          mod(1 + g1.id * g2.id, 36000),
 14          mod(g1.id, 8),
 15          sys_guid(),
 16          mod(1 + g1.id * g2.id, 36000) + round(g1.id /(1 + g2.id), 4),
 17          systimestamp AT TIME ZONE 'CET' - dbms_random.value(0, 366),
 18          systimestamp
 19      FROM
 20               gen g1
 21          CROSS JOIN gen g2;

24990001 rows created.

Elapsed: 00:11:27.71
INMTEST@localhost/orclpdb1 03-04-2023 09:50:46> commit;

Commit complete.

Elapsed: 00:00:00.38

Virtual column disabled for in-memory

To test if in-memory virtual columns have any effect, I’m using a relatively slow, deterministic function, and I will create a virtual column based on that function. My expectation is that once the virtual column is loaded to the in-memory, queries will become faster since function calls are avoided. The second thing that should happen is once the virtual column is enabled for in-memory, repopulating the in-memory area will take longer.

INMTEST@localhost/orclpdb1 03-04-2023 09:39:18> CREATE OR REPLACE FUNCTION slow_round (
  2              n NUMBER, m NUMBER
  3  ) RETURN NUMBER
  4      AUTHID DEFINER
  5      DETERMINISTIC
  6      PARALLEL_ENABLE
  7  IS
  8      PRAGMA udf;
  9      l_dummy PLS_INTEGER;
 10  BEGIN
 11      for i in 0 .. 400 loop
 12          l_dummy := mod(i, 255);
 13      end loop;
 14      RETURN round(n, m);
 15  END slow_round;
 16  /

Function created.

Elapsed: 00:00:00.01
INMTEST@localhost/orclpdb1 03-04-2023 09:50:46> alter table transactions add amount_2      AS (slow_round(amount, 2));

Table altered.

Elapsed: 00:00:00.02
INMTEST@localhost/orclpdb1 03-04-2023 09:50:46> alter table transactions inmemory;

Table altered.

Elapsed: 00:00:00.22
INMTEST@localhost/orclpdb1 03-04-2023 09:50:46> alter table transactions no inmemory (amount_2);

Table altered.

Elapsed: 00:00:00.01

My test environment has only on table marked for in-memory, so I can just reload all the segments:

INMTEST@localhost/orclpdb1 03-04-2023 09:50:47> declare
  2      l_out NUMBER;
  3  begin
  4      l_out := DBMS_INMEMORY_ADMIN.POPULATE_WAIT('NONE', force => true);
  5      dbms_output.put_line('populate_wait -> ' || l_out);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:56.53

A little check if everything was populated:

INMTEST@localhost/orclpdb1 03-04-2023 09:51:43> SELECT
  2      populate_status,
  3      COUNT(*)
  4  FROM
  5      gv$im_user_segments
  6  WHERE
  7      segment_name = 'TRANSACTIONS'
  8  GROUP BY
  9      populate_status;

POPULATE_STAT   COUNT(*)
------------- ----------
COMPLETED            367

Elapsed: 00:00:00.01
INMTEST@localhost/orclpdb1 03-04-2023 09:51:43>
INMTEST@localhost/orclpdb1 03-04-2023 09:51:43> SELECT
  2      SUM(inmemory_size),
  3      SUM(bytes),
  4      SUM(bytes_not_populated)
  5  FROM
  6      gv$im_user_segments
  7  WHERE
  8      segment_name = 'TRANSACTIONS'
  9  ;

SUM(INMEMORY_SIZE) SUM(BYTES) SUM(BYTES_NOT_POPULATED)
------------------ ---------- ------------------------
        2783707136 3078619136                        0

Elapsed: 00:00:00.01
INMTEST@localhost/orclpdb1 03-04-2023 09:51:43> SELECT
  2      column_name,
  3      obj_num,
  4      segment_column_id,
  5      inmemory_compression
  6  FROM
  7      v$im_column_level
  8  WHERE
  9      table_name = 'TRANSACTIONS';

COLUMN_NAME                                                                                                                          OBJ_NUM SEGMENT_COLUMN_ID INMEMORY_COMPRESSION
--------------------------------------------------------------------------------------------------------------------------------- ---------- ----------------- --------------------------
AMOUNT_2                                                                                                                              100347                0 NO INMEMORY
<snip>

Query to check how slow is our slow rounding function:

INMTEST@localhost/orclpdb1 03-04-2023 09:51:43> set autotrace on
INMTEST@localhost/orclpdb1 03-04-2023 09:51:43> SELECT
  2      SUM(amount_2) amount
  3  FROM
  4      transactions
  5  WHERE
  6      trunc(created_ts) = trunc(systimestamp - 128)
  7  ;

    AMOUNT
----------
1219916975

Elapsed: 00:00:19.07

Execution Plan
----------------------------------------------------------
Plan hash value: 4195887397

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |    24 |  7331  (58)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |              |     1 |    24 |            |          |       |       |
|   2 |   PARTITION RANGE ALL        |              |   249K|  5857K|  7331  (58)| 00:00:01 |     1 |1048575|
|*  3 |    TABLE ACCESS INMEMORY FULL| TRANSACTIONS |   249K|  5857K|  7331  (58)| 00:00:01 |     1 |1048575|
-------------------------------------------------------------------------------------------------------------

Virtual column enabled for in-memory

Next I will add column AMOUNT_2 to the in-memory and repopulate the segments.

INMTEST@localhost/orclpdb1 03-04-2023 09:52:21> alter table transactions inmemory (amount_2);

Table altered.

Elapsed: 00:00:00.01
INMTEST@localhost/orclpdb1 03-04-2023 09:52:21>
INMTEST@localhost/orclpdb1 03-04-2023 09:52:21> set serverout off
INMTEST@localhost/orclpdb1 03-04-2023 09:52:21> declare
  2      l_out NUMBER;
  3  begin
  4      l_out := DBMS_INMEMORY_ADMIN.POPULATE_WAIT('NONE', force => true);
  5      dbms_output.put_line('populate_wait -> ' || l_out);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:39:09.75
INMTEST@localhost/orclpdb1 03-04-2023 10:31:31> SELECT
  2      populate_status,
  3      COUNT(*)
  4  FROM
  5      gv$im_user_segments
  6  WHERE
  7      segment_name = 'TRANSACTIONS'
  8  GROUP BY
  9      populate_status;

POPULATE_STAT   COUNT(*)
------------- ----------
COMPLETED            365
STARTED                2

Elapsed: 00:00:00.01

Reloading the segments took whopping 39 minutes, and somehow POPULATE_WAIT gave up before all segments got repopulated. But for our test case this should be ok:

INMTEST@localhost/orclpdb1 03-04-2023 10:31:31> SELECT
  2      SUM(amount_2) amount
  3  FROM
  4      transactions
  5  WHERE
  6      trunc(created_ts) = trunc(systimestamp - 128)
  7  ;

    AMOUNT
----------
1219916975

Elapsed: 00:00:06.85

Execution Plan
----------------------------------------------------------
Plan hash value: 4195887397

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |    24 |  7331  (58)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE              |              |     1 |    24 |            |          |       |       |
|   2 |   PARTITION RANGE ALL        |              |   249K|  5857K|  7331  (58)| 00:00:01 |     1 |1048575|
|*  3 |    TABLE ACCESS INMEMORY FULL| TRANSACTIONS |   249K|  5857K|  7331  (58)| 00:00:01 |     1 |1048575|
-------------------------------------------------------------------------------------------------------------

Amazingly it works; with the in-memory virtual column, our query took around 7 seconds vs 19 seconds without.