Birth of the Oracle session: the network view

Twitter is a constant source of inspiration. Today there was a thread about database connections and network latencies involved. Since I did some testing some time ago for a talk, I could elaborate. Following is the simplest case of how the Oracle database session is created. There are no SCAN listeners, TLS, or external authentication involved. If my counting is correct, it takes 16 messages to set up an Oracle database session (Counting roundtrips is somewhat tedious with TCP, since ACK can come as a separate message or as a part of the next.)


The high-level view is relatively straightforward:


First, the TCP session is established, and then the client sends a connection string to the listener. If the connection string is fine, the listener will create the foreground process, and the client will keep talking to the foreground process.

Let’s check how this looks as a packet dump. Following is tcpdump output; timestamps are with microsecond precision, difference is from the previous message. In some cases, I did cut out the payload for brevity. Server is 21c XE running in docker, client is 19c JDBC and UCP.

TCP connection setup

Nothing special here, other than all the latencies are in low microseconds

00:00:00.000000 IP (tos 0x0, ttl 64, id 37057, offset 0, flags [DF], proto TCP (6), length 60)
192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [S], cksum 0x8194 (incorrect -> 0xb9bd), seq 1118603497, win 65495, options [mss 6
5495,sackOK,TS val 3789784191 ecr 0,nop,wscale 7], length 0

 00:00:00.000030 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.0.20.ncube-lm > 192.168.0.1.56054: Flags [S.], cksum 0x8194 (incorrect -> 0x1f97), seq 4178598298, ack 1118603498, win 6516
0, options [mss 1460,sackOK,TS val 1418194517 ecr 3789784191,nop,wscale 7], length 0

 00:00:00.000021 IP (tos 0x0, ttl 64, id 37058, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [.], cksum 0x818c (incorrect -> 0x4aec), seq 1, ack 1, win 512, options [nop,nop,T
S val 3789784191 ecr 1418194517], length 0


After sending ACK ( [.] TCP flag) client sends connect string to the listener. The listener sends back ACK, and then 18ms later client receives another packet. Considering the latency that could be the foreground process talking:

00:00:00.014951 IP (tos 0x0, ttl 64, id 37059, offset 0, flags [DF], proto TCP (6), length 329)
192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x82a1 (incorrect -> 0x5c65), seq 1:278, ack 1, win 512, options [nop,
nop,TS val 3789784206 ecr 1418194517], length 277
0x0000: 4500 0149 90c3 4000 4006 2786 c0a8 0001 E..I..@.@.'…..
0x0010: c0a8 0014 daf6 05f1 42ac 88ea f910 599b ……..B…..Y.
0x0020: 8018 0200 82a1 0000 0101 080a e1e3 848e …………….
0x0030: 5487 ee55 0115 0000 0100 0000 013e 012c T..U………>.,
0x0040: 0c41 2000 ffff 4f98 0000 0001 00cb 004a .A….O……..J
0x0050: 0000 0000 8181 0000 0000 0000 0000 0000 …………….
0x0060: 0000 0000 0000 0000 0000 0000 0000 0000 …………….
0x0070: 2000 0020 0000 0000 0000 0000 0001 2844 …………..(D
0x0080: 4553 4352 4950 5449 4f4e 3d28 4144 4452 ESCRIPTION=(ADDR
0x0090: 4553 533d 2850 524f 544f 434f 4c3d 5443 ESS=(PROTOCOL=TC
0x00a0: 5029 2848 4f53 543d 3132 372e 302e 302e P)(HOST=127.0.0.
0x00b0: 3129 2850 4f52 543d 3135 3231 2929 2843 1)(PORT=1521))(C
0x00c0: 4f4e 4e45 4354 5f44 4154 413d 2843 4944 ONNECT_DATA=(CID
0x00d0: 3d28 5052 4f47 5241 4d3d 4a44 4243 2054 =(PROGRAM=JDBC.T
0x00e0: 6869 6e20 436c 6965 6e74 2928 484f 5354 hin.Client)(HOST
0x00f0: 3d5f 5f6a 6462 635f 5f29 2855 5345 523d =<strong>jdbc</strong>)(USER=
0x0100: 7072 6970 6969 2929 2853 4552 5649 4345 pripii))(SERVICE
0x0110: 5f4e 414d 453d 6661 6e5f 7465 7374 2928 _NAME=fan_test)(
0x0120: 434f 4e4e 4543 5449 4f4e 5f49 443d 736f CONNECTION_ID=so
0x0130: 7a37 6e6b 414c 5269 7942 5a37 6e6a 3369 z7nkALRiyBZ7nj3i
0x0140: 5944 3741 3d3d 2929 29 YD7A==)))

 00:00:00.000040 IP (tos 0x0, ttl 64, id 43797, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.20.ncube-lm > 192.168.0.1.56054: Flags [.], cksum 0x818c (incorrect -> 0x49be), seq 1, ack 278, win 507, options [nop,nop,TS val 1418194532 ecr 3789784206], length 0


 00:00:00.018069 IP (tos 0x0, ttl 64, id 43798, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.0.20.ncube-lm > 192.168.0.1.56054: Flags [P.], cksum 0x8194 (incorrect -> 0x3e94), seq 1:9, ack 278, win 507, options [nop,nop,TS val 1418194550 ecr 3789784206], length 8



Client acknowledges the packet and sends connect string again:

00:00:00.000039 IP (tos 0x0, ttl 64, id 37060, offset 0, flags [DF], proto TCP (6), length 52)
192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [.], cksum 0x818c (incorrect -> 0x498d), seq 278, ack 9, win 512, options [nop,nop,TS val 3789784224 ecr 1418194550], length 0

 00:00:00.001076 IP (tos 0x0, ttl 64, id 37061, offset 0, flags [DF], proto TCP (6), length 329)
    192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x82a1 (incorrect -> 0x5b14), seq 278:555, ack 9, win 512, options [nop,nop,TS val 3789784225 ecr 1418194550], length 277
        0x0000:  4500 0149 90c5 4000 4006 2784 c0a8 0001  E..I..@.@.'.....
        0x0010:  c0a8 0014 daf6 05f1 42ac 89ff f910 59a3  ........B.....Y.
        0x0020:  8018 0200 82a1 0000 0101 080a e1e3 84a1  ................
        0x0030:  5487 ee76 0115 0000 0100 0000 013e 012c  T..v.........>.,
        0x0040:  0c41 2000 ffff 4f98 0000 0001 00cb 004a  .A....O........J
        0x0050:  0000 0000 8181 0000 0000 0000 0000 0000  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0070:  2000 0020 0000 0000 0000 0000 0001 2844  ..............(D
        0x0080:  4553 4352 4950 5449 4f4e 3d28 4144 4452  ESCRIPTION=(ADDR
        0x0090:  4553 533d 2850 524f 544f 434f 4c3d 5443  ESS=(PROTOCOL=TC
        0x00a0:  5029 2848 4f53 543d 3132 372e 302e 302e  P)(HOST=127.0.0.
        0x00b0:  3129 2850 4f52 543d 3135 3231 2929 2843  1)(PORT=1521))(C
        0x00c0:  4f4e 4e45 4354 5f44 4154 413d 2843 4944  ONNECT_DATA=(CID
        0x00d0:  3d28 5052 4f47 5241 4d3d 4a44 4243 2054  =(PROGRAM=JDBC.T
        0x00e0:  6869 6e20 436c 6965 6e74 2928 484f 5354  hin.Client)(HOST
        0x00f0:  3d5f 5f6a 6462 635f 5f29 2855 5345 523d  =__jdbc__)(USER=
        0x0100:  7072 6970 6969 2929 2853 4552 5649 4345  pripii))(SERVICE
        0x0110:  5f4e 414d 453d 6661 6e5f 7465 7374 2928  _NAME=fan_test)(
        0x0120:  434f 4e4e 4543 5449 4f4e 5f49 443d 736f  CONNECTION_ID=so
        0x0130:  7a37 6e6b 414c 5269 7942 5a37 6e6a 3369  z7nkALRiyBZ7nj3i
        0x0140:  5944 3741 3d3d 2929 29                   YD7A==)))

Then follows the back-and-forth part, which I’m going to skip since, without decoding Oracle Net protocol, it’s not that interesting. All in all, this takes 9 roundtrips, the client sends 5 messages, and the server 4. In terms of latency, this part is relatively fast.

Authentication

The interesting thing for me is that client sends password hash, client session parameters and haiku in the same packet. Another thing to note, the server response contains ONS subscription pattern. The server is XE running in docker; there is no Clusterware around. Could it be because of FAN inbound messages?

00:00:00.053595 IP (tos 0x0, ttl 64, id 37072, offset 0, flags [DF], proto TCP (6), length 214)
192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x822e (incorrect -> 0xde9f), seq 3490:3652, ack 3126, win 512, option
s [nop,nop,TS val 3789784323 ecr 1418194595], length 162
0x0000: 4500 00d6 90d0 4000 4006 27ec c0a8 0001 E…..@.@.'…..
0x0010: c0a8 0014 daf6 05f1 42ac 968b f910 65d0 ……..B…..e.
0x0020: 8018 0200 822e 0000 0101 080a e1e3 8503 …………….
0x0030: 5487 eea3 0000 00a2 0600 0000 0000 0376 T…………..v
0x0040: 0101 0105 0101 0101 0501 0154 4553 5430 ………..TEST0
0x0050: 010d 0d41 5554 485f 5445 524d 494e 414c …AUTH_TERMINAL
0x0060: 0107 0775 6e6b 6e6f 776e 0001 0f0f 4155 …unknown….AU
0x0070: 5448 5f50 524f 4752 414d 5f4e 4d01 1010 TH_PROGRAM_NM…
0x0080: 4a44 4243 2054 6869 6e20 436c 6965 6e74 JDBC.Thin.Client
0x0090: 0001 0c0c 4155 5448 5f4d 4143 4849 4e45 ….AUTH_MACHINE
0x00a0: 010a 0a6a 6462 6363 6c69 656e 7400 0108 …jdbcclient…
0x00b0: 0841 5554 485f 5049 4401 0404 3132 3334 .AUTH_PID…1234
0x00c0: 0001 0808 4155 5448 5f53 4944 0106 0670 ….AUTH_SID…p
0x00d0: 7269 7069 6900 ripii.

 00:00:00.010897 IP (tos 0x0, ttl 64, id 43809, offset 0, flags [DF], proto TCP (6), length 425)
    192.168.0.20.ncube-lm > 192.168.0.1.56054: Flags [P.], cksum 0x8301 (incorrect -> 0x03e3), seq 3126:3499, ack 3652, win 501, option
s [nop,nop,TS val 1418194660 ecr 3789784323], length 373
        0x0000:  4500 01a9 ab21 4000 4006 0cc8 c0a8 0014  E....!@.@.......
        0x0010:  c0a8 0001 05f1 daf6 f910 65d0 42ac 972d  ..........e.B..-
        0x0020:  8018 01f5 8301 0000 0101 080a 5487 eee4  ............T...
        0x0030:  e1e3 8503 0000 0175 0600 0000 0000 0801  .......u........
        0x0040:  0601 0c0c 4155 5448 5f53 4553 534b 4559  ....AUTH_SESSKEY
        0x0050:  0140 4042 4545 4535 3430 4644 3932 3044  .@@BEEE540FD920D
        0x0060:  4444 4641 3942 4344 3745 4332 3738 4141  DDFA9BCD7EC278AA
        0x0070:  3743 4331 3736 4130 3336 3535 3641 3639  7CC176A036556A69
        0x0080:  4130 4144 3739 4239 3038 3745 3641 3137  A0AD79B9087E6A17
        0x0090:  3639 4300 010d 0d41 5554 485f 5646 525f  69C....AUTH_VFR_
        0x00a0:  4441 5441 0120 2046 3737 4639 4336 4346  DATA...F77F9C6CF
        0x00b0:  3742 4135 3538 3133 3134 3632 4439 3744  7BA558131462D97D
        0x00c0:  3434 4334 3442 3502 4815 0114 1441 5554  44C44B5.H....AUT
        0x00d0:  485f 5042 4b44 4632 5f43 534b 5f53 414c  H_PBKDF2_CSK_SAL
        0x00e0:  5401 2020 3930 3431 4134 4634 4546 4331  T...9041A4F4EFC1
        0x00f0:  4336 4441 3331 4433 3739 3345 4530 3638  C6DA31D3793EE068
        0x0100:  3042 3839 0001 1616 4155 5448 5f50 424b  0B89....AUTH_PBK
        0x0110:  4446 325f 5647 454e 5f43 4f55 4e54 0104  DF2_VGEN_COUNT..
        0x0120:  0434 3039 3600 0116 1641 5554 485f 5042  .4096....AUTH_PB
        0x0130:  4b44 4632 5f53 4445 525f 434f 554e 5401  KDF2_SDER_COUNT.
        0x0140:  0101 3300 011a 1a41 5554 485f 474c 4f42  ..3....AUTH_GLOB
        0x0150:  414c 4c59 5f55 4e49 5155 455f 4442 4944  ALLY_UNIQUE_DBID
        0x0160:  0001 2020 3442 4235 4345 4634 4431 3136  ....4BB5CEF4D116
        0x0170:  3032 3239 3037 3241 3343 4636 3746 4139  0229072A3CF67FA9
        0x0180:  3243 3144 0004 0101 0208 8400 0000 0000  2C1D............
        0x0190:  0000 0000 0000 0000 0000 0000 0000 0100  ................
        0x01a0:  0000 0000 0000 0000 00                   .........

 00:00:00.000041 IP (tos 0x0, ttl 64, id 37073, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [.], cksum 0x818c (incorrect -> 0x2de3), seq 3652, ack 3499, win 510, options [nop,nop,TS val 3789784334 ecr 1418194660], length 0
        0x0000:  4500 0034 90d1 4000 4006 288d c0a8 0001  E..4..@.@.(.....
        0x0010:  c0a8 0014 daf6 05f1 42ac 972d f910 6745  ........B..-..gE
        0x0020:  8010 01fe 818c 0000 0101 080a e1e3 850e  ................
        0x0030:  5487 eee4                                T...

 00:00:00.113919 IP (tos 0x0, ttl 64, id 37074, offset 0, flags [DF], proto TCP (6), length 1192)
    192.168.0.1.56054 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x8600 (incorrect -> 0x7672), seq 3652:4792, ack 3499, win 512, options [nop,nop,TS val 3789784447 ecr 1418194660], length 1140
        0x0000:  4500 04a8 90d2 4000 4006 2418 c0a8 0001  E.....@.@.$.....
        0x0010:  c0a8 0014 daf6 05f1 42ac 972d f910 6745  ........B..-..gE
        0x0020:  8018 0200 8600 0000 0101 080a e1e3 857f  ................
        0x0030:  5487 eee4 0000 0474 0600 0000 0000 0373  T......t.......s
        0x0040:  0201 0105 0201 0101 010e 0101 5445 5354  ............TEST
        0x0050:  3001 0d0d 4155 5448 5f50 4153 5357 4f52  0...AUTH_PASSWOR
        0x0060:  4401 4040 4639 3138 3242 3237 3141 3238  D.@@F9182B271A28
        0x0070:  4142 3343 4231 3844 4439 3843 4230 3636  AB3CB18DD98CB066
        0x0080:  4641 4646 3341 4544 3941 3031 3432 4637  FAFF3AED9A0142F7
        0x0090:  3332 4332 3739 3633 3032 3444 3543 3131  32C27963024D5C11
        0x00a0:  3539 4138 0001 1616 4155 5448 5f50 424b  59A8....AUTH_PBK
        0x00b0:  4446 325f 5350 4545 4459 5f4b 4559 01a0  DF2_SPEEDY_KEY..
        0x00c0:  a041 3045 3042 3145 3638 4530 4337 3842  .A0E0B1E68E0C78B
        0x00d0:  3437 4231 4141 4235 3032 4443 4230 3030  47B1AAB502DCB000
        0x00e0:  3846 3342 3344 3642 4141 4335 4639 3435  8F3B3D6BAAC5F945
        0x00f0:  3245 3634 3442 3042 3134 4535 4136 4132  2E644B0B14E5A6A2
        0x0100:  3746 3135 3733 3436 4137 3843 3336 4143  7F157346A78C36AC
        0x0110:  4235 3835 4133 3035 3333 3633 3139 3436  B585A30533631946
        0x0120:  3933 4344 3736 3432 3431 3830 4441 3042  93CD76424180DA0B
        0x0130:  4344 3830 4645 4345 4537 4446 4645 3132  CD80FECEE7DFFE12
        0x0140:  4545 4146 4536 4444 4539 4530 4642 4234  EEAFE6DDE9E0FBB4
        0x0150:  4133 3045 3235 3443 4339 3343 3736 3732  A30E254CC93C7672
        0x0160:  4100 010c 0c41 5554 485f 5345 5353 4b45  A....AUTH_SESSKE
        0x0170:  5901 4040 4236 4330 3037 3537 3335 3043  Y.@@B6C00757350C
        0x0180:  4132 3137 3538 4230 3937 4646 3837 3532  A21758B097FF8752
        0x0190:  4630 3330 3444 3831 3333 4242 4538 4343  F0304D8133BBE8CC
        0x01a0:  3434 3442 4438 3231 3437 4634 3641 3237  444BD82147F46A27
        0x01b0:  3446 3730 0101 010d 0d41 5554 485f 5445  4F70.....AUTH_TE
        0x01c0:  524d 494e 414c 0107 0775 6e6b 6e6f 776e  RMINAL...unknown
        0x01d0:  0001 0f0f 4155 5448 5f50 524f 4752 414d  ....AUTH_PROGRAM
        0x01e0:  5f4e 4d01 1010 4a44 4243 2054 6869 6e20  _NM...JDBC.Thin.
        0x01f0:  436c 6965 6e74 0001 0c0c 4155 5448 5f4d  Client....AUTH_M
        0x0200:  4143 4849 4e45 010a 0a6a 6462 6363 6c69  ACHINE...jdbccli
        0x0210:  656e 7400 0108 0841 5554 485f 5049 4401  ent....AUTH_PID.
        0x0220:  0404 3132 3334 0001 1212 4155 5448 5f41  ..1234....AUTH_A
        0x0230:  4c54 4552 5f53 4553 5349 4f4e 0153 5341  LTER_SESSION.SSA
        0x0240:  4c54 4552 2053 4553 5349 4f4e 2053 4554  LTER.SESSION.SET
        0x0250:  2054 494d 455f 5a4f 4e45 3d27 5554 4327  .TIME_ZONE='UTC'
        0x0260:  204e 4c53 5f4c 414e 4755 4147 453d 2741  .NLS_LANGUAGE='A
        0x0270:  4d45 5249 4341 4e27 2020 4e4c 535f 5445  MERICAN'..NLS_TE

        0x0280:  5252 4954 4f52 593d 2741 4d45 5249 4341  RRITORY='AMERICA
        0x0290:  2700 0101 011a 1a53 4553 5349 4f4e 5f43  '......SESSION_C
        0x02a0:  4c49 454e 545f 4452 4956 4552 5f4e 414d  LIENT_DRIVER_NAM
        0x02b0:  4501 1515 6a64 6263 7468 696e 203a 2032  E...jdbcthin.:.2
        0x02c0:  312e 332e 302e 302e 3000 0116 1653 4553  1.3.0.0.0....SES
        0x02d0:  5349 4f4e 5f43 4c49 454e 545f 5645 5253  SION_CLIENT_VERS
        0x02e0:  494f 4e01 0909 3335 3235 3138 3134 3400  ION...352518144.
        0x02f0:  0116 1653 4553 5349 4f4e 5f43 4c49 454e  ...SESSION_CLIEN
        0x0300:  545f 4c4f 4241 5454 5201 0101 3100 0113  T_LOBATTR...1...
        0x0310:  1341 5554 485f 434f 4e4e 4543 545f 5354  .AUTH_CONNECT_ST
        0x0320:  5249 4e47 0167 6728 4445 5343 5249 5054  RING.gg(DESCRIPT
        0x0330:  494f 4e3d 2841 4444 5245 5353 3d28 5052  ION=(ADDRESS=(PR
        0x0340:  4f54 4f43 4f4c 3d54 4350 2928 484f 5354  OTOCOL=TCP)(HOST
        0x0350:  3d6c 6f63 616c 686f 7374 2928 504f 5254  =localhost)(PORT
        0x0360:  3d31 3532 3129 2928 434f 4e4e 4543 545f  =1521))(CONNECT_
        0x0370:  4441 5441 3d28 5345 5256 4943 455f 4e41  DATA=(SERVICE_NA
        0x0380:  4d45 3d66 616e 5f74 6573 7429 2929 0001  ME=fan_test)))..
        0x0390:  0e0e 4155 5448 5f43 4f50 5952 4947 4854  ..AUTH_COPYRIGHT
        0x03a0:  01f1 f122 4f72 6163 6c65 0a45 7665 7279  ..."Oracle.Every
        0x03b0:  626f 6479 2066 6f6c 6c6f 7773 0a53 7065  body.follows.Spe
        0x03c0:  6564 7920 6269 7473 2065 7863 6861 6e67  edy.bits.exchang
        0x03d0:  650a 5374 6172 7320 6177 6169 7420 746f  e.Stars.await.to
        0x03e0:  2067 6c6f 7722 0a54 6865 2070 7265 6365  .glow".The.prece
        0x03f0:  6469 6e67 206b 6579 2069 7320 636f 7079  ding.key.is.copy
        0x0400:  7269 6768 7465 6420 6279 204f 7261 636c  righted.by.Oracl
        0x0410:  6520 436f 7270 6f72 6174 696f 6e2e 0a44  e.Corporation..D
        0x0420:  7570 6c69 6361 7469 6f6e 206f 6620 7468  uplication.of.th
        0x0430:  6973 206b 6579 2069 7320 6e6f 7420 616c  is.key.is.not.al
        0x0440:  6c6f 7765 6420 7769 7468 6f75 7420 7065  lowed.without.pe
        0x0450:  726d 6973 7369 6f6e 0a66 726f 6d20 4f72  rmission.from.Or
        0x0460:  6163 6c65 2043 6f72 706f 7261 7469 6f6e  acle.Corporation
        0x0470:  2e20 436f 7079 7269 6768 7420 3230 3033  ..Copyright.2003
        0x0480:  204f 7261 636c 6520 436f 7270 6f72 6174  .Oracle.Corporat
        0x0490:  696f 6e2e 0001 0808 4155 5448 5f41 434c  ion.....AUTH_ACL
        0x04a0:  0104 0434 3430 3000                      ...4400.


 00:00:00.008834 IP (tos 0x0, ttl 64, id 43810, offset 0, flags [DF], proto TCP (6), length 2259)
    192.168.0.20.ncube-lm > 192.168.0.1.56054: Flags [P.], cksum 0x8a2b (incorrect -> 0xaefb), seq 3499:5706, ack 4792, win 501, option
s [nop,nop,TS val 1418194782 ecr 3789784447], length 2207
        0x0000:  4500 08d3 ab22 4000 4006 059d c0a8 0014  E...."@.@.......
        0x0010:  c0a8 0001 05f1 daf6 f910 6745 42ac 9ba1  ..........gEB...
        0x0020:  8018 01f5 8a2b 0000 0101 080a 5487 ef5e  .....+......T..^
        0x0030:  e1e3 857f 0000 089f 0600 0000 0000 0801  ................
        0x0040:  3301 1313 4155 5448 5f56 4552 5349 4f4e  3...AUTH_VERSION
        0x0050:  5f53 5452 494e 4701 0c0c 2d20 5072 6f64  _STRING...-.Prod
        0x0060:  7563 7469 6f6e 0001 1010 4155 5448 5f56  uction....AUTH_V
        0x0070:  4552 5349 4f4e 5f53 514c 0102 0232 3500  ERSION_SQL...25.
        0x0080:  0113 1341 5554 485f 5841 4354 494f 4e5f  ...AUTH_XACTION_
        0x0090:  5452 4149 5453 0101 0133 0001 0f0f 4155  TRAITS...3....AU
        0x00a0:  5448 5f56 4552 5349 4f4e 5f4e 4f01 0909  TH_VERSION_NO...
        0x00b0:  3335 3235 3138 3134 3400 0113 1341 5554  352518144....AUT
        0x00c0:  485f 5645 5253 494f 4e5f 5354 4154 5553  H_VERSION_STATUS
        0x00d0:  0101 0130 0001 1515 4155 5448 5f43 4150  ...0....AUTH_CAP
        0x00e0:  4142 494c 4954 595f 5441 424c 4500 0001  ABILITY_TABLE...
        0x00f0:  0f0f 4155 5448 5f4c 4153 545f 4c4f 4749  ..AUTH_LAST_LOGI
        0x0100:  4e01 1a1a 3738 3741 3033 3132 3046 3044  N...787A03120F0D
        0x0110:  3044 3030 3030 3030 3030 3030 3030 0001  0D000000000000..
        0x0120:  0b0b 4155 5448 5f44 424e 414d 4501 0606  ..AUTH_DBNAME...
        0x0130:  5845 5044 4231 0001 1111 4155 5448 5f44  XEPDB1....AUTH_D
        0x0140:  425f 4d4f 554e 545f 4944 0001 0a0a 3239  B_MOUNT_ID....29
        0x0150:  3738 3431 3738 3831 0001 0b0b 4155 5448  78417881....AUTH
        0x0160:  5f44 425f 4944 0001 0a0a 3130 3833 3537  _DB_ID....108357
        0x0170:  3634 3633 0001 0c0c 4155 5448 5f55 5345  6463....AUTH_USE
        0x0180:  525f 4944 0103 0331 3131 0001 0f0f 4155  R_ID...111....AU
        0x0190:  5448 5f53 4553 5349 4f4e 5f49 4401 0303  TH_SESSION_ID...
        0x01a0:  3337 3400 010f 0f41 5554 485f 5345 5249  374....AUTH_SERI
        0x01b0:  414c 5f4e 554d 0105 0536 3136 3236 0001  AL_NUM...61626..
        0x01c0:  1010 4155 5448 5f49 4e53 5441 4e43 455f  ..AUTH_INSTANCE_
        0x01d0:  4e4f 0101 0131 0001 1010 4155 5448 5f46  NO...1....AUTH_F
        0x01e0:  4149 4c4f 5645 525f 4944 0101 0131 0001  AILOVER_ID...1..
        0x01f0:  0f0f 4155 5448 5f53 4552 5645 525f 5049  ..AUTH_SERVER_PI
        0x0200:  4401 0505 3239 3635 3100 0113 1341 5554  D...29651....AUT
        0x0210:  485f 5343 5f53 4552 5645 525f 484f 5354  H_SC_SERVER_HOST
        0x0220:  010c 0c32 3138 6233 3563 3766 3164 3400  ...218b35c7f1d4.
        0x0230:  0115 1541 5554 485f 5343 5f44 4255 4e49  ...AUTH_SC_DBUNI
        0x0240:  5155 455f 4e41 4d45 0102 0258 4500 0115  QUE_NAME...XE...
        0x0250:  1541 5554 485f 5343 5f49 4e53 5441 4e43  .AUTH_SC_INSTANC
        0x0260:  455f 4e41 4d45 0102 0258 4500 0113 1341  E_NAME...XE....A
        0x0270:  5554 485f 5343 5f49 4e53 5441 4e43 455f  UTH_SC_INSTANCE_
        0x0280:  4944 0101 0131 0001 1b1b 4155 5448 5f53  ID...1....AUTH_S
        0x0290:  435f 494e 5354 414e 4345 5f53 5441 5254  C_INSTANCE_START
        0x02a0:  5f54 494d 4501 2424 3230 3232 2d30 322d  _TIME.$$2022-02-
        0x02b0:  3232 2030 393a 3537 3a33 362e 3030 3030  22.09:57:36.0000
        0x02c0:  3030 3030 3020 2b30 303a 3030 0001 1111  00000.+00:00....
        0x02d0:  4155 5448 5f53 435f 4442 5f44 4f4d 4149  AUTH_SC_DB_DOMAI
        0x02e0:  4e00 0001 1414 4155 5448 5f53 435f 5345  N.....AUTH_SC_SE
        0x02f0:  5256 4943 455f 4e41 4d45 0108 0866 616e  RVICE_NAME...fan
        0x0300:  5f74 6573 7400 011b 1b41 5554 485f 4f4e  _test....AUTH_ON
        0x0310:  535f 524c 425f 5355 4253 4352 5f50 4154  S_RLB_SUBSCR_PAT
        0x0320:  5445 524e 0134 3425 2265 7665 6e74 5479  TERN.44%"eventTy
        0x0330:  7065 3d64 6174 6162 6173 652f 6576 656e  pe=database/even
        0x0340:  742f 7365 7276 6963 656d 6574 7269 6373  t/servicemetrics
        0x0350:  2f66 616e 5f74 6573 7422 0000 011a 1a41  /fan_test".....A
        0x0360:  5554 485f 4f4e 535f 4841 5f53 5542 5343  UTH_ONS_HA_SUBSC
        0x0370:  525f 5041 5454 4552 4e01 4949 2822 6576  R_PATTERN.II("ev
        0x0380:  656e 7454 7970 653d 6461 7461 6261 7365  entType=database
        0x0390:  2f65 7665 6e74 2f73 6572 7669 6365 2229  /event/service")
        0x03a0:  207c 2028 2265 7665 6e74 5479 7065 3d64  .|.("eventType=d
        0x03b0:  6174 6162 6173 652f 6576 656e 742f 686f  atabase/event/ho
        0x03c0:  7374 2229 0000 011a 1a41 5554 485f 5343  st").....AUTH_SC
        0x03d0:  5f52 4541 4c5f 4442 554e 4951 5545 5f4e  _REAL_DBUNIQUE_N
        0x03e0:  414d 4501 0202 5845 0001 1111 4155 5448  AME...XE....AUTH
        0x03f0:  5f53 435f 5356 435f 464c 4147 5301 0101  _SC_SVC_FLAGS...
        0x0400:  3800 0115 1541 5554 485f 4641 494c 4f56  8....AUTH_FAILOV
        0x0410:  4552 5f52 4553 544f 5245 0101 0130 0001  ER_RESTORE...0..
        0x0420:  1111 4155 5448 5f49 4e53 5441 4e43 454e  ..AUTH_INSTANCEN
        0x0430:  414d 4501 0202 5845 0001 0f0f 4155 5448  AME...XE....AUTH
        0x0440:  5f4e 4c53 5f4c 584c 414e 0001 0808 414d  _NLS_LXLAN....AM
        0x0450:  4552 4943 414e 0001 1616 4155 5448 5f4e  ERICAN....AUTH_N
        0x0460:  4c53 5f4c 5843 5445 5252 4954 4f52 5900  LS_LXCTERRITORY.
        0x0470:  0107 0741 4d45 5249 4341 0001 1515 4155  ...AMERICA....AU
        0x0480:  5448 5f4e 4c53 5f4c 5843 4355 5252 454e  TH_NLS_LXCCURREN
        0x0490:  4359 0001 0101 2400 0114 1441 5554 485f  CY....$....AUTH_
        0x04a0:  4e4c 535f 4c58 4349 534f 4355 5252 0001  NLS_LXCISOCURR..
        0x04b0:  0707 414d 4552 4943 4100 0115 1541 5554  ..AMERICA....AUT
        0x04c0:  485f 4e4c 535f 4c58 434e 554d 4552 4943  H_NLS_LXCNUMERIC
        0x04d0:  5300 0102 022e 2c00 0113 1341 5554 485f  S.....,....AUTH_
        0x04e0:  4e4c 535f 4c58 4344 4154 4546 4d00 0109  NLS_LXCDATEFM...
        0x04f0:  0944 442d 4d4f 4e2d 5252 0001 1515 4155  .DD-MON-RR....AU
        0x0500:  5448 5f4e 4c53 5f4c 5843 4441 5445 4c41  TH_NLS_LXCDATELA
        0x0510:  4e47 0001 0808 414d 4552 4943 414e 0001  NG....AMERICAN..
        0x0520:  1111 4155 5448 5f4e 4c53 5f4c 5843 534f  ..AUTH_NLS_LXCSO
        0x0530:  5254 0001 0606 4249 4e41 5259 0001 1515  RT....BINARY....
        0x0540:  4155 5448 5f4e 4c53 5f4c 5843 4341 4c45  AUTH_NLS_LXCCALE
        0x0550:  4e44 4152 0001 0909 4752 4547 4f52 4941  NDAR....GREGORIA
        0x0560:  4e00 0115 1541 5554 485f 4e4c 535f 4c58  N....AUTH_NLS_LX
        0x0570:  4355 4e49 4f4e 4355 5200 0101 0124 0001  CUNIONCUR....$..
        0x0580:  1313 4155 5448 5f4e 4c53 5f4c 5843 5449  ..AUTH_NLS_LXCTI
        0x0590:  4d45 464d 0001 0e0e 4848 2e4d 492e 5353  MEFM....HH.MI.SS
        0x05a0:  5846 4620 414d 0001 1313 4155 5448 5f4e  XFF.AM....AUTH_N
        0x05b0:  4c53 5f4c 5843 5354 4d50 464d 0001 1818  LS_LXCSTMPFM....
        0x05c0:  4444 2d4d 4f4e 2d52 5220 4848 2e4d 492e  DD-MON-RR.HH.MI.
        0x05d0:  5353 5846 4620 414d 0001 1313 4155 5448  SSXFF.AM....AUTH
        0x05e0:  5f4e 4c53 5f4c 5843 5454 5a4e 464d 0001  _NLS_LXCTTZNFM..
        0x05f0:  1212 4848 2e4d 492e 5353 5846 4620 414d  ..HH.MI.SSXFF.AM
        0x0600:  2054 5a52 0001 1313 4155 5448 5f4e 4c53  .TZR....AUTH_NLS
        0x0610:  5f4c 5843 5354 5a4e 464d 0001 1c1c 4444  _LXCSTZNFM....DD
        0x0620:  2d4d 4f4e 2d52 5220 4848 2e4d 492e 5353  -MON-RR.HH.MI.SS
        0x0630:  5846 4620 414d 2054 5a52 0001 1818 4155  XFF.AM.TZR....AU
        0x0640:  5448 5f4e 4c53 5f4c 584c 454e 5345 4d41  TH_NLS_LXLENSEMA
        0x0650:  4e54 4943 5300 0104 0442 5954 4500 0119  NTICS....BYTE...
        0x0660:  1941 5554 485f 4e4c 535f 4c58 4e43 4841  .AUTH_NLS_LXNCHA
        0x0670:  5243 4f4e 5645 5843 5000 0105 0546 414c  RCONVEXCP....FAL
        0x0680:  5345 0001 1010 4155 5448 5f4e 4c53 5f4c  SE....AUTH_NLS_L
        0x0690:  5843 4f4d 5000 0106 0642 494e 4152 5900  XCOMP....BINARY.
        0x06a0:  0111 1141 5554 485f 5356 525f 5245 5350  ...AUTH_SVR_RESP
        0x06b0:  4f4e 5345 0160 6030 4643 4539 3336 3642  ONSE.``0FCE9366B
        0x06c0:  4544 4639 4434 4432 3341 4431 3041 3630  EDF9D4D23AD10A60
        0x06d0:  3246 3631 3242 3934 3944 3442 3434 4130  2F612B949D4B44A0
        0x06e0:  3044 4231 3945 4635 4433 3031 3946 4346  0DB19EF5D3019FCF
        0x06f0:  4130 4242 4530 3146 3343 3834 4142 4235  A0BBE01F3C84ABB5
        0x0700:  3430 3232 4131 3839 4232 3732 3545 3032  4022A189B2725E02
        0x0710:  3230 4235 3146 3400 0115 1541 5554 485f  20B51F4....AUTH_
        0x0720:  4d41 585f 4f50 454e 5f43 5552 534f 5253  MAX_OPEN_CURSORS
        0x0730:  0103 0333 3030 0001 0d0d 4155 5448 5f50  ...300....AUTH_P
        0x0740:  4442 5f55 4944 0001 0a0a 3130 3833 3537  DB_UID....108357
        0x0750:  3634 3633 0001 1414 4155 5448 5f4d 4158  6463....AUTH_MAX
        0x0760:  5f49 4445 4e5f 4c45 4e47 5448 0103 0331  _IDEN_LENGTH...1
        0x0770:  3238 0001 0a0a 4155 5448 5f46 4c41 4753  28....AUTH_FLAGS
        0x0780:  0101 0131 0017 0501 0110 0113 1600 0108  ...1............
        0x0790:  0841 4d45 5249 4341 4e01 1000 0107 0741  .AMERICAN......A
        0x07a0:  4d45 5249 4341 0109 0001 0101 2400 0001  MERICA......$...
        0x07b0:  0707 414d 4552 4943 4101 0100 0102 022e  ..AMERICA.......
        0x07c0:  2c01 0200 0108 0841 4c33 3255 5446 3801  ,......AL32UTF8.
        0x07d0:  0a00 0109 0947 5245 474f 5249 414e 010c  .....GREGORIAN..
        0x07e0:  0001 0909 4444 2d4d 4f4e 2d52 5201 0700  ....DD-MON-RR...
        0x07f0:  0108 0841 4d45 5249 4341 4e01 0800 0106  ...AMERICAN.....
        0x0800:  0642 494e 4152 5901 0b00 010e 0e48 482e  .BINARY......HH.
        0x0810:  4d49 2e53 5358 4646 2041 4d01 3900 0118  MI.SSXFF.AM.9...
        0x0820:  1844 442d 4d4f 4e2d 5252 2048 482e 4d49  .DD-MON-RR.HH.MI
        0x0830:  2e53 5358 4646 2041 4d01 3a00 0112 1248  .SSXFF.AM.:....H
        0x0840:  482e 4d49 2e53 5358 4646 2041 4d20 545a  H.MI.SSXFF.AM.TZ
        0x0850:  5201 3b00 011c 1c44 442d 4d4f 4e2d 5252  R.;....DD-MON-RR
        0x0860:  2048 482e 4d49 2e53 5358 4646 2041 4d20  .HH.MI.SSXFF.AM.
        0x0870:  545a 5201 3c00 0101 0124 0134 0001 0606  TZR.<....$.4....
        0x0880:  4249 4e41 5259 0132 0001 0404 4259 5445  BINARY.2....BYTE
        0x0890:  013d 0001 0505 4641 4c53 4501 3e00 010b  .=....FALSE.>...
        0x08a0:  0b80 00d0 04b5 3c3c 8000 0000 01a3 0004  ......<<........
        0x08b0:  0101 0208 8500 0000 0000 0000 0000 0000  ................
        0x08c0:  0000 0000 0000 0000 0200 0000 0000 0000  ................
        0x08d0:  0000 00                                  ...

After receiving that yuuge packet from the server, client asks for database version, sharding mode, container name. From the rest of the dump it seems database version and sharding mode is asked from the first connection only. Possible explanation could be my test code, when initializing the connection pool it checks driver metadata and logs more interesting bits.

Birth of the Oracle session: the network view

Why Oracle does not use my index, time zones edition

Handling time zones in the Oracle database is like a minefield. One interesting tidbit is that Oracle internally uses UTC for timestamp comparisons. This has some weird side effects, especially when it comes to indexes.

First, let’s have some test data. In table TRANSACTIONS created_utc is, as it’s name hints, in UTC time zone.

CREATE TABLE transactions (
    id           INTEGER PRIMARY KEY,
    user_id      INTEGER NOT NULL,
    jurisdiction VARCHAR2(5) NOT NULL,
    amount       NUMBER(8, 2) NOT NULL,
    created_utc  TIMESTAMP NOT NULL
);

CREATE INDEX ix_transactions_j_created ON
    transactions (
        jurisdiction,
        created_utc
    );
INSERT INTO transactions
    with gen as (select level as id from dual connect by level < 3500)
    SELECT
        rownum,
        mod(
            rownum, 2000
        ),
        CASE mod(
            rownum, 6
        )
            WHEN 0 THEN 'UK'
            WHEN 1 THEN 'SE'
            WHEN 2 THEN 'US_NJ'
            WHEN 3 THEN 'NL'
            ELSE 'COM'
        END,
        dbms_random.value(
            0, 1000
        ),
        systimestamp - dbms_random.value(
            0, 730
        )
    FROM
        gen gen1 cross join gen gen2
    ;
exec dbms_stats.gather_table_stats(USER, 'TRANSACTIONS', cascade => true);

I want to see how much money came in during the last day. This is a somewhat simplified example, but for our test case, it’s just fine.

SELECT
    SUM(amount)
FROM
    transactions
WHERE
    jurisdiction = 'SE'
    AND created_utc > systimestamp - 1;

And execution plan is quite reasonable

-------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name                      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |                    
-------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |                           |      1 |    20 |  2487   (0)| 00:00:01 |                    
|   1 |  SORT AGGREGATE                      |                           |      1 |    20 |            |          |                    
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TRANSACTIONS              |   2474 | 49480 |  2487   (0)| 00:00:01 |                    
|*  3 |    INDEX RANGE SCAN                  | IX_TRANSACTIONS_J_CREATED |   2474 |       |    13   (0)| 00:00:01 |                    
------------------------------------------------------------------------------------------------------------------- 

Some of the jurisdictions do not use UTC as a time zone, for example, if I want to produce a report for the Swedish authorities I would need to use CET. No problem, it’s just a matter of typing. I’m going to use current_timestamp and set the session time zone to ‘+02:00’, but anything that produces TIMESTAMP WITH TIME ZONE is good here.

ALTER SESSION SET time_zone = 'Europe/Stockholm';

SELECT
    SUM(amount)
FROM
    transactions
WHERE
    jurisdiction = 'SE'
    AND from_tz(
        created_utc, 'UTC'
    ) > current_timestamp - INTERVAL '1' DAY;

But it feels slow, so let’s check the execution plan. Oops:

------------------------------------------------------------------------------------                                                   
| Id  | Operation          | Name         | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |                                                   
------------------------------------------------------------------------------------                                                   
|   0 | SELECT STATEMENT   |              |      1 |    20 | 14756   (2)| 00:00:01 |                                                   
|   1 |  SORT AGGREGATE    |              |      1 |    20 |            |          |                                                   
|*  2 |   TABLE ACCESS FULL| TRANSACTIONS |    102K|  1992K| 14756   (2)| 00:00:01 |                                                   
------------------------------------------------------------------------------------     

Discussion

Answer to our mystery lies in the Predicate Information section:

2 - filter("JURISDICTION"='SE' AND
SYS_EXTRACT_UTC(FROM_TZ(INTERNAL_FUNCTION("CREATED_UTC"),'UTC'))>SYS_EXTR
ACT_UTC(CURRENT_TIMESTAMP(6)-INTERVAL'+01 00:00:00' DAY(2) TO SECOND(0)))

We could create an index on the expression SYS_EXTRACT_UTC(FROM_TZ(CREATED_UTC,'+02:00')), or even create a virtual column and index that one. Fortunately, there’s a better solution available. We’re fine as long as the right-hand side produces TIMESTAMP in UTC. (Alternatively, since we have jurisdiction, another TIMESTAMP column containing timestamp from the end user, in the end user time zone, would also be ok.)

Let’s test our theory

SELECT
    SUM(amount)
FROM
    transactions
WHERE
    jurisdiction = 'SE'
    AND created_utc > sys_extract_utc(current_timestamp - INTERVAL '1' DAY);


And query plan is just fine

-------------------------------------------------------------------------------------------------------------------

| Id  | Operation                            | Name                      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |

-------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                     |                           |      1 |    20 |  2388   (0)| 00:00:01 |

|   1 |  SORT AGGREGATE                      |                           |      1 |    20 |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| TRANSACTIONS              |   2376 | 47520 |  2388   (0)| 00:00:01 |

|*  3 |    INDEX RANGE SCAN                  | IX_TRANSACTIONS_J_CREATED |   2376 |       |    12   (0)| 00:00:01 |

-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):                                                                                    
---------------------------------------------------                                                                                    
                                                                                                                                       
   3 - access("JURISDICTION"='SE' AND "CREATED_UTC">SYS_EXTRACT_UTC(CURRENT_TIMESTAMP(6)-INTERVAL'+01                                  
              00:00:00' DAY(2) TO SECOND(0)))

EDIT (6/6/2022 12:30 CEST): @ilmarkerm noticed a couple of typos in time zone names, and a misleading sentence. Which I all dutifully fixed.

But as a result following predicate appeared in the query plan:

SYS_EXTRACT_UTC(FROM_TZ(INTERNAL_FUNCTION("CREATED_UTC"),'UTC'))

There is no escape from SYS_EXTRACT_UTC, even if your TIMESTAMP WITH TIME ZONE is in UTC.

Why Oracle does not use my index, time zones edition

How to add application info in the Java app without DBMS_APPLICATION_INFO

DBMS_APPLICATION_INFO is cool, but calling it from the client means extra roundtrip to the server. Fortunately, starting with JDBC 4.1 and Oracle JDBC driver version 12.1 it is possible to use Connection.setConnectionInfo procedure to set some of the parameters. These parameters are sent to the server in the same network packet with the query. So no extra roundtrips. Small example with a little extra Spring magic:

@Autowired
DataSource ds;

Connection conn = ds.getConnection();

// GV$SESSION.MODULE
conn.setClientInfo("OCSID.MODULE", "Your service");

// GV$SESSION.ACTION
conn.setClientInfo("OCSID.ACTION", "What it is doing");

// GV$SESSION.CLIENT_IDENTIFIER
conn.setClientInfo("OCSID.CLIENTID", "Additional id");

// GV$SESSION.ECID
conn.setClientInfo("OCSID.ECID", "DMS id");

conn.setClientInfo("OCSID.SEQUENCE_NUMBER", "1");
conn.setClientInfo("OCSID.DBOP", "WUT?");

OCSID.MODULE, OCSID.ACTION and OCSID.CLIENTID correspond to stuff set by DBMS_APPLICATION_INFO procedures SET_MODULE, SET_ACTION and SET_CLIENT_INFO. Key OCSID.ECID sets DMS context id.

Keys OCSID.SEQUENCE_NUMBER and OCSID.DBOP seem to be specific to DMS as well, but these do not seem to be exposed anywhere, not even in E2E_CONTEXT.

setClientInfo and namespaces

setClientInfo accepts keys in the form of <namespace>.<keyname>. According to the documentation, JDBC driver supports any <namespace>.<keyname> pairs.

There are two namespaces available in the database side where setClientInfo can set keys, already mentioned E2E_CONTEXT, and CLIENTCONTEXT. As of Oracle 21.3, attempts to set keys in any other externally initialized contexts will result ORA-28267: Invalid NameSpace Value. Namespace OCSID does not seem to be available through SYS_CONTEXT function.

And a little test

Let’s check with tcpdump if parameters really travel to to the database in the same package with the query. First, our little code snippet:

1647164216.240543 IP (tos 0x0, ttl 64, id 31708, offset 0, flags [DF], proto TCP (6), length 247)
    192.168.0.1.44286 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x824f (incorrect -> 0xcb3a), seq 5077:5272, ack 6178, win 512, option
s [nop,nop,TS val 3338843715 ecr 967253760], length 195
        0x0000:  4500 00f7 7bdc 4000 4006 3cbf c0a8 0001  E...{.@.@.<.....
        0x0010:  c0a8 0014 acfe 05f1 f1d2 db03 cdb4 3a87  ..............:.
        0x0020:  8018 0200 824f 0000 0101 080a c702 b643  .....O.........C
        0x0030:  39a7 1f00 0000 00c3 0600 0000 0000 1169  9..............i
        0x0040:  0701 0101 0101 1187 0800 0002 0279 0101  .............y..
        0x0050:  0901 010c 0101 1001 0106 0001 0100 0000  ................
        0x0060:  0000 0001 0104 436c 6965 6e74 2049 4459  ......Client.IDY
        0x0070:  6f75 7220 7365 7276 6963 6557 6861 7420  our.serviceWhat.
        0x0080:  6974 2069 7320 646f 696e 6744 4d53 2069  it.is.doingDMS.i
        0x0090:  6457 5554 3f11 b009 014c 035e 0a02 8021  dWUT?....L.^...!
        0x00a0:  0001 0116 0101 0d00 0004 ffff ffff 010a  ................
        0x00b0:  047f ffff ff00 0000 0000 0000 0000 0000  ................
        0x00c0:  0100 0000 0000 0000 0000 0000 0000 0000  ................
        0x00d0:  7365 6c65 6374 2064 756d 6d79 2066 726f  select.dummy.fro
        0x00e0:  6d20 6475 616c 0101 0000 0000 0000 0101  m.dual..........
        0x00f0:  0002 8000 0000 00                        .......

Same with CLIENTCONTEXT, only this time namespace is actually mentioned in the packet:

1647164216.274874 IP (tos 0x0, ttl 64, id 31710, offset 0, flags [DF], proto TCP (6), length 235)
    192.168.0.1.44286 > 192.168.0.20.ncube-lm: Flags [P.], cksum 0x8243 (incorrect -> 0x219c), seq 5272:5455, ack 6435, win 512, options [nop,nop,TS val 3338843750 ecr 967254045], length 183
        0x0000:  4500 00eb 7bde 4000 4006 3cc9 c0a8 0001  E...{.@.@.<.....
        0x0010:  c0a8 0014 acfe 05f1 f1d2 dbc6 cdb4 3b88  ..............;.
        0x0020:  8018 0200 8243 0000 0101 080a c702 b666  .....C.........f
        0x0030:  39a7 201d 0000 00b7 0600 0000 0000 1169  9..............i
        0x0040:  0b01 0101 0101 119a 0c01 010d 0101 0101  ................
        0x0050:  0100 434c 4945 4e54 434f 4e54 4558 5401  ..CLIENTCONTEXT.
        0x0060:  0505 4442 4f50 3201 0606 4576 696c 4f50  ..DBOP2...EvilOP
        0x0070:  0003 5e0d 0280 2100 0101 3301 010d 0000  ..^...!...3.....
        0x0080:  04ff ffff ff01 0a04 7fff ffff 0000 0000  ................
        0x0090:  0000 0000 0000 0001 0000 0000 0000 0000  ................
        0x00a0:  0000 0000 0000 0073 656c 6563 7420 5359  .......select.SY
        0x00b0:  535f 434f 4e54 4558 5428 2745 3245 5f43  S_CONTEXT('E2E_C
        0x00c0:  4f4e 5445 5854 272c 2027 4442 4f50 2729  ONTEXT',.'DBOP')
        0x00d0:  2066 726f 6d20 6475 616c 0101 0000 0000  .from.dual......
        0x00e0:  0000 0101 0002 8000 0000 00              ...........
How to add application info in the Java app without DBMS_APPLICATION_INFO

Reading Oracle hanganalyze dumps for fun and profit

On Monday afternoon out of the sudden people started complaining about database being slow and Important Things being delayed. Database in question is fairly busy RAC cluster with fair amount of data and instances for processing. Indeed, on checking there were massive waits on SS enqueue. While queries involved were either big on sorting, or abusing cursor duration temporary tables, there wasn’t obvious reason why those queries would wait on sort segment enqueue.

So let’s try ASH wait chains. From the output some of the chains stuck out. (Following output is generated later from DBA_HIST_ACTIVE_SESS_HISTORY, didn’t think at the time whole incident merits a blog post.)

 -> DW_USER_REPORT:(oracle) enq: SS - contention [mode=6]  -> [idle blocker 3,295,65220]
 -> DW_USER_REPORT:(Jnnn) enq: SS - contention [mode=6]  -> DW_USER_REPORT:(oracle) DFS lock handle
 -> DW_USER_REPORT:(oracle) enq: SS - contention [mode=6]  -> [idle blocker 10,486,63050] 
 -> DW_USER_REPORT:(oracle) enq: SS - contention [mode=6]  -> DW_USER_REPORT:(Pnnn) DFS lock handle

First, there are sessions blocked by idle session. And second weird thing, there are cross-instance calls (DFS lock handle) going nowhere: there should be background process handling the call listed. So immediate suspicion is final blocker being idle and not present in the ASH snapshot(s).

On way to check idle blockers is GV$SESSION, but fortunately there’s more comprehensive way with less typing involved: hanganalyze dump and it’s little cousin V$WAIT_CHAINS.

V$WAIT_CHAINS

It is populated by DIA0, on every 3 seconds for local chains, and every 10 seconds for global chains (check MOS document 1428210.1 for details). As it is local view, it works even if global views are not responsive enough. In our case it yielded slightly longer chain:

'rdbms ipc message'<='DFS lock handle'<='enq: SS - contention'<='PX Deq Credit: send blkd'

Here we have final blocker, which happens to be idle. Unfortunately, V$WAIT_CHAINS contains only small subset of stuff that’s present in hanganalyze dump, so let’s do some dumping.

Hanganalyze dumps

By default, hanganalyze seems to include only local instance, which isn’t very useful in RAC. So trick is to get the chains from all instances:

SYS @ dwh21:>oradebug setmypid
Statement processed.
SYS @ dwh21:>oradebug setinst all
Statement processed.
SYS @ dwh21:>oradebug -g all hanganalyze 3
Hang Analysis in /u01/app/oracle/diag/rdbms/dwh2/dwh21/trace/dwh21_diag_1877.trc

Heavily edited snippet from the hang analysis dump is in the end of this post.

While Oracle thinks naughty chains could be 1 .. 3, interesting chain happens to be 5th. Reading from right to left

  • First and second session are from instance #1, executing the same parallel query. First is the consumer waiting for the rows from the second.
  • Second session is waiting on SS enqueue (Sort Segment), which is held by the session on instance #3. That session is executing INSERT … SELECT statement, but somehow ended up doing cross instance call (DFS lock handle). According to dump, third session is blocking 37 other sessions, I would expect there will be intersecting wait chains with sessions waiting in CI enqueue, but this is not the case.

    Another fun fact, stack dump of the second session seems to point to the hash join trying to spill something to the disk (ktstgrowseg()+4094 <-ktstadr_addextent()+853 <-ktrsexec()+454 <-ktataddextent()+124 <-kxhfmGetExtent()+87 <-kxhfWrite()+755 <-kxhfNewBuffer()+756 <-qerhnGetNewBuffer()+70<-ksxb1bqb()+170)
  • Fourth and final session is DBW0 in instance #2. This is idle session waiting for the work (rdbms ipc message).

Why would idle database writer block something, is a good question indeed. In our case blocker moved around between the instances, so it wasn’t one wedged instance screwing up the whole cluster. There is similar bug for LGW0 (2699312.1), but nothing about DBW0. Another mystery, why is database writer involved in sort segment management in the first place.

Back to our original problem

Sort segments are maintained per instance basis in RAC, and space allocated to one instance isn’t visible to the others. This means ORA-01652 errors are handled differently in RAC: instead of returning error to the caller cross instance call is initiated first, to see if any other instance can free allocated but not used temp space. (GV$TEMP_EXTENT_POOL gives per instance overview on what is allocated and what is used.)

In our case, subset of the instances exhausted available temp space and somehow cross instance calls got stuck. And as a result, queries using temp segments became really slow.

Further reading

Snippet from hanganalyze dump

*** 2020-11-02T21:05:01.992759+01:00
PORADEBUG BEGIN ORIGINATING INST:7 SERIAL:0 PID:29440
PORADATA COMMAND:hanganalyze 3
2020-11-02 21:05:02.000 :kjzdregsw(): Session slot[0] (kjzddbx) switches to a new action, cbctx:0x7f33e41db160 clnup:0xb02c200 flag:3 sessid:0
PORADATA TRACEFILE /u01/app/oracle/diag/rdbms/<redacted>
PORADEBUG END ORIGINATING INST:7 SERIAL:0 PID:29440
****************************************************************
*** 2020-11-02T21:05:05.768950+01:00
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): <ten instances, redacted>
  oradebug_node_dump_level: 3
  analysis initiated by oradebug
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 21:05:05 ]
      NOTE: scheduling delay has not been sampled for 0.445928 secs
    0.000000 secs from [ 21:05:01 - 21:05:06 ], 5 sec avg
    0.000000 secs from [ 21:04:05 - 21:05:06 ], 1 min avg
    0.000000 secs from [ 21:00:06 - 21:05:06 ], 5 min avg
  vktm time drift history
===============================================================================
 
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'PX Deq: Table Q Normal'<='PX Deq Credit: send blkd'
     Chain 1 Signature Hash: 0x507d111
 [b] Chain 2 Signature: 'gc cr multi block request'<='PX Deq Credit: send blkd'
     Chain 2 Signature Hash: 0xf7d43820
 [c] Chain 3 Signature: 'gc cr multi block request'<='enq: WF - contention'
     Chain 3 Signature Hash: 0x806234ed
 
===============================================================================
Non-intersecting chains:
<snip>
 
-------------------------------------------------------------------------------
Chain 5:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (<redacted>)
                   os id: 23221
              process id: 65, oracle@oc1n11.example.com (P00D)
              session id: 16
        session serial #: 16422
             module name: 0 (DBMS_SCHEDULER)
    }
    is waiting for 'PX Deq Credit: send blkd' with wait info:
    {
                      p1: 'sleeptime/senderid'=0x10010012
                      p2: 'passes'=0x66
                      p3: 'qref'=0x19d2f26a98
            time in wait: 0.924122 sec (last interval)
            time in wait: 51.343573 sec (total)
           timeout after: never
                 wait id: 4014518
                blocking: 0 sessions
          current sql_id: 3430640901
             current sql: With player_info as (
                 <snip>
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-epoll_wait()+19<-ipcor_wseti_wait()+94<-ipclw_wait()+520<-ksxpwait_ipclw()+919<-ksxpwait_int()+155<-ksxpwait()+319<-ksliwat()+2324<-kslwaitctx()+200<-ksxprcv_int()+5361<-ksxprcvimdwctx()+95<-kxfpqidqr()+13080<-kxfprienq()+7059<-kxfpqrenq()+252<-kxfqeqb()+127<-kxfqbroadcastall()+1238<-kxfqenq()+971<-qertqoRop()+197<-qerhnProbeInner()+1143<-kdstf110010100000000km()+910<-kdsttgr()+2208<-qertbFetch()+1089<-qergiFetch()+602<-rwsfcd()+113<-qerhnF
            wait history:
              * time between current wait and wait #1: 0.000000 sec
              1.       event: 'PX Deq: reap credit'
                 time waited: 0.000006 sec
                     wait id: 4014621         
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'PX Deq Credit: send blkd'
                 time waited: 0.573351 sec (last interval)
                 time waited: 50.419445 sec (total)
                     wait id: 4014518          p1: 'sleeptime/senderid'=0x10010012
                                               p2: 'passes'=0x65
                                               p3: 'qref'=0x19d2f26a98
              * time between wait #2 and #3: 0.000000 sec
              3.       event: 'PX Deq: reap credit'
                 time waited: 0.000009 sec
                     wait id: 4014620         
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (<redacted>)
                   os id: 23282
              process id: 120, oracle@oc1n11.example.com (P00I)
              session id: 293
        session serial #: 31198
             module name: 0 (DBMS_SCHEDULER)
    }
    which is waiting for 'enq: SS - contention' with wait info:
    {
                      p1: 'name|mode'=0x53530006
                      p2: 'tablespace #'=0x5b
                      p3: 'dba'=0x2
            time in wait: 51.222013 sec
           timeout after: never
                 wait id: 864202
                blocking: 5 sessions
          current sql_id: 3430640901
             current sql: With player_info as (
                          <snip>
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-epoll_wait()+19<-ipcor_wseti_wait()+94<-ipclw_wait()+520<-ksxpwait_ipclw()+919<-ksxpwait_int()+155<-ksxpwait()+319<-ksliwat()+2324<-kslwaitctx()+200<-kjuscv_wait()+1650<-kjuscv()+1433<-ksipcon()+561<-ksqcmi()+10488<-ksqcnv()+809<-ktstspacerls()+538<-ktstgrowseg()+4094<-ktstadr_addextent()+853<-ktrsexec()+454<-ktataddextent()+124<-kxhfmGetExtent()+87<-kxhfWrite()+610<-kxhfNewBuffer()+756<-qerhnGetNewBuffer()+70<-ksxb1bqb()+170<-qerhn_kxhrPac
            wait history:
              * time between current wait and wait #1: 0.000039 sec
              1.       event: 'enq: SS - contention'
                 time waited: 0.000574 sec
                     wait id: 864201           p1: 'name|mode'=0x53530001
                                               p2: 'tablespace #'=0x5b
                                               p3: 'dba'=0x2
              * time between wait #1 and #2: 0.000677 sec
              2.       event: 'buffer busy waits'
                 time waited: 0.002828 sec
                     wait id: 864200           p1: 'file#'=0xf7
                                               p2: 'block#'=0x2
                                               p3: 'class#'=0xd
              * time between wait #2 and #3: 0.000159 sec
              3.       event: 'buffer busy waits'
                 time waited: 0.001735 sec
                     wait id: 864199           p1: 'file#'=0xf2
                                               p2: 'block#'=0x2
                                               p3: 'class#'=0xd
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 3 (<redacted>)
                   os id: 22495
              process id: 103, oracle@oc1n13.example.com (P008)
              session id: 1148
        session serial #: 9960
             module name: 0 (DBMS_SCHEDULERc8-447d-99a0-a003dd18ae783))
    }
    which is waiting for 'DFS lock handle' with wait info:
    {
                      p1: 'type|mode'=0x43490005
                      p2: 'id1'=0xe
                      p3: 'id2'=0x2
            time in wait: 53.325184 sec
           timeout after: never
                 wait id: 66290
                blocking: 37 sessions
          current sql_id: 1159177187
             current sql: INSERT 
                   INTO <snip>
                        SELECT ...
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-epoll_wait()+19<-ipcor_wseti_wait()+94<-ipclw_wait()+520<-ksxpwait_ipclw()+919<-ksxpwait_int()+155<-ksxpwait()+319<-ksliwat()+2324<-kslwaitctx()+200<-kjusuc()+4645<-ksigeti()+1478<-ksbcic_int()+4785<-ktstspacerls()+1227<-ktstgrowseg()+4094<-ktstadr_addextent()+853<-ktrsexec()+454<-ktataddextent()+124<-kxhfmGetExtent()+87<-kxhfWrite()+755<-kxhfNewBuffer()+756<-qerhnGetNewBuffer()+70<-ksxb1bqb()+170<-qerhn_kxhrPack()+1585<-qerhnSplitBuild()+3
            wait history:
              * time between current wait and wait #1: 0.000170 sec
              1.       event: 'DFS lock handle'
                 time waited: 0.001276 sec
                     wait id: 66289            p1: 'type|mode'=0x43490005
                                               p2: 'id1'=0xe
                                               p3: 'id2'=0x3
              * time between wait #1 and #2: 0.000032 sec
              2.       event: 'DFS lock handle'
                 time waited: 0.001159 sec
                     wait id: 66288            p1: 'type|mode'=0x43490005
                                               p2: 'id1'=0xe
                                               p3: 'id2'=0x1
              * time between wait #2 and #3: 0.000039 sec
              3.       event: 'DFS lock handle'
                 time waited: 0.000611 sec
                     wait id: 66287            p1: 'type|mode'=0x43490005
                                               p2: 'id1'=0xe
                                               p3: 'id2'=0x5
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 2 (<snip>)
                   os id: 16370
              process id: 32, oracle@oc1n12.example.com (DBW0)
              session id: 568
        session serial #: 8812
    }
    which is waiting for 'rdbms ipc message' with wait info:
    {
                      p1: 'timeout'=0x79
            time in wait: 0.876014 sec
      heur. time in wait: 0.876268 sec
           timeout after: 0.333986 sec
                 wait id: 4994326
                blocking: 38 sessions
          current sql_id: 0
             current sql: <none>
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2224<-kslwaitctx()+200<-ksarcv()+320<-ksbabs()+602<-ksbrdp()+1123<-opirip()+541<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
            wait history:
              * time between current wait and wait #1: 0.000003 sec
              1.       event: 'rdbms ipc message'
                 time waited: 0.000082 sec
                     wait id: 4994325          p1: 'timeout'=0x79
              * time between wait #1 and #2: 0.000004 sec
              2.       event: 'rdbms ipc message'
                 time waited: 0.000165 sec
                     wait id: 4994324          p1: 'timeout'=0x79
              * time between wait #2 and #3: 0.000012 sec
              3.       event: 'rdbms ipc message'
                 time waited: 0.036154 sec
                     wait id: 4994323          p1: 'timeout'=0x7c
    }
 
Chain 5 Signature: 'rdbms ipc message'<='DFS lock handle'<='enq: SS - contention'<='PX Deq Credit: send blkd'
Chain 5 Signature Hash: 0xb4402dff
-------------------------------------------------------------------------------
<snip>
 
===============================================================================
Intersecting chains:
 -------------------------------------------------------------------------------
Chain 13:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (<redacted>)
                   os id: 23258
              process id: 118, oracle@oc1n11.example.com (P00G)
              session id: 107
        session serial #: 17672
             module name: 0 (DBMS_SCHEDULER)
    }
    is waiting for 'PX Deq Credit: send blkd' with wait info:
    {
                      p1: 'sleeptime/senderid'=0x10010012
                      p2: 'passes'=0x66
                      p3: 'qref'=0x19d2f27260
            time in wait: 0.924493 sec (last interval)
            time in wait: 51.344392 sec (total)
           timeout after: never
                 wait id: 3580786
                blocking: 0 sessions
          current sql_id: 3430640901
             current sql: With player_info as (
     select <snip>
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-epoll_wait()+19<-ipcor_wseti_wait()+94<-ipclw_wait()+520<-ksxpwait_ipclw()+919<-ksxpwait_int()+155<-ksxpwait()+319<-ksliwat()+2324<-kslwaitctx()+200<-ksxprcv_int()+5361<-ksxprcvimdwctx()+95<-kxfpqidqr()+13080<-kxfprienq()+7059<-kxfpqrenq()+252<-kxfqeqb()+127<-kxfqbroadcastall()+1238<-kxfqenq()+971<-qertqoRop()+197<-qerhnProbeInner()+1143<-kdstf110010100000000km()+910<-kdsttgr()+2208<-qertbFetch()+1089<-qergiFetch()+602<-rwsfcd()+113<-qerhnF
            wait history:
              * time between current wait and wait #1: 0.000000 sec
              1.       event: 'PX Deq: reap credit'
                 time waited: 0.000007 sec
                     wait id: 3580889         
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'PX Deq Credit: send blkd'
                 time waited: 0.572950 sec (last interval)
                 time waited: 50.419892 sec (total)
                     wait id: 3580786          p1: 'sleeptime/senderid'=0x10010012
                                               p2: 'passes'=0x65
                                               p3: 'qref'=0x19d2f27260
              * time between wait #2 and #3: 0.000000 sec
              3.       event: 'PX Deq: reap credit'
                 time waited: 0.000098 sec
                     wait id: 3580888         
    }
    and is blocked by 'instance: 1, os id: 23282, session id: 293',
    which is a member of 'Chain 5'.
 
Chain 13 Signature: 'rdbms ipc message'<='DFS lock handle'<='enq: SS - contention'<='PX Deq Credit: send blkd'
Chain 13 Signature Hash: 0xb4402dff
-------------------------------------------------------------------------------

Reading Oracle hanganalyze dumps for fun and profit

Setting network timeouts with Oracle JDBC

With Statement and its subinterfaces it is possible to set query timeouts with setQueryTimeout method. For the queries it is fine, but this does not help with the rest of the database operations, for example with commits and metadata operations. One way to address the issue is to set timeout for the socket read through Connection.setNetworkTimeout method. Documentation talks about rare network conditions, but if you’re blessed with RAC there are additional error scenarios that look like long reads from the socket. Another positive thing about setNetworkTimeout is that it allows millisecond resolution for the timeout, while setQueryTimeout does only seconds.

Let’s have an example:

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import javax.sql.DataSource;


import org.junit.runner.RunWith;
import org.junit.Test;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.test.context.junit4.SpringRunner;

@RunWith(SpringRunner.class)
@SpringBootTest
public class JDBCStatement {

        protected final Logger logger = LoggerFactory.getLogger(getClass());

        @Autowired
        DataSource dataSource;

        @Value("${spring.datasource.use_autocommit}")
        private boolean autocommit;

        static final String DEL_STATEMENT = "BEGIN DELETE FROM TX; DELETE FROM T; END;";
        static final String INS_STATEMENT = "INSERT INTO T VALUES(?, ?, ?)";
        static final String CLIENT_INFO = "BEGIN DBMS_SESSION.SET_IDENTIFIER(?); END;";
        static final String CLIENT_NAME = "C_RK";
        static final int BATCH_SIZE = 1000000;

        @Autowired
        ThreadPoolTaskExecutor ex;

        @Test
        public void singleStatement() {
                try (
                        Connection conn = dataSource.getConnection();
                        PreparedStatement ds = conn.prepareStatement(DEL_STATEMENT);
                        PreparedStatement is = conn.prepareStatement(INS_STATEMENT);
                ) {
                        ds.execute();

//                      Just for testing
//                      conn.setNetworkTimeout(Executors.newFixedThreadPool(1), 10000);

//                      Sets 10s timeout
                        conn.setNetworkTimeout(ex.getThreadPoolExecutor(), 10000);
                        for(int i = 0; i <= BATCH_SIZE; i++){
                                is.setInt(1, i);
                                is.setInt(2, 1);
                                is.setString(3, "BOOYA!." + i);
                                is.execute();
                                // Commit in a loop is to help out with testing. Don't try this at home!
                                conn.commit();
                        }
                }
                catch (SQLException ex) {
                        throw new RuntimeException("", ex);
                }

        }
}

Testing

Easiest for me is to stop the Oracle log writer during the test run. This makes commit hang with log file sync wait event and our timeout to fire. So in the database while test case is running

SYS@roadkill 12-03-2021 13:45:23> !ps -u oracle|grep lgwr
30317 ?        00:03:36 ora_lgwr_roadki

SYS@roadkill 12-03-2021 13:45:36> oradebug setospid 30317
Oracle pid: 13, Unix process pid: 30317, image: oracle@roadkill (LGWR)
SYS@roadkill 12-03-2021 13:46:38> oradebug suspend
Statement processed.
SYS@roadkill 12-03-2021 13:46:49> select state, seq#, event from v$session where username ='C_RK';

STATE                     SEQ# EVENT
------------------- ---------- ----------------------------------------------------------------
WAITING                  27879 log file sync
Elapsed: 00:00:00.01
SYS@roadkill 12-03-2021 13:47:03>

(There’s a way to run oradebug from event handler, but I haven’t get that one working ;))

After a few seconds our timeout fired:

java.lang.RuntimeException:
        at <redacted>.JDBCStatement.singleStatement(JDBCStatement.java:77)
Caused by: java.sql.SQLRecoverableException: IO Error: Socket read timed out
        at <redacted>.JDBCStatement.singleStatement(JDBCStatement.java:71)
Caused by: oracle.net.nt.TimeoutInterruptHandler$IOReadTimeoutException: Socket read timed out
        at <redacted>.JDBCStatement.singleStatement(JDBCStatement.java:71)

Setting network timeouts with Oracle JDBC

Reducing chattiness of JDBC applications: batching be good!

When reusing PreparedStatement with the new set of bind variables, SQL statement and first row is sent to the database with first PreparedStatement.execute*() call. This involves one network roundtrip, and each consecutive call triggers another roundtrip. These network roundtrips can add up fast when operating on high latency networks. (Say hi! to the public cloud.) Let’s consider following snippet. (Autocommit is everywhere set to false unless otherwise noted.)

        static final String DEL_STATEMENT = "DELETE FROM T;";
        static final String INS_STATEMENT = "INSERT INTO T VALUES(?, ?)";

        static final int BATCH_SIZE = 1000;
        @Test
        public void rowByRow() {
                try (
                        Connection conn = dataSource.getConnection();
                        PreparedStatement ds = conn.prepareStatement(DEL_STATEMENT);
                        PreparedStatement is = conn.prepareStatement(INS_STATEMENT);
                ) {

                        ds.execute();
                        for(int i = 0; i <= BATCH_SIZE; i++){
                                is.setInt(1, i);
                                is.setString(2, "BOOYA!." + i);
                                is.execute();
                        }
                        conn.commit();
                }
                catch (SQLException ex) {
                        throw new RuntimeException("", ex);
                }

        }

As expected, inserting one row will involve one network roundtrip:

    localhost.44672 > localhost.ncube-lm: Flags [P.], cksum 0xfeb8 (incorrect -> 0xe4c3), seq 5010:5154, ack 5775, win 2478, options [nop,nop,TS val 104535079
3 ecr 1045350771], length 144
E...k.@.@..P...............2...\..      ........
>N..>N.s...........^...)......................................INSERT INTO T VALUES(:1 , :2 ).............................i....... .......i......BOOYA!.0
12:42:35.064825 IP (tos 0x0, ttl 64, id 3758, offset 0, flags [DF], proto TCP (6), length 52)
    localhost.ncube-lm > localhost.44672: Flags [.], cksum 0xfe28 (incorrect -> 0x516e), seq 5775, ack 5154, win 2497, options [nop,nop,TS val 1045350793 ecr 1045350793], length 0
E..4..@.@..................\......      ..(.....
>N..>N..
12:42:35.065721 IP (tos 0x0, ttl 64, id 3759, offset 0, flags [DF], proto TCP (6), length 145)
    localhost.ncube-lm > localhost.44672: Flags [P.], cksum 0xfe85 (incorrect -> 0xe9f0), seq 5775:5868, ack 5154, win 2497, options [nop,nop,TS val 1045350794 ecr 1045350793], length 93
E.....@.@.-................\......      ........
>N..>N.....].............0.........................................%.}..u.................%.}.u......
12:42:35.066227 IP (tos 0x0, ttl 64, id 27618, offset 0, flags [DF], proto TCP (6), length 134)
    localhost.44672 > localhost.ncube-lm: Flags [P.], cksum 0xfe7a (incorrect -> 0x671b), seq 5154:5236, ack 5868, win 2478, options [nop,nop,TS val 1045350794 ecr 1045350794], length 82
E...k.@.@.........................      ..z.....
>N..>N.....R.......^... ..........................................................BOOYA!.1
12:42:35.066666 IP (tos 0x0, ttl 64, id 3760, offset 0, flags [DF], proto TCP (6), length 145)
    localhost.ncube-lm > localhost.44672: Flags [P.], cksum 0xfe85 (incorrect -> 0xe63e), seq 5868:5961, ack 5236, win 2497, options [nop,nop,TS val 1045350795 ecr 1045350794], length 93
E.....@.@.-.......................      ........
>N..>N.....].............0.........................................%.}..v.................%.}.v......
<and so on...>

Great way to reduce the network roundtrips is executing DML in a single batch:

        static final String DEL_STATEMENT = "BEGIN DELETE FROM TX; DELETE FROM T; END;";
        static final String INS_STATEMENT = "INSERT INTO T VALUES(?, ?)";
        static final int BATCH_SIZE = 1000;
        @Test
        public void batchStatement() {
                try (
                        Connection conn = dataSource.getConnection();
                        PreparedStatement ds = conn.prepareStatement(DEL_STATEMENT);
                        PreparedStatement is = conn.prepareStatement(INS_STATEMENT);
                ) {
                        ds.execute();

                        for(int i = 0; i <= BATCH_SIZE; i++){
                                is.clearParameters();
                                is.setInt(1, i);
                                is.setString(2, "BOOYA!." + i);
                                is.addBatch();
                        }
                        int[] r = is.executeBatch();
                        conn.commit();
                }
                catch (SQLException ex) {
                        throw new RuntimeException("", ex);
                }
                                                   

And indeed, all the 1k rows are sent with initial SQL statement.(Due to the packet size limits batch could be split into multiple packets introducing additional network roundtrips. 1k rows here were actually sent in two packets/roundtrips.)

    localhost.45610 > localhost.ncube-lm: Flags [P.], cksum 0x1e29 (incorrect -> 0x2ccc), seq 5010:13202, ack 5774, win 2478, options [nop,nop,TS val 1049319975 ecr 1049319910], length 8192
E. 4.|@.@.8E.........*.. ...q+3...      ..).....
>.Z'>.Y... ........^...).......................................INSERT INTO T VALUES(:1 , :2 )..............................i.......,.......i......BOOYA!.0.....BOOYA!.1.....BOOYA!.2.....BOOYA!.3.....BOOYA!.4.....BOOYA!.5.....BOOYA!.6.....BOOYA!.7...        .BOOYA!.8...
.BOOYA!.9....   BOOYA!.10....   BOOYA!.11....   BOOYA!.12....   BOOYA!.13....   BOOYA!.14....   BOOYA!.15....   BOOYA!.16....   BOOYA!.17....   BOOYA!.18....
   BOOYA!.19....   BOOYA!.20....   BOOYA!.21....   BOOYA!.22....   BOOYA!.23....   BOOYA!.24....   BOOYA!.25....   BOOYA!.26....   BOOYA!.27....   BOOYA!.28....   BOOYA!.29....   BOOYA!.30...    BOOYA!.31...!   BOOYA!.32..."   BOOYA!.33...#   BOOYA!.34...$   BOOYA!.35...%   BOOYA!.36...&   BOOYA!.37...'   BOOYA!.38...(   BOOYA!.39...)   BOOYA!.40...*   BOOYA!.41...+   BOOYA!.42...,   BOOYA!.43...-   BOOYA!.44....   BOOYA!.45.../   BOOYA!.46...0   BOOYA!.47...1   BOOYA!.48...2   BOOYA!.49...3   BOOYA!.50...4   BOOYA!.51...5   BOOYA!.52...6   BOOYA!.53...7   BOOYA!.54...8   BOOYA!.55...9   BOOYA!.56...:   BOOYA!.57...;   BOOYA!.58...<   BOOYA!.59...=   BOOYA!.60...>   BOOYA!.61...?   BOOYA!.62...@   BOOYA!.63...A   BOOYA!.64...B   BOOYA!.65...C   BOOYA!.66...D   BOOYA!.67...E   BOOYA!.68...F   BOOYA!.69...G   BOOYA!.70...H   BOOYA!.71...I   BOOYA!.72...J   BOOYA!.73...K   BOOYA!.74...L   BOOYA!.75...M   BOOYA!.76...N   BOOYA!.77...O   BOOYA!.78...P   BOOYA!.79...Q   BOOYA!.80...R   BOOYA!.81...S   BOOYA!.82...T   BOOYA!.83...U   BOOYA!.84...V   BOOYA!.85...W   BOOYA!.86...X   BOOYA!.87...Y
   BOOYA!.88...Z   BOOYA!.89...[   BOOYA!.90...\   BOOYA!.91...]   BOOYA!.92...^   BOOYA!.93..._   BOOYA!.94...`   BOOYA!.95...a   BOOYA!.96...b   BOOYA!.97...c   BOOYA!.98...d   BOOYA!.99....
<and so on...>

In most of the real world situations transactions can contain multiple DML statements executed in specific order. As of Oracle JDBC 19.3 or newer, PreparedStatement and CallableStatement do not support more than one SQL statements in the batch. Attempt to add another will throw java.sql.SQLFeatureNotSupportedException: Unsupported feature: addBatch exception.

Fortunately there’s a way to combine multiple DML statements into a single one. Consider this:

        static final String DEL_STATEMENT = "BEGIN DELETE FROM TX; DELETE FROM  T; END;";
        static final String INS_STATEMENT = "BEGIN INSERT INTO T VALUES(?, ?); INSERT INTO TX VALUES(?, ?, ?); INSERT INTO TX VALUES(?, ?, ?); END;";
        static final int BATCH_SIZE = 10000;
        @Test
        public void compoundStatement() {
                try (
                        Connection conn = dataSource.getConnection();
                        CallableStatement csDelete = conn.prepareCall(DEL_STATEMENT);
                        CallableStatement csInsert = conn.prepareCall(INS_STATEMENT);
                ) {

                        csDelete.execute();

                        csInsert.setInt(1, i);
                        csInsert.setString(2, "BOOYA!." + i);
                        csInsert.setInt(3, 2*i);
                        csInsert.setInt(4, i);
                        csInsert.setString(5, "FUBAR." + i);
                        csInsert.setInt(6, 2*i + 1);
                        csInsert.setInt(7, i);
                        csInsert.setString(8, "FUBAZ." + i);

                        csInsert.execute();
                        conn.commit();
                }
                catch (SQLException ex) {
                        throw new RuntimeException("", ex);
                }

        }

Magic incantation in INS_STATEMENT is actually anonymous PL/SQL block:

BEGIN
    INSERT INTO t VALUES (?,?);
    INSERT INTO tx VALUES (?,?,?);
    INSERT INTO tx VALUES (?,?,?);
END;

As expected, statement and data is sent to the database in single network packet

14:20:57.714505 IP (tos 0x0, ttl 64, id 47883, offset 0, flags [DF], proto TCP (6), length 401)
    localhost.46060 > localhost.ncube-lm: Flags [P.], cksum 0xff85 (incorrect -> 0x0031), seq 5024:5373, ack 5789, win 2478, options [nop,nop,TS val 105125344
3 ecr 1051253279], length 349
E.....@.@..Y................J]G...      ........
>...>......].......^...)...v...............................BEGIN INSERT INTO T VALUES(:1 , :2 ); INSERT INTO TX VALUES(:3 , :4 , :5 ); INSERT INTO TX VALUES(:
6 , :7 , :8 ); END;.............................i.......>........i..............i..............i.......>........i..............i..............i.......>.......
.i......BOOYA!.0.....FUBAR.0......FUBAZ.0

Batching is good, and SmartDB is cool, what’s about combining those two things? Can we send anonymous PL/SQL block and batch of data in one network packet? Let’s try.

        static final String DEL_STATEMENT = "BEGIN DELETE FROM TX; DELETE FROM  T; END;";
        static final String INS_STATEMENT = "BEGIN INSERT INTO T VALUES(?, ?); INSERT INTO TX VALUES(?, ?, ?); INSERT INTO TX VALUES(?, ?, ?); END;";       
        @Test
        public void batchedCompoundStatement() {
                try (
                        Connection conn = dataSource.getConnection();
                        CallableStatement csDelete = conn.prepareCall(DEL_STATEMENT);
                        CallableStatement csInsert = conn.prepareCall(INS_STATEMENT);
                ) {
                        csDelete.execute();
                        for(int i = 0; i <= BATCH_SIZE; i++){
                                csInsert.clearParameters();
                                csInsert.setInt(1, i);
                                csInsert.setString(2, "BOOYA!." + i);
                                csInsert.setInt(3, 2*i);
                                csInsert.setInt(4, i);
                                csInsert.setString(5, "FUBAR." + i);
                                csInsert.setInt(6, 2*i + 1);
                                csInsert.setInt(7, i);
                                csInsert.setString(8, "FUBAZ." + i);
                                csInsert.addBatch();
                        }
                        int[] r = csInsert.executeBatch();
                        conn.commit();
                }

Unfortunately, computer says no. Rows are sent to the database slow-by-slow:

14:20:57.714505 IP (tos 0x0, ttl 64, id 47883, offset 0, flags [DF], proto TCP (6), length 401)
    localhost.46060 > localhost.ncube-lm: Flags [P.], cksum 0xff85 (incorrect -> 0x0031), seq 5024:5373, ack 5789, win 2478, options [nop,nop,TS val 105125344
3 ecr 1051253279], length 349
E.....@.@..Y................J]G...      ........
>...>......].......^...)...v...............................BEGIN INSERT INTO T VALUES(:1 , :2 ); INSERT INTO TX VALUES(:3 , :4 , :5 ); INSERT INTO TX VALUES(:
6 , :7 , :8 ); END;.............................i.......>........i..............i..............i.......>........i..............i..............i.......>.......
.i......BOOYA!.0.....FUBAR.0......FUBAZ.0
14:20:57.716987 IP (tos 0x0, ttl 64, id 59405, offset 0, flags [DF], proto TCP (6), length 135)
    localhost.ncube-lm > localhost.46060: Flags [P.], cksum 0xfe7b (incorrect -> 0xeeba), seq 5789:5872, ack 5373, win 2497, options [nop,nop,TS val 105125344
5 ecr 1051253443], length 83
E.....@.@.Ta............J]G....d..      ..{.....
>...>......S................        .............................../.......................
14:20:57.717010 IP (tos 0x0, ttl 64, id 47884, offset 0, flags [DF], proto TCP (6), length 52)
    localhost.46060 > localhost.ncube-lm: Flags [.], cksum 0xfe28 (incorrect -> 0x747e), seq 5373, ack 5872, win 2478, options [nop,nop,TS val 1051253445 ecr
1051253445], length 0
E..4..@.@..................dJ]HO..      ..(.....
>...>...
14:20:57.717640 IP (tos 0x0, ttl 64, id 47885, offset 0, flags [DF], proto TCP (6), length 159)
    localhost.46060 > localhost.ncube-lm: Flags [P.], cksum 0xfe93 (incorrect -> 0x5471), seq 5373:5480, ack 5872, win 2478, options [nop,nop,TS val 1051253446 ecr 1051253445], length 107
E.....@.@..I...............dJ]HO..      ........
>...>......k.......^    .. .......................................................BOOYA!.1.......FUBAR.1.......FUBAZ.1
14:20:57.719178 IP (tos 0x0, ttl 64, id 59406, offset 0, flags [DF], proto TCP (6), length 135)
    localhost.ncube-lm > localhost.46060: Flags [P.], cksum 0xfe7b (incorrect -> 0xebf7), seq 5872:5955, ack 5480, win 2497, options [nop,nop,TS val 1051253447 ecr 1051253446], length 83
E.....@.@.T`............J]HO......      ..{.....
>...>......S................        ......................      ......../............   ..........
14:20:57.719560 IP (tos 0x0, ttl 64, id 47886, offset 0, flags [DF], proto TCP (6), length 159)
    localhost.46060 > localhost.ncube-lm: Flags [P.], cksum 0xfe93 (incorrect -> 0x4dab), seq 5480:5587, ack 5955, win 2478, options [nop,nop,TS val 1051253447 ecr 1051253447], length 107
E.....@.@..H................J]H...      ........
>...>......k.......^
.. .......................................................BOOYA!.2.......FUBAR.2.......FUBAZ.2

It does not matter if we use stored procedure instead of anonymous PL/SQL block, or PreparedStatement instead of CallableStatement. Behavior remains exactly the same, batching does not work as expected.

Reducing chattiness of JDBC applications: batching be good!

Mixing TIMESTAMPs with TIMESTAMPs WITH TIME ZONE considered harmful

TIMESTAMP and TIMESTAMP WITH TIME ZONE are different data types: they have different internal representations and their behavior has subtle differences. But somehow Oracle still feels like providing implicit data type conversion for these two. So let’s check how implicit data type conversion works in real life.

First some test data:

PRIITP@testpdb > CREATE TABLE t (
    id           INTEGER PRIMARY KEY,
    created      TIMESTAMP NOT NULL,
    created_tz   TIMESTAMP WITH TIME ZONE NOT NULL
);
Table created.

Elapsed: 00:00:00.03
PRIITP@testpdb > insert into t values (1, systimestamp, systimestamp);

1 row created.

Elapsed: 00:00:00.01
PRIITP@testpdb > select * from t;

        ID CREATED                                                                     CREATED_TZ
---------- --------------------------------------------------------------------------- ---------------------------------------------------------------------------
         1 21-SEP-20 11.27.35.281367 AM                                                21-SEP-20 11.27.35.281367 AM +00:00

Now some comparison between TIMESTAMP column and TIMESTAMP WITH TIME ZONE literal.

PRIITP@testpdb > select * from t where created > TIMESTAMP '2020-09-21 10:27:35 +03:00';

        ID CREATED                                                                     CREATED_TZ
---------- --------------------------------------------------------------------------- ---------------------------------------------------------------------------
         1 21-SEP-20 11.27.35.281367 AM                                                21-SEP-20 11.27.35.281367 AM +00:00

Still fabulous. But there’s a way to bring in some fun, let’s try to adjust session time zone.

PRIITP@testpdb > alter session set time_zone = '06:00';

Session altered.

Elapsed: 00:00:00.01
PRIITP@testpdb > select * from t where created > TIMESTAMP '2020-09-21 10:27:35 +03:00';

no rows selected

Elapsed: 00:00:00.00

Oops. Somehow our data disappeared. Hint why such a mysterious behavior comes from the execution plan, specifically from the ‘filter’ section

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    41 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    41 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYS_EXTRACT_UTC(INTERNAL_FUNCTION("CREATED"))>TIMESTAMP'
              2020-09-21 07:27:35.000000000')

First Oracle converts TIMESTAMP WITH TIME ZONE literal into TIMESTAMP literal in UTC. Which is all good and well since internally operations on different time zones are done in UTC only, so conversion into UTC is expected.

Much more interesting is this part: SYS_EXTRACT_UTC(INTERNAL_FUNCTION("CREATED")) . There is no way to know in which time zone those TIMESTAMPs are, so Oracle takes a leap of faith and assumes it should be in the session time zone. This is exactly how SYS_EXTRACT_UTC operates on TIMESTAMPS: since time zone information isn’t available it assumes it should be in session time zone. Unfortunately this does not explain nested INTERNAL_FUNCTION, it looks like a data type conversion, but it’s not actually needed.

While doing the data type conversion Oracle probably ends up with construct like sys_extract_utc(from_tz(created, '06:00')), and with expression like this our query, of course, does not return anything:

PRIITP@testpdb > select created, sys_extract_utc(from_tz(created, '06:00')) from t;

CREATED                                                                     SYS_EXTRACT_UTC(FROM_TZ(CREATED,'06:00'))
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
21-SEP-20 11.27.35.281367 AM                                                21-SEP-20 05.27.35.281367 AM

For me it seems to be a violation of POLA. Since lossless data type conversion isn’t always possible between TIMESTAMPs and TIMESTAMPs WITH TIME ZONE, Oracle should throw an error instead of assuming things and making queries nondetermnistic. And speaking of assumptions, TIMESTAMPs in UTC are making much more sense compared to session time zone, at least for me.

Mixing TIMESTAMPs with TIMESTAMPs WITH TIME ZONE considered harmful

Measuring latency in Python: implementation details

Getting correct time and measuring things in sub-millisecond range is hairy and complicated topic. Most of a time it boils down to the low level details in hardware, but sometimes tools you have to use add their own weird twist. Let’s consider Python 3 as as an example. According to the documentation things couldn’t be more simple. What we have is package time and five functions to return desired reading:

  • time.monotonic() — Return the value (in fractional seconds) of a monotonic clock, i.e. a clock that cannot go backwards. The clock is not affected by system clock updates. (Well, not exactly, see later.)
  • time.perf_counter() — Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration.
  • time.process_time() — Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process.
  • time.thread_time()  — Return the value (in fractional seconds) of the sum of the system and user CPU time of the current thread. 
  • time.time() — Return the time in seconds since the epoch as a floating point number. 

Now let’s check what the underlying implementation is across the platforms. First on FreeBSD 12.1:

priitp@roadkill2:~ % python3.7
Python 3.7.8 (default, Jul 19 2020, 23:46:27)
[Clang 8.0.1 (tags/RELEASE_801/final 366581)] on freebsd12
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> [print("{}: {}".format(ci, time.get_clock_info(ci))) for ci in ('monotonic', 'perf_counter', 'process_time', 'thread_time', 'time')]
monotonic: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=2.8e-07)
perf_counter: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=2.8e-07)
process_time: namespace(adjustable=False, implementation='clock_gettime(CLOCK_PROF)', monotonic=True, resolution=0.01)
thread_time: namespace(adjustable=False, implementation='clock_gettime(CLOCK_THREAD_CPUTIME_ID)', monotonic=True, resolution=1.0000000000000002e-06)
time: namespace(adjustable=True, implementation='clock_gettime(CLOCK_REALTIME)', monotonic=False, resolution=2.8e-07)

In Linux I have slightly older Python, so no thread_time here:

[pripii@priitp-roadkill ]$ python3
Python 3.6.8 (default, Apr  2 2020, 13:34:55)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> [print("{}: {}".format(ci, time.get_clock_info(ci))) for ci in ('monotonic', 'perf_counter', 'process_time', 'time')]
monotonic: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
perf_counter: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
process_time: namespace(adjustable=False, implementation='clock_gettime(CLOCK_PROCESS_CPUTIME_ID)', monotonic=True, resolution=1e-09)
time: namespace(adjustable=True, implementation='clock_gettime(CLOCK_REALTIME)', monotonic=False, resolution=1e-09)

And since code I’m involved with can theoretically run on Windows as well, let’s include it for the sake of lulz:

Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:38:48) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> [print("{}: {}".format(ci, time.get_clock_info(ci))) for ci in ('monotonic', 'perf_counter', 'process_time', 'time')]
monotonic: namespace(adjustable=False, implementation='GetTickCount64()', monotonic=True, resolution=0.015625)
perf_counter: namespace(adjustable=False, implementation='QueryPerformanceCounter()', monotonic=True, resolution=1e-07)
process_time: namespace(adjustable=False, implementation='GetProcessTimes()', monotonic=True, resolution=1e-07)
time: namespace(adjustable=True, implementation='GetSystemTimeAsFileTime()', monotonic=False, resolution=0.015625)

In both FreeBSD and Linux time.monotonic() and time.perf()_counter are using clock_gettime(CLOCK_MONOTONIC) with precision 1ns. But on Windows you can get precision of whopping 16us from time.monotonic(). So when measuring latencies in microseconds, time.perf_counter() is a way to go, with additional loss of precision of 100ns in Windows.

While CLOCK_MONOTONIC doesn’t suffer from jumps backward in time, it can be incrementally adjusted by adjtime(3) and NTP. While monotonic, consecutive calls can return same value, and it does not count time while system was suspended. Value returned by CLOCK_MONOTONIC represents seconds either from the system boot (Linux) or from epoch (FreeBSD).

CLOCK_REALTIME represents second since epoch and has all the problems a broken timer can have: it can be adjusted by privileged user, or by NTP/PTP/whatnot, it’s readings can jump backward and forward in time, and has limited accuracy. So unless you’re interested in slightly wrong wall clock time, time.time() isn’t that interesting.

Does precision matter?

Since Python isn’t exactly the fastest language humanity has invented, why such a panic about the milli- and nanoseconds? Let’s check how long calling various time.* functions will take. Following is from the Linux system with 1900MHZ CPU and kvm-clock as system clock source.

#!/usr/bin/python3

from hdrh.histogram import HdrHistogram
import time

loops = 1000

def measure(timefunc):
        h = HdrHistogram(1, 10000, 4)
        for l in range(0, loops):
                start = time.perf_counter()
                dummy = timefunc()
                h.record_value(1000000000*(time.perf_counter() - start))
        return h

if __name__ == '__main__':
        for func in (time.monotonic, time.perf_counter, time.process_time, time.time):
                print("Histogram for {}".format(func))
                hist = measure(func)
                [print("\t{}th = {}ns".format(ct, hist.get_value_at_percentile(ct))) for ct in (50, 98, 99, 99.9, 99.99)]

And the winner is…

Histogram for <built-in function monotonic>
        50th = 324ns
        98th = 356ns
        99th = 413ns
        99.9th = 5288ns
        99.99th = 7132ns
Histogram for <built-in function perf_counter>
        50th = 350ns
        98th = 389ns
        99th = 396ns
        99.9th = 1569ns
        99.99th = 14392ns
Histogram for <built-in function process_time>
        50th = 1289ns
        98th = 1599ns
        99th = 2730ns
        99.9th = 10459ns
        99.99th = 21132ns
Histogram for <built-in function time>
        50th = 315ns
        98th = 346ns
        99th = 354ns
        99.9th = 570ns
        99.99th = 1700ns

Median call times are around 300ns, except for time.process_time(), and high values are into microseconds. So for events taking hundreds of microseconds, calling time.perf_counter()/time.monotonic() is just fine.

See also

For the general background: Measuring Latency in Linux.

Measuring latency in Python: implementation details

How to add Oracle specific SQL extensions with Liquibase

Now that sqlcl has acquired support for Liquibase, it might feel unnecessary to use vanilla LB with Oracle. But still, bundling sqlcl with applications takes extra effort, and sometimes you’re suck with vanilla Liquibase on unrelated reasons. And what’s more, adding SQL extensions like global temporary tables, index oriented tables and partitioning is actually quite straightforward with LB.

Let’s start with a simple CREATE TABLE statement:

CREATE TABLE partitioning_test (
    id           INTEGER
        PRIMARY KEY USING INDEX local,
    username     VARCHAR2(128) NOT NULL,
    created      TIMESTAMP WITH TIME ZONE NOT NULL,
    created_by   VARCHAR2(32) NOT NULL,
    updated      TIMESTAMP WITH TIME ZONE,
    updated_by   VARCHAR2(32)
)
    PARTITION BY HASH ( id ) PARTITIONS 16;

To turn this statement into Liquibase changeSet I will

  • Create barebones changeSet without any funny extensions.
  • Check with updateSQL command what kind of statement Liquibase generates from that changeSet.
  • Modify the generated statement with modifySQL sub-tag to the exact liking.

Our initial changeSet looks like this

        <createTable tableName="PARTITIONING_TEST">
            <column name="ID" type="INTEGER" >
                <constraints primaryKey="true"/>
            </column>
            <column name="USERNAME" type="VARCHAR2(128)">
                <constraints nullable="false"/>
            </column>
            <column name="CREATED" type="TIMESTAMP WITH TIME ZONE">
                <constraints nullable="false"/>
            </column>
            <column name="CREATED_BY" type="VARCHAR2(32)">
                <constraints nullable="false"/>
            </column>
            <column name="UPDATED" type="TIMESTAMP WITH TIME ZONE"/>
            <column name="UPDATED_BY" type="VARCHAR2(32)"/>
        </createTable>
    </changeSet>

From which Liquibase generates following SQL statement (formatted for better readability):

CREATE TABLE priitp.partitioning_test (
    id           INTEGER NOT NULL,
    username     VARCHAR2(128) NOT NULL,
    created      TIMESTAMP WITH TIME ZONE NOT NULL,
    created_by   VARCHAR2(32) NOT NULL,
    updated      TIMESTAMP WITH TIME ZONE,
    updated_by   VARCHAR2(32),
    CONSTRAINT pk_partitioning_test PRIMARY KEY ( id )
);

Partitioning the table is really simple, just add the partitioning clause to the end of the statement:

<modifySql dbms="oracle">
                <append value=" PARTITION BY HASH (id) PARTITIONS 16 "/>
        </modifySql>

Changing the primary key clause is trickier, we have to actually modify it:

<replace replace="PRIMARY KEY (ID)" with="PRIMARY KEY (ID) USING INDEX LOCAL">

With those additions final changeset looks like this:

<changeSet author="priitp" id="part-test">
        <createTable tableName="PARTITIONING_TEST">
            <column name="ID" type="INTEGER" >
                <constraints primaryKey="true"/>
            </column>
            <column name="USERNAME" type="VARCHAR2(128)">
                <constraints nullable="false"/>
            </column>
            <column name="CREATED" type="TIMESTAMP WITH TIME ZONE">
                <constraints nullable="false"/>
            </column>
            <column name="CREATED_BY" type="VARCHAR2(32)">
                <constraints nullable="false"/>
            </column>
            <column name="UPDATED" type="TIMESTAMP WITH TIME ZONE"/>
            <column name="UPDATED_BY" type="VARCHAR2(32)"/>
        </createTable>
        <modifySql dbms="oracle">
                <append value=" PARTITION BY HASH (id) PARTITIONS 16 "/>
                <replace replace="PRIMARY KEY (ID)" with="PRIMARY KEY (ID) USING INDEX LOCAL"/>
        </modifySql>
    </changeSet>

And generated SQL statement (once again, formatted for readability) contains all the things we want:

CREATE TABLE partitioning_test (
    id           INTEGER NOT NULL,
    username     VARCHAR2(128) NOT NULL,
    created      TIMESTAMP WITH TIME ZONE NOT NULL,
    created_by   VARCHAR2(32) NOT NULL,
    updated      TIMESTAMP WITH TIME ZONE,
    updated_by   VARCHAR2(32),
    CONSTRAINT pk_partitioning_test PRIMARY KEY ( id )
        USING INDEX local
)
    PARTITION BY HASH ( id ) PARTITIONS 16;

Caveats and Features

In addition to appending and modifying the statement modifySql sub-tag supports prepending and replace using the regular expressions.

modifySql operates on all SQL statements generated by changeSet, including statements coming from sqlFile. For example, if I add createSequence tag to the changeSet, result is something funny:

CREATE SEQUENCE PART_TEST_SEQ
PARTITION BY HASH (id) PARTITIONS 16 ;

How to add Oracle specific SQL extensions with Liquibase

Fast Connection Failover in Action

Oracle Fast Application Notification makes possible for database client to be aware of what is happening in the database cluster. Main use case for FAN is Fast Connection Failover, which can help database client a great deal when it comes to acting on the problems in the database layer. Following is a short explanation how FCF works, using Oracle Universal Connection Pool as an example. Some of the discussion should be relevant to other Oracle clients as well, since FCF works in similar way whether the client is thick, thin, or ODP.NET.

Connection Pooling 101

Oracle database is optimized to work with long running sessions, creating new connection is expensive in terms of latency and resources needed. On the other hand, web service operating in request-response model can have expected response times in single milliseconds. So creating a new session for each and every request is time consuming and not especially smart.

Solution is to create a pool of database connections in the application server. Initially connection pool creates a bunch of available connections, which application code can borrow. When application code is done with the database, it returns borrowed connection to the pool, and returned connection is marked as available.

Another nice thing with the connection pool is that if properly configured, it creates a circuit breaker between the application and database. Maximum number of connections that pool can create effectively restricts the number of sessions that can be created in the database. It helps to limit the damage in case of sudden and unexpected rush hour.

Unfortunately these nice properties introduce some side effects. From high availability standpoint connection pool creates two big problems. It is impossible to tell without testing if connection to the database has died, and in case of networking troubles testing those connections can take ages. Second thing, there is no way to load balance the connections, connection pool does not know which nodes in the cluster are overloaded and which nodes could provide better response time for the application.

UCP: handling the outages

When UCP receives message from the ONS, it inspects the connections available in the connection pool and takes action if needed. Whole process is rather simple:

  1. UCP receives FCF message from ONS.
  2. UCP inspects connections in the pool and if connection does not have service in the cluster available, UCP kills the physical connection.

Main difference between planned and unplanned outages is how borrowed connections are handled. In case of unplanned downtime UCP kills all the affected connections, including connections borrowed by the application code. If application tries to use the reaped connection, it receives the javax.util.SQLException.

In case of planned downtime UCP acts only on available connections. Borrowed connections will stay intact, these are closed when application returns them to the pool. To avoid logon storms draining period can be set with oracle.ucp.PlannedDrainingPeriod​ parameter.

UCP: load balancing and connection affinity

With FAN, load in the database cluster becomes observable for the client. This, in turn, enables connection time load balancing on the client side. Work can be distributed either by throughput or response time. Load balancing messages are configured on the server side, on the service involved (and whole thing uses RAC Load Balancer Advisory in the background). If UCP receives relevant messages, load balancing is done automatically.

As mentioned, load balancing happens when the new connection is created. Which means applications with steady number of connections does not benefit from load balancing capabilities. To avoid this issue one can set timeouts on the connections:

  • InactiveConnectionTimeout — how long available connection can be idle before it is removed from the connection pool.
  • MaxConnectionReuseTime — how long connection can be used before being gracefully removed from the pool.

Unfortunately there are applications which borrow connections and never return them back to the pool. For these greedy applications there are two more timeouts that operate on borrowed connections:

  • AbandonConnectionTimeout — Connection is considered abandoned when borrowed connection hasn’t done any database calls within the set timeout. When the timeout expires, tangling local transactions are either cancelled or rolled back, and connection is closed.
  • Time-to-Live — sets upper limit on how long connection can be borrowed by the application. When timeout expires borrowed connection is forcibly closed.

Setting those two timeouts can cause SQLExeptionS be thrown in the application when application code tries to use borrowed but already aborted connection. So, beware!

NB! There’s quite a lot more to optimizing UCP behavior. This was just a quick expose into parameters relevant to FCF behavior.

And where is the cool demo?

Test client we used for testing and debugging the Fast Connection Failover unfortunately contains proprietary bits. Making it shareable isn’t probably worth the pain. Another problem is, when it comes to FAN events and FCF, UCP is a black box and does not log anything, even on a DEBUG level. All you can divine of actions taken by UCP is your application behavior. In that sense testing your application under failure conditions is essential.

Maximum that is possible to coax out of UCP is a few lines if FCF is enabled

2020-05-19 13:32:09,295 DEBUG [main] o.ucp.jdbc oracle.ucp.jdbc.PoolDataSourceImpl:createPool:oracle.ucp.jdbc.PoolDataSourceImpl@7098b907:Connection pool instance is created
2020-05-19 13:32:10,514 DEBUG [main] oracle.ucp :::Heuristically determine whether to enable FAN
2020-05-19 13:32:10,515 DEBUG [main] oracle.ucp :::RAC/GDS 12.x, FAN is heuristically enabled

Why UCP talks here about RAG/GDS 12.x is another mystery, since JDBC, UCP and database cluster have all version 19.3.

Fast Connection Failover in Action