Andrew Clarke at Radio Free Tooting pointed out the shortcomings of using TRANSLATE() to detect numbers.
As I said earlier, all I needed to do was detect if the characters in a string were all digits or not, and I wanted it to be very fast.
But Andrew's remarks got me thinking - could translate be used to detect more complex numbers?
Here's the short list of requirements:
* Detect integers
* Detect numbers with decimal point ( 4.6, 0.2, .7)
* Detect negative and positive ( leading + or - )
* Reject text with more than 1 '.', such as an IP address ( 127.0.0.1 )
* Reject anything with alpha text
And comma's are considered as text. 99,324.1 would be alpha.
If you need to do this on 10g, no problem, as a regular expression can handle it.
Fist create some test data:
drop table number_test;
create table number_test( alphacol varchar2(20));
insert into number_test values('.5');
insert into number_test values('1');
insert into number_test values('2');
insert into number_test values(' 3');
insert into number_test values('4 ');
insert into number_test values('3.14159');
insert into number_test values('127.0.0.1');
insert into number_test values('+34.45');
insert into number_test values('-54.43');
insert into number_test values('this is a test');
insert into number_test values('th1s is 4 t3st');
insert into number_test values('.');
commit;
Now select only columns where the value is a number:
select alphacol
from number_test
where regexp_instr(trim(alphacol),'^[-+]?[0-9]*(\.?[0-9]+)?$') > 0
order by 1
SQL> /
ALPHACOL
--------------------
3
+34.45
-54.43
1
2
3.14159
4
7 rows selected.
That seems to work.
But what if you're stuck doing this on 9i? REGEXP_INSTR is not available.
You can use the user defined function IS_NUMBER(), which works well, but is very slow if used on large amounts of data.
Might we be able to use and abuse the TRANSLATE() function to speed this up? Here's a bit of convoluted SQL that works well on the limited test data:
select alphacol, alpha2
from
(
select alphacol,
-- is there a sign +- ? - remove it
decode(substr(alpha2,1,1),
'-',substr(alpha2,2),
'+',substr(alpha2,2),
alpha2
) alpha2
from (
select
alphacol,
-- remove a single '.' if it/they exists
replace(substr(alphacol,1,instr(alphacol,'.')),'.') || substr(alphacol,instr(alphacol,'.')+1) alpha2
from (
select trim(alphacol) alphacol
from number_test
)
)
)
where substr('||||||||||||||||||||||||||||||||',1,length(alpha2)) = translate(alpha2,'0123456789','||||||||||')
/
(Sorry about formatting - I seem to lose all formatting when I paste SQL)
Output from this nasty bit of SQL is identical to that when using REGEXP_INSTR:
ALPHACOL ALPHA2
-------------------- ----------------------------------------
.5 5
1 1
2 2
3 3
4 4
3.14159 314159
+34.45 3445
-54.43 5443
8 rows selected.
Runstats results:
SQL> @th5
.047739 secs
.037447 secs
PL/SQL procedure successfully completed.
No comments:
Post a Comment