Implicit Pass-through support for any SAS function in Teradata/Oracle

In SAS solutions where data is stored in a third-party database such as Oracle, Teradata, or DB2, network I/O latency becomes a bottleneck in performance. This post explains a method to replicate SAS native functions in a third-party database. The method includes writing user-defined functions (UDFs) and running those functions via SAS implicit pass-through (IP).

Introduction

When using conventional processing to access data inside a database management system (DBMS), SAS asks the SAS/ACCESS engine for all rows of the table being processed. The SAS/ACCESS engine generates an SQL SELECT * statement that is passed to the DBMS. That SELECT statement fetches all the rows in the table, and the SAS/ACCESS engine returns them to SAS. As the number of rows in the table grows larger, network latency grows because the amount of data that is fetched from the DBMS and passed on to SAS increases.

There are various conventional approaches to convert existing code to support in-DB processing:

  • SAS explicit pass-through
  • SAS implicit pass-through with PROC SQL (PROC SQL(IP))
  • Various SAS PROCS that support in-DB processing. Examples include PROC FREQ and PROC MEANS.

However, in certain cases, these conventional approaches are not feasible because of the following reasons:

  • Converting existing code to support in-DB processing limits the types of analyses that can be performed.
  • Database SQL constraints prevent you from converting the code. For example, PROC SQL does not support implicit pass-through for certain SAS functions such as intck and intnx.
  • The converted code is slower.

This post explains the alternative approach to convert existing code to support in-DB processing.

  • Creating user-defined functions and running them through SAS implicit pass-through.

 

User Defined Functions (UDF) and SAS Implicit Pass-through (IP)

If PROC SQL contains certain functions (for example, intnx and intck) that do not support implicit pass-through, then during the processing, the data in the third-party database are internally copied into the SAS environment. This data movement significantly increases the data processing time. To overcome this issue, one approach might be to write the complete SQL code in the third-party database-specific syntax and then execute the code through explicit pass-through. However, this process must be repeated for each instance of the function. Moreover, writing SQL code for explicit pass-through has an overhead of specifying connection details for each PROC SQL. The connection details might vary across databases.

This post explains how creating user-defined functions (UDFs) and running them through implicit pass-through can prevent data movement between two database management systems. The paper uses an example of SAS INTNX function and Teradata as the third-party database to explain the process.

Step 1: CREATE DATABASE-specific FUNCTIONS

This step involves creating a function in database-specific syntax that performs the same tasks that the corresponding SAS function does.

Here is the sample code for creating a UDF in Teradata that is equivalent to SAS INTNX function:

CREATE FUNCTION dabt_func_intnx_dttm (intervl VARCHAR(5), datetimecolumn timestamp, increment int, align VARCHAR(4))

RETURNS TIMESTAMP(0)

LANGUAGE SQL CONTAINS SQL DETERMINISTIC SQL SECURITY DEFINER

COLLATION INVOKER

INLINE TYPE 1

RETURN

case when TRIM(intervl) = ‘DTDAY’ then

(

case

when align = ‘END’ then

(cast((cast(datetimecolumn as date)+ increment + 1) as timestamp(0))-INTERVAL ‘1’ second)

when align = ‘SAME’ then

(datetimecolumn + CAST(increment AS interval day) )

else NULL

end

)

when TRIM(intervl) = ‘DTMONTH’ then

(

case

when align = ‘END’ then

(cast((cast(datetimecolumn as date) – (extract (day from cast(datetimecolumn as date)) – 1) + cast(increment +

1 as interval month)) as timestamp(0)) – INTERVAL ‘1’ second)

when align = ‘SAME’ then

add_months(datetimecolumn,increment)

else NULL

end

)

when TRIM(intervl) = ‘DTWEEK’ then

(

case

when align = ‘END’ then

(( cast( ( ( cast(datetimecolumn as date) – day_of_week(cast(datetimecolumn as date)) + 1)+ cast

((increment+1)*7 as interval day )) as timestamp(0) ) ) -INTERVAL ‘1’ second )

when align = ‘SAME’ then

(datetimecolumn + cast(increment*7 as interval day))

else NULL

end

)

end;

Step 2: Create corresponding SAS function using PROC FCMP

Use PROC FCMP to define the corresponding SAS function. This function will have the parameters and the return types identical to the UDF created in Step 1.

PROC FCMP OUTLIB=work.funcs.dabt_func_intnx_dttm;

FUNCTION dabt_func_intnx_dttm(intervl $,datetimecolumn,increment,align $);

RETURN(intnx(interval, datetimecolumn, increment, align)); ENDSUB;

RUN;

step 3: Map the SAS function to the Teradata UDF

The SQL Dictionary in the SAS/ACCESS engines controls which functions can be implicitly passed to the database. This dictionary maps SAS functions to the database functions. In order to implicitly call the database function, we need to map our new SAS function to it. To do that, we will need to create a table that we can append to the SAS/ACCESS SQL Dictionary. The following code creates the table and inserts a record to map our new SAS function to the Teradata UDF.

Note: Update the values of the following parameters according to your requirement.

SASFUNCNAME – Name of the SAS function.

SASFUNCNAMELEN – Length of the SAS function name.

DBMSFUNCNAME – Name of the Teradata function. It is better to write the Oracle/Teradata function as <database name>.<function name> so that your function is available to other users as well. Adjust the value of DBMSFUNCNAMELEN accordingly.

DBMSFUNCNAMELEN – Length of the Teradata function name.

PROC SQL;

CREATE TABLE work.dbfuncext

(

SASFUNCNAME CHAR(32) LABEL =’SASFUNCNAME’,

SASFUNCNAMELEN NUM LABEL =’SASFUNCNAMELEN’,

DBMSFUNCNAME CHAR(50) LABEL =’DBMSFUNCNAME’,

DBMSFUNCNAMELEN NUM LABEL =’DBMSFUNCNAMELEN’,

FUNCTION_CATEGORY CHAR(20) LABEL =’FUNCTION_CATEGORY’,

FUNC_USAGE_CONTEXT CHAR(20) LABEL =’FUNC_USAGE_CONTEXT’,

CONVERT_ARGS NUM LABEL =’CONVERT_ARGS’,

ENGINEINDEX NUM LABEL =’ENGINEINDEX’,

FUNCTION_RETURNTYP NUM(20) LABEL =’FUNCTION_RETURNTYP’,

FUNCTION_NUM_ARGS NUM LABEL =’FUNCTION_NUM_ARGS’

);

quit;

proc sql;

INSERT INTO work.dbfuncext

VALUES(“dabt_func_intnx_dttm”, 20, “dabt_func_intnx_dttm”, 20, ” “, ” “, 0, 0, . , 4);

quit;

step 4: Append the SAS function definition to the SAS/ACCESS SQL Dictionary

This can be accomplished using either of the two methods:

  • Use a LIBNAME statement to append the dabt_func_intnx_dttm definition to the SAS/ACCESS SQL Dictionary.

LIBNAME test_ip TERADATA USER=”sas_usr” PW=”sas_usr” SERVER=tdserv SQL_FUNCTIONS=”EXTERNAL_APPEND=work.dbfuncext”;

  • Use the OPTIONS statement. In this case, it is applicable to all the libraries for the given database.

OPTIONS SQL_FUNCTIONS= (TERADATA EXTERNAL_APPEND=work.dbfuncext);

How to use the UDF

Use the following code to run the UDF.

proc sql;

drop table test.table1;

create table test.table1 AS

(select id, orig_datetime1

,dabt_func_intnx_dttm(&dttm_interval.,orig_datetime1,&increment.,

&align.) as dttm_intnx,orig_date1

From test.date_dttm);

quit;

To see whether the function is run by the database, we need to see the SQL that SAS is passing to the Teradata Server. The following OPTIONS statement shows that:

OPTIONS SASTRACE=’,,,d’ SASTRACELOC=saslog NOSTSUFFIX sql_ip_trace=(ALL) DBIDIRECTEXEC;

Limitations

  • You need to write different UDFs for different databases. Hence, extra coding and maintenance is required.
  • Because UDFs are database-specific, you have to live with the database constraints.

Published by Chandan Singh

I am a Big Data Engineer working on helping Organizations develop robust analytical models. I am an expert SAS and Python programmer. I like to follow recent developments in technology and try to keep myself updated with the advancements.

Leave a comment