Writing MySQL functions in C

In the previous post Zdenek explained how to use MySQL stored functions to calculate the distance between two points on the lat/lng plane. Here I take his function as a starting point to describe how to implement user defined functions written in C, and import them into MySQL.

User defined functions (UDF) are compiled C functions that you can use as normal functions within MySQL. UDFs have a number of advantages over stored functions written using SQL, the most important being improved performance and the fact that you can use them as aggregate functions (like you can with SUM(), AVG(), etc.).

Writing UDFs is easy. So is compiling and importing UDFs into MySQL. First you need to do a little bit of setting up. This is how you do it (in Ubuntu and Red Hat).

Get the required packages

In order to compile UDFs you need the MySQL developer libraries. On Red Hat you have to install the mysql-devel package, in Ubuntu the package is (currently) called libmysqlclient16-dev.

Find your plugin directory

UDF plugins need to be placed in a specific directory for MySQL to find them. In MySQL, type,

SHOW VARIABLES LIKE 'plugin_dir';

to find out if you have a plugin directory defined. If not, /usr/lib/mysql/plugin seems like a good choice (you may need to create the directory). On our 64 bit Red Hat system the lib folder is called lib64. To set the plugin directory add the line,

plugin_dir = /usr/lib/mysql/plugin

to my.cnf (usually in /etc).

Get coding!

Let’s write our UDF. As always, the best way is to rip as much code from other people as possible. If you download the MySQL source code from http://dev.mysql.com/downloads/mysql/ you will get a useful example file called udf_example.c (in the sql directory). This file shows which header files you need to include, as well as giving some helpful example functions.

The following can be used as a general template for UDF source files:

#ifdef STANDARD
/* STANDARD is defined, don't use any mysql functions */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifdef __WIN__
typedef unsigned __int64 ulonglong;/* Microsofts 64 bit types */
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*__WIN__*/
#else
#include <my_global.h>
#include <my_sys.h>
#if defined(MYSQL_SERVER)
#include <m_string.h>/* To get strmov() */
#else
/* when compiled as standalone */
#include <string.h>
#define strmov(a,b) stpcpy(a,b)
#define bzero(a,b) memset(a,0,b)
#define memcpy_fixed(a,b,c) memcpy(a,b,c)
#endif
#endif
#include <mysql.h>
#include <ctype.h>

#ifdef HAVE_DLOPEN

#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
static pthread_mutex_t LOCK_hostname;
#endif

//                                                                        
// User #includes go here                                                 
//                                                                        

//                                                                        
// init, deinit and actual function prototypes here                       
//                                                                        

//                                                                        
// init function                                                          
//                                                                        

//                                                                        
// deinit function                                                        
//                                                                        

//                                                                        
// Actual function                                                        
//                                                                        

#endif /* HAVE_DLOPEN */

Each function has an init function, a deinit function and an “actual” function. In the init function you allocate dynamic memory, check if the given arguments provided are valid, and optionally you can apply some transformation to the arguments. In the deinit function you free dynamic memory. There’s a bit more to it, as explained in http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html, but here I try to keep things simple.

This exercise is about trying to implement Zdenek’s calc_distance function in C, so here are the functions that you need to do that:

my_bool calc_distance_udf_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
    if(!(args->arg_count == 4)) {
    strcpy(message, "Expected four arguments (lat, lon, locate_lat, locate_lon)");
    return 1;
    }

    args->arg_type[0] = REAL_RESULT;
    args->arg_type[1] = REAL_RESULT;
    args->arg_type[2] = REAL_RESULT;
    args->arg_type[3] = REAL_RESULT;

    return 0;
}

void calc_distance_udf_deinit(UDF_INIT *initid __attribute__((unused)))
{

}

double calc_distance_udf(UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)),
                    char* is_null __attribute__((unused)), char* error __attribute__((unused)))
{
    double lat = *((double *)(args->args[0]));
    double lon = *((double *)(args->args[1]));
    double locate_lat = *((double *)(args->args[2]));
    double locate_lon = *((double *)(args->args[3]));

    lat = lat * M_PI / 180;
    lon = lon * M_PI / 180;
    locate_lat = locate_lat * M_PI / 180;
    locate_lon = locate_lon * M_PI / 180;

    return 3959 * acos(cos(lat) * cos(locate_lat) * cos(locate_lon - lon) + sin(lat) * sin(locate_lat));
}

In our init function we check that we have four arguments, after which we cast all arguments to REAL. We leave the deinit function empty. The main calc_distance_udf function is more or less a copy of the original calc_distance function. In order to treat the supplied arguments as REALs, we have to cast the arguments to double, as described in http://dev.mysql.com/doc/refman/5.1/en/udf-arguments.html.

All we need now is function prototypes. If you’re not that familiar with C, function prototypes are basically the function headers copied to the beginning of the file (okay, there’s more to this as well, but for our purposes here, this is probably explanation enough). So our final calc_distance_udf.c looks like this:

#ifdef STANDARD
/* STANDARD is defined, don't use any mysql functions */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifdef __WIN__
typedef unsigned __int64 ulonglong;/* Microsofts 64 bit types */
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*__WIN__*/
#else
#include <my_global.h>
#include <my_sys.h>
#if defined(MYSQL_SERVER)
#include <m_string.h>/* To get strmov() */
#else
/* when compiled as standalone */
#include <string.h>
#define strmov(a,b) stpcpy(a,b)
#define bzero(a,b) memset(a,0,b)
#define memcpy_fixed(a,b,c) memcpy(a,b,c)
#endif
#endif
#include <mysql.h>
#include <ctype.h>

#ifdef HAVE_DLOPEN

#if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
static pthread_mutex_t LOCK_hostname;
#endif

#include <math.h>

my_bool calc_distance_udf_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
void calc_distance_udf_deinit(UDF_INIT *initid __attribute__((unused)));
double calc_distance_udf(UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)),
            char* is_null __attribute__((unused)), char* error __attribute__((unused)));

my_bool calc_distance_udf_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
    if(!(args->arg_count == 4)) {
    strcpy(message, "Expected four arguments (lat, lon, locate_lat, locate_lon");
    return 1;
    }

    args->arg_type[0] = REAL_RESULT;
    args->arg_type[1] = REAL_RESULT;
    args->arg_type[2] = REAL_RESULT;
    args->arg_type[3] = REAL_RESULT;

    return 0;
}

void calc_distance_udf_deinit(UDF_INIT *initid __attribute__((unused)))
{

}

double calc_distance_udf(UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)),
            char* is_null __attribute__((unused)), char* error __attribute__((unused)))
{
    double lat = *((double *)(args->args[0]));
    double lon = *((double *)(args->args[1]));
    double locate_lat = *((double *)(args->args[2]));
    double locate_lon = *((double *)(args->args[3]));

    lat = lat * M_PI / 180;
    lon = lon * M_PI / 180;
    locate_lat = locate_lat * M_PI / 180;
    locate_lon = locate_lon * M_PI / 180;

    return 3959 * acos(cos(lat) * cos(locate_lat) * cos(locate_lon - lon) + sin(lat) * sin(locate_lat));
}

#endif /* HAVE_DLOPEN */

Compile

The command,

mysql_config --cflags

gives you a number of system-specific compilation flags. A feasible compilation command is,

gcc $(mysql_config --cflags) -shared -fPIC -o calc_distance_udf.so calc_distance_udf.c

which will create a shared object file.

If you don’t have the mysql_config command, you can in most cases just replace it with -I/usr/include/mysql, or whichever directory contains mysql.h.

Copy the .so file to the plugin directory

This is just a matter of copying the compiled shared object file from the previous step to the directory defined in the MySQL variable plugin_dir. In this example, this would be,

cp calc_distance_udf.so /usr/lib/mysql/plugin/

Import the function into MySQL

In MySQL, execute the following statement:

CREATE FUNCTION calc_distance_udf 
  RETURNS REAL
  SONAME "calc_distance_udf.so";

For functions returning other types of data, you need to modify the statement accordingly. This is explained in http://dev.mysql.com/doc/refman/5.1/en/adding-udf.html and examples are given in udf_example.c.

Note that the name of the function in MySQL must match the name of the function in your UDF file.

Use the function

Now you can use the calc_distance_udf function just like you would use Zdenek’s original calc_distance function. Here is a contrived example:

mysql> select calc_distance_udf(1.0, 2.0, 3.0, 4.0);
  +---------------------------------------+
  | calc_distance_udf(1.0, 2.0, 3.0, 4.0) |
  +---------------------------------------+
  |                                 195.4 | 
  +---------------------------------------+
  1 row in set (0.00 sec)

Performance comparison

These are the results from running 1,000,000 iterations of Zdenek’s original calc_distance function and the UDF version I presented here:

calc_distance time taken: 160.87 seconds
calc_distance_udf time taken: 119.35 seconds

On average, the UDF version of the procedure is 25% faster. For a procedure as relatively simple as calc_distance, this is a good result. For more complex functions I would expect the difference to be even greater.

Further reading

This tutorial is not close to giving a complete picture of what you can (and cannot) do with UDFs. I’d recommend having a look at the example UDFs that come with the MySQL source distribution, especially /src/udf_example.c. You might also want to have a look at section 22.3 of the MySQL reference manual, which is dedicated to user-defined functions.

Latest News & Insights

Say connected – get Loft updates straight to your inbox.