Creating A Python Extension

It seems there has been a fair bit of interest regarding running algos in C++ and I want to talk about how I have done this, in a tutorial style since I think its something people could learn from. Now, for some disclaimers. I am quite new to this process and nothing I post should be taken as the “best” or even “correct” way to do things, this is merely what I have found and implemented. Second, @Thorn0906 continually reminds people that Rust is a fast and memory safe option for those willing to learn it (I would add Java to this recommendation). I personally haven’t done this since I completed significant portions of my code prior to Rust and Java support and I am already familiar with C++. If you aren’t far into your algo creation it may be worth giving those a shot. Again, I have not since it would take a long time to convert my code over to those versions.

Alright, I have noticed that some other players use C++ by calling their binary executable through a subprocess. This is fine, but not how I implemented mine so I won’t be going into that. Additionally, I find there is quite a bit of support online for subprocesses, and not so much for what I’ve done. Also, there is the advantage that creating an extension is created and directly part of Python. For this example, I’ll be showing the creation for an extension I wrote for my ML algos. It is a simple matrix/vector multiplication extension, used in my algo for forward-propagation passes. It is highly unoptimized yet is up to 100x faster than a basic python example. If you want to see a (very basic) pure python implementation you can see one here, although I recommend you give it a shot yourself since it’s pretty simple. I recommend running some tests with larger matrices, it gets slow very quickly and shows why a C++ extension is useful.

Final notes, I wrote my extension in C (I didn’t realize I could do it in C++ at the time), but I will also show how to compile for C++ and it is essentially the same. Lastly, much better documentation and examples are on the Python docs and I will provide links frequently to show where you can find this information. The starting docs can be found here. This is not meant to be a perfect comprehensive tutorial, but rather a launching pad for some of the concepts.


Compiling:

Okay, enough not code stuff :). A python extension is built directly through python. This means you are compiling it using python, not gcc or g++. In order to do this, you must create a setup.py to tell python how to compile it. Here is my example one, but you may need to add arguments for libaries. You can find all the arguments you may need at the official documentation and a simple example like the one below here.

from distutils.core import setup, Extension

MOD = "example_extension"
dirs = ['../Desktop/py_src_builds']

setup(  name = MOD,
        ext_modules = [
                        Extension(
                            MOD,
                            sources=['example.c'])
                      ],
        description = "My C Extension For Matrix Operations",
        include_dirs=dirs
     )

Here MOD is the name of the extension you are creating, and dirs is a list of any directories that must be included, such as where your source files (example.c) are located. sources is a list of files to compile. The file extension tells distutils how to compile. So here you can see because it is .c, it will compile for C, if you want to compile C++ like I do for my simulator, you simple need to make it .cpp, so it would be example.cpp.

I believe it is possible to compile directly using gcc and g++, but you need to give it a bunch of libaries and linker options to get it to work with the Python.h libraries, distutils and setup handles all this complex annoying stuff for us.

At this point, we have everything we will need to compile our extension. We run (compile) simply by using the command

python3 setup.py build

Using setup.py, there are a lot of other capabilities for distributing and installing your module, but build will create a local build folder with the compiled extension.


Creating The File:

Now we can create our example.c script (in the same directory as setup.py). At the top of the file, you must include Python.h BEFORE anything else (including other includes) since it does important setup things I don’t understand :). Thus, the top of my module is:

#include <Python.h>
#include <stdio.h>
// Any other fancy shmancy imports you might want

Next are 3 components are are required for compiling your extensions:

// 1:
static PyMethodDef Methods[] = {
	{"matmult",  matmult_example, METH_VARARGS},
	{NULL, NULL}  /* Sentinel */
};

// 2:
static struct PyModuleDef example_extension_module = {
	PyModuleDef_HEAD_INIT,
	"example_extension",
	NULL,
	-1,
	Methods
};

// 3:
PyMODINIT_FUNC PyInit_example_extension(void) {
	return PyModule_Create(&example_extension_module);
}

It should be noted that strictly speaking, only #3 is necessary but #1 and #2 are used in it.

#1:
This is an array containing the functions you will be exposing to your python extension. In this example we are exposing the function (not written yet) matmult_example as matmult (how it will be called in python). The third argument, METH_VARARGS indicates there are arguments. Other possible options are METH_KEYWORDS (allows keyword arguments) and METH_NOARGS (allows no arguments). The final entry should always be a sentinel value, {NULL, NULL}.

#2:
This is a struct containing information about the module for building. As you can see the 2nd elment is the name of the extension ("example_extension") and the 5th is the array of methods defined in #1. The arguments should remain the same (you can modify if you know what they do :)).

#3:
This is the important one, it must follow the syntax shown where it is PyInit_<name of module>(void). Then you can see you return a PyModule_Create with a reference to #2.

These is the bare minimum to get your module able to compile (it won’t right now since we never created the matmult function).


Processing Arguments:

Everything in python is an object, and the C/C++ behind the scenes is no different. There are many different objects, but all inherit from PyObject. Thus, the objects we will recieve will be PyObject's and we will use the python libraries to interpret them. Following our example from before, we will set up the function matmult to accept two python objects (lists of lists, or matricies) which it will multiply using matrix multiplication and return the result in a new python object. We start with the header:

static PyObject *matmult(PyObject *self, PyObject *args) {

which is a function that returns a PyObject and accepts two as parameters. There will always be the self parameter, and args will be an object of the arguments we pass. Thus, we could pass an integer, string, list, dictionary, etc, it doesn’t matter to this since every one of those is a PyObject. Thus, error handling is extremely important to make sure you recieved the arguments you want.

In this case, we want args to be an object containing two lists of lists, or two matricies to multiply. Thus, args must have more than one object. To check for this we use PyTuple_Size to get the size of args and then check to see whether or not it contains the general format we expect:

if(!TupleSize || TupleSize != 2) {
	if(!PyErr_Occurred()) 
		PyErr_SetString(PyExc_TypeError,"You must supply two matricies.");
	return NULL;
}

Here you can also see how we can return an error, using PyErr_SetString(<type>,<string>) and returning NULL.

Now that we know we at least have two objects (we don’t know anything about them though) we can start to unpack them.

PyObject *obj1;
PyObject *obj2;

if (!PyArg_ParseTuple(args, "OO", &obj1, &obj2)) {
	PyErr_SetString(PyExc_TypeError,"Error parsing arguments");
	return NULL;
}

We create two objects to contain the arguments and get the arguments using PyArg_ParseTuple(). The "OO" shows we expect to get two objects and then we supply the two references to point to the two new objects. There are many different keys you can pass to specify parsing, and you can read about all of them on the official documentation.

Now we have obj1 and obj2 which are really matrix1 and matrix2, but they are still python objects. I created a basic C struct to represent a matrix (outside the matmult function):

struct Matrix {
	Py_ssize_t rows;
	Py_ssize_t cols;
	double** vals;
};

and then created a seperate function to convert from a PyObject to a Matrix struct.


Converting From Objects to Other Data Types:

static struct Matrix *ObjectToMatrix(PyObject *obj) {
	PyObject* seq;
	Py_ssize_t i, j, rws, cls;

	PyObject* row;
	PyObject* val;

	seq = PySequence_Fast(obj, "expected a sequence");
	rws = PySequence_Size(obj);

	double **values = PyMem_RawMalloc(rws*sizeof(double));

	if (PyList_Check(seq)) {
		for (i = 0; i < rws; i++) {
			row = PySequence_Fast(PyList_GET_ITEM(seq, i), "here in rows");

			if (PyList_Check(row)) {
				cls = PySequence_Size(row);

				values[i] = PyMem_RawMalloc(cls*sizeof(double));

				for (j = 0; j < cls; j++) {
					val = PyList_GET_ITEM(row, j);
					values[i][j] = PyFloat_AsDouble(val);
				}
			}
		}
	}

	struct Matrix *mat = PyMem_RawMalloc(sizeof(struct Matrix));

	mat->rows = rws;
	mat->cols = cls;
	mat->vals = values;

	return mat;
}

It looks like a lot is going on, but it can really be broken down into a couple of key steps.

  1. Get the object’s elements (PySequence_Fast(obj)) and size (PySequence_Size(obj))
  2. For every element get it’s elements (PySequence_Fast(PyList_GET_ITEM(seq, i), "here in rows"))
  3. Get each subelement, get it as a double (PyFloat_AsDouble(val)) and store it in an array (values)

The function then creates a matrix to return, assigns the appropriate variables, and returns the matrix.

It should be noted that I left out much of the error checking for clarity, but you should continue to check for all of these so if there is an error, it is shown in python. Otherwise, the module will fail without saying anything. This includes things like making sure the matricies are the correct dimensions for multiplication.

Another extremely important note: You may notice I am using PyMem_RawMalloc() for memory allocation in C. Memory management between Python and the heap should NEVER be mixed. It is safe to use them independent of each other, but never mix and match them. You can read about memory management here. Also, you can see I have just provided these functions for checking python lists, etc. These can all be found well documented on the Python API Reference Manual and most of the ones above can be found here for sequences. Just google their names and the documentation will come right up.


Back To matmult:

Now that we have a way to convert from the python object to a data structure we can define and use (matrix in this case), we can call the function we created and get the two matrices. Back in matmult:

struct Matrix *m1 = ObjectToMatrix(obj1);
struct Matrix *m2 = ObjectToMatrix(obj2);

Now we have two matrix structures that we can multiply together. I personally did this in a seperate function, simply doing:

struct Matrix *product = MatrixMultiply(m1, m2);

I won’t post it since it is essentially just a triple nested for loop of multiplying rows and columns from each (just look at the raw python example) and I don’t think it is necessary to post, but if people are having trouble with it I don’t mind posting it, I just figure this post is long enough as it is :slight_smile: and this function is not python specific, just regular C/C++.


Returning Data Back to PyObject

So at this point, we have the product we with to return, but in struct Matrix form. We need to convert it back to a PyObject to return it. So in a new function:

static PyObject* MatrixToObject(struct Matrix *mat) {
	PyObject* obj;
	PyObject* row;
	int i, j;

	obj = PyList_New(mat->rows);
	for (i = 0; i < mat->rows; i++) {
		row = PyList_New(mat->cols);
		for (j = 0; j < mat->cols; j++) {
			PyList_SetItem(row, j, PyFloat_FromDouble(mat->vals[i][j]));
		}
		PyList_SetItem(obj, i, row);
	}

	return obj;
}

This function is pretty simple, it:

  1. Creates a PyObject obj to return
  2. Sets obj to a new Python list object of length matrix rows
  3. Sets each element of obj to another Python list of length matrix columns
  4. Sets each element[row][column] to the value at that point
  5. Returns obj

Then, back in matmult we call:

PyObject *rtn = MatrixToObject(product);

and then:

	return rtn;
} // end of matmult

Congrats!! At his point, our module should be done. We have created a module that converts a PyObject to a data structure we could manipulate, did some processes, and then made it remake the object again to return to python, all that is left is compile (which should already be set up) and then test it out!

Again, to compile run python setup.py build. There should be a build directory, and inside that is another directory lib.<os unique identifier>-<python version> (eg lib.win-amd64-3.6). Inside there should be a .pyd (windows) or .so (Linux) file with your extension name with a bunch other other stuff in the name. Go ahead and copy that file into whereever your test python script is and you can name it what you like. So in my case, I renamed it example_extension.pyd.

Then, create a python file, which I just called test.py to test out your module. You simply call the function like you would any other python module, for example:

import example_extension as ex
print ( ex.matmult([[1,2],[2,3],[3,4]], [[1,2,3],[3,4,5]]) )

which should output:

[[7.0, 10.0, 13.0], [11.0, 16.0, 21.0], [15.0, 22.0, 29.0]]

Here is an example of what I wrote to compare the times for a default python matmult vs the compiled module:

import example_extension as ex
import time

def test_mult():
	tests = [3,10,100,200,300,400,500,600,700]

	for n in tests:
		X = [[x%50 for x in range(n)] for y in range(n)]
		Y = [[x%50 for x in range(n)] for y in range(n)]


		start_time = time.time()
		result = [[sum(a*b for a,b in zip(X_row,Y_col)) for Y_col in zip(*Y)] for X_row in X]
		default_time = time.time() - start_time

		start_time = time.time()
		c = ex.matmult(X,Y)
		elapsed_time = time.time() - start_time

		print ('{: <30}Python Time: {: <30}Compiled Time: {}'.format('{0}x{0} Matrix'.format(n), default_time, elapsed_time))

test_mult()

I hope this was interesting for some people, and that it wasn’t too long :). Please let me know if I missed something or if I have a bug somewhere (more than likely :smile:) that needs to be fixed. Also, feel free to ask questions either here or message me personally if you’d prefer.

Happy coding :P!

8 Likes

How did you go about uploading this to the servers? I’m currently in the process of trying to upload a sample module, but I’m met with “Error uploading algo” each time, with my .so file being the only thing out of ordinary. Removing it from my zip file resolves the error, which is a bit concerning.

I did rename it to my_module.so instead of the original my_module.<os unique identifier>-<python version>.so, which seems to be what you did in your tutorial.

I put my module just aside algo_strategy.py and also rename it.
Two things I can think of:

  • your module makes your algo folder breach the size limit. (50Mo if I recall)
  • your algo folder is under the size limit but still heavy, so your internet connection is not fast enough to upload it before timeout which result in an “Error uploading algo”. This happens to me quite often when I’m not at home and use the connection of my phone.

Both of these points could explain why removing your module from the folder seems to resolve the error.

1 Like

Hmmmm, interesting. I can’t think of anything right now that would cause this problem. I think @arnby’s suggestions are excellent.

I personally renamed my extension module as well, so that shouldn’t cause any issues unless maybe there’s some keyword that is blocked? But I doubt this. As long as you tried actually running your algo on linux and it worked I can’t think of anything right now. Full disclaimer though, I haven’t uploaded an algo in well over a month so maybe they changed how submissions work, but again I doubt it.

@arnby’s answer was correct. I was able to verify it was a slow internet connection. I do find it interesting how gating a slow connection can be… the algo was 1.4 mB in total. Thanks for the help!