mercredi 19 avril 2017

Accelerate python program using python-C-API and POXIS threads?

I wrote some program in python, which runs steps similar to, but more complicate than following steps:

STEP 1: Given a BATCH of lists with same length, each element of one list represents the number of states it may have, I need to DFS all the possible states(represented by 0,1,2...) of one list and get them in one list. e.g. input [[1,2,1], [2,2,2]], the output of this step should be [[0,0,0],[0,1,0],[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]

STEP 2: Calculate some value related to the output of STEP 1, and return a dict with the form: {"0,0,0": 0.1, "0,1,0":0.2, "0,0,1":0.56, "0,1,0":0.68, "0,1,1":0.3242, "1,0,0":0.8987, "1,0,1":0.214, "1,1,0":0.2, "1,1,1":0.9}

STEP 3: In this step I need to process a BATCH of lists. While processing, I need to lookup(just read operation) the dict returned by STEP 2 very frequently, and generate one tuple for each list in the batch.

I found my program really slow in STEP 1 and STEP 3. STEP 2 can only be down in python by some reason. What's more, lists in batch are independent from each other, they only share the same dict in STEP 3. So I want to use multi threading to process these lists in parallel.

Since python has GIL, threading module doesn't work. Then I tried multiprocessing, even slower(I guess it is because of context switching and data transferring). Then I used c++11 to write a .so module containing functions receiving PyObject and returning PyObject. I used POXIS threads, but it always raised SegmentFault error as long as I use more than one thread. I carefully read the document of python-C-API, and found GIL is still needed, refer here. So this doesn't help at all. Then I used Cython by declaring types of all variables by cdef, it did accelerate but not that much.

I'm losing myself in this problem. Can anyone help me? I'll be really really grateful.

Aucun commentaire:

Enregistrer un commentaire