I am trying to understand a research paper named EBIC: An evolutionary based parellel biclustering algorithm for pattern discovery by O.Patryk, S.Moshe, H.Xiuzhen, H.M Jason published in Bioinformetics journal. They have also uploaded the code under the copyrights Copyright (c) Patryk Orzechowski. Now i want to understand a piece of code so that i can use it for my algorithm. I have been in contact with one of the writers of the paper and he has been very kind in giving understanding and advices but still i believe its not a good idea to contact him and ask him to explain C++11 and CUDA.
Now i am gonna tell you a little background which i think is essential for you to know so that you can understand the problem. Imagine there is matrix of dimension rows x columns. For now lets call this matrix Bicluster. This Bicluster is trying to extract a pattern from a dataset. It does so by checking the monotonicity rules. First of all it selects a random series of columns. Then it starts adding the rows in it such that they follow a pattern specified by randomly selected series of columns. It keeps on adding the rows in it if the rows keep following the pattern and also it keeps on adding the columns in the previous series of columns too if the newly selected columns also follow the patterns by the so far selected series of columns. This means at the end we will have a matrix which is a representation of pattern recognized or extracted from a dataset.
Now i want to point out again that a row is added in the Bicluster if it follows the monotonicity rules of the underlying biculster. and this is what i want to know where is it happening in th code that is provided below. The writer of the paper also gave me the explanation which i am gonna paste here for you.
The monocity is checked on GPU-side, more specifically in kernels/evaluate_trends.cu. Two variables are used (each of size: 1 x num_rows): trendvalue - for storing the values of a given column of a bicluster, and trendcheck - for counting in each row the number of times the values of a bicluster are monotonous. You could imagine this as a sliding window consisting of 1 column throughout column set of each of the biclusters. The exact check is performed in line 54. Both fitness calculation and bicluster extraction used this kernel. Please e-mail me if you had any other question.
The reference post is here
#define _EVALUATE_TRENDS_CU_
template<typename T>
__device__ void evaluate_trends(int *bicl_indices,
int *compressed_biclusters,
int num_rows,
int num_cols,
T *data,
int *trendcheck,
float *trendvalue,
const float EPSILON,
float MISSING_VALUE,
int increasing=1) {
long long int index_x = blockIdx.x * blockDim.x + threadIdx.x; //block of bicluster
long long int index_y = blockIdx.y * blockDim.y + threadIdx.y; //block of row
trendcheck[threadIdx.x]=0;
trendvalue[threadIdx.x]=0;
if (index_x<num_rows ) {
trendcheck[threadIdx.x] = 1;
trendvalue[threadIdx.x] = data[compressed_biclusters[bicl_indices[index_y]]+num_cols*index_x];
}
__syncthreads();
if (index_x<num_rows ) {
for(int compressedId=bicl_indices[index_y]+1; compressedId<bicl_indices[index_y+1]; ++compressedId) {
int pos=compressed_biclusters[compressedId];
*line54* trendcheck[threadIdx.x] += (increasing*(data[pos+num_cols*index_x]+EPSILON-trendvalue[threadIdx.x])>= 0 && data[pos+num_cols*index_x]!=MISSING_VALUE);
trendvalue[threadIdx.x] = data[pos+num_cols*index_x];
__syncthreads();
}
}
}
#endif
Now what i want to know if i have a row vector(in our problem instance is called test sample) and a Bicluster(matrix) and i want to check whether this test sample belongs to this bicluster and i have the above code which is complicated for me, how i am gonna find out where that check is happening. Also i want to translate that logic in R language. So i dont want to write GPU code in R(just writing this in case i created any confusion). I just want to know where in the code that check is happening exactly where i can check my sample does belong to this bicluster so that i can translate it into R language
Aucun commentaire:
Enregistrer un commentaire