I implemented a reinforcement learning algorithm Actor Critic with softmax action selection. My state space is a grid of size xmax x ymax with the goal in the middle. I implemented it as a vector of elements of a struct:
struct stateAction{
double up, down, right, left, sv;
};
such that each grid point has a value for all moves (up,down,right,left) and a state value for actor critic. To access them, I use:
stateAction &Environment::access(int x, int y) {
return this->matrix.at(y * this->xmax + x);
}
A loop I used for other learning algorithms that worked the same way looks like this:
while (e.position != e.goal) {
double r = distribution(generator); //create random number to choose move
std::string move = e.softmax(0.2 , r);
int tmpx = e.position[0];
int tmpy = e.position[1];
e.performAction(move);
int newx = e.position[0];
int newy = e.position[1];
if (move == "up") {
e.access(tmpx, tmpy).up += alpha * (e.getReward(newx, newy) + gamma * e.access(newx, newy).sv - e.access(tmpx, tmpy).sv);
}
else if (move == "right") {
e.access(tmpx, tmpy).right += alpha * (e.getReward(newx, newy) + gamma * e.access(newx, newy).sv - e.access(tmpx, tmpy).sv);
}
else if (move == "down") {
e.access(tmpx, tmpy).down += alpha * (e.getReward(newx, newy) + gamma * e.access(newx, newy).sv - e.access(tmpx, tmpy).sv);
}
else if (move == "left") {
e.access(tmpx, tmpy).left += alpha * (e.getReward(newx, newy) + gamma * e.access(newx, newy).sv - e.access(tmpx, tmpy).sv);
}
e.access(tmpx, tmpy).sv += beta * (e.getReward(newx, newy) + gamma * e.access(newx, newy).sv - e.access(tmpx, tmpy).sv);
//std::cout << "( " << e.position[0] << "," << e.position[1] << " )" << std::endl;
}
This code works for exactly 8 loops (since the generated random numbers are always the same) and then crashes when reaching the middle point and trying to do the update of the values, throwing the error: Error in `./a.out': double free or corruption (out): 0x0000000000d47030 *** Aborted (core dumped) I don't know why this is not working, I can't finde any rogue pointers there. Moreover it worked for all other algorithms, the only change is actually one more value (sv) in the struct.
I let valgrind run over the code for debugging but I can't specify the problem out of it.
==10260== Invalid read of size 4
==10260== at 0x4030C0: main (in /home/alex/ClionProjects/Blatt3/a.out)
==10260== Address 0x5a1d044 is 0 bytes after a block of size 4 alloc'd
==10260== at 0x4C2B0E0: operator new(unsigned long)
(in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10260==
by 0x4081DF: __gnu_cxx::new_allocator<float>::allocate(unsigned long,void const*)
(in /home/alex/ClionProjects/Blatt3/a.out)
==10260== by 0x407886: std::_Vector_base<float,
std::allocator<float>>::_M_allocate(unsigned long) (in /home/alex/ClionProjects/Blatt3 /a.out)
==10260== by 0x4069C2: std::_Vector_base<float, std::allocator<float> >::_M_create_storage(unsigned long) (in /home/alex/ClionProjects/Blatt3/a.out)
==10260== by 0x40539C: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (in /home/alex/ClionProjects/Blatt3/a.out)
==10260== by 0x404045: std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) (in /home/alex/ClionProjects/Blatt3/a.out)
==10260== by 0x4028AB: main (in /home/alex/ClionProjects/Blatt3/a.out)
==10260==
==10260==
==10260== HEAP SUMMARY:
==10260== in use at exit: 0 bytes in 0 blocks
==10260== total heap usage: 1,518 allocs, 1,518 frees, 41,468 bytes allocated
I am thankful for any help.
Aucun commentaire:
Enregistrer un commentaire