lundi 2 novembre 2015

Why would a boost::asio::ip::tcp::socket::async_connect block whole thread?

Actually this will be now almost two weeks I've been figuring out this problem, and I finally at least know possible cause. What happens is, that a call to socket::async_connect blocks the thread and seemingly acts as synchronous. The call looks like bellow and happens in :

void Layer104::connect(const std::string& ip, const std::string& port)
{
  try
  {
    tcp::resolver resolver(io);
    tcp::resolver::query query(ip.c_str(), port.c_str());
    tcp::resolver::iterator it = resolver.resolve(query);  

    LOGMTRTTIDEBUG("StartProtocol()"<<endl);
    activeSocket->async_connect(*it, boost::bind(&LayerXXX::connectHandler, getPtr(), boost::asio::placeholders::error));
    LOGMTRTTIDEBUG("StartProtocol() started using activeSocket->async_connect."<<endl);
  }
  catch (exception&)
  {
    LOGMTRTTIERR("ERR_CONNECTION_FAILED"<<endl)
    handleError(ERR_CONNECTION_FAILED);
  }  
}

As you can see I have two debug messages there to see if the function returned. 90% the function returns immediatelly and I get this in the log:

  Date                     Thread name                 Thread id  Method name                    Message
D 2015-11-02 10:35:39:787 [Client 192.168.8.23:1242        #6576] MyProtocol::LayerXXX::connect StartProtocol()
D 2015-11-02 10:35:39:788 [Client 192.168.8.23:1242        #6576] MyProtocol::LayerXXX::connect StartProtocol() started using activeSocket->async_connect.

But sometimes, and this only happens when network connection is very bad, I just see this, last message from that thread at all:

  Date                     Thread name   ID     Method name                   Message
D 2015-11-02 10:35:39:787 [Client        #6576] MyProtocol::LayerXXX::connect StartProtocol()

And since then the thread is stuck here:

image description

The thread I'm talking about is using the boost::asio::io_service event loop and also the connect method is called from this event loop. This is how the thread is started:

void MyClient::threadMain() {
    LOGMTRTTIDEBUG("Starting boost::asio::io_service main loop."<<endl);
    // IP and port is added to name after connecting succesfully
    ThreadNameMap::setName( "Client" );
    io_.run();
    LOGMTRTTIDEBUG("boost::asio::io_service main OVER! Thread DEAD!"<<endl);
}

The relevant stacktrace when the thread is stuck:

ntdll.dll!NtWaitForSingleObject()  + 0xa bytes  
mswsock.dll!__GSHandlerCheck_SEH()  + 0x2c95 bytes  
mswsock.dll!__GSHandlerCheck_SEH()  + 0x5e0c bytes  
ws2_32.dll!WSAAccept()  + 0xd4 bytes    
ws2_32.dll!accept()  + 0x15 bytes   
myapp64d.exe!boost::asio::detail::socket_ops::call_accept<int>(int * __formal, unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen)  Line 96 + 0x32 bytes   C++
myapp64d.exe!boost::asio::detail::socket_ops::accept(unsigned __int64 s, sockaddr * addr, unsigned __int64 * addrlen, boost::system::error_code & ec)  Line 114 + 0x19 bytes    C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::open_descriptors()  Line 90 + 0x1c bytes   C++
myapp64d.exe!boost::asio::detail::socket_select_interrupter::socket_select_interrupter()  Line 42   C++
myapp64d.exe!boost::asio::detail::select_reactor::select_reactor(boost::asio::io_service & io_service)  Line 48 + 0x86 bytes    C++
myapp64d.exe!boost::asio::detail::service_registry::create<boost::asio::detail::select_reactor>(boost::asio::io_service & owner)  Line 81 + 0x26 bytes  C++
myapp64d.exe!boost::asio::detail::service_registry::do_use_service(const boost::asio::io_service::service::key & key, boost::asio::io_service::service * (boost::asio::io_service &)* factory)  Line 123 + 0x13 bytes   C++
myapp64d.exe!boost::asio::detail::service_registry::use_service<boost::asio::detail::select_reactor>()  Line 49 C++
myapp64d.exe!boost::asio::use_service<boost::asio::detail::select_reactor>(boost::asio::io_service & ios)  Line 34  C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::get_reactor()  Line 620 + 0xd bytes C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service_base::start_connect_op(boost::asio::detail::win_iocp_socket_service_base::base_implementation_type & impl, boost::asio::detail::reactor_op * op, const sockaddr * addr, unsigned __int64 addrlen)  Line 550 + 0xd bytes   C++
myapp64d.exe!boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 515  C++
myapp64d.exe!boost::asio::stream_socket_service<boost::asio::ip::tcp>::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(boost::asio::detail::win_iocp_socket_service<boost::asio::ip::tcp>::implementation_type & impl, const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 234  C++
myapp64d.exe!boost::asio::basic_socket<boost::asio::ip::tcp,boost::asio::stream_socket_service<boost::asio::ip::tcp> >::async_connect<boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const & __ptr64>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX * __ptr64>,boost::arg<1> > > >(const boost::asio::ip::basic_endpoint<boost::asio::ip::tcp> & peer_endpoint, const boost::_bi::bind_t<void,boost::_mfi::mf1<void,MyProtocol::LayerXXX,boost::system::error_code const &>,boost::_bi::list2<boost::_bi::value<MyProtocol::LayerXXX *>,boost::arg<1> > > & handler)  Line 779 C++
myapp64d.exe!MyProtocol::LayerXXX::connect(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & ip, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & port)  Line 225    C++     ntdll.dll!NtWaitForSingleObject()  + 0xa bytes  

What I know so far:

  1. This only happens when network is dropping packets, so it must be timing related. I simulated this using clumsy 0.2 (recommended).
  2. boost::asio::ip::tcp::socket::async_connect is called from boost::asio::io_service event loop, but it works most of the time
  3. According to bug reports we've got, this thread doesn't go un-stuck in any reasonable amount of time. So it's not like some timeout... and anyway, the connect should be asynchronous.

I really need some help already, it's been too long without any progress on this issue.

Aucun commentaire:

Enregistrer un commentaire