lundi 24 juillet 2017

Basic concepts with std::string references, std::regex and boost::filesystem

Below, I produce both broken code and a fixed version of the same. The problem is that I am not able to fully explain to myself why the former doesn't work but the latter does. I obviously need to review some very basic concept of the C++ language: could you provide pointers as to what I should review, and possibly also give an explanation as to why I get the results I get with the broken code.

In the '../docs/' directory referred in the code, I simply used the 'touch' command on linux to create a number of doc......html files of various length.

#include <iostream>
#include <regex>
#include <boost/filesystem.hpp>

namespace fs = boost::filesystem;

int main() {
        fs::path p("../docs/");

        for (auto& dir_it : fs::directory_iterator(p)) {
                std::regex re = std::regex("^(doc[a-z]+)\\.html$");
                std::smatch matches;
                // BROKEN HERE:
                if (std::regex_match(dir_it.path().filename().string(), matches, re)) {
                        std::cout << "\t\t" <<dir_it.path().filename().string();
                        std::cout << "\t\t\t" << matches[1] << std::endl;
                }
        }

        return 0;
}

Produces:

        documentati.html                        ati
        documentationt.html                     �}:ationt
        document.html                   document
        documenta.html                  documenta
        docume.html                     docume
        documentat.html                 documentat
        docum.html                      docum
        documentatio.html                       ��:atio
        documen.html                    documen
        docu.html                       docu
        documentation.html                      ��:ation
        documaeuaoeu.html                       ��:aoeu

Note 1: The bug above is triggered with filenames which are above a certain length. I only understand that's because the std::string object is resizing itself.

Note 2: The above code is very similar than the code used in the following question, but with boost::regex_match instead of std::regex_match: Can I use a mask to iterate files in a directory with Boost?
It used to work for me before as well, but now I use GCC 5.4 instead of GCC 4.6, std::regex instead of boost::regex, C++11, and a much newer version of boost::filesystem. Which change is relevant, that caused working code to get broken?

Fixed:

#include <iostream>
#include <regex>
#include <boost/filesystem.hpp>

namespace fs = boost::filesystem;

int main() {
        fs::path p("../docs/");

        for (auto& dir_it : fs::directory_iterator(p)) {
                std::regex re = std::regex("^(doc[a-z]+)\\.html$");
                std::smatch matches;
                std::string p = dir_it.path().filename().string();
                if (std::regex_match(p, matches, re)) {
                        std::cout << "\t\t" <<dir_it.path().filename().string();
                        std::cout << "\t\t\t" << matches[1] << std::endl;
                }
        }

        return 0;
}

produces:

        documentati.html                        documentati
        documentationt.html                     documentationt
        document.html                   document
        documenta.html                  documenta
        docume.html                     docume
        documentat.html                 documentat
        docum.html                      docum
        documentatio.html                       documentatio
        documen.html                    documen
        docu.html                       docu
        documentation.html                      documentation
        documaeuaoeu.html                       documaeuaoeu

Using boost 1.62.0-r1 and gcc (Gentoo 5.4.0-r3), the boost::filesystem documentation does not appear to provide any clear indication as to what path().filename().string() returns: http://ift.tt/2upPFE1
It appears that it depends:
Why does boost::filesystem::path::string() return by value on Windows and by reference on POSIX?

Aucun commentaire:

Enregistrer un commentaire