mardi 29 septembre 2015

How to find all occurrences of a pattern using std::regex_search?

I have the following code to parse a bunch of part numbers (basically serial numbers of components of a product) from an arbitrarily formatted file.

auto buildPartNumberRegexString( bool preFlash ) -> std::string
{
    std::ostringstream patternBuilder;

    // The original, raw literal as tested on https://regex101.com/ is:
    //
    // @@PART_NUMBER_POST_FLASH\<\s*(\S+)\s*\,\s*(\d+)\s*\>@@
    //
    // In C++, each backslash needs to be doubled. Alternatively, we could use raw string literals ( R"\w" ).

    patternBuilder << "@@PART_NUMBER_" << ( preFlash ? "PRE" : "POST" )
        << "_FLASH\\<\\s*(\\S+)\\s*\\,\\s*(\\d+)\\s*\\>@@";

    return patternBuilder.str();
}

auto parsePartNumberAddresses( const std::string& templateFileContent, bool preFlash ) -> ParamAddressContainer
{
    const std::regex regEx( buildPartNumberRegexString( preFlash ) );
    std::smatch match;

    if ( std::regex_search( templateFileContent, match, regEx ) )
    {
        assert( match.size() > 1 );
        const std::size_t capturedGroups = match.size() - 1;

        assert( capturedGroups % 2 == 0 );
        const std::size_t partNumberAddressesFound = capturedGroups / 2;

        ParamAddressContainer results;
        results.reserve( partNumberAddressesFound );

        std::cerr << "DEBUG: capturedGroups = " << capturedGroups << ", partNumberAddressesFound = " << partNumberAddressesFound
            << "\n";

        for ( std::size_t i = 0; i < partNumberAddressesFound; ++i )
        {
            const std::size_t paramIdMatchIndex = i * 2 + 1;
            const std::string paramIdString = match.str( paramIdMatchIndex );
            const std::string paramIndexString = match.str( paramIdMatchIndex + 1 );

            results.emplace_back( util::string_funcs::fromString< ParamId_t > ( paramIdString ),
                util::string_funcs::fromString< ParamIndex_t > ( paramIndexString ) );
        }

        std::cerr << "DEBUG: Going to read the following part numbers (" << ( preFlash ? "pre" : "post" ) << "-flash):\n\n";

        for ( const auto& paramAddress : results )
        {
            std::cerr << "\t" << std::hex << std::noshowbase << paramAddress.paramId << std::dec << "<" << paramAddress.paramIndex
                << ">\n";
        }

        return results;
    }

    return ParamAddressContainer();
}

I've written the "beautified" regex (i.e. without the double backslashes required to escape the actual backslashes) in the comment in the buildPartNumberRegexString function.

A sample file that I'm using this regex on might look like this:

Component alpha;@@PART_NUMBER_POST_FLASH<F12C,0>@@
Component beta;@@PART_NUMBER_POST_FLASH<F12C,1>@@

I've tested my regex, using that same sample file, on https://regex101.com/ and it works exactly as it should, matching both occurrences and extracting the desired match groups. The problem is that, when I try to do the same thing via std::regex it only finds the first match. Now on https://regex101.com/ I had to enable the g modifier (global, All matches, don't return on first match) for the regex to find all matches. I'm assuming (hoping) that a similar flag is available for std::regex_search, but the description of the available flags (http://ift.tt/1MXKGyq) doesn't seem to list any that meets my requirements. Surely there has to be a way to find more than one occurrence of a pattern, right? Does anyone have an idea?

Aucun commentaire:

Enregistrer un commentaire