samedi 26 mars 2016

Reading a file in chunks and appending the incomplete line to the next read

I am trying to read in from the following file:

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
12345abcdefghijklmnopqrstu
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz

The code is below:

#include <iostream>
#include <fstream>
#include <sstream>
#include <thread>
#include <mutex>
#include <vector>
#include <array>
#include <algorithm>
#include <iterator>

#define CHUNK_SIZE 55

std::mutex queueDumpMutex;

void getLinesFromChunk(std::vector<char>& chunk, std::vector<std::string>& container)
{
    static std::string str;
    unsigned int i = 0;
    while(i < chunk.size())
    {   
        str.clear();
        size_t chunk_sz = chunk.size();

        while(chunk[i] != '\n' && i < chunk_sz )
        {
            str.push_back(chunk[i++]);
        }
        std::cout<<"\nStr = "<<str;

        if (i < chunk_sz)
        {
            std::lock_guard<std::mutex> lock(queueDumpMutex);
            container.push_back(str);
        }
        ++i;
    }
    chunk.clear();
    std::copy(str.begin(), str.end(), std::back_inserter(chunk));
    std::cout << "\nPrinting the chunk out ....." << std::endl;
    std::copy(chunk.begin(), chunk.end(), std::ostream_iterator<char>(std::cout, " "));
}

void ReadFileAndPopulateDump(std::ifstream& in)
{
    std::vector<char> chunk;
    chunk.reserve(CHUNK_SIZE*2);
    std::vector<std::string> queueDump; 
    in.unsetf(std::ios::skipws);
    std::cout << "Chunk capacity: " << chunk.capacity() << std::endl;

    do{
        in.read(&chunk[chunk.size()], CHUNK_SIZE);
        std::cout << "Chunk size before getLines: " << chunk.size() << std::endl;
        getLinesFromChunk(chunk, queueDump);
        std::cout << "Chunk size after getLines: " << chunk.size() << std::endl;
    }while(!in.eof());
}

int main()
{
    std::ifstream in("/home/ankit/codes/more_practice/sample.txt", std::ifstream::binary);
    ReadFileAndPopulateDump(in);
    return 0;
}

What i wish to achieve is for the container to be line complete.

By this i mean that suppose my CHUNK_SIZE reads only:

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
12

The container should look like:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz

instead of:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz12

Now i understand that chunk.reserve(CHUNK_SIZE) reserves the given memory and does not actually assign a SIZE. Because if this i am not able to read from in.read().

If i use chunk.resize(CHUNK_SIZE) and append it to the end as i want the remaining characters '12' to be appended with its complete line.

Now the issue is that the code is being repeated more than it should. According to me the conditions seem fine.

Any help will be much appreciated.

Aucun commentaire:

Enregistrer un commentaire