mardi 31 janvier 2017

What is the best way to access data from file in random?

I have been searching for an answer for this question, but I found none. I'm currently using fstream to open a file on the hard-disk with flags std::ios_base::in | std::ios_base::out | std::ios_base::binary for random access.

The data present in the file are records and each record is of size 16 bytes. And it took about 0.479604 seconds to write 336.357 kb of records to the file in parallel (two threads). I didn't use any specific strategy to perform the read write operation optimally. And without threads, the same operation took 0.352716 seconds (the time difference is as expected).

These are the methods I used for performing the read write operations:

void FileBascIO::createFile(const std::string& fName) {
    std::fstream fcreate(fName.c_str(), std::ios_base::out | std::ios_base::binary);
}


FileBascIO::returnTypeRead FileBascIO::readFromFile(const std::string& fName, int64_t pos,
                        FileFlagType relativeInitial, char* data, uint64_t size) {

    std::fstream fio (fName.c_str(), std::ios_base::in | std::ios_base::out | std::ios_base::binary);
    fio.seekg(pos, relativeInitial);
    return fio.read(data, size);
}


FileBascIO::returnTypeWrite FileBascIO::writeToFile(const std::string& fName, int64_t pos,
                        FileFlagType relativeInitial, char* data, uint64_t size) {

    std::fstream fio (fName.c_str(), std::ios_base::in | std::ios_base::out | std::ios_base::binary);

    if (!fio) {
        fio.close();
        fio.clear();
        createFile (fName);
        fio.open(fName.c_str(), std::ios_base::in | std::ios_base::out | std::ios_base::binary);
    }
    fio.seekp(pos, relativeInitial);
    return fio.write(data, size);
}

The above code is not efficient and I don't know any ways of making it efficient. I did a bit of searching on this and I came to understand that the entire file or a large block of the file should be pre-read and stored in the RAM, before performing the operations. This is because the physical position of the data in the hard-disk matters. And once the fields get updated, the entire file must be written back to the hard-disk in large blocks.

In the example of writing 16 bytes to the file sequentially, won't it be better to first store the data in the RAM memory first and write that to the file in large blocks? If I were to do this manually, what block size should I choose? What is the best alternate way to implement random read/write access more efficiently?

Aucun commentaire:

Enregistrer un commentaire