For the following problem please suggest better solution. My approach I have explained at the last.
There is a file that has records in following format:- RecordType;Symbol;price;id;parentId
Sample file looks like -
RecordType;Symbol;price;id;parentId
A;BANK_X;20;2345;0 A;BANK_Y;30;2346;0 A;BANK_Z;40;2347;0 M;BANK_X;50;2348;2345 M;BANK_Y;10;2349;2346 A;BANK_X;20;2350;0 A;BANK_E;40;2351;0 M;BANK_X;45;2352;2345 M;BANK_X;20;2353;2350
Such a file contains millions of records. Objective is to write a efficient C++ program to split file into multiple files such that each smaller file contains Y number of records, where Y is an integer number provided as input.
Points to remember: 1. Every record has unique id (i.e. second last field in the record) 2. For a symbol matching A and M records should be in the same smaller file.
For example, if example file is splited into files containing minimum 2 rows, then following records should be in one file:-
A;BANK_X;20;2345;0 M;BANK_X;50;2348;2345 M;BANK_X;45;2352;2345
My approach to solve problem:
-
Data structure used:
- Queue: This will have objects in which key will be id (those are parents) and value in the object will be a vector that will have list of children.
- Unordered_map 1: Key: id (i.e. ids whose record has value 0 in the last field), value: string (i.e. record of that id read from file)
- Unordered_map 2: Key: id (i.e. ids whose record has NON 0 value in the last field), value: string (i.e. record of that id read from file)
-
Algorithm:
- Read file line by line
- Parse last 2 fields of record
- Check if id is parent (i.e. if last field of record is 0) If YES: create object{id, vactor} put in the queue Add id and string record to unordered_map 1 If NO: Search parent id in the queue and add child id in the vector (This can be made constant time search) Add id and string record to unordered_map 2
- Perform above steps till end of file.
- Now start poping the queue and for each id (that is parent) get the record string from Unordered_map 1 write in a new file, Also for it's children (which are available in vector) get the record string from Unordered_map 2 write in the file. Here I will check for minimum rows.
- Based on the value of Y, get the record for ids (parent) and children from unsorted_map and write to new files.
If I consider sample file mentioned in the statement, after applying my algo data structures will have following values:-
Queue>: [ {2345, <2348, 2352>}, {2346, <2349>}, {2347, }, {2350, <2353>}, {2351, }]
Unordered_map 1 : [{2345, "A;BANK_X;20;2345;0"}, {2346, "A;BANK_Y;30;2346;0"}, {2347, "A;BANK_Z;40;2347;0"}, {2350, "A;BANK_X;20;2350;0"}, {2351, "A;BANK_E;40;2351;0"}]
Unordered_map 2 : [{2348, "M;BANK_X;50;2348;2345"}, {2349, "M;BANK_Y;10;2349;2346"}, {2352, "M;BANK_X;45;2352;2345"}, {2353, "M;BANK_X;20;2353;2350"}]
Aucun commentaire:
Enregistrer un commentaire