mardi 10 mars 2015

Why extended ASCII (special) characters take 2 bytes to get stored?

ASCII ranging from 32 to 126 are printable. 127 is DEL and thereafter are considered the extended characters.


To check, how are they stored in the std::string, I wrote a test program:



int main ()
{
string s; // ASCII
s += "!"; // 33
s += "A"; // 65
s += "a"; // 97
s += "â"; // 131
s += "ä"; // 132
s += "Ã "; // 133

cout << s << endl; // Print directly
for(auto i : s) // Print after iteration
cout << i;

cout << "\ns.size() = " << s.size() << endl; // outputs 9!
}


The special characters visible in the code above actually look different and those can be seen in this online example (also visible in vi).


In the string s, first 3 normal characters acquire 1 byte each as expected. The next 3 extended characters take surprisingly 2 bytes each.


Questions:



  1. Despite being an ASCII (within range of 0 to 256), why those 3 extended characters take 2 bytes of space?

  2. When we iterate through the s using range based loop, how is it figured out that for normal characters it has to increment 1 time and for extended characters 2 times!?


[Note: This may also apply to C and other languages.]


Aucun commentaire:

Enregistrer un commentaire