libstdc++-v3/docs/27_io/howto.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="AUTHOR" CONTENT="pme@sources.redhat.com (Phil Edwards)">
   <META NAME="KEYWORDS" CONTENT="HOWTO, libstdc++, GCC, g++, libg++, STL">
   <META NAME="DESCRIPTION" CONTENT="HOWTO for the libstdc++ chapter 27.">
   <META NAME="GENERATOR" CONTENT="vi and eight fingers">
   <TITLE>libstdc++-v3 HOWTO:  Chapter 27</TITLE>
<LINK REL=StyleSheet HREF="../lib3styles.css">
<!-- $Id: howto.html,v 1.4 2000/11/29 20:37:02 pme Exp $ -->
</HEAD>
<BODY>

<H1 CLASS="centered"><A NAME="top">Chapter 27:  Input/Output</A></H1>

<P>Chapter 27 deals with iostreams and all their subcomponents
   and extensions.  All <EM>kinds</EM> of fun stuff.
</P>


<!-- ####################################################### -->
<HR>
<H1>Contents</H1>
<UL>
   <LI><A HREF="#1">Copying a file</A>
   <LI><A HREF="#2">The buffering is screwing up my program!</A>
   <LI><A HREF="#3">Binary I/O</A>
   <LI><A HREF="#4">Iostreams class hierarchy diagram</A>
   <LI><A HREF="#5">What is this &lt;sstream&gt;/stringstreams thing?</A>
</UL>

<HR>

<!-- ####################################################### -->

<H2><A NAME="1">Copying a file</A></H2>
   <P>So you want to copy a file quickly and easily, and most important,
      completely portably.  And since this is C++, you have an open
      ifstream (call it IN) and an open ofstream (call it OUT):
      <PRE>
   #include &lt;fstream&gt;

   std::ifstream  IN ("input_file");
   std::ofstream  OUT ("output_file"); </PRE>
   </P>
   <P>Here's the easiest way to get it completely wrong:
      <PRE>
   OUT &lt;&lt; IN;</PRE>
      For those of you who don't already know why this doesn't work
      (probably from having done it before), I invite you to quickly
      create a simple text file called &quot;input_file&quot; containing
      the sentence
      <PRE>
   The quick brown fox jumped over the lazy dog.</PRE>
      surrounded by blank lines.  Code it up and try it.  The contents
      of &quot;output_file&quot; may surprise you.
   </P>
   <P>Seriously, go do it.  Get surprised, then come back.  It's worth it.
   </P>
   <HR WIDTH="60%">
   <P>The thing to remember is that the <TT>basic_[io]stream</TT> classes
      handle formatting, nothing else.  In particular, they break up on
      whitespace.  The actual reading, writing, and storing of data is
      handled by the <TT>basic_streambuf</TT> family.  Fortunately, the
      <TT>operator&lt;&lt;</TT> is overloaded to take an ostream and
      a pointer-to-streambuf, in order to help with just this kind of
      &quot;dump the data verbatim&quot; situation.
   </P>
   <P>Why a <EM>pointer</EM> to streambuf and not just a streambuf?  Well,
      the [io]streams hold pointers (or references, depending on the
      implementation) to their buffers, not the actual
      buffers.  This allows polymorphic behavior on the part of the buffers
      as well as the streams themselves.  The pointer is easily retrieved
      using the <TT>rdbuf()</TT> member function.  Therefore, the easiest
      way to copy the file is:
      <PRE>
   OUT &lt;&lt; IN.rdbuf();</PRE>
   </P>
   <P>So what <EM>was</EM> happening with OUT&lt;&lt;IN?  Undefined
      behavior, since that particular &lt;&lt; isn't defined by the Standard.
      I have seen instances where it is implemented, but the character
      extraction process removes all the whitespace, leaving you with no
      blank lines and only &quot;Thequickbrownfox...&quot;.  With
      libraries that do not define that operator, IN (or one of IN's
      member pointers) sometimes gets converted to a void*, and the output
      file then contains a perfect text representation of a hexidecimal
      address (quite a big surprise).  Others don't compile at all.
   </P>
   <P>Also note that none of this is specific to o<B>*f*</B>streams. 
      The operators shown above are all defined in the parent 
      basic_ostream class and are therefore available with all possible
      descendents.
   </P>
   <P>Return <A HREF="#top">to top of page</A> or
      <A HREF="../faq/index.html">to the FAQ</A>.
   </P>

<HR>
<H2><A NAME="2">The buffering is screwing up my program!</A></H2>
<!--
  This is not written very well.  I need to redo this section.
-->
   <P>First, are you sure that you understand buffering?  Particularly
      the fact that C++ may not, in fact, have anything to do with it?
   </P>
   <P>The rules for buffering can be a little odd, but they aren't any
      different from those of C.  (Maybe that's why they can be a bit
      odd.)  Many people think that writing a newline to an output
      stream automatically flushes the output buffer.  This is true only
      when the output stream is, in fact, a terminal and not a file
      or some other device -- and <EM>that</EM> may not even be true
      since C++ says nothing about files nor terminals.  All of that is
      system-dependant.  (The &quot;newline-buffer-flushing only occuring
      on terminals&quot; thing is mostly true on Unix systems, though.)
   </P>
   <P>Some people also believe that sending <TT>endl</TT> down an
      output stream only writes a newline.  This is incorrect; after a
      newline is written, the buffer is also flushed.  Perhaps this
      is the effect you want when writing to a screen -- get the text
      out as soon as possible, etc -- but the buffering is largely
      wasted when doing this to a file:
      <PRE>
   output &lt;&lt; &quot;a line of text&quot; &lt;&lt; endl;
   output &lt;&lt; some_data_variable &lt;&lt; endl;
   output &lt;&lt; &quot;another line of text&quot; &lt;&lt; endl; </PRE>
      The proper thing to do in this case to just write the data out
      and let the libraries and the system worry about the buffering.
      If you need a newline, just write a newline:
      <PRE>
   output &lt;&lt; &quot;a line of text\n&quot;
          &lt;&lt; some_data_variable &lt;&lt; '\n'
          &lt;&lt; &quot;another line of text\n&quot;; </PRE>
      I have also joined the output statements into a single statement.
      You could make the code prettier by moving the single newline to
      the start of the quoted text on the thing line, for example.
   </P>
   <P>If you do need to flush the buffer above, you can send an
      <TT>endl</TT> if you also need a newline, or just flush the buffer
      yourself:
      <PRE>
   output &lt;&lt; ...... &lt;&lt; flush;    // can use std::flush manipulator
   output.flush();               // or call a member fn </PRE>
   </P>
   <P>On the other hand, there are times when writing to a file should
      be like writing to standard error; no buffering should be done 
      because the data needs to appear quickly (a prime example is a
      log file for security-related information).  The way to do this is
      just to turn off the buffering <EM>before any I/O operations at
      all</EM> have been done, i.e., as soon as possible after opening:
      <PRE>
   std::ofstream    os (&quot;/foo/bar/baz&quot;);
   std::ifstream    is (&quot;/qux/quux/quuux&quot;);
   int   i;

   os.rdbuf()-&gt;pubsetbuf(0,0);
   is.rdbuf()-&gt;pubsetbuf(0,0);
   ...
   os &lt;&lt; &quot;this data is written immediately\n&quot;;
   is &gt;&gt; i;   // and this will probably cause a disk read </PRE>
   </P>
   <P>Since all aspects of buffering are handled by a streambuf-derived
      member, it is necessary to get at that member with <TT>rdbuf()</TT>.
      Then the public version of <TT>setbuf</TT> can be called.  The 
      arguments are the same as those for the Standard C I/O Library
      function (a buffer area followed by its size).
   </P>
   <P>A great deal of this is implementation-dependant.  For example,
      <TT>streambuf</TT> does not specify any actions for its own 
      <TT>setbuf()</TT>-ish functions; the classes derived from
      <TT>streambuf</TT> each define behavior that &quot;makes 
      sense&quot; for that class:  an argument of (0,0) turns off
      buffering for <TT>filebuf</TT> but has undefined behavior for
      its sibling <TT>stringbuf</TT>, and specifying anything other
      than (0,0) has varying effects.  Other user-defined class derived
      from streambuf can do whatever they want.
   </P>
   <P>A last reminder:  there are usually more buffers involved than
      just those at the language/library level.  Kernel buffers, disk
      buffers, and the like will also have an effect.  Inspecting and
      changing those are system-dependant.
   </P>
   <P>Return <A HREF="#top">to top of page</A> or
      <A HREF="../faq/index.html">to the FAQ</A>.
   </P>

<HR>
<H2><A NAME="3">Binary I/O</A></H2>
   <P>The first and most important thing to remember about binary I/O is
      that opening a file with <TT>ios::binary</TT> is not, repeat
      <EM>not</EM>, the only thing you have to do.  It is not a silver
      bullet, and will not allow you to use the <TT>&lt;&lt;/&gt;&gt;</TT>
      operators of the normal fstreams to do binary I/O.
   </P>
   <P>Sorry.  Them's the breaks.
   </P>
   <P>This isn't going to try and be a complete tutorial on reading and
      writing binary files (because &quot;binary&quot; covers a lot of
      ground), but we will try and clear up a couple of misconceptions
      and common errors.
   </P>
   <P>First, <TT>ios::binary</TT> has exactly one defined effect, no more
      and no less.  Normal text mode has to be concerned with the newline
      characters, and the runtime system will translate between (for
      example) '\n' and the appropriate end-of-line sequence (LF on Unix,
      CRLF on DOS, CR on Macintosh, etc).  (There are other things that
      normal mode does, but that's the most obvious.)  Opening a file in
      binary mode disables this conversion, so reading a CRLF sequence
      under Windows won't accidentally get mapped to a '\n' character, etc.
      Binary mode is not supposed to suddenly give you a bitstream, and
      if it is doing so in your program then you've discovered a bug in
      your vendor's compiler (or some other part of the C++ implementation,
      possibly the runtime system).
   </P>
   <P>Second, using <TT>&lt;&lt;</TT> to write and <TT>&gt;&gt;</TT> to
      read isn't going to work with the standard file stream classes, even
      if you use <TT>skipws</TT> during reading.  Why not?  Because 
      ifstream and ofstream exist for the purpose of <EM>formatting</EM>,
      not reading and writing.  Their job is to interpret the data into
      text characters, and that's exactly what you don't want to happen
      during binary I/O.
   </P>
   <P>Third, using the <TT>get()</TT> and <TT>put()/write()</TT> member
      functions still aren't guaranteed to help you.  These are
      &quot;unformatted&quot; I/O functions, but still character-based.
      (This may or may not be what you want.)
   </P>
   <P>Notice how all the problems here are due to the inappropriate use
      of <EM>formatting</EM> functions and classes to perform something
      which <EM>requires</EM> that formatting not be done?  There are a
      seemingly infinite number of solutions, and a few are listed here:
      <UL>
        <LI>&quot;Derive your own fstream-type classes and write your own
            &lt;&lt;/&gt;&gt; operators to do binary I/O on whatever data
            types you're using.&quot;  This is a Bad Thing, because while
            the compiler would probably be just fine with it, other humans
            are going to be confused.  The overloaded bitshift operators
            have a well-defined meaning (formatting), and this breaks it.
        <LI>&quot;Build the file structure in memory, then <TT>mmap()</TT>
            the file and copy the structure.&quot;  Well, this is easy to
            make work, and easy to break, and is pretty equivalent to
            using <TT>::read()</TT> and <TT>::write()</TT> directly, and
            makes no use of the iostream library at all...
        <LI>&quot;Use streambufs, that's what they're there for.&quot;
            While not trivial for the beginner, this is the best of all
            solutions.  The streambuf/filebuf layer is the layer that is
            responsible for actual I/O.  If you want to use the C++
            library for binary I/O, this is where you start.
      </UL>
   </P>
   <P>How to go about using streambufs is a bit beyond the scope of this
      document (at least for now), but while streambufs go a long way,
      they still leave a couple of things up to you, the programmer.
      As an example, byte ordering is completely between you and the
      operating system, and you have to handle it yourself.
   </P>
   <P>Deriving a streambuf or filebuf
      class from the standard ones, one that is specific to your data
      types (or an abstraction thereof) is probably a good idea, and
      lots of examples exist in journals and on Usenet.  Using the
      standard filebufs directly (either by declaring your own or by
      using the pointer returned from an fstream's <TT>rdbuf()</TT>)
      is certainly feasible as well.
   </P>
   <P>One area that causes problems is trying to do bit-by-bit operations
      with filebufs.  C++ is no different from C in this respect:  I/O
      must be done at the byte level.  If you're trying to read or write
      a few bits at a time, you're going about it the wrong way.  You
      must read/write an integral number of bytes and then process the
      bytes.  (For example, the streambuf functions take and return
      variables of type <TT>int_type</TT>.)
   </P>
   <P>Another area of problems is opening text files in binary mode.
      Generally, binary mode is intended for binary files, and opening
      text files in binary mode means that you now have to deal with all of 
      those end-of-line and end-of-file problems that we mentioned before.
      An instructive thread from comp.lang.c++.moderated delved off into
      this topic starting more or less at
      <A HREF="http://www.deja.com/getdoc.xp?AN=436187505">this</A>
      article and continuing to the end of the thread.  (You'll have to
      sort through some flames every couple of paragraphs, but the points
      made are good ones.)
   </P>
 
<HR>
<H2><A NAME="4">Iostreams class hierarchy diagram</A></H2>
   <P>The <A HREF="iostreams_hierarchy.pdf">diagram</A> is in PDF.  Rumor
      has it that once Benjamin Kosnik has been dead for a few decades,
      this work of his will be hung next to the Mona Lisa in the
      <A HREF="http://www.louvre.fr/">Musee du Louvre</A>.
   </P>
 
<HR>
<H2><A NAME="5">What is this &lt;sstream&gt;/stringstreams thing?</A></H2>
   <P>Stringstreams (defined in the header <TT>&lt;sstream&gt;</TT>)
      are in this author's opinion one of the coolest things since
      sliced time.  An example of their use is in the Received Wisdom
      section for Chapter 21 (Strings),
      <A HREF="../21_strings/howto.html#1.1internal"> describing how to
      format strings</A>.
   </P>
   <P>The quick definition is:  they are siblings of ifstream and ofstream,
      and they do for <TT>std::string</TT> what their siblings do for
      files.  All that work you put into writing <TT>&lt;&lt;</TT> and
      <TT>&gt;&gt;</TT> functions for your classes now pays off
      <EM>again!</EM>  Need to format a string before passing the string
      to a function?  Send your stuff via <TT>&lt;&lt;</TT> to an
      ostringstream.  You've read a string as input and need to parse it?
      Initialize an istringstream with that string, and then pull pieces
      out of it with <TT>&gt;&gt;</TT>.  Have a stringstream and need to
      get a copy of the string inside?  Just call the <TT>str()</TT>
      member function.
   </P>
   <P>This only works if you've written your
      <TT>&lt;&lt;</TT>/<TT>&gt;&gt;</TT> functions correctly, though,
      and correctly means that they take istreams and ostreams as
      parameters, not i<B>f</B>streams and o<B>f</B>streams.  If they
      take the latter, then your I/O operators will work fine with
      file streams, but with nothing else -- including stringstreams.
   </P>
   <P>If you are a user of the strstream classes, you need to update
      your code.  You don't have to explicitly append <TT>ends</TT> to
      terminate the C-style character array, you don't have to mess with
      &quot;freezing&quot; functions, and you don't have to manage the
      memory yourself.  The strstreams have been officially deprecated,
      which means that 1) future revisions of the C++ Standard won't
      support them, and 2) if you use them, people will laugh at you.
   </P>


<!-- ####################################################### -->

<HR>
<P CLASS="fineprint"><EM>
Comments and suggestions are welcome, and may be sent to
<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
<A HREF="mailto:gdr@gcc.gnu.org">Gabriel Dos Reis</A>.
<BR> $Id: howto.html,v 1.4 2000/11/29 20:37:02 pme Exp $
</EM></P>


</BODY>
</HTML>