1 files changed, 332 insertions, 0 deletions
diff --git a/libstdc++-v3/docs/html/21_strings/howto.html b/libstdc++-v3/docs/html/21_strings/howto.html
new file mode 100644
index 00000000000..7318084f3ad
--- /dev/null
+++ b/libstdc++-v3/docs/html/21_strings/howto.html
@@ -0,0 +1,332 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
+<HTML>
+<HEAD>
+   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
+   <META NAME="AUTHOR" CONTENT="pme@sources.redhat.com (Phil Edwards)">
+   <META NAME="KEYWORDS" CONTENT="HOWTO, libstdc++, GCC, g++, libg++, STL">
+   <META NAME="DESCRIPTION" CONTENT="HOWTO for the libstdc++ chapter 21.">
+   <META NAME="GENERATOR" CONTENT="vi and eight fingers">
+   <TITLE>libstdc++-v3 HOWTO:  Chapter 21</TITLE>
+<LINK REL=StyleSheet HREF="../lib3styles.css">
+<!-- $Id: howto.html,v 1.7 2000/12/03 23:47:47 jsm28 Exp $ -->
+</HEAD>
+<BODY>
+
+<H1 CLASS="centered"><A NAME="top">Chapter 21:  Strings</A></H1>
+
+<P>Chapter 21 deals with the C++ strings library (a welcome relief).
+</P>
+
+
+<!-- ####################################################### -->
+<HR>
+<H1>Contents</H1>
+<UL>
+   <LI><A HREF="#1">MFC's CString</A>
+   <LI><A HREF="#2">A case-insensitive string class</A>
+   <LI><A HREF="#3">Breaking a C++ string into tokens</A>
+   <LI><A HREF="#4">Simple transformations</A>
+</UL>
+
+<HR>
+
+<!-- ####################################################### -->
+
+<H2><A NAME="1">MFC's CString</A></H2>
+   <P>A common lament seen in various newsgroups deals with the Standard
+      string class as opposed to the Microsoft Foundation Class called
+      CString.  Often programmers realize that a standard portable
+      answer is better than a proprietary nonportable one, but in porting
+      their application from a Win32 platform, they discover that they
+      are relying on special functons offered by the CString class.
+   </P>
+   <P>Things are not as bad as they seem.  In
+      <A HREF="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html">this
+      message</A>, Joe Buck points out a few very important things:
+      <UL>
+         <LI>The Standard <TT>string</TT> supports all the operations
+             that CString does, with three exceptions.
+         <LI>Two of those exceptions (whitespace trimming and case 
+             conversion) are trivial to implement.  In fact, we do so
+             on this page.
+         <LI>The third is <TT>CString::Format</TT>, which allows formatting
+             in the style of <TT>sprintf</TT>.  This deserves some mention:
+      </UL>
+   </P>
+   <A NAME="1.1internal"> <!-- Coming from Chapter 27 -->
+   <P>The old libg++ library had a function called form(), which did much
+      the same thing.  But for a Standard solution, you should use the
+      stringstream classes.  These are the bridge between the iostream
+      hierarchy and the string class, and they operate with regular
+      streams seamlessly because they inherit from the iostream
+      heirarchy.  An quick example:
+      <PRE>
+   #include &lt;iostream&gt;
+   #include &lt;string&gt;
+   #include &lt;sstream&gt;
+
+   string f (string&amp; incoming)     // incoming is "foo  N"
+   {
+       istringstream   incoming_stream(incoming);
+       string          the_word;
+       int             the_number;
+
+       incoming_stream &gt;&gt; the_word        // extract "foo"
+                       &gt;&gt; the_number;     // extract N
+
+       ostringstream   output_stream;
+       output_stream &lt;&lt; "The word was " &lt;&lt; the_word
+                     &lt;&lt; " and 3*N was " &lt;&lt; (3*the_number);
+
+       return output_stream.str();
+   } </PRE>
+   </P></A>
+   <P>A serious problem with CString is a design bug in its memory
+      allocation.  Specifically, quoting from that same message:
+      <PRE>
+   CString suffers from a common programming error that results in
+   poor performance.  Consider the following code:
+   
+   CString n_copies_of (const CString&amp; foo, unsigned n)
+   {
+           CString tmp;
+           for (unsigned i = 0; i &lt; n; i++)
+                   tmp += foo;
+           return tmp;
+   }
+   
+   This function is O(n^2), not O(n).  The reason is that each +=
+   causes a reallocation and copy of the existing string.  Microsoft
+   applications are full of this kind of thing (quadratic performance
+   on tasks that can be done in linear time) -- on the other hand,
+   we should be thankful, as it's created such a big market for high-end
+   ix86 hardware. :-)
+   
+   If you replace CString with string in the above function, the
+   performance is O(n).
+      </PRE>
+   </P>
+   <P>Joe Buck also pointed out some other things to keep in mind when
+      comparing CString and the Standard string class:
+      <UL>
+         <LI>CString permits access to its internal representation; coders
+             who exploited that may have problems moving to <TT>string</TT>.
+         <LI>Microsoft ships the source to CString (in the files
+             MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation
+             bug and rebuild your MFC libraries.
+             <EM><B>Note:</B>  It looks like the the CString shipped with
+             VC++6.0 has fixed this, although it may in fact have been one
+             of the VC++ SPs that did it.</EM>
+         <LI><TT>string</TT> operations like this have O(n) complexity
+             <EM>if the implementors do it correctly</EM>.  The libstdc++
+             implementors did it correctly.  Other vendors might not.
+         <LI>While parts of the SGI STL are used in libstdc++-v3, their
+             string class is not.  The SGI <TT>string</TT> is essentially
+             <TT>vector&lt;char&gt;</TT> and does not do any reference
+             counting like libstdc++-v3's does.  (It is O(n), though.)
+             So if you're thinking about SGI's string or rope classes,
+             you're now looking at four possibilities:  CString, the
+             libstdc++ string, the SGI string, and the SGI rope, and this
+             is all before any allocator or traits customizations!  (More
+             choices than you can shake a stick at -- want fries with that?)
+      </UL>
+   </P>
+   <P>Return <A HREF="#top">to top of page</A> or
+      <A HREF="../faq/index.html">to the FAQ</A>.
+   </P>
+
+<HR>
+<H2><A NAME="2">A case-insensitive string class</A></H2>
+   <P>The well-known-and-if-it-isn't-well-known-it-ought-to-be
+      <A HREF="http://www.peerdirect.com/resources/">Guru of the Week</A>
+      discussions held on Usenet covered this topic in January of 1998.
+      Briefly, the challenge was, &quot;write a 'ci_string' class which
+      is identical to the standard 'string' class, but is
+      case-insensitive in the same way as the (common but nonstandard)
+      C function stricmp():&quot;
+      <PRE>
+   ci_string s( "AbCdE" );
+
+   // case insensitive
+   assert( s == "abcde" );
+   assert( s == "ABCDE" );
+
+   // still case-preserving, of course
+   assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
+   assert( strcmp( s.c_str(), "abcde" ) != 0 ); </PRE>
+   </P>
+
+   <P>The solution is surprisingly easy.  The original answer pages
+      on the GotW website were removed into cold storage, in
+      preparation for
+      <A HREF="http://cseng.aw.com/bookpage.taf?ISBN=0-201-61562-2">a
+      published book of GotW notes</A>.  Before being
+      put on the web, of course, it was posted on Usenet, and that
+      posting containing the answer is <A HREF="gotw29a.txt">available
+      here</A>.
+   </P>
+   <P>See?  Told you it was easy!</P>
+   <P><B>Added June 2000:</B>  The May issue of <U>C++ Report</U> contains
+      a fascinating article by Matt Austern (yes, <EM>the</EM> Matt Austern)
+      on why case-insensitive comparisons are not as easy as they seem,
+      and why creating a class is the <EM>wrong</EM> way to go about it in
+      production code.  (The GotW answer mentions one of the principle
+      difficulties; his article mentions more.)
+   </P>
+   <P>Basically, this is &quot;easy&quot; only if you ignore some things,
+      things which may be too important to your program to ignore.  (I chose
+      to ignore them when originally writing this entry, and am surprised
+      that nobody ever called me on it...)  The GotW question and answer
+      remain useful instructional tools, however.
+   </P>
+   <P><B>Added September 2000:</B>  James Kanze provided a link to a
+      <A HREF="http://www.unicode.org/unicode/reports/tr21/">Unicode
+      Technical Report discussing case handling</A>, which provides some
+      very good information.
+   </P>
+   <P>Return <A HREF="#top">to top of page</A> or
+      <A HREF="../faq/index.html">to the FAQ</A>.
+   </P>
+
+<HR>
+<H2><A NAME="3">Breaking a C++ string into tokens</A></H2>
+   <P>The Standard C (and C++) function <TT>strtok()</TT> leaves a lot to
+      be desired in terms of user-friendliness.  It's unintuitive, it
+      destroys the character string on which it operates, and it requires
+      you to handle all the memory problems.  But it does let the client
+      code decide what to use to break the string into pieces; it allows
+      you to choose the &quot;whitespace,&quot; so to speak.
+   </P>
+   <P>A C++ implementation lets us keep the good things and fix those
+      annoyances.  The implementation here is more intuitive (you only
+      call it once, not in a loop with varying argument), it does not
+      affect the original string at all, and all the memory allocation
+      is handled for you.
+   </P>
+   <P>It's called stringtok, and it's a template function.  It's given
+      <A HREF="stringtok_h.txt">in this file</A> in a less-portable form than
+      it could be, to keep this example simple (for example, see the
+      comments on what kind of string it will accept).  The author uses
+      a more general (but less readable) form of it for parsing command
+      strings and the like.  If you compiled and ran this code using it:
+      <PRE>
+   std::list&lt;string&gt;  ls;
+   stringtok (ls, " this  \t is\t\n  a test  ");
+   for (std::list&lt;string&gt;const_iterator i = ls.begin();
+        i != ls.end(); ++i)
+   {
+       std::cerr &lt;&lt; ':' &lt;&lt; (*i) &lt;&lt; ":\n";
+   }</PRE>
+      You would see this as output:
+      <PRE>
+   :this:
+   :is:
+   :a:
+   :test:</PRE>
+      with all the whitespace removed.  The original <TT>s</TT> is still
+      available for use, <TT>ls</TT> will clean up after itself, and
+      <TT>ls.size()</TT> will return how many tokens there were.
+   </P>
+   <P>As always, there is a price paid here, in that stringtok is not
+      as fast as strtok.  The other benefits usually outweight that, however.
+      <A HREF="stringtok_std_h.txt">Another version of stringtok is given
+      here</A>, suggested by Chris King and tweaked by Petr Prikryl,
+      and this one uses the
+      transformation functions mentioned below.  If you are comfortable
+      with reading the new function names, this version is recommended
+      as an example.
+   </P>
+   <P>Return <A HREF="#top">to top of page</A> or
+      <A HREF="../faq/index.html">to the FAQ</A>.
+   </P>
+
+<HR>
+<H2><A NAME="4">Simple transformations</A></H2>
+   <P>Here are Standard, simple, and portable ways to perform common
+      transformations on a <TT>string</TT> instance, such as &quot;convert
+      to all upper case.&quot;  The word transformations is especially
+      apt, because the standard template function
+      <TT>transform&lt;&gt;</TT> is used.
+   </P>
+   <P>This code will go through some iterations (no pun).  Here's the
+      simplistic version usually seen on Usenet:
+      <PRE>
+   #include &lt;string&gt;
+   #include &lt;algorithm&gt;
+   #include &lt;cctype&gt;      // old &lt;ctype.h&gt;
+
+   std::string  s ("Some Kind Of Initial Input Goes Here");
+
+   // Change everything into upper case
+   std::transform (s.begin(), s.end(), s.begin(), toupper);
+
+   // Change everything into lower case
+   std::transform (s.begin(), s.end(), s.begin(), tolower);
+
+   // Change everything back into upper case, but store the
+   // result in a different string
+   std::string  capital_s;
+   capital_s.reserve(s.size());
+   std::transform (s.begin(), s.end(), capital_s.begin(), tolower); </PRE>
+      <SPAN CLASS="larger"><B>Note</B></SPAN> that these calls all involve
+      the global C locale through the use of the C functions
+      <TT>toupper/tolower</TT>.  This is absolutely guaranteed to work --
+      but <EM>only</EM> if the string contains <EM>only</EM> characters
+      from the basic source character set, and there are <EM>only</EM>
+      96 of those.  Which means that not even all English text can be
+      represented (certain British spellings, proper names, and so forth).
+      So, if all your input forevermore consists of only those 96
+      characters (hahahahahaha), then you're done.
+   </P>
+   <P>At minimum, you can write short wrappers like
+      <PRE>
+   char toLower (char c)
+   {
+      return tolower(static_cast&lt;unsigned char&gt;(c));
+   }</PRE>
+   </P>
+   <P>The correct method is to use a facet for a particular locale
+      and call its conversion functions.  These are discussed more in
+      Chapter 22; the specific part is
+      <A HREF="../22_locale/howto.html#5">here</A>, which shows the
+      final version of this code.  (Thanks to James Kanze for assistance
+      and suggestions on all of this.)
+   </P>
+   <P>Another common operation is trimming off excess whitespace.  Much
+      like transformations, this task is trivial with the use of string's
+      <TT>find</TT> family.  These examples are broken into multiple
+      statements for readability:
+      <PRE>
+   std::string  str (" \t blah blah blah    \n ");
+
+   // trim leading whitespace
+   string::size_type  notwhite = str.find_first_not_of(" \t\n");
+   str.erase(0,notwhite);
+
+   // trim trailing whitespace
+   notwhite = str.find_last_not_of(" \t\n"); 
+   str.erase(notwhite+1); </PRE>
+      Obviously, the calls to <TT>find</TT> could be inserted directly
+      into the calls to <TT>erase</TT>, in case your compiler does not
+      optimize named temporaries out of existance.
+   </P>
+   <P>Return <A HREF="#top">to top of page</A> or
+      <A HREF="../faq/index.html">to the FAQ</A>.
+   </P>
+
+
+
+
+<!-- ####################################################### -->
+
+<HR>
+<P CLASS="fineprint"><EM>
+Comments and suggestions are welcome, and may be sent to
+<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
+<A HREF="mailto:gdr@gcc.gnu.org">Gabriel Dos Reis</A>.
+<BR> $Id: howto.html,v 1.7 2000/12/03 23:47:47 jsm28 Exp $
+</EM></P>
+
+
+</BODY>
+</HTML>