diff options
Diffstat (limited to 'libstdc++-v3/docs/html/21_strings/howto.html')
-rw-r--r-- | libstdc++-v3/docs/html/21_strings/howto.html | 332 |
1 files changed, 332 insertions, 0 deletions
diff --git a/libstdc++-v3/docs/html/21_strings/howto.html b/libstdc++-v3/docs/html/21_strings/howto.html new file mode 100644 index 00000000000..7318084f3ad --- /dev/null +++ b/libstdc++-v3/docs/html/21_strings/howto.html @@ -0,0 +1,332 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"> +<HTML> +<HEAD> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> + <META NAME="AUTHOR" CONTENT="pme@sources.redhat.com (Phil Edwards)"> + <META NAME="KEYWORDS" CONTENT="HOWTO, libstdc++, GCC, g++, libg++, STL"> + <META NAME="DESCRIPTION" CONTENT="HOWTO for the libstdc++ chapter 21."> + <META NAME="GENERATOR" CONTENT="vi and eight fingers"> + <TITLE>libstdc++-v3 HOWTO: Chapter 21</TITLE> +<LINK REL=StyleSheet HREF="../lib3styles.css"> +<!-- $Id: howto.html,v 1.7 2000/12/03 23:47:47 jsm28 Exp $ --> +</HEAD> +<BODY> + +<H1 CLASS="centered"><A NAME="top">Chapter 21: Strings</A></H1> + +<P>Chapter 21 deals with the C++ strings library (a welcome relief). +</P> + + +<!-- ####################################################### --> +<HR> +<H1>Contents</H1> +<UL> + <LI><A HREF="#1">MFC's CString</A> + <LI><A HREF="#2">A case-insensitive string class</A> + <LI><A HREF="#3">Breaking a C++ string into tokens</A> + <LI><A HREF="#4">Simple transformations</A> +</UL> + +<HR> + +<!-- ####################################################### --> + +<H2><A NAME="1">MFC's CString</A></H2> + <P>A common lament seen in various newsgroups deals with the Standard + string class as opposed to the Microsoft Foundation Class called + CString. Often programmers realize that a standard portable + answer is better than a proprietary nonportable one, but in porting + their application from a Win32 platform, they discover that they + are relying on special functons offered by the CString class. + </P> + <P>Things are not as bad as they seem. In + <A HREF="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html">this + message</A>, Joe Buck points out a few very important things: + <UL> + <LI>The Standard <TT>string</TT> supports all the operations + that CString does, with three exceptions. + <LI>Two of those exceptions (whitespace trimming and case + conversion) are trivial to implement. In fact, we do so + on this page. + <LI>The third is <TT>CString::Format</TT>, which allows formatting + in the style of <TT>sprintf</TT>. This deserves some mention: + </UL> + </P> + <A NAME="1.1internal"> <!-- Coming from Chapter 27 --> + <P>The old libg++ library had a function called form(), which did much + the same thing. But for a Standard solution, you should use the + stringstream classes. These are the bridge between the iostream + hierarchy and the string class, and they operate with regular + streams seamlessly because they inherit from the iostream + heirarchy. An quick example: + <PRE> + #include <iostream> + #include <string> + #include <sstream> + + string f (string& incoming) // incoming is "foo N" + { + istringstream incoming_stream(incoming); + string the_word; + int the_number; + + incoming_stream >> the_word // extract "foo" + >> the_number; // extract N + + ostringstream output_stream; + output_stream << "The word was " << the_word + << " and 3*N was " << (3*the_number); + + return output_stream.str(); + } </PRE> + </P></A> + <P>A serious problem with CString is a design bug in its memory + allocation. Specifically, quoting from that same message: + <PRE> + CString suffers from a common programming error that results in + poor performance. Consider the following code: + + CString n_copies_of (const CString& foo, unsigned n) + { + CString tmp; + for (unsigned i = 0; i < n; i++) + tmp += foo; + return tmp; + } + + This function is O(n^2), not O(n). The reason is that each += + causes a reallocation and copy of the existing string. Microsoft + applications are full of this kind of thing (quadratic performance + on tasks that can be done in linear time) -- on the other hand, + we should be thankful, as it's created such a big market for high-end + ix86 hardware. :-) + + If you replace CString with string in the above function, the + performance is O(n). + </PRE> + </P> + <P>Joe Buck also pointed out some other things to keep in mind when + comparing CString and the Standard string class: + <UL> + <LI>CString permits access to its internal representation; coders + who exploited that may have problems moving to <TT>string</TT>. + <LI>Microsoft ships the source to CString (in the files + MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation + bug and rebuild your MFC libraries. + <EM><B>Note:</B> It looks like the the CString shipped with + VC++6.0 has fixed this, although it may in fact have been one + of the VC++ SPs that did it.</EM> + <LI><TT>string</TT> operations like this have O(n) complexity + <EM>if the implementors do it correctly</EM>. The libstdc++ + implementors did it correctly. Other vendors might not. + <LI>While parts of the SGI STL are used in libstdc++-v3, their + string class is not. The SGI <TT>string</TT> is essentially + <TT>vector<char></TT> and does not do any reference + counting like libstdc++-v3's does. (It is O(n), though.) + So if you're thinking about SGI's string or rope classes, + you're now looking at four possibilities: CString, the + libstdc++ string, the SGI string, and the SGI rope, and this + is all before any allocator or traits customizations! (More + choices than you can shake a stick at -- want fries with that?) + </UL> + </P> + <P>Return <A HREF="#top">to top of page</A> or + <A HREF="../faq/index.html">to the FAQ</A>. + </P> + +<HR> +<H2><A NAME="2">A case-insensitive string class</A></H2> + <P>The well-known-and-if-it-isn't-well-known-it-ought-to-be + <A HREF="http://www.peerdirect.com/resources/">Guru of the Week</A> + discussions held on Usenet covered this topic in January of 1998. + Briefly, the challenge was, "write a 'ci_string' class which + is identical to the standard 'string' class, but is + case-insensitive in the same way as the (common but nonstandard) + C function stricmp():" + <PRE> + ci_string s( "AbCdE" ); + + // case insensitive + assert( s == "abcde" ); + assert( s == "ABCDE" ); + + // still case-preserving, of course + assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); + assert( strcmp( s.c_str(), "abcde" ) != 0 ); </PRE> + </P> + + <P>The solution is surprisingly easy. The original answer pages + on the GotW website were removed into cold storage, in + preparation for + <A HREF="http://cseng.aw.com/bookpage.taf?ISBN=0-201-61562-2">a + published book of GotW notes</A>. Before being + put on the web, of course, it was posted on Usenet, and that + posting containing the answer is <A HREF="gotw29a.txt">available + here</A>. + </P> + <P>See? Told you it was easy!</P> + <P><B>Added June 2000:</B> The May issue of <U>C++ Report</U> contains + a fascinating article by Matt Austern (yes, <EM>the</EM> Matt Austern) + on why case-insensitive comparisons are not as easy as they seem, + and why creating a class is the <EM>wrong</EM> way to go about it in + production code. (The GotW answer mentions one of the principle + difficulties; his article mentions more.) + </P> + <P>Basically, this is "easy" only if you ignore some things, + things which may be too important to your program to ignore. (I chose + to ignore them when originally writing this entry, and am surprised + that nobody ever called me on it...) The GotW question and answer + remain useful instructional tools, however. + </P> + <P><B>Added September 2000:</B> James Kanze provided a link to a + <A HREF="http://www.unicode.org/unicode/reports/tr21/">Unicode + Technical Report discussing case handling</A>, which provides some + very good information. + </P> + <P>Return <A HREF="#top">to top of page</A> or + <A HREF="../faq/index.html">to the FAQ</A>. + </P> + +<HR> +<H2><A NAME="3">Breaking a C++ string into tokens</A></H2> + <P>The Standard C (and C++) function <TT>strtok()</TT> leaves a lot to + be desired in terms of user-friendliness. It's unintuitive, it + destroys the character string on which it operates, and it requires + you to handle all the memory problems. But it does let the client + code decide what to use to break the string into pieces; it allows + you to choose the "whitespace," so to speak. + </P> + <P>A C++ implementation lets us keep the good things and fix those + annoyances. The implementation here is more intuitive (you only + call it once, not in a loop with varying argument), it does not + affect the original string at all, and all the memory allocation + is handled for you. + </P> + <P>It's called stringtok, and it's a template function. It's given + <A HREF="stringtok_h.txt">in this file</A> in a less-portable form than + it could be, to keep this example simple (for example, see the + comments on what kind of string it will accept). The author uses + a more general (but less readable) form of it for parsing command + strings and the like. If you compiled and ran this code using it: + <PRE> + std::list<string> ls; + stringtok (ls, " this \t is\t\n a test "); + for (std::list<string>const_iterator i = ls.begin(); + i != ls.end(); ++i) + { + std::cerr << ':' << (*i) << ":\n"; + }</PRE> + You would see this as output: + <PRE> + :this: + :is: + :a: + :test:</PRE> + with all the whitespace removed. The original <TT>s</TT> is still + available for use, <TT>ls</TT> will clean up after itself, and + <TT>ls.size()</TT> will return how many tokens there were. + </P> + <P>As always, there is a price paid here, in that stringtok is not + as fast as strtok. The other benefits usually outweight that, however. + <A HREF="stringtok_std_h.txt">Another version of stringtok is given + here</A>, suggested by Chris King and tweaked by Petr Prikryl, + and this one uses the + transformation functions mentioned below. If you are comfortable + with reading the new function names, this version is recommended + as an example. + </P> + <P>Return <A HREF="#top">to top of page</A> or + <A HREF="../faq/index.html">to the FAQ</A>. + </P> + +<HR> +<H2><A NAME="4">Simple transformations</A></H2> + <P>Here are Standard, simple, and portable ways to perform common + transformations on a <TT>string</TT> instance, such as "convert + to all upper case." The word transformations is especially + apt, because the standard template function + <TT>transform<></TT> is used. + </P> + <P>This code will go through some iterations (no pun). Here's the + simplistic version usually seen on Usenet: + <PRE> + #include <string> + #include <algorithm> + #include <cctype> // old <ctype.h> + + std::string s ("Some Kind Of Initial Input Goes Here"); + + // Change everything into upper case + std::transform (s.begin(), s.end(), s.begin(), toupper); + + // Change everything into lower case + std::transform (s.begin(), s.end(), s.begin(), tolower); + + // Change everything back into upper case, but store the + // result in a different string + std::string capital_s; + capital_s.reserve(s.size()); + std::transform (s.begin(), s.end(), capital_s.begin(), tolower); </PRE> + <SPAN CLASS="larger"><B>Note</B></SPAN> that these calls all involve + the global C locale through the use of the C functions + <TT>toupper/tolower</TT>. This is absolutely guaranteed to work -- + but <EM>only</EM> if the string contains <EM>only</EM> characters + from the basic source character set, and there are <EM>only</EM> + 96 of those. Which means that not even all English text can be + represented (certain British spellings, proper names, and so forth). + So, if all your input forevermore consists of only those 96 + characters (hahahahahaha), then you're done. + </P> + <P>At minimum, you can write short wrappers like + <PRE> + char toLower (char c) + { + return tolower(static_cast<unsigned char>(c)); + }</PRE> + </P> + <P>The correct method is to use a facet for a particular locale + and call its conversion functions. These are discussed more in + Chapter 22; the specific part is + <A HREF="../22_locale/howto.html#5">here</A>, which shows the + final version of this code. (Thanks to James Kanze for assistance + and suggestions on all of this.) + </P> + <P>Another common operation is trimming off excess whitespace. Much + like transformations, this task is trivial with the use of string's + <TT>find</TT> family. These examples are broken into multiple + statements for readability: + <PRE> + std::string str (" \t blah blah blah \n "); + + // trim leading whitespace + string::size_type notwhite = str.find_first_not_of(" \t\n"); + str.erase(0,notwhite); + + // trim trailing whitespace + notwhite = str.find_last_not_of(" \t\n"); + str.erase(notwhite+1); </PRE> + Obviously, the calls to <TT>find</TT> could be inserted directly + into the calls to <TT>erase</TT>, in case your compiler does not + optimize named temporaries out of existance. + </P> + <P>Return <A HREF="#top">to top of page</A> or + <A HREF="../faq/index.html">to the FAQ</A>. + </P> + + + + +<!-- ####################################################### --> + +<HR> +<P CLASS="fineprint"><EM> +Comments and suggestions are welcome, and may be sent to +<A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or +<A HREF="mailto:gdr@gcc.gnu.org">Gabriel Dos Reis</A>. +<BR> $Id: howto.html,v 1.7 2000/12/03 23:47:47 jsm28 Exp $ +</EM></P> + + +</BODY> +</HTML> |