C++ Tutorial (1)
Hello,
World!
Let’s
start with the traditional Hello,
World! program, so that we
can test our C++ compiler suite. The file tut01_01.cppcontains:
// tut01_01.cpp -- a simple C++ hello, world
program.
#include <iostream>
int
main(int argc,
char *argv[])
{
std::cout << "Hello,
World!" << std::endl;
}
|
With
GCC, we can compile and run it manually like this:
% c++ -Wall -o tut01_01 tut01_01.cpp
% ./tut01_01
Hello, World!
|
Exercise: Try
to compile and run tut01_01.cpp on your platform. Unless you’re using
a Unix-like operating system, you may have to install a C++ compiler, and
figure out how to call it. You can’t proceed with this tutorial unless you’re
able to compile C++ programs on your own.
Interactive
Hello, World!
Now,
let’s get some user input. Copy tut01_01.cpp into tut01_02.cpp and change it like this:
// tut01_02.cpp -- a simple C++ hello, world
program, with user input
#include <string>
#include <iostream>
int
main(int argc,
char *argv[])
{
std::string theName;
std::cout << "Hi!
What's your name? ";
std::cin >> theName;
std::cout <<
"Welcome, " << theName <<
"!"
<<
std::endl;
}
|
Again,
compile and run:
% c++ -Wall -o tut01_02 tut01_02.cpp
% ./tut01_02
Hi! What's your name? John
Welcome, John!
|
But
what if John entered his whole name?
% ./tut01_02
Hi! What's your name? John Doe
Welcome, John!
|
As we can see, std::cin
>> theName
truncated
the input, stopping at the first whitespace. Or, to be more precise, the input
operator, applied to a std::string reads characters up to the next
whitespace. This is
counter-intuitive and the
first gotcha. For Perl or Python programmers, reading a whole line is a lot
more natural. To do this, we’ll use thestd::getline()
function:
// tut01_03.cpp -- a simple C++ hello, world
program, with user input
#include <string>
#include <iostream>
int
main(int argc,
char *argv[])
{
std::string theName;
std::cout << "Hi!
What's your name? ";
std::getline(std::cin,
theName);
// read a whole line
std::cout << "Welcome,
" << theName << "!"
<<
std::endl;
}
|
Let’s
test it:
% c++ -Wall -o tut01_03 tut01_03.cpp
% ./tut01_03
Hi! What's your name? John Doe
Welcome, John Doe!
|
Getting
more data from the user
Now we
care how old the user is (or pretends he/she is):
// tut01_04.cpp -- a simple C++ hello, world
program, with user input
#include <string>
#include <iostream>
const unsigned
short ADULT_AGE = 18;
int
main(int argc,
char *argv[])
{
std::string theName;
unsigned short theAge;
std::cout << "Hi!
What's your name? ";
std::getline(std::cin,
theName);
// read a whole line
std::cout << "How old
are you? ";
std::cin >> theAge;
std::cout << "Hey,
" << theName << ", welcome to
";
if (theAge
< ADULT_AGE)
std::cout <<
"Disneyland!\n";
else
std::cout <<
"XXX-land!\n";
}
|
Trying
it out:
% c++ -Wall -o tut01_04 tut01_04.cpp
% ./tut01_04
Hi! What's your name? John Doe
How old are you? 31
Hey, John Doe, welcome to XXX-land!
% ./tut01_04
Hi! What's your name? Mickey TooYoung
How old are you? 3
Hey, Mickey TooYoung, welcome to Disneyland!
|
An
object-oriented version
Encapsulating theName and theAge into a single object would be neat.
The file tut01_05.cpp starts with a definition of our new
class TheUser:
// tut01_05.cpp -- a simple C++ hello, world
program, class oriented.
#include <string>
#include <iostream>
const unsigned
short ADULT_AGE = 18;
class TheUser
{
public:
std::string theName_;
unsigned short theAge_;
};
|
To use
that class, we append the following main() function to the end of the source file tut01_05.cpp:
int
main(int argc,
char *argv[])
{
TheUser theUser;
std::cout << "Hi!
What's your name? ";
std::getline(std::cin,
theUser.theName_);
std::cout << "How old
are you? ";
std::cin >> theUser.theAge_;
std::cout << "Hey,
" << theUser.theName_ << ",
welcome to ";
if (theUser.theAge_
< ADULT_AGE)
std::cout <<
"Disneyland!\n";
else
std::cout <<
"XXX-land!\n";
}
|
We
instantiate an object theUser of type TheUser, and start using its
(public) members theName_ and theAge_.
Compiling
and running tut01_05.cpp yields:
% c++ -Wall -o tut01_05 tut01_05.cpp
% ./tut01_05
Hi! What's your name? John Doe
How old are you? 32
Hey, John Doe, welcome to XXX-land!
|
Instead
of using a class, and made
the members public, we
could also have used a struct in this particular case, as in:
struct TheUser
{
std::string theName_;
unsigned short theAge_;
};
|
But,
the members shouldn’t be publicly accessible from the outside, if we want to
uphold data encapsulation principles. We can hide the members, making them private. Then we should relax
the hiding by making them indirectly accessible through accessor functions
(which could perform additional checks, but won’t in our simple example):
// tut01_06.cpp -- a simple C++ hello, world
program, class oriented
#include <string>
#include <iostream>
typedef unsigned
short age_t;
const age_t
ADULT_AGE = 18;
class TheUser
{
public:
TheUser(std::string theName =
"N/A", age_t theAge = 0) :
theName_(theName),
theAge_(theAge) { }
const std::string
getName() const { return theName_;
}
const age_t
getAge() const { return theAge_;
}
void setName(const
std::string &newName) { theName_ = newName; }
void setAge(const
age_t newAge) { theAge_ = newAge; }
private:
std::string theName_;
age_t theAge_;
};
|
Note
that we’ve introduced a custom type age_t.
We can
use this slightly modified class in the following main(). The code is similar to tut01_05.cpp, with the
exception that we now access the member functions indrectly through the
accessors getName() and getAge().
int
main(int argc,
char *argv[])
{
std::string someName;
age_t someAge;
std::cout << "Hi!
What's your name? ";
std::getline(std::cin,
someName);
std::cout << "How old
are you? ";
std::cin >> someAge;
TheUser theUser(someName,
someAge);
std::cout << "Hey,
" << theUser.getName() << ",
welcome to ";
if (theUser.getAge()
< ADULT_AGE)
std::cout <<
"Disneyland!\n";
else
std::cout <<
"XXX-land!\n";
}
|
Compiling
and interacting with tut01_06 is the same as with tut01_05.
Input
and output operators
It
would be nice if we could input and output TheUser instances as if they were POD (plain
old data) data types (like int,short etc…). That’s easy if we override the
operators like this:
// tut01_07.cpp -- a simple C++ hello, world
program, iostream operators
#include <string>
#include <iostream>
#include <istream>
#include <ostream>
#include <iomanip>
typedef unsigned
short age_t;
const age_t
ADULT_AGE = 18;
class TheUser
{
public:
TheUser(std::string theName =
"N/A", age_t theAge = 0) :
theName_(theName),
theAge_(theAge) { }
friend std::istream&
operator>> (std::istream &in, TheUser &user);
friend std::ostream&
operator<< (std::ostream &out, const TheUser
&user);
private:
std::string theName_;
age_t theAge_;
};
std::istream& operator>>
(std::istream &in, TheUser &user)
{
in >> user.theAge_
>> std::ws;
std::getline(in, user.theName_);
return in;
}
std::ostream& operator<<
(std::ostream &out, const TheUser &user)
{
return out
<< user.theAge_ << " " <<
user.theName_ << std::endl;
}
|
Operator operator>> operates on a std::istream (defined in <istream>), which it
should return after use. Operatoroperator<< operates on a std::ostream (defined in <ostream>), which it also
should return after use. The second argument to both operators is the instance
that to be input into, or output from. The best way to get used to those
operators is to remember their signature by heart.
Note
that the manipulator std::ws (defined in <iomanip>) swallows
whitespaces.
Both
operators are declare friend of TheUser,
so that they can access TheUser‘s
private (and protected) data membersTheName_ and TheAge_.
With those operators in place, we can now write an object with std::cout
<< theUser
, and read an object with std::cin
>> theUser
, as if theUser was a POD type like int or age_t:
Int
main(int argc,
char *argv[])
{
TheUser theUser;
std::cout <<
"Uninitialized: " << theUser <<
std::endl;
std::cout << "Please
enter age [whitespace(s)] name: ";
std::cin >> theUser;
std::cout << "The
user is: " << theUser;
}
|
See how
readable the code has become?
Compling
and interacting with this program:
% c++ -Wall -o tut01_07 tut01_07.cpp
% ./tut01_07
Uninitialized: 0, N/A
Please enter age [whitespace(s)] name: 32
John Doe
The user is: 32 John Doe
|
File
I/O
What’s
a program worth if it can’t persist its data somewhere. We’d like to save and
load instances of TheUser to and from files, respectively, one instance per
line. We start where we’ve left off in tut01_07.cpp.
With the addition of the headers <fstream> and <cstdlib> and the member function bump_age(), everything remains
the same:
// tut01_08.cpp -- a simple C++ hello, world
program, file streams
#include <string>
#include <iostream>
#include <istream>
#include <ostream>
#include <iomanip>
#include <fstream>
#include <cstdlib>
typedef unsigned
short age_t;
const age_t
ADULT_AGE = 18;
class TheUser
{
public:
TheUser(std::string theName =
"N/A", age_t theAge = 0) :
theName_(theName),
theAge_(theAge) { }
void bump_age(void)
{ ++theAge_; }
friend std::istream&
operator>> (std::istream &in, TheUser &user);
friend std::ostream&
operator<< (std::ostream &out, const TheUser
&user);
private:
std::string theName_;
age_t theAge_;
};
std::istream& operator>>
(std::istream &in, TheUser &user)
{
in >> user.theAge_
>> std::ws;
std::getline(in, user.theName_);
return in;
}
std::ostream& operator<<
(std::ostream &out, const TheUser &user)
{
return out
<< user.theAge_ << " " <<
user.theName_ << std::endl;
}
|
Please
note that the input and output operators still operate on std::istream and std::ostream.
As such, they are file stream agnostic. But that will come in handy now. In the
function main(), we make
use of std::ifstream and std::ofstream(defined
in <fstream>):
int
main(int argc,
char *argv[])
{
if (argc
!= 3) {
std::cerr <<
"Usage: " << argv[0]
<<
" infile.dat outfile.dat"
<<
std::endl;
return
EXIT_FAILURE;
}
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
TheUser aUser;
while (ifs
>> aUser) {
aUser.bump_age();
ofs << aUser;
}
ifs.close();
ofs.close();
return EXIT_SUCCESS;
}
|
Assuming
we started with the file ifile.dat:
32 John Doe
8 Mickey TooYoung
|
compiling,
then running the program tut01_08.cpp yields:
% c++ -Wall -o tut01_08 tut01_08.cpp
% ./tut01_08 ifile.dat ofile.dat
% cat ofile.dat
33 John Doe
9 Mickey TooYoung
|
As you
can see, a std::ifstream is a std::istream,
and a std::ofstream is a std::ostream,
so the input and output operators could be used unchanged.
From a
practical point of view, what’s important here is that the output stream must
be readable 1:1 back into an input stream. Should we decide to add additional
fields (members) to TheUser,
those fields should be placed in such a way as to preserve the property of
reading back what we’ve written out. If necessary, some fields will need to be
enclosed in some well-defined delimiters, as in:
32 "John Doe" "15. Yellow
Drive"
|
Obviously,
the delimiter must be escaped if necessary:
32 "John \"the weasel\"
Doe" "15. Yellow Drive"
|
Writing
this out is easy. Modify the output operator accordingly. Reading it back in is
a lot more complicated. This is where other techniques can be leveraged, like
boost::serialization, XML parsers etc… (in another tutorial).
Containers
Creating
a couple of TheUser instances is easy. Storing them in an
appropriate data structure is easy too.
First,
let’s modularize the data. The header tut01_09.h defines the class TheUser:
// tut01_09.h -- the class TheUser
#ifndef TUT01_09_H_INCLUDED
#define TUT01_09_H_INCLUDED
#include <string>
#include <istream>
#include <ostream>
typedef unsigned
short age_t;
const age_t
ADULT_AGE = 18;
class TheUser
{
public:
TheUser(std::string theName =
"N/A", age_t theAge = 0) :
theName_(theName),
theAge_(theAge) { }
void bump_age(void)
{ ++theAge_; }
friend std::istream&
operator>> (std::istream &in, TheUser &user);
friend std::ostream&
operator<< (std::ostream &out, const TheUser
&user);
private:
std::string theName_;
age_t theAge_;
};
#endif // TUT01_09_H_INCLUDED
|
The
implementation of the input and output operators (and later possibly additional
functions) goes into tut01_09.cpp:
// tut01_09.cpp -- the class TheUser
#include "tut01_09.h"
#include <iomanip>
std::istream& operator>>
(std::istream &in, TheUser &user)
{
in >> user.theAge_
>> std::ws;
std::getline(in, user.theName_);
return in;
}
std::ostream& operator<<
(std::ostream &out, const TheUser &user)
{
return out
<< user.theAge_ << " " <<
user.theName_ << std::endl;
}
|
tut01_09.cpp could be compiled separately into an object
file tut01_09.o, but we
won’t do that in this example.
Finally, main() goes into the driver program tut01_09a.cpp:
// tut01_09a.cpp -- driver for tut01_09.cpp
#include "tut01_09.h"
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <vector>
typedef std::vector<TheUser>
theuser_vec_t;
int
main(int argc,
char *argv[])
{
if (argc
!= 3) {
std::cerr <<
"Usage: " << argv[0]
<<
" infile.dat outfile.dat"
<<
std::endl;
return
EXIT_FAILURE;
}
theuser_vec_t
aVec;
TheUser
aUser;
//
Read TheUser instances into aVec
std::ifstream ifs(argv[1]);
while (ifs
>> aUser) {
aUser.bump_age();
aVec.push_back(aUser);
}
ifs.close();
// Write TheUser instances into
file
std::ofstream ofs(argv[2]);
if (!aVec.empty())
{
typedef
theuser_vec_t::const_iterator vec_iter_t;
for (vec_iter_t
i = aVec.begin(); i != aVec.end(); ++i)
ofs
<< *i;
}
ofs.close();
return EXIT_SUCCESS;
}
|
We used
a std::vector container from the STL (Standard
Template Library), instantiated to the data type TheUser. We read all instances
of TheUser from a file (see previous example)
into that container. Then we write every object of this container back out into
another file, after incrementing the age of each TheUser instance.
We
compile this program like this:
% c++ -Wall -o tut01_09a tut01_09.cpp
tut01_09a.cpp
% ./tut01_09a
Usage: ./tut01_09a infile.dat outfile.dat
|
Starting
again with ifile.dat, we
get the same output file ofile.dat as shown above:
% cat ifile.dat
32 John Doe
8 Mickey TooYoung
% ./tut01_09a ifile.dat ofile.dat
% cat ofile.dat
33 John Doe
9 Mickey TooYoung
|
Exercise:
Notice how the two whitespace between 8 and Mickey
TooYoung in ifile.dat collapsed into a single whitespace
between 9 and Mickey
TooYoung in ofile.dat? Explain.
C++ Tutorial (2)
Introducing
std::map
Associative
arrays a.k.a. dictionaries are one of the most useful data structures. That’s
why they are so popular among very high level programming languages like Python
or Perl. In C++, the STL data type std::map is one possible implementation of
dictionaries, though it’s not the only one available.
DECLARING A
STD::MAP VARIABLE
We must
first determine the data type of the keys and values, since std::map is templated. Let’s suppose that keys
arestd::string, and values are int.
To declare a map variable aMap.
we start by including the necessary headers:
#include <string>
#include <map>
|
Declaring
the map variable aMap is as simple as:
std::map<std::string, int> aMap;
|
While
this is enough for most uses, it can become quite cumbersome to repeat it, e.g.
in function definitions:
std::map<std::string, int>
merge_maps(const std::map<std::string,
int> &someMap,
const
std::map<std::string, int> &someOtherMap);
|
To
simplify such code, typedefs
are quite useful:
typedef std::map<std::string,
int> map_t;
map_t aMap;
map_t
merge_maps(const map_t
&someMap, const map_t &someOtherMap);
|
Of
course, we can also allocate a map variable dynamically. With the previously typedef‘d map_t:
map_t *map_ptr =
new map_t;
// ... do something with *map_ptr, then:
delete map_ptr;
|
ADDING, CHANGING,
AND ERASING KEY/VALUE PAIRS
The
main idea of maps is to add key/value pairs to them:
aMap["C++"] = 10;
aMap["Python"] = 8;
aMap["Perl"] = 6;
aMap["Scheme"] = 7;
aMap["Java"] = 3;
|
Changing
a value is easy:
++aMap["Python"];
// is now 9
aMap["C++"] =
aMap["C++"] + 10; // is now 20
|
We may
want to remove a key/value pair, knowing its key:
aMap.erase("Scheme"); // remove
"Scheme"/7 from map.
|
QUERYING A VALUE,
KNOWING A KEY
Querying
a map is easiest, if we already know the key:
#include <iostream>
std::cout << "Ranking of C++:
" << aMap["C++"] <<
std::endl;
std::cout << "Ranking of Python:
" << aMap["Python"] <<
std::endl;
|
Gotcha! However,
we can’t query if a key is in the map using the
subscript notation, because writing something likeaMap["Prolog"]
automatically instantiates a new key/value
pair with the key “Prolog” and the default value for the mapped type (here 0) if it doesn’t already exist:
int value
= aMap["Prolog"]; // WRONG!
// now aMap["Prolog"] == 0, while
it didn'e exist before
|
The equivalent of Python’s dict.has_key
function, which doesn’t gratuitously add a
new key/value pair to the map can be implemented like this in C++:
bool
has_key(const map_t
&the_map, const map_t::key_type
&the_key)
{
map_t::const_iterator iter =
the_map.find(the_key);
return iter
!= the_map.end();
}
|
This
brings us to std::map iterators, and how to traverse a map.
TRAVERSING A
STD::MAP WITH ITERATORS
So
what’s in aMap? If we
don’t know which keys are in it, we can’t query it directly (think about it,
it’s quite obvious). Unlike Python, we don’t have a function that returns a
list of keys, so that we can iterate over it. But fortunately, C++’sstd::map provides iterators that can be used
like this:
typedef map_t::const_iterator
iter_t;
for (iter_t
i = aMap.begin(); i != aMap.end(); ++i)
{
// i->first is the curent key
// i->second is the current
value
// *i is a std::pair<const
std::string, int>
std::cout << "key:
" << i->first << std::endl;
std::cout << "value:
" << i->second << std::endl;
}
|
The difference between an map_t::const_iterator
and map_t::iterator
is that we can’t change the mapped type
with aconst_iterator. Bumping all values requires an iterator:
for (map_t::iterator
i = aMap.begin(); i != aMap.end(); ++i)
++(i->second);
|
So, to implement the equivalent of the Python dict.keys
method, which returns a list of keys of a
dictionary in C++:
#include <list>
std::list<map_t::key_type>
keys(const map_t
&the_map)
{
std::list<map_t::key_type>
theKeys;
map_t::const_iterator iter;
for (iter
= the_map.begin(); iter != the_map.end(); ++iter)
theKeys.push_back(iter->first);
return theKeys;
}
|
This
can be used like this:
#include <algorithm>
void
printme(const map_t::key_type
&aKey)
{
std::cout <<
"Key=" << aKey <<
std::endl;
}
int main(int
argc, char *argv[])
{
// ...
std::list<map_t::key_type>
allKeys;
allKeys = keys(aMap);
std::for_each(allKeys.begin(),
allKeys.end(), printme);
// ...
}
|
A
practical example
With
what we’ve learned so far, we can write a simple C++ program that counts how
often words occur in a text:
// tut_02_01.cpp -- word count using
std::map
#include <cstdlib>
#include <iostream>
#include <string>
#include <map>
int
main (int argc,
char *argv[])
{
typedef std::map<std::string,
unsigned int> wc_t;
wc_t wordCount;
std::string aWord;
while (std::cin
>> aWord) {
++wordCount[aWord];
}
typedef wc_t::const_iterator
wc_iter_t;
for (wc_iter_t
i=wordCount.begin(); i != wordCount.end(); ++i)
std::cout <<
i->first << ": " <<
i->second << std::endl;
return EXIT_SUCCESS;
}
|
If we
compile and run this program on the following file tut_02_01.txt:
a horse is a horse
of course of course
a test is a test
and is successful if passed
and crap if failed
|
we get
this output:
% c++ -Wall -o tut_02_01 tut_02_01.cpp
% ./tut_02_01 < tut_02_01.txt
a: 4
and: 2
course: 2
crap: 1
failed: 1
horse: 2
if: 2
is: 3
of: 2
passed: 1
successful: 1
test: 2
|
As you
can see, the keys are sorted alphabetically.
Not
yet covered so far
In
subsequent parts of this tutorial:
C++
Tutorial (3)
This is
part 3 of a fast paced C++ tutorial for programmers familiar with high level
languages like Perl and Python.
User-defined
classes im maps
The maps in the previous
tutorial contained
only pairs of basic types like int or std::string.
In this section, we’ll see how to store arbitrary objects of user-defined
classes in maps (and other STL containers).
LIFETIME ISSUES
Before
we dive right in, we need to talk a little about lifetime issues of C++
objects. In most interpreted languages, you don’t have to care about lifetime
of objects, because memory allocation and deallocation happen automagically.
Let’s look at a typical example in Python:
#!/usr/bin/env python
# tut03_01.py -- objects in dictionaries
class Employee(object):
"A typical
employee."
def __init__(self,
name="N/A", salary=1000.0):
self.name
= name
self.salary
= salary
if __name__
== '__main__':
e1 = Employee("e1", 2000.0)
e2
= Employee("e2", 1800.0)
print "e1:", e1
print "e2:", e2
aDict
= {}
aDict["e1"]
= e1
aDict["e2"]
= e2
print aDict
|
Running
this program, we get:
% python tut03_01.py
e1: <__main__.Employee object at
0x800ec7750>
e2: <__main__.Employee object at
0x800ec79d0>
{'e1': <__main__.Employee object at
0x800ec7750>,
'e2': <__main__.Employee object at
0x800ec79d0>}
|
If you
pay close attention to the pointers to e1 and e2,
you’ll notice that the values in the dictionary point to the very same objects
(which is fine). This way, you can change e1 via the dictionary like this:
aDict["e1"].salary +=
100
|
If we now print
e1.salary
, we get 2100.0: we’ve just modified e1.
What
happens under the hood is that Python stores Employee objects in an internal data structure.
Each time we assign such an object to a variable (like e1), a smart pointer to that
internal data structure is stored into that variable. Ditto for dictionaries
and other containers. The smart pointer maintains a reference count, so that
the internal data structure will only be destroyed, when all pointers that
point to it have been destroyed. In Python (and most other interpreted
languages like Ruby, Perl, PHP, Java, …), variables never contain the objects
themselves, but pointers to them!
Not so
in C++! If we store an object in a
variable, or in an STL container like std::map, std::vector, std::list etc., a copy of that object is made,
and that copy will ultimately be stored.
To see
this in action, let’s reimplement Employee in C++:
// tut03_01.h -- a sample Employee class
#ifndef TUT03_01_H
#define TUT03_01_H
#include <string>
#include <istream>
#include <ostream>
#include <iostream>
const float
DEFAULT_SALARY = 1000.0;
class Employee
{
public:
Employee():
name_("N/A"),
salary_(DEFAULT_SALARY) {
std::cerr
<< "Employee() called"
<<
" -- this=" << this
<< ". [default ctor]" <<
std::endl;
}
Employee(const std::string
name, const float salary)
:
name_(name),
salary_(salary) {
std::cerr
<< "Employee(" << name
<< "," << salary <<
") called"
<<
" -- this=" << this
<< ". [normal ctor]" <<
std::endl;
}
Employee(const Employee
©) {
name_ = copy.name_;
salary_ =
copy.salary_;
std::cerr <<
"Employee(" << © <<
") called"
<<
" -- this=" << this
<< ". [copy ctor]" <<
std::endl;
}
~Employee() {
std::cerr <<
"~Employee([" << this
<< "]) called" <<
std::endl;
}
Employee& operator=(const
Employee &rhs) {
name_ = rhs.name_;
salary_ =
rhs.salary_;
std::cerr <<
"operator=(" << &rhs <<
") called"
<<
" -- this=" << this
<< ". [assign operator]" <<
std::endl;
return
*this;
}
std::string getName() const
{ return name_; }
float getSalary()
const { return salary_;
}
void setName(const
std::string newName) { name_ = newName; }
void setSalary(const
float newSalary) { salary_ = newSalary; }
friend std::ostream&
operator<<(std::ostream &ostr, const Employee
&emp);
friend std::istream&
operator>>(std::istream &istr, Employee &emp);
private:
std::string name_;
float salary_;
};
std::ostream
&operator<<(std::ostream &ostr, const Employee
&emp)
{
return ostr
<< "Employee(" << emp.name_
<< ","
<<
emp.salary_ << "), this=" <<
&emp;
}
std::istream &operator>>(std::istream
&istr, Employee &emp)
{
// We first read the salary,
then the name!
istr >> emp.salary_;
std::getline(istr, emp.name_);
return istr;
}
#endif // TUT03_01_H
|
We’ll
see shortly what all those constructors, operator= and member functions are for.
A
simple test program? Okay, here we go:
// tut03_01.cpp -- testing the Employee
class
#include "tut03_01.h"
const char
*separator =
"---------------------------------------------------";
int
main (int argc,
char *argv[])
{
// Calling the default
constructor Employee()
std::cout << "CALLING
Employee e0;" << std::endl;
Employee e0;
std::cout << "e0 is
" << e0 << std::endl <<
separator << std::endl;
// Calling the constructor
Employee(std::string, float)
std::cout << "CALLING
Employee e1(\"e1\", 1200.0);" <<
std::endl;
Employee e1("e1", 1200.0);
std::cout
<< "e1 is " << e1 <<
std::endl << separator << std::endl;
//
Calling the copy constructor
std::cout << "CALLING
Employee e2(e1); // and changing name to e2"
<< std::endl;
Employee
e2(e1);
e2.setName("e2");
std::cout
<< "e2 is now " << e2 <<
std::endl << separator << std::endl;
//
Testing the assignment operator by calling e2 = e0;
std::cout << "CALLING e2 = e0;" << std::endl;
e2 =
e0;
std::cout
<< "e2 is now " << e2 <<
std::endl << separator << std::endl;
// Testing the dynamic case
(new)
std::cout << "CALLING
Employee *empPtr = new Employee(\"e_new\", 8000.0);"
<<
std::endl;
Employee *empPtr = new
Employee("e_new", 8000);
std::cout << "*empPtr
is " << *empPtr << std::endl <<
separator << std::endl;
// Explicitly deleting empPtr:
std::cout << "CALLING
delete empPtr;" << std::endl;
delete empPtr;
// Here, e2, e1, e0 will be
destroyed.
}
|
So
let’s see what happens, when we run this program:
% c++ -o tut03_01
tut03_01.cpp
% ./tut03_01
CALLING Employee e0;
Employee() called -- this=0x7fffffffea00.
[default ctor]
e0 is Employee(N/A,1000),
this=0x7fffffffea00
---------------------------------------------------
CALLING Employee e1("e1", 1200.0);
Employee(e1,1200) called --
this=0x7fffffffe9f0. [normal ctor]
e1 is Employee(e1,1200), this=0x7fffffffe9f0
---------------------------------------------------
CALLING Employee e2(e1); // and changing
name to e2
Employee(0x7fffffffe9f0) called --
this=0x7fffffffe9e0. [copy ctor]
e2 is now Employee(e2,1200),
this=0x7fffffffe9e0
---------------------------------------------------
CALLING e2 = e0;
operator=(0x7fffffffea00) called --
this=0x7fffffffe9e0. [assign operator]
e2 is now Employee(N/A,1000),
this=0x7fffffffe9e0
---------------------------------------------------
CALLING Employee *empPtr = new
Employee("e_new", 8000.0);
Employee(e_new,8000) called --
this=0x800d04040. [normal ctor]
*empPtr is Employee(e_new,8000), this=0x800d04040
---------------------------------------------------
CALLING delete empPtr;
~Employee([0x800d04040]) called
~Employee([0x7fffffffe9e0]) called
~Employee([0x7fffffffe9f0]) called
~Employee([0x7fffffffea00]) called
|
Now pay
close attention to the pointers of the objects.
So what
do we see here?
- The first Employee has been created by a default constructor, and
stored in the variable e0.
- Providing explicit
parameters for the name and the salary, invokes a different constructor,
i.e. the one with the signature
Employee(const std::string name, const
float salary)
. Again, an Employee object is created, and stored in
the variable e1.
- When we call
Employee e2(e1)
, we are in fact requesting that a copy of the object stored in e1 be made, and that copy be stored
into the variable e2.
C++ calls the copy
constructor Employee(const Employee ©)
, which creates a new object and stores it into e2.
- Finally, the expression
e2 = e0
means that we effectively want the state of e0 to be copied into the object
stored into e2. Important: Note that e2 must already exist! This calls
the assignment operator operator=,
which in turn performs the necessary state-copying.
Hadn’t
we explicitly defined the copy constructor and assignment operator, the
compiler would have defined them for us. In this case, the state of the source
object would have been copied over to the target object in a bit-wise fashion
by the compiler-generated code (which we don’t see in the source code files).
In our particular case, this would be been fine. However, it is not always a
good idea to rely on compiler-generated functions: what if the state contained
pointers to memory allocated with new?
We would have gotten shallow copy semantics, and a lot of head scratching as to
who would be responsible for delete-ing
that! That’s why we need explicit copy constructors and assignment operators.
There’s
nothing interesting to say about the last test case with new and delete.
It behaves as expected, right?
STORING EMPLOYEE
OBJECTS IN STL CONTAINERS
The
most important lesson to learn is that STL containers store copies of objects. This is different from
Python, whose containers always store smart pointers to objects.
To
illustrate this STL behavior, the following test program:
// tut03_01a.cpp -- store an Employee object
in a std::list
#include "tut03_01.h"
#include <list>
#include <cstdlib>
typedef std::list<Employee>
list_t;
int
main (int argc,
char *argv[])
{
Employee
e1("e1", 1000);
list_t
aList;
aList.push_back(e1);
return EXIT_SUCCESS;
}
|
would
print:
% c++ -o tut03_01a tut03_01a.cpp
% ./tut03_01a
Employee(e1,1000) called --
this=0x7fffffffea30. [normal ctor]
Employee(0x7fffffffea30) called --
this=0x800d02070. [copy ctor]
~Employee([0x800d02070]) called
~Employee([0x7fffffffea30]) called
|
So,
what’s going on here? Look at the addresses:
- The first object created at
the address 0x7fffffffea30 was stored in e1.
- The STL implementation of
std::list.push_back()
effectively called Employee‘s
copy constructor to generate a copy of e1.
This new copy with the (different) address 0x800d02070 has been stored in the list.
- Because aList was created after e1, it will be destroyed
before e1 when the program is about to end.
When the destructor of aList is called, it will walk through
all objects stored in the list, calling their destructors in turn. That’s
why the copied object 0x800d02070‘s
destructor will be called next.
- Even though we can’t see it, aList will be destroyed at this point,
after it has destroyed all its elements.
- Last but not least, e1 will be destroyed, i.e. the
destructor for the original object 0x7fffffffea30 will get called.
As we
can see, std::list stores copies of objects, and assumes
responsibility for calling their destructors when necessary.
Let’s
repeat this test, using a std::map instead of a std::list:
// tut03_01b.cpp -- store an Employee object
in a std::map
#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
typedef std::map<std::string,
Employee> map_t;
int
main (int argc,
char *argv[])
{
Employee
e1("e1", 1000);
map_t
aMap;
aMap["e1"]
= e1;
return EXIT_SUCCESS;
}
|
What
whould happen here? We already expect std::map to create a copy of e1:
% c++ -o tut03_01b
tut03_01b.cpp
% ./tut03_01b
Employee(e1,1000) called --
this=0x7fffffffea20. [normal ctor]
Employee() called -- this=0x7fffffffe9a0.
[default ctor]
Employee(0x7fffffffe9a0) called --
this=0x7fffffffe988. [copy ctor]
Employee(0x7fffffffe988) called --
this=0x800d03068. [copy ctor]
~Employee([0x7fffffffe988]) called
~Employee([0x7fffffffe9a0]) called
operator=(0x7fffffffea20) called --
this=0x800d03068. [assign operator]
~Employee([0x800d03068]) called
~Employee([0x7fffffffea20]) called
|
Woah,
hold on! What’s going on here? That’s a lot more complicated that with std::list!
If we
follow the addresses, this is what we gather from all this (in pseudo code):
// This is what really happens
(Pseudo-Code):
Employee e1("e1",
1000); // 0x7fffffffea20
Employee
e2;
// 0x7fffffffe9a0
Employee
e3(e2); //
0x7fffffffe988
Employee
e4(e3); //
0x800d03068
~Employee(e3);
~Employee(e2);
e4 = e1;
~Employee(e4);
~Employee(e1);
|
So,
basically, std::map creates a copy of our original object e1 and stores it (in our pseudo-code it
is marked as e4) in the
map.
However,
this copy isn’t made directly. Instead, this particular implementation of std::map creates a few intermediary ephemeral
objects (e2 and e3), which
get destroyed almost immediately again. Why it does this can’t be guessed
without looking at the source code of std::map.operator[].
What’s
important for us here right now is not that ephemeral objects get created, but
the relative order in which the copy (e4) and the original (e1)
are being destroyed: just like in the previous std::list example, since aMap is being destroyed before e1, the destructor of aMap takes care of calling the destructors
of all stored objects (copies), therefore destroying e4 first. Then, after aMap has been destroyed, e1 is being destroyed in turn.
By the
way, hadn’t we implemented the default constructor Employee() in the class Employee, this particular
implementation of std::map would have thrown a compile error at
us, looking somewhat like this:
% c++ -o tut03_01b
tut03_01b.cpp
/usr/include/c++/4.2/bits/stl_map.h: In
member function
'_Tp& std::map<_Key, _Tp, _Compare,
_Alloc>::operator[](const _Key&)
[with _Key =
std::basic_string<char, std::char_traits<char>,
std::allocator<char>
>,
_Tp
= Employee,
_Compare
= std::less<std::basic_string<char, std::char_traits<char>,
std::allocator<char>
> >,
_Alloc
= std::allocator<std::pair<const std::basic_string<char,
std::char_traits<char>,
std::allocator<char>
>,
Employee>
>]':
tut03_01b.cpp:16: instantiated
from here
/usr/include/c++/4.2/bits/stl_map.h:350:
error:
no matching function for
call to 'Employee::Employee()'
tut03_01.h:24: note: candidates are:
Employee::Employee(const Employee&)
tut03_01.h:14:
note:
Employee::Employee(std::string, float)
|
What does this cryptic message tell
us? We obviously need a standard constructor Employee(),
and neither our constructor Employee(const
std::string name, const float salary)
nor our copy constructor Employee(const
Employee ©)
were
able to fulfill the role of a standard constructor.
So how
can we solve the problem? We can:
- either add an Employee() standard constructor to the class Employee as we did alright,
- or change
Employee(const std::string, const float)
to a constructor that accepts default parameters, as inEmployee(const
std::string="N/A", const float salary = 1000.0)
.
STORING POINTERS
TO EMPLOYEES IN STL CONTAINERS
We’ve
just seen that storing Employee objects in a std::map container incurs not only additional
and gratuitous copies in the form of ephemeral and target objects, it also
means per-value / copy semantics: changing an object in the STL container
doesn’t affect the original object, but only the copy in the container.
If we
wanted to mimic Python’s dictionary (and variable) semantics, we could store
pointers to Employee objects right into the std::map, instead of storing
the whole Employee objects.
A naive
first try:
// tut03_01c.cpp -- store pointers to
Employee objects in a std::map
#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
typedef std::map<std::string,
Employee *> map_t;
int
main (int argc,
char *argv[])
{
Employee
e1("e1", 1000);
map_t
aMap;
aMap["e1"]
= &e1;
return EXIT_SUCCESS;
}
|
seems
to be running just fine:
% c++ -o tut03_01c
tut03_01c.cpp
% ./tut03_01c
Employee(e1,1000) called --
this=0x7fffffffea20. [normal ctor]
~Employee([0x7fffffffea20]) called
|
Looks
good: no more gratuitous copies nor ephemeral Employee objects. But beware, this isn’t as
safe as we might imagine! How about the following code?
// tut03_01d.cpp -- store pointers to
Employee objects in a std::map
#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
typedef std::map<std::string,
Employee *> map_t;
void
hire_at_minimum_wages(const std::string
name, map_t &payroll)
{
Employee aSlave(name, 400.0);
payroll[name] = &aSlave;
}
int
main (int argc,
char *argv[])
{
map_t aMap;
hire_at_minimum_wages("Hungry
Programmer", aMap);
hire_at_minimum_wages("Uncle
Tom", aMap);
for (map_t::const_iterator
it = aMap.begin(); it != aMap.end(); ++it)
std::cout <<
"Employee(" << it->first
<< "), "
<<
it->second->getName() << ", $"
<<
it->second->getSalary() << std::endl;
return EXIT_SUCCESS;
}
|
Running
it segfaults:
% c++ -o tut03_01d
tut03_01d.cpp
% ./tut03_01d
Employee(Hungry Programmer,400) called -- this=0x7fffffffe940.
[normal ctor]
~Employee([0x7fffffffe940]) called
Employee(Uncle Tom,400) called --
this=0x7fffffffe940. [normal ctor]
~Employee([0x7fffffffe940]) called
Bus error (core dumped)
|
If
you’re a C or C++ programmer, you’d have already spotted the obvious error: aMap stores stale pointers to Employeeobjects that have
already been destroyed (when the local variable aSlave in hire_at_minimum_wages() has gone out of scope)! The moment we
tried to access this memory — even in a read-only manner –, we entered the
realm of the scary undefined
behavior.
Obviously,
saving a pointer to an auto object like aSlave into a long-lived container isn’t such
a bright idea!
An
alternative is to modify hire_at_minimum_wages() in such a way, that it instantiates dynamic
(heap) objects with new:
// tut03_01e.cpp -- store pointers to
Employee objects in a std::map
#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
typedef std::map<std::string,
Employee *> map_t;
void
hire_at_minimum_wages(const std::string
name, map_t &payroll)
{
Employee *aSlavePtr = new
Employee(name, 400.0);
payroll[name] = aSlavePtr;
}
int
main (int argc,
char *argv[])
{
map_t aMap;
hire_at_minimum_wages("Hungry
Programmer", aMap);
hire_at_minimum_wages("Uncle
Tom", aMap);
std::cout << "Our
slaves:" << std::endl;
for (map_t::const_iterator
it = aMap.begin(); it != aMap.end(); ++it)
std::cout <<
" Employee(" << it->first
<< "), "
<<
it->second->getName() << ", $"
<<
it->second->getSalary() << std::endl;
return EXIT_SUCCESS;
}
|
So
let’s try it:
% c++ -o tut03_01e
tut03_01e.cpp
% ./tut03_01e
Employee(Hungry Programmer,400) called --
this=0x800d03040. [normal ctor]
Employee(Uncle Tom,400) called --
this=0x800d03050. [normal ctor]
Our slaves:
Employee(Hungry Programmer),
Hungry Programmer, $400
Employee(Uncle Tom), Uncle Tom,
$400
|
No more
core dumps, but this time, we’ve got another problem. Can you guess which one?
It’s right there to see, even if it is invisible! Hint: where have the calls to ~Employee() gone?
The problem here is that we’ve got a
big fat memory leak: now
that aMap doesn’t store Employee objects anymore, but pointers to Employee, aMap‘s destructor will not call ~Employee(). In fact, aMap‘s destructor will try to
call the destructor of the stored pointers, but since the raw pointers used
here don’t have destructors, nothing happens. Of course,delete
never gets called on the new
-ed objects… thus the memory leak.
SMART POINTERS TO
THE RESCUE
The
previous issue was caused by the fact that raw pointers don’t have destructors,
and therefore, the destructor of the container wasn’t able to clean up
dynamically allocated memory.
Instead of storing raw pointers in an
STL container, we could have stored a custom object that mimics a raw pointer.
This object would have to keep track of the number of references pointing to
some data, and its destructor would deallocate the object (by calling delete
object
) when the last reference pointing to
it is about to disappear. Do you recognize Python’s memory allocation scheme
here?
Fortunatly,
we don’t have to implement such a beast, because it already exists. Not in the
C++ standard though, but in its TR1 addendum instead:
// tut03_01f.cpp -- store pointers to
Employee objects in a std::map
#include "tut03_01.h"
#include <string>
#include <map>
#include <cstdlib>
#include <tr1/memory>
// If <tr1/memory> is not available
for your compiler,
// #include <boost/shared_ptr.hpp>
// and use boost::shared_ptr instead of
std::tr1::shared_ptr.
// Don't forget to add
-I/path/to/boost/headers when compiling.
// Boost headers and library are available
at http://www.boost.org/
typedef std::tr1::shared_ptr<Employee>
emp_ptr_t;
typedef std::map<std::string,
emp_ptr_t> map_t;
void
hire_at_minimum_wages(const std::string
name, map_t &payroll)
{
emp_ptr_t aSlavePtr(new
Employee(name, 400.0));
payroll[name] = aSlavePtr;
}
int
main (int argc,
char *argv[])
{
map_t aMap;
hire_at_minimum_wages("Hungry
Programmer", aMap);
hire_at_minimum_wages("Uncle
Tom", aMap);
std::cout << "Our
slaves:" << std::endl;
for (map_t::const_iterator
it = aMap.begin(); it != aMap.end(); ++it)
std::cout <<
" Employee(" << it->first
<< "), "
<<
it->second->getName() << ", $"
<<
it->second->getSalary() << std::endl;
return EXIT_SUCCESS;
}
|
As shown in the comments, if your
compiler doesn’t support TR1 headers yet, use Boost‘s boost::shared_ptr template instead, and add something
like -I/usr/local/include when compiling.
So
let’s try it out, one last time:
% c++ -o tut03_01f
tut03_01f.cpp
% ./tut03_01f
Employee(Hungry Programmer,400) called --
this=0x800d03040. [normal ctor]
Employee(Uncle Tom,400) called --
this=0x800d03050. [normal ctor]
Our slaves:
Employee(Hungry Programmer),
Hungry Programmer, $400
Employee(Uncle Tom), Uncle Tom,
$400
~Employee([0x800d03050]) called
~Employee([0x800d03040]) called
|
Ain’t
that sweet?
Summary
STL
containers store copies of objects, and assume ownership of
them; i.e. their destructor takes care of calling the destructors of all stored
objects when necessary.
This
copying behavior can incur an additional overhead w.r.t. CPU cycles and memory
used to create ephemeral temporary objects. Moreover, those copy semantics are
not the same as Python’s, Perl’s and other interpreted languages’ you may be
used to.
Instead of storing objects into STL
containers, one could also store just raw pointers to objects. But then, the
container is no longer responsible for those objects. Extra care must be taken
not to store pointers to auto
(stack based) objects, and pointers to new
allocated objects will leak memory.
When storing pointers to objects in
STL containers, it is better to avoid raw pointers, and use a std::tr1::shared_ptrinstead. If
your compiler doesn’t implement the TR1 headers yet, you can use the
corresponding boost::shared_ptrtemplate
from the Boost Libraries (best practices documented here).
C++
Tutorial (4)
This is
part 4 of a fast paced C++ tutorial for programmers familiar with high level
languages like Perl and Python.
Copying
files with Standard I/O Streams
After having familiarized ourselves
with std::map
in the previous
tutorial, it’s time to take a closer look at the I/O Streams
Library. So, in this tutorial, we’ll be copying text and binary files “the C++
way.”
Copying files isn’t exactly an
interesting task, especially since we could run external utilities like cp(1) with
thesystem(3) library
call. To avoid the overhead of spawning an external process, we could also copy
files the plain old C way, e.g. chunkwise using fread(3) and fwrite(3) from <cstdio>.
However, the purpose of this tutorial is to learn C++, so let’s look at how to
copy files using I/O streams from the STL.
COPYING
LINE-ORIENTED TEXT FILES
If the
file is a collection of lines (i.e. not a binary file), we could copy the file
line-wise:
// copy1.cpp -- copying of files, line
structure.
#include <string>
#include <fstream>
#include <cstdlib>
int
main (int argc,
char *argv[])
{
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
std::string aLine;
while (std::getline(ifs,
aLine))
ofs << aLine
<< std::endl;
ifs.close();
ofs.close();
return EXIT_SUCCESS;
}
|
An input file is represented by the
input stream ifs of type std::ifstream
, and an output file is represented by the output stream ofs of
type std::ofstream
:
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
|
Those input and output streams can be
used just like std::cin
and std::cout
.
Lines are read in from the input
stream with the std::getline
function. Destination of the read is aLine, a std::string
:
std::string aLine;
while (std::getline(ifs,
aLine))
ofs << aLine <<
std::endl;
|
We use a std::string
instead of an old style buffer, because std::string
automatically adapts its length to the size of the input,
so we don’t have to worry about buffer overflows.
Since std::getline
strips the end-of-line character(s) from its input, we
need to add it again in the output (we use thestd::endl
manipulator for that, though it may have been more
efficient to simply append "\n"
and not flushing the output stream).
After
we’re done with the files, we can close them explicitely:
ifs.close();
ofs.close();
|
Needless
to say: this program is only for line-oriented files.
COPYING TEXT FILES
USING A BUFFER
The previous program had an important
property: it used a streamlined data flow: as soon as a line (or a
chunk) was read in, it was written to the output. That program’s memory footprint was
very small.
Alternatively, we could have slurped the
whole file into memory (e.g. into a std::vector
of std::string
s), and then written the output:
// copy2.cpp -- copying of files, lines
structures, via container
#include <string>
#include <vector>
#include <fstream>
#include <cstdlib>
int
main (int argc,
char *argv[])
{
typedef std::vector<std::string>
vec_t;
vec_t theLines;
std::string aLine;
std::ifstream ifs(argv[1]);
while (std::getline(ifs,
aLine))
theLines.push_back(aLine);
ifs.close();
std::ofstream ofs(argv[2]);
typedef vec_t::const_iterator
iter_t;
for (iter_t
i = theLines.begin(); i != theLines.end(); ++i)
ofs << *i
<< std::endl;
ofs.close();
return EXIT_SUCCESS;
}
|
As before, we used a std::ifstream
and std::ofstream
to represent input and output files; and of course, we’re
again reading in the data line-wise with the std::getline
function.
What’s new here is the data structure theLines. This is our vector
of string
s. Note that we defined the data type vec_tlike this:
typedef std::vector<std::string>
vec_t;
|
so that
we can later on define a constant iterator out of it:
typedef vec_t::const_iterator
iter_t;
|
We used the vec_t::push_back method
of theLines to append the (stripped) lines to the
end of the vector in the while
loop. In the output for
loop, we let an iterator i traverse the vectory from begin to
end. Of course, we don’t want to output the iterator i but
what i points to, i.e. we dereference i as in *i
.
This program isn’t as good as the
previous one, because it needs to store the whole file into memory (i.e. into theLines). This is okay for small files, but copying
very large files (e.g. many GBs large) is sure to exhaust the virtual memory of
the process.
The lesson to remember here: always
use a streamlined data flow if you can!
LATHER, RINSE,
REPEAT… BUT WITH ALGORITHMS
The
code of the previous program wasn’t very elegant. Some idioms could have been
written in a more concise way. Look at this variation:
// copy3.cpp -- copying of files, lines
structures, via containers and algs.
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <fstream>
#include <cstdlib>
int
main (int argc,
char *argv[])
{
typedef std::vector<std::string>
vec_t;
vec_t theLines;
std::string aLine;
std::ifstream ifs(argv[1]);
std::copy(std::istream_iterator<std::string>(ifs),
std::istream_iterator<std::string>(),
std::back_inserter(theLines));
ifs.close();
std::ofstream ofs(argv[2]);
std::copy(theLines.begin(),
theLines.end(),
std::ostream_iterator<std::string>(ofs,
"\n"));
ofs.close();
return EXIT_SUCCESS;
}
|
The while
and for
loops are gone and have been replaced by calls to the
generic std::copy
algorithm (from<algorithm>). This algorithm has the following
signature:
std::copy(input_iterator_begin,
input_iterator_end,
output_iterator);
|
and can be applied to nearly every
structure that provides the necessary iterator semantics. For example, to
replace the while
loop with an idiomatic std::copy
call, we need:
- an input iterator pointing
to the beginning of the input sequence. Since we want to read from an
input stream, we need an
std::istream_iterator
adaptor, parameterized for std::string
and using ifs as
input stream:std::istream_iterator<std::string>(ifs)
,
- an input interator pointing
one past the end of the input sequence. Here we need a special notation /
convention: the iterator adaptor
std::istream_iterator<std::string>()
without parameters represents such an end iterator.
- an output iterator, that
wenn called, will automatically call the function
push_back
on the data structure passed to it (so it will fill the vector). We
could write such an iterator manually, but why bother, if we can use a
prefabricated iterator from <iterator>: std::back_inserter
, parameterized with the name of the needed target data structure? std::back_inserter(theLines)
This
results in the following idiomatic code:
std::copy(std::istream_iterator<std::string>(ifs),
std::istream_iterator<std::string>(),
std::back_inserter(theLines));
|
To output the vector theLines with std::copy
we need:
- an input iterator pointing
to the beginning of the vector:
theLines.begin()
- an input iterator pointing
one past the end of the vector:
theLines.end()
- an output iterator, that,
when called, will send the data it gets to an output stream. Again, we
could write such an input interator by hand, but it’s much more convenient
to use an iterator adaptor from <iterator>.
More precisely, we get such an output iterator with
std::ostream_iterator
, passing the data type std::string
as template parameter, and the desired output stream and separator
string as parameters:std::ostream_iterator<std::string>(ofs,
"\n")
.
This
results in the following very idiomatic code:
std::copy(theLines.begin(),
theLines.end(),
std::ostream_iterator<std::string>(ofs,
"\n"));
|
Obviously,
even though it is more readable than the previous example, this program isn’t
streamlined, because it buffers its whole input.
COPYING BINARY
FILES WITH STREAMBUF ITERATORS
All previous examples were about
copying lines. To copy a binary file, we need to read and write bytes or chunks
(buffers of bytes) directly. One way to do this, is to call istream::get
or istream::read
, to fetch data, and ostream::put
orostream::write
to save it. You may want to try it. Have a look at the
headers <istream> and <ostream> for the signatures.
A different, much more idiomatic
approach is to use std::copy
again, like this:
// copy4.cpp -- copying of files, via
streambufs, iters and algs.
#include <algorithm>
#include <iterator>
#include <fstream>
#include <cstdlib>
int
main (int argc,
char *argv[])
{
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
std::copy(std::istreambuf_iterator<char>(ifs),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(ofs));
ifs.close();
ofs.close();
return EXIT_SUCCESS;
}
|
istreambuf_iterator
and ostreambuf_iterator
are iterator adaptors that operate directly on the
underlying streambuf
of the respective streams. You may find details about them
in <iterator> or in a header that is #included by
that (e.g. with gcc-4.2, it is in /usr/include/c++/4.2/bits/streambuf_iterator.h on my system)
Summary
There
are many ways to copy files using the C++ I/O Streams library. Text files can
be copied line-by-line, while binary files need to be copied byte- or
chunkwise.
When copying
files, we should strive to streamline the data flow — i.e. not to buffer the
whole input file in memory — because large files can easily overflow the
available amount of virtual memory.
Instead of using loops, you should
use the more idiomatic std::copy
algorithm with appropriate iterators. Iterators that
operate on streams can be obtained with std::istream_iterator
and std::ostream_iterator
.
Bypassing the formatting that the
stream imposes is possible too with the std::istreambuf_iterator
andstd::ostreambuf_iterator
iterator adaptors, which operate directly on the
underlying streambuf
, and are thus more efficient.
In the next
tutorial, we’ll use an external library (POCO) to Base64 encode and
-decode files and strings.
C++
Tutorial (5)
This is
part 5 of a fast paced C++ tutorial for programmers familiar with high level
languages like Perl and Python.
Beyond
the C++ STL
In the previous C++ tutorials, all examples were
restricted to the C++ Standard Templates Library (STL), which is part of every
ANSI C++ compliant compiler environment, and which we can take for granted.
Unfortunately,
the STL doesn’t include classes for many popular areas, like:
- Networking
- Crypto
- Databases
- XML
- GUI Frameworks
This is intentional: because C++ is a
superset of C, C++ programmers could just as easily call external C or C++
libraries (e.g. Berkeley Sockets API for networking, OpenSSL for Crypto, C bindings for SQLite3,
PostgreSQL, MySQL, … for database connectivity, SAX and DOM for XML parsing,
and various C++ GUI frameworks like Qt, wxWidgets, and so on.
C++ designers
didn’t want to impose a default standard for all those areas of application:
C++ and the STL’s philosophy is distinctly different from Java’s which includes
and therefore standardizes a lot of different APIs.
So, as
C++ programmers, we’re confronted with a series of choices regarding external
libraries. Which library is best suited for networking? For database
connectivity?…
As firm
believers in Open Source Software (OSS), we eliminate closed-source and
proprietary libraries right from the start (feel free to use one, if need be).
Furthermore, we eliminate libraries that are not portable across platforms: it
just doesn’t make sense to develop against a Windows-only API if you want to
port your application to Linux later, or vice-versa, right?
There
are many cross-platform OSS C++ libraries out there, some of them highly
specialized, others broad in scope and size. The following “generalist”
libraries are interesting from the point of view of a general application
developer:
- Boost: a
collection of C++ libraries designed by many members of the C++ standards
committee with the intent to include the best of them in revised versions
of the C++ Standard.
- POCO: a set of
portable C++ components that aims to close many gaps left open by the STL.
- Qt: A powerful,
cross-platform framework of C++ classes for GUI development.
In all
cases, before using an external library, it is necessary to download, compile
and install it both on the development and on the target machine. In this
tutorial, we’ll explore a couple of classes from the POCO library, so if it
isn’t already installed on your system, you’ll need to fetch it from its web
site, and install it.
BASE64 ENCODING
AND DECODING FILES WITH POCO
To transmit files over a channel that
is not 8-bit clean (e.g. UUCP, old SMTP, NNTP etc…), it is necessary to encode
binary files in such a way that only some characters are being used. A long
time, ago, people used to uuencode(1) and
uudecode(1) such files, but today, we would Base64-encode and -decode them.
On some systems (like FreeBSD), we
can use the utilities b64encode and b64decode, that are
already part of the system, to achieve the job. But on most other systems, we
need to roll our own Base64 encoders and decoders.
Fortunately, POCO provides the
classes Poco::Base64Encoder
and Poco::Base64Decoder
to do the job.
Base-64 encoding a file: b64encode.cpp
// b64encode.cpp -- Base64 encode a file
with Poco::Base64Encoder
#include <fstream>
#include <cstdlib>
#include "Poco/Base64Encoder.h"
int
main (int argc,
char *argv[])
{
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
Poco::Base64Encoder b64out(ofs);
std::copy(std::istreambuf_iterator<char>(ifs),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(b64out));
b64out.close(); // always call
this at the end!
return EXIT_SUCCESS;
}
|
This is
how to compile this program:
% c++ -O2 -I/usr/local/include
-Wall -c -o b64encode.o b64encode.cpp
% cc -L/usr/local/lib
b64encode.o -lPocoFoundation -o b64encode
|
The program is not really that much
different from copy4.cpp of the previous
tutorial, which was, in a nutshell:
std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);
std::copy(std::istreambuf_iterator<char>(ifs),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(ofs));
ifs.close();
ofs.close();
|
The only difference is that we copy
the output to b64out, which wraps
the std::ofstream
ofs in a Poco::Base64Encoder
, and use b64out as the destination of the std::copy
operation.
This is
the result of Base64-encoding some big file:
% ./b64encode
/boot/kernel/kernel /var/tmp/kernel.b64
% ls -l
/boot/kernel/kernel /var/tmp/kernel.b64
-r-xr-xr-x 1 root
wheel 12161158 Feb 24 12:47 /boot/kernel/kernel
-rw-r--r-- 1 farid wheel
16665292 Mar 21 16:09 /var/tmp/kernel.b64
|
As you
can see, the base-64 encoded file is, as expected, larger. We can also peek
into (the beginning) of both files:
% hd /boot/kernel/kernel | head -5
00000000 7f 45 4c 46 02 01 01 09
00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00
10 b1 18 80 ff ff ff ff |..>.............|
00000020 40 00 00 00 00 00 00 00
08 4f 9f 00 00 00 00 00 |@........O......|
00000030 00 00 00 00 40 00 38 00
05 00 40 00 25 00 22 00 |....@.8...@.%.".|
00000040 06 00 00 00 05 00 00 00
40 00 00 00 00 00 00 00 |........@.......|
% hd /var/tmp/kernel.b64 | head -5
00000000 66 30 56 4d 52 67 49 42
41 51 6b 41 41 41 41 41 |f0VMRgIBAQkAAAAA|
00000010 41 41 41 41 41 41 49 41
50 67 41 42 41 41 41 41 |AAAAAAIAPgABAAAA|
00000020 45 4c 45 59 67 50 2f 2f
2f 2f 39 41 41 41 41 41 |ELEYgP////9AAAAA|
00000030 41 41 41 41 41 41 68 50
6e 77 41 41 41 41 41 41 |AAAAAAhPnwAAAAAA|
00000040 41 41 41 41 41 45 41 41
0d 0a 4f 41 41 46 41 45 |AAAAAEAA..OAAFAE|
|
We see
that the second file contains only printable characters.
b64decode.cpp
The reverse operation is Base-64
decoding a file. Instead of Poco::Base64Encoder
, we simply use aPoco::Base64Decoder,
like this:
// b64decode.cpp -- Base64 decode a file
with Poco::Base64Decoder
#include <fstream>
#include <cstdlib>
#include "Poco/Base64Decoder.h"
int
main (int argc,
char *argv[])
{
std::ifstream ifs(argv[1]);
Poco::Base64Decoder b64in(ifs);
std::ofstream ofs(argv[2]);
std::copy(std::istreambuf_iterator<char>(b64in),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(ofs));
return EXIT_SUCCESS;
}
|
This is, again, our file copy
program, idiomatic version with std::copy
and streambuf_iterator
s. In b64encode we wrapped ofs with Poco::Base64Encoder
. Here, we wrapped ifs with Poco::Base64Decoder
, resulting in an input streamb64in.
Compiling:
% c++ -O2 -I/usr/local/include
-Wall -c -o b64decode.o b64decode.cpp
% cc -L/usr/local/lib
b64decode.o -lPocoFoundation -o b64decode
|
Now,
let’s Base64-decode the file we’ve previously Base64-encoded:
% ./b64decode /var/tmp/kernel.b64
/var/tmp/kernel.decoded
% ls -l /boot/kernel/kernel
/var/tmp/kernel.decoded
-r-xr-xr-x 1 root
wheel 12161158 Feb 24 12:47 /boot/kernel/kernel
-rw-r--r-- 1 farid wheel
12161158 Mar 21 16:19 /var/tmp/kernel.decoded
% diff /boot/kernel/kernel
/var/tmp/kernel.decoded
% rm /var/tmp/kernel.b64
/var/tmp/kernel.decoded
|
Of
course, we’ve got the very same file that we’ve encoded previously.
BASE64-ENCODING
AND -DECODING STRINGS: B64STRINGS.CPP
Suppose we don’t want to Base-64
encode whole files, but only std::string
s. One example could be that we want to compose
Base64-encoded e-mail messages from some data that the user entered in a GUI
element.
We could re-use Poco::Base64Encoder
and Poco::Base64Decoder
to transform strings, but there’s a little problem here:
both classes need output- und input streams, respectively, and not strings!
However, the signature of the functions we need are:
std::string toBase64 (const std::string
&source);
std::string fromBase64 (const std::string
&source);
|
Fortunately, we can easily transform
a string to an input or output stream with std::istringstream
andstd::ostringstream
from <sstream>. toBase64 could
look like this:
std::string
toBase64 (const std::string
&source)
{
std::istringstream in(source);
std::ostringstream out;
Poco::Base64Encoder b64out(out);
std::copy(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(b64out));
b64out.close(); // always call
this at the end!
return out.str();
}
|
and fromBase64 would be:
std::string
fromBase64 (const std::string
&source)
{
std::istringstream in(source);
std::ostringstream out;
Poco::Base64Decoder b64in(in);
std::copy(std::istreambuf_iterator<char>(b64in),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(out));
return out.str();
}
|
Here’s
one possible main program:
// b64strings.cpp -- functions to Base64
encode and decode strings.
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <cstdlib>
#include "Poco/Base64Encoder.h"
#include "Poco/Base64Decoder.h"
std::string toBase64 (const std::string
&source); // As shown above
std::string fromBase64 (const std::string
&source); // As shown above
int
main (int argc,
char *argv[])
{
std::string
clearText("hello, world!");
std::string
b64Text(toBase64(clearText));
std::string
clearAgain(fromBase64(b64Text));
std::cout << "Clear1:
" << clearText << std::endl;
std::cout << "Base64:
" << b64Text <<
std::endl;
std::cout << "Clear2:
" << clearAgain << std::endl;
return EXIT_SUCCESS;
}
|
Compling
and running it:
% c++ -O2 -I/usr/local/include
-Wall -c -o b64strings.o b64strings.cpp
% cc -L/usr/local/lib
b64strings.o -lPocoFoundation -o b64strings
% ./b64strings
Clear1: hello, world!
Base64: aGVsbG8sIHdvcmxkIQ==
Clear2: hello, world!
|
Summary
To
overcome the (intentional) limitations of the C++ STL, it is necessary to use
external libraries. We distinguish between closed-source and open-source
libraries, between highly specialized and broad scope libraries, and between
platform-specific and cross-platform libraries.
Good
libraries include Boost, Poco, and Qt, but they are by no means the only ones.
C++ isn’t Java: the standard doesn’t define what external libraries are best
suited for your needs. The choice is yours to make.
As an example, we’ve used the input
stream adapter Poco::Base64Encode
from the POCO library to Base64-encode files (or streams,
more generally), and Poco::Base64Decode
to Base64-decode files (or streams). We’ve seen how to
make use of std::istringstream
and std::ostringstring
in combination with the above mentioned POCO classes, to
Base64-encode and Base64-decode std::string
s.
Basically,
we’re simply plumbing well-tested code components together and don’t reinvent
the wheel.