C++ range I/O reference

Contents

  1. Motivation
  2. Range I/O
    1. Formatting
    2. Error handling
  3. Reference
    1. Input
    2. Output

Motivation

C++11 introduced the ‘range-for’ construct, which was the first step toward making ranges first-class citizens in the language. The concept is still being developed, but for now we can say that a range is anything that has begin() and end() member functions, or that works with begin() and end() free functions – either std::begin()/std::begin(), or begin()/end() as found by ADL – that in either case return a type that satisfies the Iterator concept, where the return value from begin() references the beginning of a sequence and the return value from end() references one-past-the end of the same sequence. That’s a mouthful, but in brief you can say:

If it works with range-for, it’s a range.

It is possible that future standards will include a whole new algorithms library that works with ranges rather than iterator pairs. That would be neat, because it would mean that verbose function calls like these:

using std::begin;
using std::end;

std::sort(begin(r), end(r));
std::set_difference(begin(a), end(a), begin(b), end(b), std::back_inserter(c));

can be simplified to this:

std::sort(r);
std::set_difference(a, b, std::back_inserter(c));

which has the additional benefit of eliminating the possibility of mixing up iterators to different ranges.

Whether or not that happens, the range concept is still a very nice one to work with, and one that we already tend to use.

Unfortunately, C++ has never had a natural way to use ranges – even ranges defined by a pair of iterators – with IOStreams. The best the standard library has had to offer is stream iterators, most commonly used with the std::copy() algorithm, which looks like this:

// Output
std::copy(begin(r), end(r), std::ostream_iterator<double>{out, "\n"});

// Input - note: if you don't use braces, watch out for the most vexing parse!
std::copy(std::istream_iterator<double>{in}, std::istream_iterator<double>{}, std::back_inserter(c));

Stream iterators have the neat ability to be plugged into any algorithm that takes input and output iterators, so you can do some really cool things like read values from two different files, add each pair, and output the results to a third file in one statement:

std::transform(std::istream_iterator<int>{in1}, std::istream_iterator<int>{},
               std::istream_iterator<int>{in2}, std::ostream_iterator<int>{out, "\n"},
               std::plus<int>{});

The flexibility that stream iterators offer can’t easily be beat. Unfortunately, they’re a little clunky for the simplest and most common I/O tasks. Look again at what reading values from a stream into a range looks like:

std::copy(std::istream_iterator<double>{in}, std::istream_iterator<double>{}, std::back_inserter(r));

And this is writing a range to a stream:

std::copy(begin(r), end(r), std::ostream_iterator<double>{out, "\n"});

Those statements are not just clunky, verbose, and unintuitive. The semantics are also wrong. I/O is not a copy operation. No one calls operator<< or operator>> copy operators.

Even worse, they don’t really work all that well. Suppose you have a file with a known number of double values – say 30. How would you read them into a vector? Well, you might write something like this:

auto r = std::vector<double>(30);
std::copy_n(std::istream_iterator<double>{in}, r.size(), begin(r));

If all goes well, that will work fine. However, if there is a problem reading, say, the fifth double value – either because the source data is corrupted or because there was an I/O error with the stream – what do you think will happen? What will happen is probably that the first four values of the vector will be legitimate, then the remaining twenty-six will simply be repetitions of the fourth value. Maybe. Here’s the really tricky question, though: how can you detect where the error occurred?

The stream state will tell you that there was a read error at some point... but where? Are all the values in the vector okay but the last? Are only half okay? Are they all bad? You see, the algorithm will merrily chug away until 30 values have been “read” and “copied” – regardless of whether 30 values actually get read and copied or not. It’s just not designed for this kind of use.

However, using std::copy() won’t work for this problem either, because that algorithm will copy values until it can’t anymore – it won’t stop at 30. Which means you’re stuck – you basically have to write your own algorithm that copies up to n values, stopping early if the “end” iterator is reached (which, with input stream iterators, is triggered when not more values can be read).

You may think that may not be that bad, because “copy up to n” is probably a useful algorithm to have in your toolkit anyway. However, there’s yet another thorny issue to deal with: formatting.

Algorithms and stream iterators have no awareness of stream formatting. You can get surprising and unpredictable results. Consider the following code, where v is a vector with N elements { 0, 1, 2, ..., N-1 }:

out << "{ ";
out.fill('0');
out.width(5);
std::copy(begin(v), end(v), std::ostream_iterator<int>{out, ", "});
out << " }\n";

Clearly what is desired is to print the vector like this: “{ 00000, 00001, 00002, ... }”. Unfortunately, that’s not what you’re going to get – and even worse, you can’t predict exactly what’s going to happen without knowing the size of the vector. Here are some possible results:

// v = { 0 }
{ 00000,  }

// v = { 0, 1, 2 }
{ 00000, 1, 2,  }

// v = { }
{ 00 }

Reading a bunch of values from a stream into a range, and writing the values in a range to a stream, are very basic, and very common operations. It should be easy, especially for beginners. Making it easy is the goal of this proposal.

Range I/O

This is what input of singular values looks like in C++:

auto v = type{};
in >> v;

This is what output of singular values looks like:

out << v;

One of the goals of this proposal was to make input and output of ranges as natural as possible, and to integrate it will with the existing I/O mechanisms. So this is what input of ranges looks like:

auto r = vector<type>{};
in >> back_insert(r);

And this is what output of ranges looks like:

out << write_all(r);

These operations integrate right into the existing input and output paradigms. For example, you can chain them:

auto r = vector<int>{ 1, 1, 2, 3, 5 };
out << "{ " << write_all(r, ", ") << " }";
// output: "{ 1, 1, 2, 3, 5 }"

Output comes in two flavours – simple output that just prints each element of the range:

auto r = vector<char>{ 'a', 'b', 'c', 'd', 'e' };
out << write_all(r);
// output: "abcde"

And delimited output, that prints each element of the range with a delimiter between each element:

auto r = vector<char>{ 'a', 'b', 'c', 'd', 'e' };
out << write_all(r, " and ");
// output: "a and b and c and d and e"

Note that the delimiter is much more flexible than the ‘delimiter’ in ostream_iterator. It does not necessarily need to be a string – if the delimiter is a single character like a newline, you can just use it as a char – but you can even use a ‘smart’ delimiter:

struct oxford_comma
{
  explicit oxford_comma(size_t n) : n_{--n} {}
  
  size_t n_;
};

auto operator<<(ostream& o, oxford_comma& c) -> ostream&
{
  if (!(--n_))
    o << ", ";
  else
    o << ", and ";
}

auto benefits = vector<string>{
  "clearer",
  "classier",
  "more logical"
};

cout << "The Oxford comma is " <<
        write_all(benefits, oxford_comma(benefits.size())) <<
        " than the barbaric alternatives.";

// The Oxford comma is clearer, classier, and more logical than the barbaric alternatives.

With output, you can do any filtering you need on the range itself. For example, using Boost.Range’s range adaptors:

struct is_odd { auto operator()(int x) const { return x % 2 == 1; } };

auto const r = vector<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

using namespace boost::adaptors;

cout << write_all(r | reversed | filtered(is_odd{}), ", ");

// output: "9, 7, 5, 3, 1"

Input, however, is more complicated, partly because you need to do the filtering while the values are being read in, and partly because there are multiple ways you can add values to a range. For that reason, several input functions are included in the proposal, covering the most common ways values are added to a range:

auto a = array<double, 10>{};
in >> overwrite(a); // reads 10 doubles into a

auto v = vector<int>{};
in >> back_insert(v); // reads ints until a read failure, adding them via v.push_back()
in >> back_insert_n(v, 5); // reads ints until 5 are read or there is a read failure,
                           // whichever comes first, adding them via v.push_back()

auto l = list<complex<float>>{};
in >> front_insert(l); // similar to back_insert(), except this uses l.push_front()
in >> front_insert_n(l, 5);

auto s = vector<int>{ 0, 1, 2, 3, 4, 5 };
in >> insert(s, next(begin(s), 3)); // reads ints until a read failure, adding them
                                    // via s.insert() sequentially starting after the 2
in >> insert_n(s, next(begin(s), 3), 5);

There is also a generalized input function that you can customize using a traits type. You define a class with two functions – prepare() and read() – then pass an instance of that class to the generalized input function. Using that, you can customize any kind of input operation you like. For example, reading only odd numbers and inserting them into a range in order:

struct insert_only_if_odd
{
  using iterator_type = vector<int>::iterator;
  
  auto prepare(vector<int>& r, iterator_type /* not used here */)
  {
    return make_tuple(true, end(r));
  }
  
  auto read(istream& in, vector<int>& r, iterator_type /* not used here */)
  {
    auto v = int{};
    
    auto store_ok = false;
    auto read_ok = bool(in >> v);
    
    if (read_ok && (v % 2))
    {
      r.insert(upper_bound(begin(r), end(r), v), v);
      store_ok = true;
    }
    
    return make_tuple(bool(in), end(r), read_ok, store_ok);
  }
};

auto iss = istringstream{"6 2 8 3 1 8 5 3 0 7 1 7 9"};

auto r = vector<int>{};

iss >> input(r, insert_only_if_odd{});

// r = { 1, 1, 3, 3, 5, 7, 7, 9 }

Formatting

Perhaps the biggest improvement of range I/O functions over stream iterators and algorithms is awareness of stream formatting.

The philosophy of formatting in range I/O functions can be summed up as:

  • Whatever formatting is applied to the first value read/written in a range I/O operation gets applied to every value in that operation.
  • The stream formatting state after a range I/O operation must be the same whether zero, one, or many values were read/written in that operation.

The first point allows you to set up the stream’s formatting state the way you desire, and have every element of the range be formatted according to that state:

auto const r = array<int, 5>{ 1, 1, 2, 3, 5 };
out << "{ " << hex << showbase << setfill('-') << setw(5) << write_all(r, ", ") << " }";
// output: "{ --0x1, --0x1, --0x2, --0x3, --0x5 }"

The second point means that you don’t need to worry about how many elements are in the range – if any. You will have a consistent, predictable formatting state after the operation (as you can see above – the closing brace is added quite naturally, and would be printed as you would expect even if the range was empty).

Note that in the case of delimited output, the formatting only applies to the actual range elements. The formatting applied to the delimiter is determined by whatever state the stream would be left in after writing a single value of the range’s type.

The formatting rules also apply to input:

auto iss = istringstream{"abcdefghi"};
auto r = array<string, 3>{};
auto s = string{};
iss >> setw(2) >> overwrite(r) >> s;
// r = { "ab", "cd", "ef" }
// s = "ghi"

Error handling

The range I/O functions also give you much more information about exactly what went down in an I/O operation – including whether any errors occurred.

To get this information, you have to capture the range I/O operation object returned by the range I/O function. You use this object in an I/O expression as usual, but after the operation the range I/O object can be queried.

For example, the next member holds the iterator to the next element in the range where a value would be written (equivalent to the return value you would get if you used std::copy() with istream iterators):

auto r = array<int, 10>{};

auto p = overwrite(r);
in >> p;

if (p.next != end(r))
{
  cerr << "An error occurred during input.\n";
  cerr << "Number of successfully read values = " << distance(begin(r), p.next) << " of " << r.size() << '\n';
  cerr << "Successfully read = { " << write_all(boost::make_iterator_range(begin(r), p.next), ", ") << " }\n";
}

(The previous example uses iterator_range from Boost.Range.)

Range input operation objects also have two members that count the number of elements read from the stream, and the number actually stored in the range. For the standard input operation objects, those counts will always be the same, but if you use a filtering input operation they may be different:

struct insert_only_if_odd
{
  using iterator_type = vector<int>::iterator;
  
  auto prepare(vector<int>& r, iterator_type /* not used here */)
  {
    return make_tuple(true, end(r));
  }
  
  auto read(istream& in, vector<int>& r, iterator_type /* not used here */)
  {
    auto v = int{};
    
    auto store_ok = false;
    auto read_ok = bool(in >> v);
    
    if (read_ok && (v % 2))
    {
      r.insert(upper_bound(begin(r), end(r), v), v);
      store_ok = true;
    }
    
    return make_tuple(bool(in), end(r), read_ok, store_ok);
  }
};

auto iss = istringstream{"6 2 8 3 1 8 5 3 0 7 1 7 9"};

auto r = vector<int>{};

auto p = input(r, insert_only_if_odd{});
iss >> p;

// r = { 1, 1, 3, 3, 5, 7, 7, 9 }
// p.next = end(r)
// p.count = 13
// p.stored = 8

As you can see, 13 values were read from the stream, but only 8 were placed in the range.

Output operation objects also have the next member – which is an iterator to the next item in the range that would be written – and a count member that counts the number of values that were written in the last output operation:

auto r = vector<int>{ 1, 1, 2, 3, 5 };
auto p = write_all(r, ", ");
out << p;
// output: "1, 1, 2, 3, 5"
// p.next = end(r)
// p.count = 5

Reference

Input

Function Arguments Effects Supports
overwrite(r)
r
Non-const range lvalue
  • Overwrites each element in r with successive values.
  • Stops when r is completely overwritten, or after a read failure.
  • C arrays of known bound
  • std::array
  • std::vector
  • std::deque
  • std::forward_list
  • std::list
  • std::basic_string
  • std::valarray
back_insert(r)
r
Non-const range lvalue
  • Adds successive values to the end of r using push_back().
  • Stops after a read failure.
  • std::vector
  • std::deque
  • std::list
  • std::basic_string
back_insert_n(r, n)
r
Non-const range lvalue
n
size_t
  • Adds successive values to the end of r using push_back().
  • Stops after n values are read, or after a read failure.
Same as above.
front_insert(r)
r
Non-const range lvalue
  • Adds successive values to the start of r using push_front().
  • Stops after a read failure.
  • std::deque
  • std::forward_list
  • std::list
front_insert_n(r, n)
r
Non-const range lvalue
n
size_t
  • Adds successive values to the start of r using push_front().
  • Stops after n values are read, or after a read failure.
Same as above.
insert(r, p)
r
Non-const range lvalue
p
Iterator for r
  • Adds successive values to r before the element referenced by p using insert().
  • Stops after a read failure.
  • std::vector
  • std::deque
  • std::list
  • std::basic_string
insert_n(r, p, n)
r
Non-const range lvalue
p
Iterator for r
n
size_t
  • Adds successive values to r before the element referenced by p using insert().
  • Stops after n values are read, or after a read failure.
Same as above.
input(r, b)
r
Non-const range lvalue
d
Instance of an object that describes an input behaviour
Depends on behaviour. Depends on behaviour.

All range input functions return a range input object that references the range. The object cannot be default-constructed (it must be constructed by a range input function), and it satisfies the following concepts:

The range input object has the following members:

Member Type Description
next Iterator An iterator to the range that is being read into, pointing to the next location that would be read into if this input operation were continued.
count size_t The number of values that were successfully read from the stream in the last input operation. Initially zero.
stored size_t The number of values that were added to the range in the last input operation. Initially zero.

Output

Function Arguments Effects Supports
write_all(r)
r
const lvalue reference or non-const rvalue reference to a range.
  • Writes each element in r to the output stream successively.
  • Stops when r is completely written, or after a write failure.
  • C arrays of known bound
  • std::array
  • std::vector
  • std::deque
  • std::forward_list
  • std::list
  • std::basic_string
  • std::valarray
write_all(r, d)
r
const lvalue reference or non-const rvalue reference to a range.
d
An instance of a type that can be written to an output stream.
  • Writes each element in r to the output stream successively.
  • The delimiter d is written after each value, except the last.
  • Stops when r is completely written, or after a write failure.
Same as above.

All range output functions return a range output object that references the range, or – in the case of rvalues – takes ownership of the range (by moving). The object cannot be default-constructed (it must be constructed by a range output function), and it satisfies the following concepts:

The range output object has the following members:

Member Type Description
next Iterator An iterator to the range that is being read from, pointing to the next location that would be read from if this output operation were continued.
count size_t The number of values that were successfully written to the stream in the last output operation. Initially zero.