Recently when plugging two components of a high-throughput web service together, I ran into a snag. One component (a data repository) exposes an Iterator for pulling XML-formatted records out of it one by one. The other (for serving SOAP response documents) needed, ideally, something that could be wrapped in a StreamSource — i.e. an InputStream or Reader. But although these are both pull-based ways of providing (in this case) character data, they’re not compatible.
One easy option is to iterate over the whole Iterator and buffer the results in a String, and then use a StringReader. But that’s not terribly efficient, when you might well be dealing with XML documents in the 10-20MB range. So I wrote an IteratorReader class, which is a Reader that can be wrapped around any Iterator. Each time it’s read from, it pulls enough elements from the Iterator to enable the request to be fulfilled, and buffers any remainder. This keeps its memory usage down, although this of course depends on (a) the number of characters requested at once from its read method, and (b) the size of the elements coming off the Iterator. (Each element is simply converted into a String via its toString method before being stored in a character buffer.)
Surprisingly, given the vast amount of Java source out there, I couldn’t find an existing solution for this — not even in the usually comprehensive Apache Commons. The code is below, and you are free to do what you like with it, but a credit would be nice if you use it, and if you come up with any improvements I’d be interested to hear about them. In particular, I’m sure it could be optimized more, as it spends a lot of time garbage collecting in its current form. It’s pretty thoroughly tested, with an ArrayList of two million random strings as the source of the Iterator, and seems to work fine both with single-character reads and a BufferedReader wrapped round it. Actually, testing taught me some very interesting lessons, but that’s another post.
Implementation note: As well as providing the Iterator to read from, you can optionally provide an object that implements the Closeable interface. This is because in the scenario I developed this for, the Iterator in question represented a stream of objects that was being generated on-the-fly from a live database connection, and implemented Closeable as well as Iterator so the connection could be closed when necessary. I needed a way of doing this automatically from the Reader’s point of view, so when the Iterator runs out of data (hasNext returns false) the close method of the attached Closeable, if present, is called.
Download the file: IteratorReader.java.v0_1
All comments very gratefully received.
/** * IteratorReader v. 0.1 * Andrew B. Clegg */ import java.io.Closeable; import java.io.IOException; import java.io.Reader; import java.util.Iterator; public class IteratorReader extends Reader { // The iterator from which we'll read private final Iterator<? extends Object> _iterator; // Optionally, an object to close when we're done private Closeable _closeable; // Buffer to hold character pulled from iterator before they're read private char[] _leftoverCharsFromLastRead = new char[ 0 ]; // Flag to indicate when iterator is out of elements private boolean _iteratorExhausted = false; /** * Creates a new IteratorReader. * @param iterator the Iterator to read from */ public IteratorReader( Iterator<? extends Object> iterator ) { _iterator = iterator; _closeable = null; } /** * Creates a new IteratorReader whose Iterator is backed by a Closeable * object that must be cleanly closed when no longer needed. * @param iterator the Iterator to read from * @param closeable the Closeable object backing the Iterator */ public IteratorReader( Iterator<? extends Object> iterator, Closeable closeable ) { _iterator = iterator; _closeable = closeable; } /** * Closes the Closeable object on which this reader's Iterator depends. * If there is no such Closeable, or it has already been closed, this * method does nothing. This method is automatically called when Iterator's * hasNext method returns false, but can be called earlier. * @throws IOException if the Closeable encounters a problem when closing */ @Override public void close() throws IOException { if( _closeable != null ) { _closeable.close(); _closeable = null; } } /** * Reads characters into a portion of an array. See Reader. * @param outBuf array to copy the characters into * @param outBufOffset offset at which to start storing characters * @param charsRequested maximum number of characters to read * @return the number of characters read, or -1 if the end of the iterator has been reached * @throws IOException if the Closeable encounters a problem when closing */ @Override public synchronized int read( char[] outBuf, int outBufOffset, int charsRequested ) throws IOException { // System.out.format( "read called: outBufOffset=%d, charsRequested=%d\n", outBufOffset, charsRequested ); // System.out.format( "current state: _leftoverCharsFromLastRead has %d characters, _iteratorExhausted=%b\n", // _leftoverCharsFromLastRead.length, _iteratorExhausted ); // Have we already read enough characters from the iterator to feed this request? if( charsRequested <= _leftoverCharsFromLastRead.length ) { // Yes, we already have enough characters, copy them into output buffer System.arraycopy( _leftoverCharsFromLastRead, 0, outBuf, outBufOffset, charsRequested ); // Are there any left over? int remainder = _leftoverCharsFromLastRead.length - charsRequested; assert( remainder >= 0 ); if( remainder > 0 ) { // Copy remaining characters to new buffer (i.e. shrink buffer) char[] tempBuf = new char[ remainder ]; System.arraycopy( _leftoverCharsFromLastRead, charsRequested, tempBuf, 0, remainder ); _leftoverCharsFromLastRead = tempBuf; } else { // None left over, so reset buffer to zero-length _leftoverCharsFromLastRead = new char[ 0 ]; } // Return the number of characters read // (in this case, all the characters requested) return charsRequested; } else { // We have been asked for more characters than we currently have, so we // can return what we have (if there are no more in the iterator) or // try to acquire more from the iterator // If iterator is exhausted and read has been called again, clean up and // return straight away, after copying as many characters as we have left if( _iteratorExhausted ) { int charsAvailable = _leftoverCharsFromLastRead.length; if( charsAvailable == 0 ) { // Nothing in the iterator or the buffer, we're done return -1; } else { // Copy what we have into output buffer System.arraycopy( _leftoverCharsFromLastRead, 0, outBuf, outBufOffset, charsAvailable ); // Clean up our own buffer and return number of characters copied _leftoverCharsFromLastRead = new char[ 0 ]; return charsAvailable; } } else { // There's still data in the iterator, so we can attempt to satisfy the whole request // by doing another read -- open a stringbuilder of the desired length StringBuilder sb = new StringBuilder( charsRequested ); // Insert however many characters we do have and reset our buffer to zero-length if( _leftoverCharsFromLastRead.length > 0 ) { sb.append( _leftoverCharsFromLastRead ); _leftoverCharsFromLastRead = new char[ 0 ]; } int charsStillRequired = charsRequested - _leftoverCharsFromLastRead.length; // Iteratively add new strings until no more characters are required while( charsStillRequired > 0 && !_iteratorExhausted ) { // Read another string from the underlying iterator String string = nextString(); // Add it to stringbuffer sb.append( string ); // Adjust number still required charsStillRequired = charsStillRequired - string.length(); } // Did we read to the end of the iterator? if( _iteratorExhausted ) { // We have read all the strings from the iterator, but can only return // as many characters as we managed to read, or as many as were requested, // whichever is lower int charsObtained = sb.length(); char[] tempBuf = sb.toString().toCharArray(); // charsToReturn is the number of chars requested, or obtained, whichever is lower int charsToReturn = Math.min( charsRequested, charsObtained ); // Copy this many characters into output buffer System.arraycopy( tempBuf, 0, outBuf, outBufOffset, charsToReturn ); // Do we have any left over in our buffer? if( charsObtained > charsRequested ) { // Yes -- more obtained than requested -- save them for next request int charsToSave = charsObtained - charsRequested; assert( charsToSave + charsToReturn == tempBuf.length ); _leftoverCharsFromLastRead = new char[ charsToSave ]; System.arraycopy( tempBuf, charsToReturn, _leftoverCharsFromLastRead, 0, charsToSave ); } if( charsObtained == 0 ) { // No characters left in buffer or iterator; return -1 immediately _leftoverCharsFromLastRead = new char[ 0 ]; return -1; } else { // There are some remaining in buffer for next time, so just return // the number we acquired this time return charsToReturn; } } else { // sb now contains text to return, and there are more strings to iterate through. // We can save a bit of effort by putting the entire contents of sb into // our 'leftover' characters buffer, and calling this method again to copy it over _leftoverCharsFromLastRead = sb.toString().toCharArray(); return read( outBuf, outBufOffset, charsRequested ); } } } } private String nextString() throws IOException { // This should never get called after _iteratorExhausted has been set assert( !_iteratorExhausted ); if( _iterator.hasNext() ) { return _iterator.next().toString(); } else { _iteratorExhausted = true; close(); return ""; } } }