Quantcast
Channel: biotext.org.uk » Research
Viewing all articles
Browse latest Browse all 4

IteratorReader — streaming character data from an iterator

$
0
0

Recently when plugging two components of a high-throughput web service together, I ran into a snag. One component (a data repository) exposes an Iterator for pulling XML-formatted records out of it one by one. The other (for serving SOAP response documents) needed, ideally, something that could be wrapped in a StreamSource — i.e. an InputStream or Reader. But although these are both pull-based ways of providing (in this case) character data, they’re not compatible.

One easy option is to iterate over the whole Iterator and buffer the results in a String, and then use a StringReader. But that’s not terribly efficient, when you might well be dealing with XML documents in the 10-20MB range. So I wrote an IteratorReader class, which is a Reader that can be wrapped around any Iterator. Each time it’s read from, it pulls enough elements from the Iterator to enable the request to be fulfilled, and buffers any remainder. This keeps its memory usage down, although this of course depends on (a) the number of characters requested at once from its read method, and (b) the size of the elements coming off the Iterator. (Each element is simply converted into a String via its toString method before being stored in a character buffer.)

Surprisingly, given the vast amount of Java source out there, I couldn’t find an existing solution for this — not even in the usually comprehensive Apache Commons. The code is below, and you are free to do what you like with it, but a credit would be nice if you use it, and if you come up with any improvements I’d be interested to hear about them. In particular, I’m sure it could be optimized more, as it spends a lot of time garbage collecting in its current form. It’s pretty thoroughly tested, with an ArrayList of two million random strings as the source of the Iterator, and seems to work fine both with single-character reads and a BufferedReader wrapped round it. Actually, testing taught me some very interesting lessons, but that’s another post.

Implementation note: As well as providing the Iterator to read from, you can optionally provide an object that implements the Closeable interface. This is because in the scenario I developed this for, the Iterator in question represented a stream of objects that was being generated on-the-fly from a live database connection, and implemented Closeable as well as Iterator so the connection could be closed when necessary. I needed a way of doing this automatically from the Reader’s point of view, so when the Iterator runs out of data (hasNext returns false) the close method of the attached Closeable, if present, is called.

Download the file: IteratorReader.java.v0_1

All comments very gratefully received.

/**
 * IteratorReader v. 0.1
 * Andrew B. Clegg
 */
 
import java.io.Closeable;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;
 
 
public class IteratorReader extends Reader
{
 
    // The iterator from which we'll read
    private final Iterator<? extends Object> _iterator;
 
    // Optionally, an object to close when we're done
    private Closeable _closeable;
 
    // Buffer to hold character pulled from iterator before they're read
    private char[] _leftoverCharsFromLastRead = new char[ 0 ];
 
    // Flag to indicate when iterator is out of elements
    private boolean _iteratorExhausted = false;
 
    /**
     * Creates a new IteratorReader.
     * @param iterator the Iterator to read from
     */
    public IteratorReader( Iterator<? extends Object> iterator )
    {
        _iterator = iterator;
        _closeable = null;
    }
 
    /**
     * Creates a new IteratorReader whose Iterator is backed by a Closeable
     * object that must be cleanly closed when no longer needed.
     * @param iterator the Iterator to read from
     * @param closeable the Closeable object backing the Iterator
     */
    public IteratorReader( Iterator<? extends Object> iterator, Closeable closeable )
    {
        _iterator = iterator;
        _closeable = closeable;
    }
 
    /**
     * Closes the Closeable object on which this reader's Iterator depends.
     * If there is no such Closeable, or it has already been closed, this
     * method does nothing. This method is automatically called when Iterator's
     * hasNext method returns false, but can be called earlier.
     * @throws IOException if the Closeable encounters a problem when closing
     */
    @Override
    public void close() throws IOException
    {
        if( _closeable != null )
        {
            _closeable.close();
            _closeable = null;
        }
    }
 
    /**
     * Reads characters into a portion of an array. See Reader.
     * @param outBuf array to copy the characters into
     * @param outBufOffset offset at which to start storing characters
     * @param charsRequested maximum number of characters to read
     * @return the number of characters read, or -1 if the end of the iterator has been reached
     * @throws IOException if the Closeable encounters a problem when closing
     */
    @Override
    public synchronized int read( char[] outBuf, int outBufOffset, int charsRequested ) throws IOException
    {
//        System.out.format( "read called: outBufOffset=%d, charsRequested=%d\n", outBufOffset, charsRequested );
//        System.out.format( "current state: _leftoverCharsFromLastRead has %d characters, _iteratorExhausted=%b\n",
//                _leftoverCharsFromLastRead.length, _iteratorExhausted );
 
        // Have we already read enough characters from the iterator to feed this request?
        if( charsRequested <= _leftoverCharsFromLastRead.length )
        {
            // Yes, we already have enough characters, copy them into output buffer
            System.arraycopy( _leftoverCharsFromLastRead, 0, outBuf, outBufOffset, charsRequested );
            // Are there any left over?
            int remainder = _leftoverCharsFromLastRead.length - charsRequested;
            assert( remainder >= 0 );
            if( remainder > 0 )
            {
                // Copy remaining characters to new buffer (i.e. shrink buffer)
                char[] tempBuf = new char[ remainder ];
                System.arraycopy( _leftoverCharsFromLastRead, charsRequested, tempBuf, 0, remainder );
                _leftoverCharsFromLastRead = tempBuf;
            }
            else
            {
                // None left over, so reset buffer to zero-length
                _leftoverCharsFromLastRead = new char[ 0 ];
            }
            // Return the number of characters read
            // (in this case, all the characters requested)
            return charsRequested;
        }
        else
        {
            // We have been asked for more characters than we currently have, so we
            // can return what we have (if there are no more in the iterator) or
            // try to acquire more from the iterator
 
            // If iterator is exhausted and read has been called again, clean up and
            // return straight away, after copying as many characters as we have left
            if( _iteratorExhausted )
            {
                int charsAvailable = _leftoverCharsFromLastRead.length;
                if( charsAvailable == 0 )
                {
                    // Nothing in the iterator or the buffer, we're done
                    return -1;
                }
                else
                {
                    // Copy what we have into output buffer
                    System.arraycopy( _leftoverCharsFromLastRead, 0, outBuf, outBufOffset, charsAvailable );
                    // Clean up our own buffer and return number of characters copied
                    _leftoverCharsFromLastRead = new char[ 0 ];
                    return charsAvailable;
                }
            }
            else
            {
                // There's still data in the iterator, so we can attempt to satisfy the whole request
                // by doing another read -- open a stringbuilder of the desired length
                StringBuilder sb = new StringBuilder( charsRequested );
                // Insert however many characters we do have and reset our buffer to zero-length
                if( _leftoverCharsFromLastRead.length > 0 )
                {
                    sb.append( _leftoverCharsFromLastRead );
                    _leftoverCharsFromLastRead = new char[ 0 ];
                }
                int charsStillRequired = charsRequested - _leftoverCharsFromLastRead.length;
                // Iteratively add new strings until no more characters are required
                while( charsStillRequired > 0 && !_iteratorExhausted )
                {
                    // Read another string from the underlying iterator
                    String string = nextString();
                    // Add it to stringbuffer
                    sb.append( string );
                    // Adjust number still required
                    charsStillRequired = charsStillRequired - string.length();
                }
                // Did we read to the end of the iterator?
                if( _iteratorExhausted )
                {
                    // We have read all the strings from the iterator, but can only return
                    // as many characters as we managed to read, or as many as were requested,
                    // whichever is lower
                    int charsObtained = sb.length();
                    char[] tempBuf = sb.toString().toCharArray();
                    // charsToReturn is the number of chars requested, or obtained, whichever is lower
                    int charsToReturn = Math.min( charsRequested, charsObtained );
                    // Copy this many characters into output buffer
                    System.arraycopy( tempBuf, 0, outBuf, outBufOffset, charsToReturn );
                    // Do we have any left over in our buffer?
                    if( charsObtained > charsRequested )
                    {
                        // Yes -- more obtained than requested -- save them for next request
                        int charsToSave = charsObtained - charsRequested;
                        assert( charsToSave + charsToReturn == tempBuf.length );
                        _leftoverCharsFromLastRead = new char[ charsToSave ];
                        System.arraycopy( tempBuf, charsToReturn, _leftoverCharsFromLastRead, 0, charsToSave );
                    }
                    if( charsObtained == 0 )
                    {
                        // No characters left in buffer or iterator; return -1 immediately
                        _leftoverCharsFromLastRead = new char[ 0 ];
                        return -1;
                    }
                    else
                    {
                        // There are some remaining in buffer for next time, so just return
                        // the number we acquired this time
                        return charsToReturn;
                    }
                }
                else
                {
                    // sb now contains text to return, and there are more strings to iterate through.
                    // We can save a bit of effort by putting the entire contents of sb into
                    // our 'leftover' characters buffer, and calling this method again to copy it over
                    _leftoverCharsFromLastRead = sb.toString().toCharArray();
                    return read( outBuf, outBufOffset, charsRequested );
                }
            }
        }
    }
 
    private String nextString() throws IOException
    {
        // This should never get called after _iteratorExhausted has been set
        assert( !_iteratorExhausted );
 
        if( _iterator.hasNext() )
        {
            return _iterator.next().toString();
        }
        else
        {
            _iteratorExhausted = true;
            close();
            return "";
        }
    }
 
}

Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles



Latest Images