.NN #3: Streams

Even people very new to .Net will have run into streams (classes that derive from System.IO.Stream) in the framework.  I’d think the most common usage of streams is file I/O, but it’s really just a way to handle a sequence of bytes.  You can read from streams, write to streams and even seek around in them finding what you want (if the backing store supports that). 

You’ll find a ton of examples of stream usage throughout the framework, but why are they so useful?  In my opinion it’s for two reasons:

  1. They provide a standard way to deal with I/O of data.
  2. They use the Decorator pattern.

Providing a standard way to deal with I/O means that there is a consistent way to deal with all sorts of backing stores that you may need to read/write to.  Just look at some of the stores offered out of the box:

If I need to write data to a file, or send it across the network there is a consistent API to do that.  Consistency makes for easier coding and is the sign of a good framework.  If the way you communicate to various stores is the same across the board then it makes it easier for you focus on the information you are dealing with and not how that communication works.

The Decorator pattern also makes streams incredibly useful.  DoFactory.com defines the decorator pattern as:

“Attach additional responsibilities to an object dynamically. Decorators provide a flexible alternative to subclassing for extending functionality.”

The way this works is that you can basically “nest” multiple objects that all share the same interface into a type of object chain.  When you call a method on one of the objects it then executes some code and then calls the same method on a nested instance, which in turn executes some code and call’s it’s nested instance.  This nesting allows you to write code that executes prior to the call to the nested instance as well as after so that you can affect the values being passed down the chain, as well as the result coming back up.

Streams are an excellent example of the decorator pattern, because by nesting streams you can come up with all sorts of cool ways to affect the stream.  The Framework provides you a couple of very useful stream derivatives, such as the System.IO.BufferedStream class and the System.IO.Compression.GZipStream class.  For example, you could use a FileStream nested within a BufferedStream, nested within a GZipStream to read from a file using a buffered stream and then compress that stream before say, saving it to another file.

Here’s an example of reading a file from a stream:

 public static void ReadFromFileStreamAndOutputTimingToConsole(string sourceFile)
 {
     if (string.IsNullOrEmpty(sourceFile) || !File.Exists(sourceFile))
     {
         throw new ArgumentException("You must supply a valid file to read from.");
     }
     Stopwatch readerTimer = new Stopwatch();
     long fileSize = new FileInfo(sourceFile).Length;
     readerTimer.Start();
     using (StreamReader reader = new StreamReader(sourceFile))
     {
         while (reader.Peek() >= 0)
         {
             reader.ReadLine();
         }
     }
     readerTimer.Stop();
     Console.WriteLine(
         string.Format("Simply reading from a FileStream took {0} ms for a {1} byte file.",
             readerTimer.ElapsedMilliseconds.ToString(),
             fileSize.ToString()));
 }

I use the System.Diagnostics.Stopwatch class to get a timing on how long it takes to read the file.  I used an overload on the System.IO.StreamReader class that simply takes the string path to the file.  Under the hood this will create a FileStream object to read from.  I ran this against a text file of 34 MB (the text of Bram Stoker’s Dracula from Project Gutenberg repeated several times) and it read it in about 575 ms.  Note on line 12-15 I’m looping until I’ve read the whole file line by line.  Let’s assume I’m doing something with the line I’ve read, like looking for a specific value or writing it to the console, otherwise I could have just used ReadToEnd, which would have read the entire stream filling the internal buffer each time.  ReadLine doesn’t completely fill the internal buffer, it simply reads until it finds a line terminator.

Now, let’s compare that with the following code:

  public static void ReadFromBufferedFileStreamAndOutputTimingToConsole(string sourceFile)
   {
     if (string.IsNullOrEmpty(sourceFile) || !File.Exists(sourceFile))
     {
     throw new ArgumentException("You must supply a valid file to read from.");
     }
     Stopwatch readerTimer = new Stopwatch();
     long fileSize = new FileInfo(sourceFile).Length;
     readerTimer.Start();
   
     using (FileStream sourceFileStream = new FileStream(sourceFile, FileMode.Open, FileAccess.Read))
     {
         using (BufferedStream bufferedStream = new BufferedStream(sourceFileStream, 5242880))
         {
             using (StreamReader reader = new StreamReader(bufferedStream))
             {
                 while (reader.Peek() >= 0)
                 {
                     reader.ReadLine();
                 }
            }       
         }
     }
  
     readerTimer.Stop();
     Console.WriteLine(
         string.Format("Reading from a buffered stream took {0} ms for a {1} byte file.",
             readerTimer.ElapsedMilliseconds.ToString(),
             fileSize.ToString()));
  }

In this example I’ve created a FileStream specifically, then passed that into the constructor of a BufferedStream (with a buffer size of 5,242,880, or 5 MB), then passed that to a StreamReader.  I then read the stream line by line again.  The timing for this method on the same file was 296 ms.  The built in buffering from the BufferedStream kept the StreamReader from having to go to the file system for every read.  Instead, the first call to read down the chain caused the BufferedStream to read in 5 MB worth of data and then pass only a line at a time out to it’s caller when ReadLine was called.  When the buffer in the BufferedStream is completely read it goes back to the underlying stream, in this case the file, and reads out another 5MB chunk.  This is more efficient (see notes below).

The point I’m making isn’t about using BufferedStreams over simple StreamReaders, but rather, that the power of streams comes from their use of the Decorator pattern. 

Note, please bear in mind the following:

  • The timings were not scientifically gathered.  I simply ran the code a few times and took the best times from each method.
  • The times gathered for these methods would be affected by the speed of the machine, the speed of the hard drive (mostly this one) and what else happened to be going on.  I have a dual core 2.16 GHtz machine, with a 7200 RPM hard drive.  Your mileage may vary.
  • Note that if you are very familiar with the type of data you are reading you could probably tailor the buffer size much better than I did.
  • Also note that if you attempt to read an amount of bytes from a BufferedStream that is more than the amount it usually keeps in it’s buffer the class is smart enough to bypass some of it’s logic and read you the full amount you want from the underlying stream in one go.  Nice touch. 
  • I was using ReadLine which is reads an arbitrary number of bytes until it finds a line terminator (usually a carriage return or line feed).  For pure data files, or other backing stores, you may need to read in more specific buffer sizes and then the BufferedStream can be be tailored to provide much better throughput.

As always, please leave comments if you find the .Net Nugget informative, interesting, or just plain wrong.