A lazily-evaluated stream wrapper for IEnumerable

Recently I found myself working with a storage API in .NET that expected a Stream parameter for the content to be persisted. In contrast to other SDKs I've worked with, like the AWS and Azure blob SDKs, this one did not give you the ability to write your output directly to a Stream if preferred. Even without that option, in most cases, representing your content as a stream is not too difficult:

  • If your content is coming from a file, or any other already Stream-deriving type (network, http response, etc.), just pass in the stream you have
  • If your content is coming from a non-stream source, or is generated in memory, serialise it and pass along a MemoryStream

But what if we need to handle content that is the result of processing some large set of inputs where the full output does not fit into memory? Or perhaps we're happy that the job does fit into memory, but aspire to be Good Memory Citizens and keep the working set down anyway?

Today, fortunately, if this was your problem - you no longer have a problem 🎉

This gist contains a simple wrapper around IEnumerable that derives from Stream, suitable for APIs like the above-mentioned that insist on managing the stream themselves. We provide the IEnumerable (which can of course be composed of any arbitrarily complex sequence of operations, over an arbitrarily large set of inputs, yielding an arbitrarily large set of outputs, not yet iterated), and a serialisation function to convert each output element into a byte[]. With these, the stream wrapper ensures the IEnumerable will only be enumerated as the downstream consumer (outside of your control) reads from the stream.

So awesome (!?), show me how!

Actually, the code for the wrapper is pretty basic, but when I scoured the web for something like this I didn't come up with much. Anyway, here I'll just show you how you can use it.

A simple example

Here we set up an enumerable that will yield a small number of strings when enumerated. I added a filter clause to help hold the reader's interest till the end.

A more complicated example

Here we consolidate the contents of some unknown number of blob inputs into a single item, performing filtering and transformation based on inspection of the input contents.

If you squint a little at the above, imagine that ListBlobs yields a very large number of inputs, and imagine the inner filtering and transformation to be relatively expensive in memory, it should be clear how this can help in memory-constrained situations ⛳️.

Thinking about Performance

Making the decision to move to per input processing (in the above example, per blob file) can reduce the memory requirement of your task significantly. Instead of having to hold the entire output content in memory, you need just the greater of {memory required to hold a single serialised output element} and {memory required to generate a single output element}. In most cases, there will be a performance penalty to pay, so it is a matter of balancing execution time and memory use. However, for memory constrained devices, like mobiles or that t1-micro instance on AWS that falls under the free tier, this may be the trick you are looking for.

◾️ ◾️ ◾️