How to Download and Extract Gzipped Files with C#

In today’s data-driven world, you might find yourself needing to download and process files regularly. A common task involves downloading gzipped files, but doing so in C# can be tricky if you’re not familiar with the right methods. This guide will walk you through the steps to download and extract a gzipped file programmatically using C#.

The Problem: Downloading Gzipped Files

Many websites host data in compressed formats like gzip to save bandwidth. For example, the URL http://data.dot.state.mn.us/dds/det_sample.xml.gz points to a gzipped XML file. If you need to download, extract, and save this file to your disk, you’ll need to know how to do so effectively in C#.

The Solution: Step-by-Step Guide

1. Download the Gzipped File

To download a gzipped file, you can use either the WebRequest or WebClient classes available in .NET. Below is an example using the WebClient class for its simplicity.

using System.Net;

string url = "http://data.dot.state.mn.us/dds/det_sample.xml.gz";
string savePath = @"C:\path\to\your\det_sample.xml.gz";

using (WebClient webClient = new WebClient())
{
    webClient.DownloadFile(url, savePath);
}
  • Explanation:
    • WebClient is used to download files from the web.
    • The DownloadFile method takes the URL of the gzipped file and the local path where you want to save it.

2. Extracting the Gzipped Content

After you’ve downloaded the gzipped file, the next step is to extract its contents. For this, we will use the GZipStream class.

Decompression Code Example

using System.IO;
using System.IO.Compression;

string gzippedFilePath = @"C:\path\to\your\det_sample.xml.gz";
string outputFilePath = @"C:\path\to\your\det_sample.xml";

using (FileStream fInStream = new FileStream(gzippedFilePath, FileMode.Open, FileAccess.Read))
{
    using (GZipStream zipStream = new GZipStream(fInStream, CompressionMode.Decompress))
    {
        using (FileStream fOutStream = new FileStream(outputFilePath, FileMode.Create, FileAccess.Write))
        {
            byte[] tempBytes = new byte[4096];
            int bytesRead;
            while ((bytesRead = zipStream.Read(tempBytes, 0, tempBytes.Length)) != 0)
            {
                fOutStream.Write(tempBytes, 0, bytesRead);
            }
        }
    }
}
  • Explanation:
    • First, create a FileStream to read the gzipped file.
    • Then, wrap that stream in a GZipStream to handle the decompression.
    • Finally, create another FileStream to write the decompressed content to a new file.

Summary of Steps

  1. Download the file using WebClient.
  2. Open the gzipped file with a FileStream.
  3. Wrap it in a GZipStream for decompression.
  4. Write the output to a new file using another FileStream.

By following these steps, you can easily automate the process of downloading and extracting gzipped files using C#.

Conclusion

Handling gzipped files in C# is straightforward once you understand the necessary classes and methods. With just a few lines of code, you can efficiently download and extract any gzipped content you need.

For further reading on compression and file handling in C#, consider checking out Microsoft’s official documentation.