Using .NET HttpClient to capture partial Responses

January 29, 2014 • from Maui, HI • 22 comments

On this page:

Over the last few days I’ve been struggling with an issue to capture HTTP content from arbitrary URLs and read only a specified number of bytes from the connection. Seems easy enough, but it turns out that if you want to control bandwidth and only read a small amount of partial data from the TCP/IP connection, that process is not easy to accomplish using the new HttpClient introduced in .NET 4.5, or even HttpWebRequest/Response (on which the new HttpClient is based) because the .NET stack automatically reads a fairly large chunk of data in the first request – presumably to capture the HTTP headers.

I’ll start this post by saying I didn’t find a full solution to this problem, but I’ll layout some of the discoveries I made in my quest for small byte counts on the wire some of which partially address the issue.

Why partial Requests? Why does this matter?

Here’s some background: I’m building a monitoring application that might be monitoring a huge number of URLs that get checked frequently for uptime. I’m talking about maybe 100,000 urls that get on average checked once every minute. As you might expect hitting that many URLs and retrieving the entire HTTP response, when all you need are a few bytes to verify the content would incur a tremendous amount of network traffic. Assuming a URL requested returned an average of 10k bytes of data, that would be 1 gig of data a minute. Yikes!

Using HttpClient with Partial Responses

So my goal was to try and read only a small chunk of data – say the first 1000 or 2000 bytes in which the user is allowed to search for content to match.

Using HttpClient you might do something like this:

[TestMethod]
public async Task HttpGetPartialDownloadTest()
{
    //ServicePointManager.CertificatePolicy = delegate { return true; };

    var httpclient = new HttpClient();
    var response = await httpclient.GetAsync("http://weblog.west-wind.com/posts/2012/Aug/21/An-Introduction-to-ASPNET-Web-API",
                                                HttpCompletionOption.ResponseHeadersRead);

    string text = null;

    using (var stream = await response.Content.ReadAsStreamAsync())
    {
        var bytes = new byte[1000];
        var bytesread = stream.Read(bytes, 0, 1000);
        stream.Close();

        text = Encoding.UTF8.GetString(bytes);
    }

    Assert.IsFalse(string.IsNullOrEmpty(text), "Text shouldn't be empty");
    Assert.IsTrue(text.Length == 1000, "Text should hold 1000 characters");

    Console.WriteLine(text);            
}

This looks like it should do the trick, and indeed you get a result in this code that is 1000 characters long.

But not all is as it seem: While the .NET app gets its 1000 bytes, the data on the wire is actually much larger. If I use this code with a file that’s say 10k in size, I find that the entire response is actually travelling over the wire. If the file gets bigger (like the URL above which is a 110k article) the file gets truncated at around 20k or so – depending on how fast the connection is or how quickly the connection is closed.

I’m using WireShark to look at the TCP/IP trace to see the actual data captured and it’s definitely way bigger than my 1000 bytes of data. So what’s happening here?

TCP/IP Buffering

After discussion with a few people more knowledgeable in network theory, I found out that the .NET HTTP client stack is caching TCP/IP traffic as it comes in. Normally this is exactly what you want – have the network connection read as much data as it can, as quickly as possible. The more data that is read the more efficient the data retrieval in general.

But for my use case this unfortunately doesn’t work. I want just 1000 bytes (or as close as possible to that anyway) and then immediately close the connection. No matter how I tried this either with HttpClient or HttpWebRequest, I was unable to make the buffering go away.

Even using the new features in .NET 4.5 that supposedly allow turning off buffering to HttpWebRequest using AllowReadStreamBuffering=false didn’t work:

[TestMethod]
public async Task HttpWebRequestTest()
{
    var request =
        HttpWebRequest.Create("http://weblog.west-wind.com/posts/2012/Aug/21/An-Introduction-to-ASPNET-Web-API")
            as HttpWebRequest;

    request.AllowReadStreamBuffering = false;
    request.AllowWriteStreamBuffering = false;

    Stream stream;
    byte[] buffer;
    using (var response = await request.GetResponseAsync() as HttpWebResponse)
    {
        stream = response.GetResponseStream();
            
        buffer = new byte[1000];
        int byteCount = await stream.ReadAsync(buffer, 0, buffer.Length);
        request.Abort();  // call ASAP to kill connection          
        response.Close();
    }
    stream.Close();

    string text = Encoding.UTF8.GetString(buffer);

    Console.WriteLine(text);
}

Even running this code I get exactly 19,934 bytes of text from a response according to the Wireshark trace, which is not what I was hoping for.

Then I also tried an older application that uses WinInet doing a non-buffered read. There I also got buffering, although the buffer was roughtly 8k bytes which is the size of my HTTP buffer that I specify in the WinInet calls. Better but also not an option because WinInet is not reliable for many simultaneous connections.

TcpClient works better, but…

Several people suggested using TcpClient directly and it turns out that using raw TcpClient connections does give me a lot more control over the data travelling over the wire.

Using the following code I get a much more reasonable 3k data footprint:

[TestMethod]
public void TcpClient()
{

    var server = "weblog.west-wind.com";
    var pageName = "/posts/2012/Aug/21/An-Introduction-to-ASPNET-Web-API";
    int byteCount = 1000;

    const int port = 80;
    TcpClient client = new TcpClient(server, port);

    string fullRequest = "GET " + pageName + " HTTP/1.1\nHost: " + server + "\n\n";
    byte[] outputData = System.Text.Encoding.ASCII.GetBytes(fullRequest);

    NetworkStream stream = client.GetStream();
    stream.Write(outputData, 0, outputData.Length);

    byte[] inputData = new Byte[byteCount];

    var actualByteCountRecieved = stream.Read(inputData, 0, byteCount);

    // If you want the data as a string, set the function return type to a string
    // return 'responseData' rather than 'inputData'
    // and uncomment the next 2 lines
    //string responseData = String.Empty;
    string responseData = System.Text.Encoding.ASCII.GetString(inputData, 0, actualByteCountRecieved);

    stream.Close();
    client.Close();

    Console.WriteLine(responseData);

It’s still bigger than the 1,000 bytes I’m requesting, but significantly smaller than anything I was able to get with any of the Windows HTTP clients.

Unfortunately, using TcpClient generically is not a good option for my use case. I need to hit generic URLs of all kinds and I really don’t want to re-implement a full Http client stack using TcpClient… Implementing SSL, authentication of all sorts, redirects, 100 continues etc. is not a trivial matter – especially SSL.

Why not use HEAD requests?

Http also supports HEAD requests, which retrieves only the HTTP headers. This is often ideal for monitoring situations as it doesn’t bring back any content at all.

Unfortunately in my scenario this is not going to work, at least not for everything. First I need to look at content to determine that the content – not just the headers – are valid. The other problem is that the target URL’s server has to support HEAD requests – not something that’s a given either. ASP.NET and IIS’s default entries in web.config in the past didn’t include HEAD requests for handlers, which would make HEAD requests fail immediately.

So again, for generic URL access this isn’t going to work although it might be good for an option.

What about Range Headers?

HTTP 1.1 supports the concept of range headers, which allow for retrieving partial responses. It’s meant for large files and sending those files in chunks so that individual chunks can be re-loaded if a transmission is aborted. Ranges are easy to grab from the server by requesting a range.

A range request can look as simple as this:

GET http://west-wind.com/presentations/DotnetWebRequest/DotNetWebREquest.htm HTTP/1.1
Range: bytes=0-1000
Host: west-wind.com
Connection: Keep-Alive

Here I’m simply asking for the range of bytes between 0 and 1000. Normally you’re also suppose to send an etag – the normal flow goes: Call the page with a HEAD request, get the size and an ETAG, then start using Range request to chunk the data from the server. The server responds with a 206 Partial Response and only physically pushes down the requested number of bytes.

Using HttpClient this looks like this:

[TestMethod]
public async Task HttpClientGetStreamTest()
{
    string url = "http://west-wind.com/presentations/DotnetWebRequest/DotNetWebREquest.htm";
    int size = 1000;

    using (var httpclient = new HttpClient())
    {
        httpclient.DefaultRequestHeaders.Range = new RangeHeaderValue(0, size);

        var response =  await httpclient.GetAsync(url,HttpCompletionOption.ResponseHeadersRead);
                
        using (var stream = await response.Content.ReadAsStreamAsync())
        {
            var bytes = new byte[size];
            var bytesread = stream.Read(bytes, 0, bytes.Length);
            stream.Close();
        }
    }
}

This works great – if the server supports this. The server and the request responding has to support it. Most modern Web servers support range requests natively so this works out of the box on static content. However, if content is dynamic it doesn’t work because the server generator code has to support it somehow. It works on the static HTML page I reference above, but it doesn’t work on the dynamic ASP.NET Web Log request I used in the earlier examples.

For my scenario I’m going to always add the range header in hopes that the server and link that I’m hitting support it, but chances are it doesn’t and the response will be a full response.

How to check Wire Traffic

Turns out checking what’s happening on the wire is not as trivial as you might think.

Fiddler – not a good idea

I love Fiddler and use it daily for all sorts of HTTP monitoring and testing. It’s an awesome tool, but for monitoring Wire Traffic size unfortunately it’s not well suited (I think – Eric Lawrence keeps making me realize with his nudges how little of Fiddler’s features I actually use or know about).

So initially when I wanted to see how much data was actually captured I went to Fiddler since it’s my go-to tool. But I quickly found out that no matter what I sent, Fiddler would always retrieve the entire HTTP response. Initially I just assumed that means that the HTTP client is reading the entire response, but that’s not actually the case. Fiddler is a proxy and as such retrieves requests on behalf of the client. You send an HTTP request, and Fiddler then retrieves it for you and feeds it back to your application. This means the entire response is retrieved (unless HTTP headers specify otherwise).

So, Fiddler doesn’t really help in tracking actual wire traffic.

.NET System.Net Tracing

.NET’s tracing system actually provides a ton of information regarding network operations. It tells you when it connects, reads, writes and closes connections and shows bytecounts etc. Unfortunately, it also shows some incorrect information when it comes to TCP/IP data on the wire and read through the actual interface.

To turn on Tracing for the ConsoleTraceListener:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <system.diagnostics>
    <trace autoflush="true" />
    <sources>
      <source name="System.Net" maxdatasize="1000000">
        <listeners>
          <add name="MyConsole"/>
        </listeners>
      </source>
    </sources>

    <sharedListeners>
      <add
        name="MyTraceFile"
        type="System.Diagnostics.TextWriterTraceListener"
        initializeData="System.Net.trace.log"
                />
      <add name="MyConsole" type="System.Diagnostics.ConsoleTraceListener" />
    </sharedListeners>

    <switches>
      <add name="System.Net" value="Verbose" />
      <add name="System.Net.Sockets" value="Verbose" />
    </switches>
  </system.diagnostics>
</configuration>

This works great for Tests which can directly display the console output in the test output.

One line in this trace in particular is a problem:

System.Net Information: 0 : [6708] ConnectStream#45653674::ConnectStream(Buffered 110109 bytes.)

Notice that it seems to indicate that the request buffered the entire content! It turns out that this line is actually bullshit – the connect stream is buffering, but it’s not buffering whatever that byte value is. The actual data on the wire ends up being only 19,934 so this line is definitely wrong.

Between this line and the lines that show the actual data read from the connection and the final count, the values that come from the system trace are not reliable for telling what actual network traffic was incurred.

WireShark

So, that led me back to using WireShark. WireShark is a great network packet level sniffer and it works great for these sorts of things. However, I use WireShark once a year or less so every time I fire it up I forget how to set up the filters to get only what I’m interested in. Basically you’ll want to filter requests only by Http traffic and then look through all the captured packets that have data which is tedious. But I can get the data that I need. From this I could tell that on the long 110k request I was not reading the entire response, but on smaller responses I was in fact getting the entire response.

Here’s what the trace looks like on the 110k request (using HttpWebRequest), which is reading ~19k of text:

BTW, here’s a cool tip: Did you know that you can take a WireShark pcap trace export and view it in Fiddler? It’s a much nicer way to look at Http requests, than inside of Wireshark.

To do this:

In WireShark select all packets capture
Go to File | Export | Export as .pcap file
Go into Fiddler
Go to File | Import Sessions | Packet Capture
Pick the .pcap file and see the requests in the browser

This may seem silly since you could capture directly in fiddler but remember that Fiddler is a proxy so it will pull data from the server then forward it. By capturing with WireShark at the protocol level you can see what’s really happening on the wire and by importing into Fiddler you can see truncated requests.

Once imported into Fiddler, I can now see more easily what’s happening. The reconstructed trace in Fiddler from my test looks like this:

This is the WireShark imported trace. The response header shows the full content-length:

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Vary: Content-Encoding
Server: Microsoft-IIS/7.0
Date: Sat, 11 Jan 2014 00:30:41 GMT
Content-Length: 110061

but the actual content captured (up to to the highlighted nulls in the screen shot) is exactly 19,934k. Repeatedly. So this tells us the response is indeed getting truncated, but not immediately – there’s buffering of the HTTP stream.

However, if you look at a network trace, you’ll find that that the actual data that was sent is actually much larger. I chose this specific URL because it’s about 110k of text (yeah, a long article 😃). If you chose a smaller file that is say 10-20k in size you’ll find that the entire file was sent. Here with the 110k file I noticed that the actual data that came over the wire is about 20k. While 20k is a lot better than 110k, it’s still too much data to be on the wire when I’m only interested in the first 1000 bytes.

Where are we?

As I mentioned on the outset of this post – I haven’t found a complete solution to my problem at this point. There are a number of ideas to reduce the traffic in some situations, but none of them work for all cases.

I think moving forwardt the best option for this particular application likely will be to create a TCP/IP client and handle the ‘simple’ requests and turn on a byte count with some extra padding for the expected header size. Basically plain URL access without HTTPS, I can handle with the TCP/IP client. For HTTPS requests, Authentication, Redirects etc. then I have to live with the HttpClient/HttpWebRequest behavior and applying Range headers to everything to limit the data output from the server if it happens to be supported.

I’m hoping by posting here, somebody might have some additional ideas about how to limit the initial Http read buffer size for HttpWebRequest/HttpClient.

Resources

WireShark
Fiddler
My original StackOverflow Post from which this was compiled (thanks to Shawty and Darrel Miller for their help)

The Voices of Reason

Harry Athey
January 30, 2014

# re: Using .NET HttpClient to capture partial Responses

If all you are trying to do is verify that the site is functional, why not try something like getting the full request once in a while (depending on what frequency is important for the individual site, or policy or whatever). Then for the remainder of the minute by minute checks, grab one of the smaller includes from the original request and store the link, use that over and over to see if the item is available.

1) Get the whole page, parse for a typically small include (css?) and store the link.
2) every minute check the link file to see if you get a response
3) after a timeout period repeat step 1.

I know it doesn't solve the original problem of trying to only get the first 1K, but it may be a work around.

Harry

Stephen Cleary
January 30, 2014

# re: Using .NET HttpClient to capture partial Responses

HttpWebRequest.AllowReadStreamBuffering is not currently implemented (it's always false and throws an exception if you set it to true).

I believe the buffering you're seeing is the read buffer on the TCP/IP socket connection (8KB by default). I haven't tried this, but the first thing I'd attempt is to limit ServicePoint.ReceiveBufferSize.

ficedula
January 30, 2014

# re: Using .NET HttpClient to capture partial Responses

For non-SSL connections a (hacky!) solution that comes to mind is to write a class that listens on a socket, and when an incoming connection is received, makes an outgoing TCP connection to a predefined endpoint and forwards data between the two - automatically terminating the connection after X bytes. It only has to listen on localhost, and you can create a number of them listening at once easily enough. Kind of like a dumb, protocol agnostic proxy where the only purpose is that it cuts you off after X bytes ;) Then instead of connecting your web requests directly to the target url, fire up one of these proxies pointing at the target url, and point the web request at localhost:whateverport.

I imagine SSL connections would scream if you tried that, of course, since you're connecting to localhost and getting a cert back for the actual remote server ;) Might sort out all your non-SSL cases, though.

There'd probably be some overhead in creating one of these proxies and getting it to listen - although I suppose you could start a number of them listening and reassign the 'target' of an existing listener on the fly, effectively maintaining a pool of them already listening ready to go.

Since it /is/ a hack, you'd need to /really/ want to limit bandwidth more than your existing workarounds do for it to be worthwhile though!

MarcelDevG
January 30, 2014

# re: Using .NET HttpClient to capture partial Responses

Isn't the problem caused by the sending party ?
When you do a GET of a dynamic page, the sender (webserver) makes up the page, and at one point starts sending.
You have little control on how much you will get. The only thing what will help is the use of range parameter. That's all.

Just my first thought, could by wrong....

Marcel

Rick Strahl
January 31, 2014

# re: Using .NET HttpClient to capture partial Responses

@Harry - clever idea and yes that might work. Read for success, then just do HEAD requests for a bunch of requests, then check again on an interval. Might get into trouble though if the request returns variable data. There's no way to tell if something isn't right. Like the idea though and might do something along those lines.

@Stephen - I thought HttpWebRequest.AllowReadStreamBuffering was enabled in .NET 4.5. I'm using it in one of the samples and it definitely doesn't throw when assigning to it.

@Marcel - yes the sending party sends the data - all of it, but the client has to actually read the incoming stream in order for it to travel over the wire - the client pulls the data down - nothing happens until the client connects and reads the TCP/IP stream. If the client decides to quit it can at anytime terminate the streaming. The problem is that we can't control the stream very well as the TCP/IP stack seems to agressively read the first batch. Normally this is what you'd want, but in this case not so much. It'd be really nice if there were some ways to limit the initial read as an option.

Remco
January 31, 2014

# re: Using .NET HttpClient to capture partial Responses

What about calling native api's? (winhttp.dll or wininet.dll) did you consider / tried this?

Rick Strahl
January 31, 2014

# re: Using .NET HttpClient to capture partial Responses

@Remco - yup - tried WinInet and it's a count (around 8-10k) on the first read. Better but still too much potentially.

SteveC
February 02, 2014

# re: Using .NET HttpClient to capture partial Responses

It generally makes me sad when Rick ends an article or post without a "win".

Rick Strahl
February 03, 2014

# re: Using .NET HttpClient to capture partial Responses

@Steve - Me too, me too! But I thought that this info here might still be useful and maybe, just maybe end up at some point with a solution from readers or Microsoft as they address this type of issue.

Paulo Morgado
February 03, 2014

# re: Using .NET HttpClient to capture partial Responses

I haven't used it myself yet, but Message Analyzer seems to be great on Windows as it can aggregate multiple sources of information:

http://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-71-Message-Analyzer-Part-1

http://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-72-Message-Analyzer-Part-2

http://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-73-Message-Analyzer-Part-3

Snixtor
March 07, 2014

# re: Using .NET HttpClient to capture partial Responses

Sounds like you need something *like* the HttpRequest.GetBufferlessInputStream method. http://msdn.microsoft.com/en-us/library/ff406798(v=vs.110).aspx

"This method provides an alternative to using the InputStream property. The InputStream property waits until the whole request has been received before it returns a Stream object. In contrast, the GetBufferlessInputStream method returns the Stream object immediately."

Of course, it's still not a solution for your scenario, which probably just adds to the frustration to know there's something appropriate in a different context.

Rick Strahl
March 07, 2014

# re: Using .NET HttpClient to capture partial Responses

@Snixtor - Looks like that's for the Web Server, not the client. This lives in System.Web, meaning this deals with the ASP.NET InputStream rather than the HttpClient/HttpWebRequest response stream. still interesting though - I didn't know about this actually, looks like a newer API specifically put in place to support OWin scenarios.

Nikolay
May 26, 2014

# re: Using .NET HttpClient to capture partial Responses

Hi Rick, I allso like the idea with 1 full request every hour and then just simple checks for HTTP 200 OK . Example for the checks:
HttpClient httpClient = new HttpClient();
httpClient.MaxResponseContentBufferSize = 1000;
httpClient.Timeout = new TimeSpan(0, 0, 10);
HttpRequestMessage message = new HttpRequestMessage(
HttpMethod.Get, itemUri);
var response = await httpClient.SendAsync(
message, cancellationToken).ConfigureAwait(false);

item = response.IsSuccessStatusCode;

steveC
June 23, 2015

# re: Using .NET HttpClient to capture partial Responses

A year later, what's the final word on this issue?

Thanks.

Rick Strahl
June 23, 2015

# re: Using .NET HttpClient to capture partial Responses

@Stevec - unfortunately non the wiser. In my particular application I just use HttpWebRequest and set the buffer size as small as possible. If you really want something small the only way to do it is with raw sockets and even then you get a fair bit of buffering. And personally I don't want to rewrite all HTTP logic for generic requests. If you're dealing with simple scenarios of known requests then sockets is probably the best choice.

Erx
August 05, 2015

# re: Using .NET HttpClient to capture partial Responses

Rick, I've been doing this for some years now, successfully. I asked a question on the net around 2006 and they pointed me to something, and I got some code, changed it around and had it working, it still works. In fact I was playing around with it today still working on the project. Here is the code, if you want, I'll send you my version of it as this is the code I was given as an example, but then I put together an async version of it.

Dim webRequest as HTTPWebRequest = webRequest.Create(docurl)
webRequest.Method = "GET"
webRequest.ContentType = "application/x-www-form-urlencoded"
webRequest.CookieContainer = cookies

Dim bytesProcessed As Integer = 0
Dim remoteStream As Stream
Dim localStream As Stream
Dim response As WebResponse

response = webRequest.GetResponse()
If Not response Is Nothing Then
remoteStream = response.GetResponseStream()
localStream = File.Create(targetfile)

'Declare buffer as byte array
Dim myBuffer As Byte()
'Byte array initialization
ReDim myBuffer(1024)

Dim bytesRead As Integer
bytesRead = remoteStream.Read(myBuffer, 0, 1024)
Do While (bytesRead > 0)
localStream.Write(myBuffer, 0, bytesRead)
bytesProcessed += bytesRead
bytesRead = remoteStream.Read(myBuffer, 0, 1024)

' HERE YOU NEED TO PUT IN THE RULES FOR WHEN YOU WANT TO STOP THE DOWNLOADING.
If bytesRead => 4096 Then
webRequest.Abort() ' off the top of my head
response.Close() ' off the top of my head
localStream.Close() ' and any other things that need closing
Exit Do ' Or exit sub/function etc... or do something to return your collected bytesRead here...

End If

Loop
' localStream.Close()
End If

OR, you can go to the original C# version of the above over here:
http://www.codeguru.com/columns/dotnettips/article.php/c7005/Downloading-Files-with-the-WebRequest-and-WebResponse-Classes.htm

Of course, not sure if this particular code works, but if it doesn't, then I can certainly give you the async version of the code that I am using (which works 100%) as I am using it to download the start/first portion, whatever amount of bytes I want, to download binary files (ie: images or other binaries).

Let me know what you'd like me to do. My gmail username is bm3racer so just shoot me off an email if you want to talk privately.

Erx

Erx
August 05, 2015

# re: Using .NET HttpClient to capture partial Responses

CORRECTION: THE LAST VB.NET CODE I INCLUDED WAS RUBBISH, PLEASE IGNORE IT (THE VB INCLUDED VB.NET CODE ONLY). FIXED ONE INCLUDED BELOW.

I tried to test the code after I posted it, was rubbish, so I worked on it for a few hours and now it is working perfectly, I tested it with a variety of numbers/configurations.

All you need to do is give it a remote (internet) URL, and a local file name which doesn't exist, and you're done.

Just start a vb.net console application, include this code below in a Class called Test, and have a Module with it so that you can call the class and print out its results on your console screen.

Here is the fixed vb.net console app code.

Imports System.Net
Imports System.IO

Public Class Test
Public Sub New(sRFile As String, sLFile As String)
' sRFile is the URL address you want to partially download.
' sLFile is a local file name, don't use directories as it will throw an error, just use a simple file name
' such as testweb.txt and the testweb.txt file will be saved inside your project directory, typically
' under the \bin\Debug folder where it is run from (that's it's Current Directory according to VS GUI).
Dim wRequest As WebRequest = WebRequest.Create(sRFile)
wRequest.Method = "GET"
wRequest.ContentType = "application/x-www-form-urlencoded"

Dim bytesProcessed As Integer = 0
Dim remoteStream As Stream
Dim localStream As Stream
Dim wResponse As WebResponse
Dim BUFFER_SIZE As Integer = 128 ' These are the bytes collected per Do Loop increment. 128 since your needs are small.
Dim DESIRED_TOTAL_SIZE As Integer = BUFFER_SIZE * 8 ' How many of the above increments do you want? 8? since you want < 1K.

wResponse = wRequest.GetResponse()
If Not wResponse Is Nothing Then
remoteStream = wResponse.GetResponseStream()
localStream = File.Create(sLFile)

'Declare and initialize buffer as byte array
Dim myBuffer As Byte()
ReDim myBuffer(BUFFER_SIZE - 1)

Dim bytesRead As Integer
Dim tsDiff As TimeSpan
Dim dtStart As DateTime = DateTime.Now
Dim dtEnd As DateTime

Do
bytesRead = remoteStream.Read(myBuffer, 0, BUFFER_SIZE)
bytesProcessed += bytesRead
localStream.Write(myBuffer, 0, bytesRead)
Console.WriteLine("Bytes Read: " & bytesRead & ", Sub Total: " & bytesProcessed & ", Buffer Length: " & myBuffer.Length & ".")

If bytesProcessed >= DESIRED_TOTAL_SIZE Then
wRequest.Abort()
wResponse.Close()
remoteStream.Close()
localStream.Close()

dtEnd = DateTime.Now
tsDiff = dtEnd - dtStart

Console.WriteLine("Transfered " & bytesProcessed & " into " & sLFile & " in " & tsDiff.TotalSeconds & " Seconds.")
Exit Do
End If
Loop While bytesRead > 0
End If
End Sub
End Class

NOW PASTE THE FOLLOWING IN THE DEFAULT [ SUB MAIN() ] SECTION OF YOUR MODULE THAT SHOULD BE INCLUDED WHEN YOU CREATE YOUR FIRST EMPTY VB.NET CONSOLE APP:

Dim sURL As String = "http://www.codeproject.com/script/common/Images/404.jpg" ' Good 170KB image file for test.
Dim dtS, dtE As DateTime, tsD As TimeSpan
dtS = DateTime.Now
Dim t = New Test(sURL, "testweb.txt") ' THIS IS THE LINE THAT RUNS THE CODE, NOTHING ELSE IS NEEDED!
' THE REST OF THE STUFF ARE JUST LUXURY TIMERS TO SEE HOW LONG THINGS TAKE, THAT'S ALL, COPY PASTE IT!
dtE = DateTime.Now
tsD = dtE - dtS
Console.WriteLine("Time Taken by Wrapper: " & tsD.TotalSeconds & " Seconds.")
Console.ReadLine()
End

FANTASTIC IT'S DONE.

ALSO... Be reminded that the above VB.NET code is the synchronous version of doing a partial file download.
I also have an asynchronous version, if there is something that you don't like about this, I will be happy to send you a copy of the asynch class as well, even if you just want it as a backup, another way to do something or to read.

Let me know how it all goes.

Rick Strahl
August 05, 2015

# re: Using .NET HttpClient to capture partial Responses

@Erx - as described in my post I don't think this actually works the way you think it does. While you get the right amount of data in your code, what's on the wire does not match that. I've tried this approach and it does not work for me - I still get a close to 20k initial hit against the network when I check the low level network traffic with Wireshark.

Erx
August 06, 2015

# re: Using .NET HttpClient to capture partial Responses

Did you try the async version of the above? So, as data comes in, it gets posted to a function, the function keeps the data it wants and terminates the request. Using HttpWebRequest, I will do wireshark checks on the async version before I post if you want? If it doesn't work by limiting the bandwidth, then I'm in trouble as well, because there are things I need to do and limiting bandwidth is critical in my application.

Rick Strahl
August 06, 2015

# re: Using .NET HttpClient to capture partial Responses

@Erx - What you see in the actual HTTP response is not what's on the wire necessarily. It doesn't make a difference whether you run sync, or async etc. The buffering is an HttpWebRequest internal issue I suspect. You can only tell what's happening by looking at the low level TCP/IP packets using Wireshark and the like.

S
August 22, 2016

# re: Using .NET HttpClient to capture partial Responses

I would imagine this behaviour will be due to the TCP window all the way down at the network layer. Part of the TCP networking code involves having a "window" of packets that get sent by the server before the client acknowledges the packets. The server will send these packets before you read any data from the socket, but will stop if you close the connection.

Using the range requests is the correct solution, as this will stop the server from sending any data that you don't want. If the server does not support range requests, you would need to look at ways to reduce the TCP window if you want to stop the server from sending too many packets before you have a chance to close the socket. I'm not sure if this is even possible through any of the .Net HTTP client libraries.

Also note that based on the wireshark traces, you will receive data 1514 bytes at a time, of which 1460 bytes will be payload data (there are 14 bytes ethernet header + 20 bytes IP + 20 bytes TCP). This means that for larger web pages you will receive data in 1514 byte chunks, and cannot receive only 1000 bytes for example.

If you are waiting to read any part of the HTML, you will need to receive enough packets for all the HTTP headers plus any HTML content that you want to read.

Komron Nouri
May 18, 2017

# re: Using .NET HttpClient to capture partial Responses

So, looking at this, we could either worry about the packet sizes, or possibly include some type of additional parameter in the request to tell the server to only send back a small sample size that is within the bounds of the packet size you are looking for.

This will not work on static pages of course, however dynamic pages could take a query string or body parameter that tells the server that you are merely testing uptime, at which point the server can send back a handshake that confirms that it is alive..

That way, you minimize the traffic on the wire. This could be done in some form of base page that all of your other pages inherit from or reference in some way that listens for this parameter and automatically writes a simple response and closes the response at the application layer.

Realistically, if you are writing a generic monitoring tool, you might not be able to enforce this, however, if you are writing one just for your own purposes, this might be an easy way to accomplish your goal.

Rick Strahl's Weblog