For some time now I am trying to understand a rather strange phenomenon in my server logs. Some times when a new user comes in, there are lots (and I mean lots) of requests to certain image files. This leads to the following stats:
| File type | Hits | Bandwidth |
| Image (.gif) | 1210734 (39.2%) | 185.99 MB (4%) |
| Image (.png) | 1167648 (37.8%) | 1.85 GB (40.9%) |
So please note, I did 2GB of traffic so far in November only with image files!
I did research. It is a browser IE/Win bug,
which got inserted in version 5.5 SP2 and persists in 6.0 and
later; it does not exist in IE 5.x or older. The bug is simple,
painful and there is no known workaround (or I don't know of
any). And it affects all of us.
It seems to be a browser cache bug. Again, I can't explain how the cache worked perfectly fine in IE 5.x and stopped working in such a strange way for IE 6.0. Anyway, I presume it is related to a bug described in [1] (IE flickers when using rollover images for links “:hover” state). What I discovered is unfortunately more complex and that workaround doesn't seem to apply.
So, let's assume that we dynamically create 50 elements having a certain “background-image”, defined through CSS. So we will have the following style:
.element { background-image: url("bg.png"); }
The following code will create 50 elements that have the class “element”:
function test() { for (var i = 0; i < 50; i++) { var el = document.createElement("div"); el.className = "element"; el.innerHTML = " "; document.body.appendChild(el); } }
There's nothing strange about it; it's the DOM way to create elements and append it to the document. Visit the test page to see it happening. With any browser other than IE6, the page will display almost instantly as the image is loaded only once. Yes, even with IE5. But with IE6, you have a chance to see how each and every of those 50 images (that are actually the same) is loaded over and over from the server.
It will work only the first time you access the page; then, it is only visible after you remove the browser “cache”. This is because our server uses special cache configuration (we will talk about it later). If you are using IE5.x and IE6 on the same machine [2], please be aware that they share the cache, so in order to see the effect with IE6 you must remove the cache if you first tried it with IE5.x.
First, clear your IE cache. ;-)
Now, let's see the page but this time without any <script>-s. It has the code that our script generates, but this time it is “inline” HTML, coming directly from server, having those 50 DIV-s. Visit the second test with IE6. It loads instantly. Then go to “Tools/Internet Options” click “Delete files” and make sure the “Delete all offline content” is checked. Then click OK, OK and watch how IE requests “bg.png” several times from the server.
So it appears that IE has this problem when elements are already in the document. But how come, our page initially has all those elements in the document and the image is only requested once; then, when you clear the cache, the image is requested multiple times (it can be anything from 3-4 requests to 30 requests). This random nature drives me crazy.
[ If you want to check server logs, you'll have to download our files and put them on your own server; then just watch the access_log. The files you need are test.html, test2.html and bg.png. Right-click and select “Save as...”. ]
I found a technical note at Microsoft.com about a bug that seems closely related (in fact I believe it's the same bug). A small excerpt below:
When you run Internet Explorer, the Internet Explorer cache is not used as you expect when you run innerHTML code to insert the same image multiple times. Notice the following code samples:
myDiv1.innerHTML = "<IMG SRC='image.gif'>"myDiv2.innerHTML = "<IMG SRC='image.gif'>"The preceding code results in sending two GET requests to retrieve the Image.gif file in Internet Explorer 6.0. Three GET requests are sent to retrieve the Image.gif file in Internet Explorer 5.5. Your expectation is that the Web server would be hit only one time.
2-3 GET requests! I can understand 2 given the broken cache, but why 3? Anyway, later in the Microsoft page you can find the following statement:
STATUS
This behavior is by design.
Thank you dear Microsoft! You're making buggy products “by design”. You introduce bugs by design! This bug wasn't there in IE5.0 and 5.5 upto SP2! Thank you so much for including it, it was highly demanded!
Let's take this for true: “this behavior is by design”. Why would Microsoft add this bug? Well, believe it or not, they have valid reasons! Internet Explorer is dropping, fading away in browser statistics, worldwide, in the last years. People got tired of popups, spyware, adware, viruses, reboots, crashes. People don't want trash anymore!
Microsoft could be worried about it. Microsoft is very much interested in both having the stats show that IE is being used a lot and in the actual fact that IE is being used a lot.
This new bug makes a fresh IE6 user (that is, one who didn't
visit the bage before) generate much more server hits than
one with any other browser would. Microsoft is not
interested in the fact that users are wasting bandwidth, not in
the fact that Web servers are wasting bandwidth; this is not
their money. However Microsoft is interested in the fact
that Web servers got 10 times more hits from Internet Explorer
than they get from other browsers. This definitely raises
IE in server logs. All log analysing tools I'm aware of make
browser statistics based on number of hits. It is true that
after the first request, my server responds with a 304 (it's
like telling to browser IE6 “go away you fool,
you already have that image”), but however, for the
first request, the browser GET-s that same image
over and over and the server really sends it (probably because
there's a too short time between requests).
So I wish this wouldn't be true, but it seems a very plausible reason: IE needed this bug to keep showing up in the browser wars. At Dynarch.com, Internet Explorer usage has dropped from around 75% to around 61% in the last month! Cool! I'm glad to see that. However, it can't drop much more because each new IE visitor counts like 20 visitors with other browsers, so my stats will never reflect reality.
I am planning to work on a simple browser statistic tool based on a really simple idea: don't count hits; count browsers. Probably this is what needs to be done.
Something that really helps at Dynarch.com is to return an expiration time in the HTTP header when serving images. This is easily accomplished in Apache with “mod_expires”, using something like the following configuration (thanks altblue!):
LoadModule expires_module libexec/mod_expires.so
AddModule mod_expires.c
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
</IfModule>
In this example I configure the expiration date for any images as being the access time + one month. Then, when IE will ask for an image that he already got, the server will reply with a 304 (“not modified”) instead of returning the full image again. Note that a server hit is still involved (therefore, given IE's broken cache, multiple hits are still involved) but the request round-trip is much shorter.
This explains why on subsequent requests the effect is not visible anymore; however for the first request, given a clean cache, the browser will ask for the image multiple times, as I proved.
Is this a good enough reason to ban Internet Explorer? Probably not; but then, things will take years to change. :-( It's analogue to an election fraud: IE popularity is smaller than we think! This is both good and bad news. The bad news being, Web developers are still required to spend huge amounts of time in order to make things work with Internet Explorer.