Intent to Deprecate and Remove: Document.prototype.defaultCharset

664 views
Skip to first unread message

Philip Jägenstedt

unread,
Dec 8, 2015, 10:14:23 AM12/8/15
to blink-dev, Henri Sivonen, Anne van Kesteren

Primary eng (and PM) emails

[email protected]


Summary

Remove the readonly attribute defaultCharset from Document, which returns the fallback encoding used in some cases when the encoding could not be determined by any other means.


Motivation

This is non-standard, and my suggestion to standardize it was not met with enthusiasm. To quote Henri Sivonen of Mozilla:


I don't see legitimate use case for this. As a Web author, you should always use UTF-8 and declare that you are using UTF-8. At that point, what's the use case for knowing what the default would have been?
If someone tries to use it for something, what non-harmful uses it could have? I can think of a harmful use case: trying to guess what encoding to use for downloadable text files. (Correct way: Use UTF-8 BOM for downloadable text/plain and use Python, etc. -specific encoding declarations for downloadable scripts with #!.)
So I see this as useless and harmful if used, so I think we shouldn't spec this.

I agree with this characterization, and think that the only reason to keep it would be compatibility and/or interoperability.

Interoperability and Compatibility Risk

Standardizing would probably be a faster way to at least surface-level interop, but there is no interest in adding this in Gecko, for good reasons. Removing it will lead to decreased interoperability on the WebKit-dominated mobile web in the short term.


The use counter (see below) does show that this attribute is hit fairly often, 0.5% of page views. This is much higher than what we would usually consider acceptable, so I have searched the httparchive data from 20150101 for occurrences of "defaultCharset" and categorized the results.


About 48k of 1 million resources (not pages) were found, which is ~5%.

Looking for patterns and searching with regexes I broke it down as follows:
  • 19830 AddThis, unreachable: document.charset || document.characterSet || document.inputEncoding || document.defaultCharset
  • 12632 beacon.js, unreachable: document.characterSet || document.defaultCharset || ""
  • 5455 ebOneTag.js, where document.defaultCharset is used together with navigator.userAgent, navigator.userLanguage (not supported) and many other inputs, seemingly for fingerprinting purposes
  • 2991 unreachable of the form *.characterSet || * .defaultCharset
  • 1652 like above served from voicefive.com specifically
  • 1542 of a script called createCookie, seemingly for fingerprinting
  • 1484 WebTrends, which falls back to characterSet
  • 432 eluminate.js, which also falls back to characterSet
  • 91 AdPlayer, unreachable: document.characterSet || document.defaultCharset
  • 88 Ecommerce, unreachable: d.characterSet ? d.characterSet : d.defaultCharset ? d.defaultCharset : ""

Excluding the unreachable, this adds up to 5455 + 1542 + 1484 + 432 = 8913, making it very likely that cases like these make up the majority of the uses.

There were 1911 resources left uncategorized, from 1686 unique pages. I sorted them randomly and looked at them in turn. After going through 30 I had only found minimized duplicates of the above and other cases where removing defaultCharset would not be observable.

Not a single worrying case makes me think that removing this will be smooth sailing.

Alternative implementation suggestion for web developers

Assume no fallback encoding and instead declare the encoding of all resources.


Usage information from UseCounter

https://www.chromestatus.com/metrics/feature/timeline/popularity/428


Usage is around 0.5%. See above for why removal is still likely to be safe.


OWP launch tracking bug

https://crbug.com/567738


Entry on the feature dashboard

https://www.chromestatus.com/features/6217124578066432


The entry is assuming that deprecation will happen in M49, I will update it if not.


Requesting approval to remove too?

Yes, I'd like to include removal in this intent too. However, I'd like to do the removal after the next branch point, so that there's one release cycle with a deprecation message.


P.S. It's not obvious that we should deprecate at all when the vast majority of code hitting the use counter is harmless, but I think we should give priority to those developers who use the console to find problems, who might otherwise ask why they got no warning.

Rick Byers

unread,
Dec 9, 2015, 7:13:42 PM12/9/15
to Philip Jägenstedt, blink-dev, Henri Sivonen, Anne van Kesteren
Does IE or Edge implement this API?

Either way LGTM1 to give this a shot. Thanks for the thorough httparchive analysis, that makes it seem pretty unlikely that any of the uses are really going to care if this becomes undefined.

Rick

Philip Jägenstedt

unread,
Dec 10, 2015, 2:38:32 AM12/10/15
to Rick Byers, blink-dev, Henri Sivonen, Anne van Kesteren
Yes, I have confirmed that IE11 and Edge have document.defaultCharset ("windows-1252"). I haven't checked when it was added, but I suspect that this originated in IE. It was added to WebKit in 2005.

I see that it was even in the spec for a short while between June and December 2011. (Note that charset was later added back.)

Chris Harrelson

unread,
Dec 10, 2015, 11:31:42 AM12/10/15
to Philip Jägenstedt, Rick Byers, blink-dev, Henri Sivonen, Anne van Kesteren
LGTM2

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

TAMURA, Kent

unread,
Dec 14, 2015, 10:21:38 PM12/14/15
to Chris Harrelson, Philip Jägenstedt, Rick Byers, blink-dev, Henri Sivonen, Anne van Kesteren
LGTM3.  I trust your risk evaluation.
--
TAMURA Kent
Software Engineer, Google


Reply all
Reply to author
Forward
0 new messages