Encoding Issues Fixed

| 37 Comments

I think that I’ve fixed most of the encoding issues that I am aware of. I had to edit a few lines of MT code and add some new logic to my MT-Dispatcher.

Parts of the database are still “corrupted” because of the bug. I can fix most of it, but I won’t do it right away.

37 Comments

Checking

Am I the only one bothered by the unnecessary apostrophe in “Panda’s Only?”

David Fickett-Wilbar:

Am I the only one bothered by the unnecessary apostrophe in “Panda’s Only?”

Talk to the Alaska DOT.

Personally, I think it refers to “Panda’s Only Ole-Fashioned Malt Liqueur”

in my browser (latest firefox), the accented o in Torbjorn looks like a smeary diamond shaped road sign.

Am I the only one bothered by the unnecessary apostrophe in “Panda’s Only?”

that’s the way the sign was printed; it’s not an encoding issue, it’s an actual photograph of a sign, IIRC.

still seeing the same “diamond shape” on my end, for whatever that’s worth.

…copying and pasting shows it to actually be a black diamond with a question mark in the middle.

is there perhaps something that needs to be enabled within the browser to see foreign characters?

In IE, it shows up as a blank white box.

No. For some reason, on some pages, the blog software is converting some characters from UTF-8 encoding to ISO-8859-1 encoding. This a new error.

hmm. ISO-8859-1 is the default character encoding used by firefox when the webpage itself does not define which to use.

if that helps any, it sounds like the script is not putting in a definition for which character encoding package to use when it builds the page?

no. that can’t be, since if i force the browser to use UTF-8, it still looks the same

interestingly, it appears correctly over on the “recent comments” section.

The problem is that they are ISO-8859-1 characters on a UTF-8 page. They show up correctly if in firefox you do “View:Character Encoding:Western”.

sure enough.

that being the case, why is the script for the page defining the character set to use to be UTF-8?

I see that the script builds the UTF definition into the header, can’t you just change that to the standard western ISO?

no, better yet, looking at the form code (I assume it needs to have UTF-8 characters defined for it?), it should work to simply strip the reference to the code definition from the portion of the page script that generates the header.

leave the header definition blank, and let the specific references take care of themselves.

IOW, where the script is set to generate this tag:

meta http-equiv=”Content-Type” content=”text/html; charset=utf-8”

simply strip out the reference to the charset entirely.

that shouldn’t fubar any references to using a specific character set in later instances, yes?

hmm, I just tried that on a local copy of this page, and it correctly defaults to the ISO standard, but then the foreign characters show up as just plain question marks.

so if the charset isn’t defined in the primary header, it does seem to break the resulting form data.

but then, I’m not actually accessing the database to generate the page, so it might still work on your end?

I’m not going to do that.

good luck to you then. you do need to somehow force the main body content NOT to use the UTF8 encoding, however.

I suspect these two issues (encoding forcing and recoding UTF as ISO) are general problems for many default blog scripts.

Some ScienceBlogs have these issues (more often the former), and I specifically remember GMBM having the current problem fixed (as he is a CS he couldn’t leave it alone :-). For some reason it is the name input box, so it may be some industry error that is hereditary spread.

In any case, the current situation is similar to what happens on quite a few blogs.

Forcing of change in view (in Firefox) often happens when I comment, I think. Dunno how it happens, and I have the habit of reading the next blog while waiting for the comment update, so it may be unrelated.

As we have reached the point of diminishing returns, being a small problem for a few individuals, I can live with the current situation.

don’t you have a backup version of the site to play with?

The current issue has to do with how pages are written from the database. Everything is being entered properly, but is being output improperly.

MT has some code dealing with encoding issues and I am certain that the encoding fixes that I put together earlier today are now conflicting with one of those encoding checks.

I made another patch to the MT code and it appears to fix the final encoding issues.

Am I the only one bothered by the unnecessary apostrophe in “Panda’s Only?”

that’s the way the sign was printed; it’s not an encoding issue, it’s an actual photograph of a sign, IIRC.

I’m sure it is. I was just being crotchety, and this seemed as good a thread as any to be it on.

why is the script for the page defining the character set to use to be UTF-8?

Presumably so it can display characters from a wide range of languages, not just the European ones covered by ISO-8859-1.

appears fixed today.

whee!

nice job, Reed.

Yes, even the crotchety home page shows UTF-8 OK.

Thanks for all the hard work, Reed!

An internationalized web site is all the better. (Well, it needs content posters from at least 3 time zones distributed across the globe to be really editorially international, but you know what I mean.)

And now I know that MT may handle database issues somewhat poorly. (Doesn’t seem the likeliest explanation for ScienceBlogs varying issues, though.)

Speaking of encoding issues, how is one supposed to create a carriage return without leaving a blank line? HTML-style “br between angle brackets” doesn’t work.

The CR handling does seem to be busted; not even <code> or <pre> protects them.

Bill Gascoyne:

Speaking of encoding issues, how is one supposed to create a carriage return without leaving a blank line? HTML-style “br between angle brackets” doesn’t work.

Then try XHTML-style <br/>.

Popper’s Ghost:

The CR handling does seem to be busted; not even <code> or <pre> protects them.

<pre> is not recognized. <code> now does the same thing as XHTML’s code tag. <blockcode> is what you are looking for.

Thanks Reed, for the response and all your hard work.

OK, I’ll try
that and see if
it works.

Testing
and
learning.

So,
it
is
evidently
time
to
learn
some
XHTML!

So, it is evidently time
to learn some XHTML!

Just as long as there’s not a quiz on the stuff… :p

About this Entry

This page contains a single entry by Reed A. Cartwright published on October 9, 2007 5:40 PM.

Retrospectacle for Blogging Scholarship was the previous entry in this blog.

Eugenie Scott Lecture (and the DI Panic) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.361

Site Meter