UTF-8, PHP headers, and BOMs

BOMs in a php file just killed 5 hours of my time. Ugh.

The gist of it is, if you call PHP’s header() and you have a BOM in a file that was loaded by PHP (in my case, a language file; the BOM was <feff>), the BOM will actually trigger a header call before *your* header call. It’s a tough bug to catch b/c you almost never know if / when / why headers are sent!

Tip 1: If you find yourself chasing a header() related bug, use headers_sent($filename, $line) to pinpoint when an offending header was in fact sent.

Tip 2: To find and remove them invisible BOMs, here’s the grep call and the vim setting:

  • grep -rl $’\xEF\xBB\xBF’ * //Search folder and subfolders for BOM markers
  • vim filename //opens file in vim
  • :set nobomb //removes the BOM
  • :w //save the file
  • :q //quit

Interestingly, diff can spot BOMs and actually shows them nicely too. Cute.

Digg! delicious
Posted on Friday, March 14th, 2008 at 4:02 am and filed under Internet, geekery. Subscribe to RSS 2.0. Leave a comment or trackback.

2 Responses to “UTF-8, PHP headers, and BOMs”

  1. caos30

    Woooowwww… thanks guy, you safe my life and my psyquic health!!! ;)))

    Since one week ago i have been crazy by this error or bug!! In my case, i had a php application developed by me during the last two years (a CMS) and the last week i done a big change in all the CMS structure, and can you imagine my sorpress when after hours of work i saw a white horizontal space at top of my pages… all of the pages of the CMS. The most curious were that it showed in IE7 and FF3 (beta) but not in FF2. My experience say that when one browser don’t show well some code it means that there are something wrong (normally little).

    Well, soon i discovered that the source code of my pages showed in the browsers it had some “invisible” characters at the begin of the document. I could detect it presence moving the text cursor “between this invisible characters”. So they were there.

    The next question was to understand where and how this “invisible codes” were generated in my PHP code. And at this point i was blocked until i found your page, man!!!! ufff… thanks a lot !!!!

    I want to share here the solution for my case:

    i’ve opened each PHP file of my application (in my case with the bluefish editor for linux) and positioning the text cursor just at the begining. Attention: it is very important to do this using the keyboard “Home” key (”tecla Inicio” in spanish) for really put the text cursor before the “invisible codes” that are before the “<?php” characters. And after, select all the file pressing the keys SHIFT+End, and after press “Delete” Key. And finally rewrite “<?php”.

    In my case this was the problem: there were some invisible codes just at the begining of the php files. And the solution, obviously was deleting it.

    I ignore from where this character appears or how it do. I suspect that Notepad++ inserted it when i converted the original PHP files (in ISO-8859-1 latin encoding) to UTF-8 encoding, because it was one of the jobs i did for do better my CMS (because i’m from spanish language country). But i’m not sure if the Notepad++ application (for windows) was the cause :|

    I hope my explanation will help anyone more (and will be indexed by searchers ;)

    Thanks!

  2. Ian

    Heh. I’ve run into this before too. Lucky for me I use jEdit which differetiates between BOM and non-BOM UTF and it took a lot less time to figure out that there was a BOM included in the file.

Leave a Reply