PDA

Click to See Complete Forum and Search --> : Strip Whitespace from Between HTML Tags


tomcatexodus
Apr 7th, 2010, 05:36 PM
I've found that whitespace between HTML tags can bloat the file considerably. After reading around, I used my new found knowledge and came up with this one-liner you can use a the top of any page with HTML output to strip out the whitespace from between tags:

ob_start(function($buffer){ return preg_replace("/(>\s+<)/", "><", $buffer); });

I'm posting this for those who need it, and also to inquire, does anyone know how to optimize this further (in accuracy/efficiency/etc.)

SambaNeko
Apr 8th, 2010, 11:14 AM
Don't mean to be a jerk, but I'd say the best (most accurate and efficient) way to deal with excessive whitespace is to not create it in the first place. If you're writing HTML with big white gaps in it, stop.

kows
Apr 8th, 2010, 11:30 AM
I like clean, properly formatted mark-up. Whitespace can bloat a file, but I'd say that if it is then that's the developer's fault.

I'd have to generally agree with Samba.

tomcatexodus
Apr 8th, 2010, 08:27 PM
Undoubtedly, I agree, but we're not always working with our own layout templates.

In any case, would either of you have any input regarding the regex?

Perhaps someone can share some insight on speed, whether performing a preg_replace to strip out whitespace is more time consuming than sending the data to begin anyways.

Just sharing :)

kows
Apr 9th, 2010, 01:54 PM
I've made similar callback functions to strip out all possible whitespace inside of a CSS file. I didn't notice any issues doing it, and it reduced the file sizes of the stylesheets being used considerably (which was the point) -- all on top of using gzip to compress the files as well.

SambaNeko
Apr 9th, 2010, 02:48 PM
Here's Yahoo Developer Network's Best Practices for Speeding up your pages (http://developer.yahoo.com/performance/rules.html). They recommend minifying JS and CSS, but don't explicitly mention HTML in the same category. In their source files, I don't see them practicing all of what they preach though; I can point out very few big sites that go to such rigorous lengths, in fact.

That's the thing to me: unless you're targeting a low-bandwidth audience, how much does this stuff really matter? There are optimization practices I always endeavor to follow, but some seem to detract value that isn't made up for by a few milliseconds gain. Like if I need to troubleshoot the output HTML and I view source to find a giant text blob, that's not helpful.

In any case, you may want to add newline and tab characters to your regex, if you really want to go full-bore on this.

kows
Apr 9th, 2010, 05:49 PM
the whitespace character he's using (\s) targets spaces, tabs, and line breaks.