Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

HTML Pretty Printing and Beatifying

News Recommended Links  Webmaster Toolset Perl HTML Processors and Converters

Since most documents in the world are getting converted to HTML format, and in many case they are tranformed form different format using some kind of automated tool,  the HTML beautifier is immensely important.

FrontPage provides good HTML pretty printer. It is also available in its free version SharePoint Designer 2007

The other well established package with known (pretty high) quality is tidy. It is written is C and as such it is difficult to maintain, but it does its job well.

tidy is available as a package and can be installed in Cygwin, so it can be used in Windows too. Cygwin for some reason wants to install Gnome with it :-)

 


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Mar 04, 2020] A command-line HTML pretty-printer Making messy HTML readable - Stack Overflow

Jan 01, 2019 | stackoverflow.com

A command-line HTML pretty-printer: Making messy HTML readable [closed] Ask Question Asked 10 years, 1 month ago Active 10 months ago Viewed 51k times


knorv ,

Closed. This question is off-topic . It is not currently accepting answers.

jonjbar ,

Have a look at the HTML Tidy Project: http://www.html-tidy.org/

The granddaddy of HTML tools, with support for modern standards.

There used to be a fork called tidy-html5 which since became the official thing. Here is its GitHub repository .

Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.

For your needs, here is the command line to call Tidy:

tidy inputfile.html

Paul Brit ,

Update 2018: The homebrew/dupes is now deprecated, tidy-html5 may be directly installed.
brew install tidy-html5

Original reply:

Tidy from OS X doesn't support HTML5 . But there is experimental branch on Github which does.

To get it:

 brew tap homebrew/dupes
 brew install tidy --HEAD
 brew untap homebrew/dupes

That's it! Have fun!

Boris , 2019-11-16 01:27:35

Error: No available formula with the name "tidy" . brew install tidy-html5 works. – Pysis Apr 4 '17 at 13:34

Perl HTMLPrettyPrinter - Handling self-closing tags

Stack Overflow

Perl: HTML::PrettyPrinter - Handling self-closing tags

I am a newcomer to Perl (Strawberry Perl v5.12.3 on Windows 7), trying to write a script to aid me with a repetitive HTML formatting task. The files need to be hand-edited in future and I want them to be human-friendly, so after processing using the HTML package (HTML::TreeBuilder etc.), I am writing the result to a file using HTML::PrettyPrinter. All of this works well and the output from PrettyPrinter is very nice and human-readable. However, PrettyPrinter is not handling self-closing tags well; basically, it seems to be treat the slash as an HTML attribute. With input like:
<img />

PrettyPrinter returns:
<img /="/" >

Is there anything I can do to avoid this other than preprocessing with a regex to remove the backslash?

Not sure it will be helpful, but here is my setup for the pretty printing:
my $hpp = HTML::PrettyPrinter->new('linelength' => 120, 'quote_attr' => 1);
$hpp->allow_forced_nl(1);

my $output = new FileHandle ">output.html";
if (defined $output) {
$hpp->select($output);
my $linearray_ref = $hpp->format($internal);
undef $output;
$hpp->select(undef),
}

perl html-formatting

shareimprove this question

asked Jan 18 '12 at 15:18



SenatorForLife
305

add a comment

1 Answer

active oldest votes



up vote

1

down vote

accepted

You can print formatted human readable html with TreeBuilder method:
$h = HTML::TreeBuilder->new_from_content($html);
print $h->as_HTML('',"\t");

but if you still prefer this bugged prettyprinter try to remove problem tags, no idea why someone need ...
$h = HTML::TreeBuilder->new_from_content($html);
while(my $n = $h->look_down(_tag=>img,'src'=>undef)) { $n->delete }

UPD:

well... then we can fix the PrettyPrinter. It's pure perl module so lets see... No idea where on windows perl modules are for me it's /usr/local/share/perl/5.10.1/HTML/PrettyPrinter.pm

maybe not an elegant solution, but will work i hope. this sub parse attribute/value pairs, a little fix and it will add single '/' at the end

~line 756 in PrettyPrinter.pm I've marked the stings that i added with ###<<<<<< at the end
#
# format the attributes
#
sub _attributes {
my ($self, $e) = @_;
my @result = (); # list of ATTR="value" strings to return

my $self_closing = 0; ###<<<<<<
my @attrs = $e->all_external_attr(); # list (name0, val0, name1, val1, ...)

while (@attrs) {
my ($a,$v) = (shift @attrs,shift @attrs); # get current name, value pair
if($a eq '/') { ###<<<<<<
$self_closing=1; ###<<<<<<
next; ###<<<<<<
} ###<<<<<<

# string for output: 1. attribute name
my $s = $self->uppercase? "\U$a" : $a;.

# value part, skip for boolean attributes if desired
unless ($a eq lc($v) &&
$self->min_bool_attr &&.
exists($HTML::Tagset::boolean_attr{$e->tag}) &&
(ref($HTML::Tagset::boolean_attr{$e->tag}).
? $HTML::Tagset::boolean_attr{$e->tag}{$a}.
: $HTML::Tagset::boolean_attr{$e->tag} eq $a)) {
my $q = '';
# quote value?
if ($self->quote_attr || $v =~ tr/a-zA-Z0-9.-//c) {
# use single quote if value contains double quotes but no single quotes
$q = ($v =~ tr/"// && $v !~ tr/'//) ? "'" : '"'; # catch emacs ");
}
# add value part
$s .= '='.$q.(encode_entities($v,$q.$self->entities)).$q;
}
# add string to resulting list
push @result, $s;
}

push @result,'/' if $self_closing; ###<<<<<<
return @result; # return list ('attr="val"','attr="val"',...);
}

Thanks for the answer. However, neither of these is a solution. Even with the tab indention option you suggested, print $h->as_HTML still runs things together in odd ways that a human never would (for example, all of the h2's are run together on the same line with the preceding p tag). Hence the use of PrettyPrinter. I think you misunderstood my miminal example with regard to PrettyPrinter. There is nothing wrong with my img tags--PrettyPrinter prints all self-closing tags as standard tags with a / property set to "/", e.g. <br /> becomes <br /="/"> SenatorForLife Jan 19 '12 at 15:05

i updated the post about how to fix the module. hope this will help Dimanoid Jan 19 '12 at 19:20

This seems to work very nicely, thank you! I'm definitely too much of a Perl newbie to have figured out the hack on my own. For other Strawberry Perl users, here's where HTML::PrettyPrinter showed up on my machine after being installed with cpanm: C:\strawberry\perl\site\lib\HTML\PrettyPrinter.pm SenatorForLife Jan 19 '12 at 20:20

Thanks. Your fix worked for me. AJ Dhaliwal Jun 19 '12 at 7:05

[Jan 18, 2012] Prettify - Beautify HTML in Perl

Stack Overflow

I have some HTML output sitting in a variable that I will like to Prettify / Beautify but struggling to make sense out of the results of my web searches.

The example in the HTML::Tidy documentation shows it uses the source parameter as a variable (which is what was requested). Thomas Dickey Mar 28 '15 at 10:55

HTML::Tidy does exactly that:

#!/usr/bin/perl
use strict;
use warnings;

use HTML::Tidy;

my $str = '<div><p>Text<h2>Heading</h2>';
my $tidy = 'HTML::Tidy'->new;
print $tidy->clean($str);

output

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content="tidyp for Linux (v1.04), see www.w3.org">
<title></title>
</head>
<body>
<div>
<p>Text</p>
<h2>Heading</h2>
</div>
</body>
</html>

Given that the HTML is not under your control, you might want to also consider HTML::PrettyPrinter if HTML::TreeBuilder generates the correct syntax tree for your HTML.

[Dec 12,2005] Programming - HTML and CSS Code Beautifier The Tabifier - Arantius.com

I've been working on this tool for a while. It's not quite done yet, but it's definitely at a useful stage. I call it the tabifier, but in truth it's a code beautifier.

The overarching design goal for this tool was to beautify HTML code without breaking it. This is of course not totally possible, but I've strived to get as close as possible. There are great HTML beautifiers out there like HTML Tidy but they generally do too much.

When I take an ugly HTML page that someone else has written and I don't know, I want to pass it through a beautifier to make it easier to work with. Things like tidy, though, will drastically alter the code, often making it more work to turn the result of the beautification back into something that displays like the original than it would be to just plain fix it.

htmlpp A Simple HTML Pretty Printer by Len Budney.

htmlpp is a simple HTML pretty printer, based on nsgmls and SGMLS.pm. The code is pretty alpha, but gives attractive results for many HTML docs. Some things, like nested tables, are rendered only passably. Other deeply-nested structures may render badly as well.

Note that this pretty-printer is oldish, and alpha, and unlikely to be developed any further. It's not a bad illustration of some of the possibilities for SGML technology in web authoring. Perhaps someone will take up the challenge, and build the "right" tool!

Since htmlpp gets its input from nsgmls, invalid documents should not be expected to work. However, a side effect of this approach is that minor errors and inconsistencies are actually fixed. Attribute values are always quoted in the pretty printed version. Characters like "<", ">" and "&" are converted into the appropriate SGML entities in attribute values and in document text. End tags are inserted automatically -- which will surprise you if you thought it was legal to imbed <pre> elements inside <p> elements, for example.

HTMLPrettyPrinter - generate nice HTML files from HTML syntax trees

[June 7, 2002] A prettyprinter for HTML documents

From the author book The Web Architect's Handbook; makes heavy use of modules:

use LWP::Simple;
use HTML::Parse;
use HTML::Entities;
use Text::Wrap;
use Getopt::Long;

[July 14, 2001] Clean up your Web pages with HTML TIDY

a free utility to fix mistakes made while editing HTML and to automatically tidy up sloppy editing into nicely layed out markup.

It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.

[July 14, 1999] hindent

HTML indentation (pretty printing) utility Mar 28th 1999, 19:16 stable: 1.0.1 - devel: none license: GPL
http://www.domtools.com/pub/hindent1.1.0.tar.gz (12 hits)
Homepage: http://www.domtools.com/unix/hindent.shtml (34 hits)
Changelog: http://www.domtools.com/pub/hindent1.1.0-changes.txt

FHTML.PL by John Watson

(Perl) Formats and indents HTML code and writes a new file with the results.

ZDNet Software Library - Pretty HTML

Pretty HTML is an easy-to-use program that formats your HTML Web pages. After processing, your HTML code is neatly arranged, commented, spaced, and indented, making it much easier to read and maintain. You can also use Pretty HTML to compress your Web pages by eliminating unnecessary spaces and carriage returns. Process your Web pages one at a time or batch-format entire folders in a single operation. Pretty HTML offers a number of options to ensure that the HTML formatting is done to your liking. To play it extra safe, you can have the program make backup copies of your originals. Excellent online help is included.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Tidy:

Etc

Validators



Etc

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019