Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Perl Prettyprinting

Old News
;-)

Beautifiers and syntax highlighters

Recommended Books Recommended Links Larry Wall

Intoductions  Tutorials

Perl Reference

Editors

Filemanagers

Perl and security Tools    History and philosophy Humor

Prettyprinting is a weak side of Perl: syntax is too complex. There are several approaches to this problem:

***** DzSoft Perl Editor  Very decent shareware editor with elements of Integrated environment. Contains built-in beautifier ! by Sergey Dzyubenko, Alexander Dzyubenko, DzSoft Ltd.

pb -- far from being perfect but better than nothing

#!/usr/bin/perl

# 'pb' Perl Beautifier

# Written by P. Lutus Ashland, Oregon lutusp@arachnoid.com 5/20/96

# This script processes Perl scripts, cleans up and indents, like cb does for C

# Be careful with this script - it accepts wildcards and processes every text file
# that meets the wildcard criteria. This could be a catastrophe in the hands of the unwary.

$tabstring = "  "; # You may place any tab-stop characters you want here

if($ARGV[0] eq "") {
  print "usage: file1 file2 etc. or wildcards (replaces originals in place)\n";
}
else {
  foreach $filename (@ARGV) {
    if(-T $filename) {
      &process($filename);
    }
  }
}

sub process {
  $fn = $_[0];
  undef $/; # so we can grab the entire file at once
  undef @infa; # prevent left-overs
  print STDERR "$fn";
  open (INFILE,$fn);
  @infa = split(/\n/,);
  close INFILE;
  
  $/ = "\n"; # restore default
  
  open (OUTFILE,">$fn");
  $tabtotal = 0;
  for (@infa) {
    
    s/^\s*(.*?)\s*$/$1/; # strip leading and trailing spaces
    
    $a = $_; # copy original string
    $q = $a; # i plan to modify this copy for testing
    $q =~ s/\\\#//g; # remove escaped comment tokens
    $q =~ s/\#.*?$//g; # remove Perl-style comments
    $q =~ s{/\*.*?\*/} []gsx; # remove C-style comments
    $q =~ s/\\\{//g; # remove escaped left  braces
    $q =~ s/\\\}//g; # remove escaped right braces
    $q =~ s/\\\(//g; # remove escaped left  parentheses
    $q =~ s/\\\)//g; # remove escaped right parentheses
    
    $q =~ s/\'.*?\'//g; # remove single-quoted lines
    
# now the remaining braces/parentheses should be structural
    
    $delta = -($q =~ s/\}/\}/g); # subtract closing braces
    $delta += ($q =~ s/\{/\{/g); # add opening braces
    
    $delta -= ($q =~ s/\)/\)/g); # subtract closing parens
    $delta += ($q =~ s/\(/\(/g); # add opening parens
    
    $tabtotal += ($delta < 0)?$delta:0; # subtract closing braces/parentheses
    
    $i = ($tabtotal > 0)?$tabtotal:0; # create tab index
    
    $tabtotal += ($delta>0)?$delta:0; # add opening braces/parentheses for next print
    
    if(substr($a,0,1) ne "#") { # don't tab comments
      print OUTFILE $tabstring x $i; # "tab" out to position
    }
    
    print OUTFILE "$a\n"; # print original line
  } # -- for (@infa)
  
  close OUTFILE;
  
  if($tabtotal != 0) {
    print STDERR " Indentation error: $tabtotal\n";
  }
  else {
    print STDERR "\n";
  }
} # sub process

A Beautifier for Perl

But I figured it could not be as difficult as this makes it sound. For one thing, Perl knows how to parse Perl programs, and the Perl source code is freely available, so Perl could conceivably be reworked into a Perl beautifier.

Alternatively, the GNU source code for the popular C language beautifier "indent" is also available, so another approach would be to rework its 7k lines of C to handle the Perl language.

But I knew there was an easier approach, that would not require reworking anybody else's existing code, or the use of a language other than Perl. I knew this because my quest for beauty had already led me to write a rudimentary C++ beautifier in a three command (sed | indent| sed) shell script (UNIX/World, August 1991, p. 134.), and later a more robust C++ beautifier in 140 lines of C and shell code (Dr. Dobbs' Journal, Dec. 1992, pp. 23s-27s).

These beautifiers certainly don't qualify as "standalone parsers" for C++, because they don't classify the program elements into meaningful units. But that doesn't prevent them from doing for C++ everything that indent and cb do for C! The trick is realizing that programs written in Language B can be successfully processed by beautifiers for Language A, if Language B bears a syntactic similarity to Language A, and if the Language B program can be temporarily disguised as Language A.

So with this C++ beautification experience under my belt, and a stubborn determination to prove that "Perl Beautification" could be accomplished if sufficient Hubris, Impatience, and Laziness could be mustered, I began writing in Perl the first fully functional "Perl Beautifier", pbeaut,   in April of 1998 [1].

Beautification Strategy

As with its C++ predecessors, I approached the problem of writing pbeaut by capitalizing on the existence of mature beautification utilities for the C language, which has some fortunate syntactic similarities to Perl, and milking the UNIX filter model for all it's worth.

The basic approach, borrowed from my C++ beautifiers, is to use a pre-processor to disguise Perl code as C code, effect the beautification using standard C tools, and then convert the disguised Perl back to its original form using a post-processor.

The basic model is therefore:

Here is a listing of the first pbeaut program:

The encoder, pencode, examines every character of the Perl program, from first to last, and rewrites certain character sequences as necessary to disguise the Perl code as C.

The C beautifier, in this case GNU indent, inserts tabs to properly represent nesting levels, aligns parentheses and braces, inserts newline characters to split long lines into shorter ones, and generally fools around with the layout of the code to make it look more orderly and to emphasize the program's structure.

The decoder, pdecode, undoes the disguises crafted by pencode to reveal the hidden Perl program elements in their newly beautified context.

The current "production" version of pbeaut (version .62; 412 lines of code) communicates various types of information to and from the encoder and decoder and does extensive error checking, but its basic function is the same as the simple version shown above.

De-Obfuscation Testing

Where should one look to find the ugliest Perl code on the planet? Why the archives of past Obfuscated Perl Contests, of course, where contestants are rewarded for making their programs as inscrutable as possible (http://www.tpj.com/tpj/contest).

In this section, we'll examine the effects on Perl programs of C-style beautification using indent, as well as Perl-style beautification using pbeaut.

Here's a prize-winning entry from the 1996 contest:

After beautification with indent, it looks like this:

$ indent -npro -br -nce -npcs  < caton  # -br: brace on line with keyword
#F. First place: Russell Caton
$ -= 100;
while ((($ @) = (getpwent())[2])) {
        push(@@, $ @);
}
foreach(sort
        {
        $a <=> $b
        }
        @@) {
        (($_  <= $ -) || ($_  == ($ - +++1))) ? next : die "$-\n";
}

Perhaps surprisingly, the program layout looks pretty good, owing to the fact that Perl inherited many of its basic features from C (brace-delimited blocks, &&/|| conjunctions, semicolon line-termination, operator syntax, etc.).

On the other hand, the representations of two variables ( $- , $@ ) were altered by the insertion of a space between the symbols. Does this bother Perl?

It doesn't bother Perl a bit! The program still produces the next available number from the /etc/passwd file (or NIS database). However, having the depictions of those variables messed up is definitely likely to annoy most (non-obfuscatory) Perl programmers!

After beautification with pbeaut, the program looks like this:

As you can see, the preservation of variable names has been achieved, along with a much more Perlish representation of the foreach loop.

Here's another unattractive winning entry from the 1996 contest:

Let's try beautifying this one with indent:



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September 12, 2017