||Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
|(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix|
|News||Programming style||Recommended Links||Program Understanding||Defensive programming||neatbash -- bash beautifier||Neatperl -- a simple Perl prettyprinter|
|Compilers Algorithms||Lexical analysis||Debugging||Perl Prettyprinting||C||C++||HTML|
|Multitarget verifiers||Multitarget highlighters||Multitarget beautifiers|
|Unix History||Admin Horror Stories||Software Engineering||Language Design and Programming Quotes||Humor||Random Findings||Etc|
Pretty printer Neatperl can be called a "fuzzy" pretty-printer. If does not perform full lexical analysis (which for Perl is difficult as Perl has complex lexical structure, which is further complicated by such innovation as custom delimiters in q, qq and qr strings). Instead it relies on analysis of a limited context of each line (prefix and suffix) to "guess" the correct nesting level. It does not perform any reorganization of the text other then re-indentation.
Neatperl also computer some basic statistics about code, such as number of lines without comments, the number of subroutines, number of block, etc.
For a reasonable Perl style typically found in production scripts the results are quite satisfactory. Of course, it will not work for compressed or obscured code.
This is a relatively non-traditional approach as typically prettyprinter attempt to implement full lexical analysis of the language with some elements of syntax analysis, see for example my (very old) NEATPL pretty printer ( http://www.softpanorama.org/Articles/Oldies/neatpl.pdf ) -- one of the first first program that I have written. It implements full lexical analyzer for PL/1/. Perltidy architecture is closer to this approach too.
The main advantage is that such approach allows to implement a useful pretty printer is less then 500 lines of Perl source code with around 200 lines implementing the formatting algorithm. Such small scripts are more maintainable and have less chances to "drop dead" and became abandonware after the initial author lost interest and no longer supports the script.
Neatperl does not depends on any non-standard Perl modules and it's distribution consists just of two items: the script itself and the readme file. This can be an important advantage as installing Perl modules in corporate environment often is not that simple and you can run into some bureaucratic nightmare. Also with many modules used you always risk compatibility hell. It is sad that Perl does not have zipped format as jar files for Java which allow to package the program with dependencies as a single file.
Another important advantage is the this is a very safe approach, which normally does not introduce any errors in bash code with the exception of indented HERE lines which might be incorrectly "re-indented" based on the current nesting. As Perl have very complex lexical structure which in not a context free grammar its parsing represent a daunting programming task and it can never be guaranteed to be correct. Which means that such "fuzzy" prettyprinter approach is a safer approach for such language.
But even it can mangle some parts of the script such as HERE strings in case of some "too inventive" delimiters used. In Perl the delimiter can be defined via single or double quotes string, Neatperl does not recognize perverted HERE string defined via q or qq notation.
Neatperl does not try to nest the string which start with the first position. Such strings always remains intact.
There is no free lunch, and such limited context approach means that sometimes (rarely) the nesting level can be determined incorrectly. There also might be problem with determination of the correct end of of HERE literals or q and qq literals. Missed HERE string that have non zero fixed indent can be shifted left or right which might be not a good thing (HERE stings with zero indent are safe). So fuzzy prettyprinter is best for you own scripts in which you can maintain a safe Perl style which is prettyprinter friendly. For scripts written by other people your mileage can vary but even in this case this is a great diagnostic tools. It also greatly helps to understand the scripts written by other people.
To correct this situation three pseudo-comments (pragmas) were introduced using which you can control the formatting and correct formatting errors. All pesudocomments should start at the beginning of the line. No leading spaces allowed.
Currently Neatperl allows three types of pseudo-comments:
For example, if Neatperl did not recognize correctly the point of closing of a particular control structure you can close it yourself with the directive
Also you can arbitrary increase and decrease indent with this directive
As Neatperl maintains stack of control keywords it reorganize it also produces some useful diagnostic messages.
For most scripts Neatperl is able to determine that correct nesting level and proper indentation. Of course, to be successful, this approach requires a certain (very reasonable) layout of the script. The main requirement is that multiline control statements should start and end on a separate line.One liners (control statements which start and end on the same line) are acceptable
While any of us saw pretty perverted formatting style in some scripts this typically is an anomaly in production quality scripts. Most production quality scripts display very reasonable control statements layout, the one that is expected by this pretty printer. But again that's why I called this pretty printer "fuzzy." For example, for any script compressed to eliminate whitespace this approach to pretty printing is not successful
neatperl [options] [file_to_process]or
neatperl -f [other_options] [file_to_process] # in this case the text will be replaced with formatted text, # backup will be saved in the same directoryor
cat file | neatperl -p [other_options] > formatted_text # invocation as pipe
1st -- name of the file to be formatted
This is still raw version (0.4, so it is still beta, but usable ). It works for all my scripts and script by other authors, that I tested
Google matched content