|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
A Web Site is a Harsh Mistress
Early versions of the WWW developed a reputation as a versatile and convenient tool for accessing mission-critical data at the European Laboratory for Particle Physics (CERN). Paradoxically the Web tools Tim Berners-Lee developed were the most successful and were widely regarded as the best way to access the CERN phone directory. Please note, that the first successful WWW application was not distribution of published papers it was a gateway to an existing and important application. Of course, the versatility of WWW became clearer as the technology spread among high energy physics institutions and then to the outside world. But is it really an accident that the Web took off as a gateway to existing information system? I think this is not an accident and that's why WWW served as a launch pad for several scripting languages, including Perl, JavaScript, PHP and Python.
As Oleg Kiselyov noted in his Login - Speaking HTTP paper:
...HTTP is useful in its own right, for example, as a good file-distribution protocol with a number of important advantages over ftp. This article gives an example how to speak HTTP and get understood.
... By definition[1], HTTP is a request/response protocol that exchanges messages in a format similar to that used by Internet mail (MIME). An HTTP transaction is essentially a remote procedure call. It is usually a blocking call, although HTTP/1.1 provides for asynchronous and batch modes. HTTP allows intermediaries (caches, proxies) to cut into the response-reply chain.
An operation to execute remotely is expressed in HTTP as an application of a request method to a resource. Additional parameters, if needed, are communicated via request headers or a request body. The request body may be an arbitrary octet-stream. The HTTP/1.1 standard defines methods GET, HEAD, POST, PUT, DELETE, OPTIONS, TRACE, and CONNECT. A particular server may accept many others. This extensibility is a rather notable feature of HTTP. The parties can use not only custom methods but custom request and reply headers as well. In addition, a client and a server may exchange meta-information via "name=value" attribute pairs of the standard "Content-Type:" header.
Most of the HTTP transactions performed every day are done behind the scenes by browsers, proxies, robots, and servers. Yet the protocol is so simple that one can easily speak it oneself. The only requirement is a language or tool that is able to manipulate text strings and establish TCP connections. Even a simple telnet application may do in a pinch, which is often useful for debugging. Server-side programming is less demanding: a servlet or a scriptlet does not need to bother with the network connectivity, authentication, access restrictions, SSL, and other similar chores. Server modules or FastCGI give a server-side programmer even more tools: load-balancing, persistence, database connectivity, etc. This article demonstrates how to use Perl scripts to speak and respond HTTP directly.
Nikolai Bezroukov
|
"According to an article on New York Times, Microsoft researchers have discovered tens of thousands of junk Web pages, created only to lure search-engine users to advertisements. While most of us have run across them from time to time, the company researchers have found the pages are deliberately generated in vast numbers by a small group of shadowy operators. By following the money trail, Microsoft researchers were able to track the flow from big-name advertisers to search engine spammers. Many use Google's blogspot.com to set up spam doorway pages. 'The practice has proved to be a vexing problem for the major search companies, which struggle to prevent both spammers and companies specializing in improving legitimate clients' Web traffic -- a field known as search-engine optimization -- from undermining their page-ranking systems. Surprisingly, the researchers noted that the vast bulk of the junk listings was created from just two Web hosting companies and that as many as 68 percent of the advertisements sampled were placed by just three advertising syndicators.' The report is available at Microsoft Strider Search Ranger project page."
Tutorial Documentation -- tutorial gateway -- Perl-based, very good idea
WDVL CGI The Common Gateway Interface for Server-side Processing
CGI Script Tutorial and CGI Resources
Common Gateway Interface (CGI) Specifications
CGI-Resources Page
CGI
Tutorials and scripts
The Idiot's Guide to Solving Perl CGI Problems
Perl
Tutotial Start
CGI Scripts from NCSA
ENMPC: Tutorial on CGI
Perl and CGI Tutorial
CGI
Tutorial - Frames version
Matt's Perl Tutorial
Danny Aldham's
Perl CGI Tutorial Page version 1.07
Perl and CGI Tutorial
CGI Tutorial && Link
CGI Tutorial: Start
CGI Manual
CGI & Perl links on the
WWW
Perl-Related Links
CGI
Tutorial: A simple CGI script
CGI
Tutorial: What CGI scripts are
htmlpp A Simple HTML Pretty Printer by Len Budney.
htmlpp is a simple HTML pretty printer, based on nsgmls and SGMLS.pm. The code is pretty alpha, but gives attractive results for many HTML docs. Some things, like nested tables, are rendered only passably. Other deeply-nested structures may render badly as well.
Note that this pretty-printer is oldish, and alpha, and unlikely to be developed any further. It's not a bad illustration of some of the possibilities for SGML technology in web authoring. Perhaps someone will take up the challenge, and build the "right" tool!
Since htmlpp gets its input from nsgmls, invalid documents should not be expected to work. However, a side effect of this approach is that minor errors and inconsistencies are actually fixed. Attribute values are always quoted in the pretty printed version. Characters like "<", ">" and "&" are converted into the appropriate SGML entities in attribute values and in document text. End tags are inserted automatically -- which will surprise you if you thought it was legal to imbed <pre> elements inside <p> elements, for example.
HTMLPrettyPrinter - generate nice HTML files from HTML syntax trees
[June 7, 2002] A prettyprinter for HTML documents -- From the author book The Web Architect's Handbook; an interesting in that it makes heavy use of modules:
use LWP::Simple;
use HTML::Parse;
use HTML::Entities;
use Text::Wrap;
use Getopt::Long;
[July 14, 2001] Clean up your Web pages with HTML TIDY is a free utility to fix mistakes made while editing HTML and to automatically tidy up sloppy editing into nicely layed out markup.
It also works great on the atrociously hard to read markup generated by specialized HTML editors and conversion tools, and can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.
[July 14, 1999] hindent -- HTML indentation (pretty printing) utility Mar 28th 1999, 19:16 stable: 1.0.1 - devel: none license: GPL
| http://www.domtools.com/pub/hindent1.1.0.tar.gz (12 hits) | |
| Homepage: | http://www.domtools.com/unix/hindent.shtml (34 hits) |
| Changelog: | http://www.domtools.com/pub/hindent1.1.0-changes.txt |
FHTML.PL (Perl) Formats and indents HTML code and writes a new file with the results.
ZDNet Software Library - Pretty HTML
Pretty HTML is an easy-to-use program that formats your HTML Web pages. After processing, your HTML code is neatly arranged, commented, spaced, and indented, making it much easier to read and maintain. You can also use Pretty HTML to compress your Web pages by eliminating unnecessary spaces and carriage returns. Process your Web pages one at a time or batch-format entire folders in a single operation. Pretty HTML offers a number of options to ensure that the HTML formatting is done to your liking. To play it extra safe, you can have the program make backup copies of your originals. Excellent online help is included.
Perl scripts
sarep
(Console/Editors)
Command-line search and replace tool written in Perl.
Sep 16th 1998, 21:51 stable: 0.32 - devel: none - license: freely distributable
rpl is a UNIX text replacement utility. It will replace
strings with new strings in multiple text files. It can scan
directories recursively and replace strings in all files
found. Includes source, build script, and man page. Should
work on most flavors of Unix.
ftp://ftp2.laffeycomputer.com/pub/current_builds/rpl.tar.gz
ftp://ftp.laffeycomputer.com/ftp/pub/current_builds/rpl.tar.gz
replacer.pl (Perl) A utility to replace all instances of a given text string with a new text string in all the files in a single directory.
Treesed -- Freeware
Treesed, a Perl program, is a search/replace tool for lists of files. It can
search for patterns in a list of files, or even a tree of directories with
files.
Usage:
treesed pattern1 <pattern2> -files <file1 file2 ...>
treesed pattern1 <pattern2> -tree
Treesed searches for pattern1. If pattern2 is supplied pattern1 is replaced by pattern2. If pattern2 is not supplied treesed just searches. A list of files can be supplied with the -files parameter. Treesed is also capable of search/replace in files in subdirectories if you supply the -tree parameter. All files in the current directory and subdirectories are processed. Always a backup is made of the original file, with a random numeric suffix.
non-perl
[July 14, 1999] Search & Replace 98 Download -- html search and replace
Search and Replace Search and Replace 98 From: Andromeda HTML Workshop Version: 2.21 Date: July 19, 1998 File size: 263.2K Downloads: 835 License: Free Search and Replace 98 is a text search-and-replace tool that can work on single files or an entire directory of HTML pages. Search and Replace 98 can read files of up to 512K in size.
BK ReplaceEm BK ReplaceEm From: BK Computer Programming pop Version: 1.7 Date: January 5, 1998 File size: 457K Downloads: 8,538 Freeware string-replacing utility. At its core, BK ReplaceEm is a text search and replace program. However, unlike the search and replace functionality of a standard text editor, BK ReplaceEm is designed to operate on multiple text files at once. And you need not only perform one search and replace operation per file--you can set up a list of operations to perform. You can perform different operations on multiple file groups. You can also specify a backup file for each file processed, just in case the replace operation doesn't meet your expectations. This latest version adds whole-word search support and other enhancements. The file-processing engine has been completely rewritten in this version.
[June 7, 1999] CVS Version Control for Web Site Projects
Whatcha' gonna make - SunWorld - October 1998 -- make can be used for compiling a book or WEB site
Web Page Generator (Perl) This program allows the user to create a generic web page.
The problem with /usr/ucb/mail shell escapes is going stay with us for quite a while: I have found that many web sites run CGI helper scripts that send data from the network into /usr/ucb/mail, without censoring of, for example, newline characters embedded in the data.
WebMaker
| Download: | http://www.services.ru/linux/webmaker/WebMaker-0.8.0.tar.gz |
| Homepage: | http://www.services.ru/linux/webmaker/ |
WebMaker is a GUI HTML Editor for Unix. Main features include a nice GUI interface, menus, toolbar and dialogs for tag editing, multiple windows support, HTML 4.0 support, color syntax highlighting, preview with external browser, ability to filter editor content through any external program that supports stdin/stdout and KDE integration.
Web Tools -- Web Authoring Tools -- good
The Web Developers Virtual Library is an award-winning webmaster's encyclopedia.
Web Development guidelines for creating an effective Web site.
Web Site Development Primer is a beginners guide to Web site design and publishing.
The Complete Intranet Resource provides resources to research and implement Intranets.
WWW Robots, Wanderers, and Spiders are programs that traverse the Web automatically.
RadView Software Inc. - Developers of WebLoad -- stress testing
Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: May 06, 2008