|
|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
| (slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix | |||||||
|
|
Switchboard | ||||
| Latest | |||||
| Past week | |||||
| Past month | |||||
Often Perl claims to be efficient in the scarce resource of the programmer's time. It isn't often that people tune scripts for optimum performance. Are there a few tips you can give to new Perl programmers on how squeak out a little better runtime performance?
LR: 'Script' ne 'program'
again. When dealing with upward-scalable data sets, performance becomes important. I proposed a tutorial
to Perl Conference 4.0 this year on this subject, but unfortunately it wasn't accepted.
An Interview with HP's Larry Rosler
Many people know your extensive work with Perl's regular expressions. What is the most common misunderstanding new programmers have about this pattern-matching language?
IZ: I do not remember. For me, the beginner stage was so long ago, and I try to avoid questions on c.l.p.misc which many posters have enough expertise to answer. Let me guess.
Perl's regular expressions are modeled (eventually) after command-line
parameters to grep and other similar utilities. In the command-line world, everything is a string. Bingo:
Perl regular expressions look like strings. (Let us forget for a moment that operators
qq() etc. were introduced to make strings look like
regular expressions ... .)
We have a language with binary operators (for example, `|',
`{4}', or `' - this was concatenation),
unary operators (`[]', `[^]',
`(?!)', '+'
- both postfix and aroundfix), grouping (`(?:)'),
keywords (`\w', `^'),
ternary ('{3,7}'), naming (`()')
etc. All of this is packed into a string. No wonder that even inherently unreadable languages like Tcl
or Lisp start looking like Dr. Seuss compared to regular expressions.
Additionally, newcomers do not understand that one needs to break
a regular expression into tokens (not mentioning how to do it!), all these rules about what is special
when backslashed, what is special when not backslashed and so on. To add insult to injury,
m in m//
is optional, but s in
s/// is not, //x
would require you to go into "gory details," some switches in //ioxmsg
apply to regular expressions, some to the operator in which the regular expression appears,
print /foo/, 'bar' is applied to
$_, but split
/foo/, 'bar' is not etc., etc., etc.
//x was introduced as
a clever hack around the problem of "packing a language into a string," but it went only a small part
of the way to make things more maintainable. Languages like SNOBOL introduced COBOL-style patterns,
which swing into the opposite end of the scale: things become less readable due to the sheer size of
patterns for any nontrivial task.
Regular expressions are extremely powerful tools, they are the functional-programming oasis inside a procedural language, and, when well understood, they map wonderfully into the problem domain they address. Making them into eye candy is not impossible, but requires a lot of work (and probably significant changes in the current mindsets).
Interview with Dr. Ilya Zakharevich
|
|
Switchboard | ||||
| Latest | |||||
| Past week | |||||
| Past month | |||||
... data out of this data structure, and it might never run ... the stream computes the data just as they're needed ... is more like a linked list, which means that it ...
www.plover.com/~mjd/perl/Stream/article - 27k - Cached - Similar pages
The OutRider Computing Journal
"One recurrent theme in my job as a database administrator/assistant systems administrator/systems analyst is the need to keep track of what happened on the systems while I wasn't watching. What did the cron job do last night. What did all those spooler daemons do while I was at lunch? In other words logging. It bothered me that there was a lack of simple tools for doing such a simple, redundant job. So, I set out to do build some myself. My systems programming tool of choice is Perl, so, that is language I chose for the project. This journey took me out of my normal routine of straight-line Perl programming and dumped me in the land of Modules and Object Oriented Perl. I'm glad to say it didn't overwhelm me and in fact I found it rather easy to write."
"My first order of business was to take my old standby logging routines and objectify them. I had several concise routines that I would either import into the main package through a use statement or just simply copy/paste depending on my mood and what I was doing. They consisted of four routines: start_logging, stop_logging, restart_logging and log."
"This was quick and dirty code that, while it did the job, was not very simple to use. For example, if I needed to redirect the output of a sub-process to the log file I would have to say: stop_logging(), then run that process and redirect its output to the log then restart_logging() again. It was Rather clumsy and difficult to document. So, I set about to rewrite the routines in an object oriented manner. I followed the 'Three little rules' as formulated by Larry Wall in the objperl(1) man page and restated by Damian Conway in his book "Object Oriented Perl"..."
A few weeks ago I wrote an article on how to automate FTP via the .netrc file. I received lots of excellent feedback concerning the article, most of which to tell me other ways to automate FTP. I had aimed that article at the non-programming audience. I wanted to demonstrate a very simple way to automate your FTP logins and to automatically perform certain FTP tasks. Because the volume of mail was so high I decided to write a second part to the article to detail two other ways to automate FTP. For very simple tasks the .netrc file is probably the easiest way to go. However, for more complex tasks, or tasks needing greater error checking and flexibility you will probably want to use one of the methods outlined below. This article is not meant to be a definitive reference on any of the material covered, its primary goal is to familiarize you with the methods and give you further references to learn more. Remember, doing is the best way to learn.
Using Perls NET::FTP
I have to admit, I am a PERL bigot, I love PERL and do much of my programming in PERL. Now, notice I said bigot and not god, just because I like it does not I mean I am good at it, I would say I write the worlds worst PERL code. If your wanting to do FTP with PERL then the Net::FTP module should be your first choice. Net::FTP is part of PERLs libnet package and is probably already installed on your system. In the really off chance that it is not you can head over to CPAN and get it along with easy to understand installation instructions. Net:FTP allows us to use familiar FTP commands via PERLs object oriented syntax. In order to use Net::FTP we simply have to place a use Net::FTP statement at the start of our program and make a Net::FTP object. Let us say we wanted to upload the file dailyreport.txt to someserver.com into the directory /reports everyday under the username of someuser with a password of foo. This is how we would do it with Net::FTP:
#!/usr/bin/perl use warnings; use Net::FTP; $ftp = Net::FTP->new("ftp.someserver.com") or die "Could not connect: $@\n"; $ftp->login("someuser","foo"); $ftp->cwd("/reports"); $ftp->type("ascii"); $ftp->put("reports.txt"); $ftp->close;Could it get any easier than this? The general syntax is ftpobject->ftpcommand(parameters). You will notice that all of the statements that do the actual work are done through standard FTP commands, and that is the beauty of Net::FTP, there is nothing new to learn. If you know rudimentary PERL and FTP you can use Net::FTP. Here is a short table with some of the Net::FTP methods.
Method Use login($username,$password) Will log you into the ftp server. Use anonymous without a password to login anonymously cwd($dir) Changes the current directory on the server cdup() Changes the current directory on the server to one directory level above the current directory pwd() Returns the current working directory ls() Gets the current directory contents get($remotename, $localname) Retrieves file $remotename and stores it in $localname put($localname, $remotename) Stores file $localname to the ftp server as $remotename delete($filename) Removes the file $filename from the server rmdir($dirname) Removes the directory $dirname from the server type($type) Switch to either binary or ascii transfer mode ascii() Switch to ascii transfer mode binary() Switch to binary transfer mode quit() Terminate the FTP connection Pretty simple stuff. This is not all there is to Net::FTP, you can find more information by reading the Net::FTP man page - man Net::FTP
Ncftp
Another simple choice we have is to use ncftp which once again is probably already installed on your system. Ncftp gives us a simple command line based interface to ftp. Note that ncftp can also be used as a shell just as regular ftp can. The two commands we are mainly concerned with are ncftpget and ncftpput. As you can probably guess these two commands are used to send and retrieve files from a specified ftp server. We will look at ncftpget first.
Ncftpget allows us to retrieve files from a remote server with a single command line. Try it out, get the Sim City 3000 demo from Loki with this command:
ncftpget ftp://ftp.lokigames.com/pub/demos/sc3u/sc3u-demo-x86.run
If you run the above command you should connect to the ftp server and begin downloading the Sim City 3000 demo and get a status line telling you how much you have downloaded and an estimated time left. Be forewarned though, this demo is quite large (170+MB) so you may not want to execute this unless you have good bandwidth (or a lot of time!). This is all well and good, but what if we want to log in as a specific user. Well, that is easy, with command line switches. For example the command:
ncftpget -u someuser -p yourpassword ftp.someserver.net . '/pub/README.txt'
Would login to someserver.net with a username of someuser and a password of yourpassword and retrieve the file /pub/README.txt from that server. You can get a complete list of command line options by typing ncftpget by itself on the command line (and then pressing enter of course). You can also read the ncftpget man page for even more detailed information.
Well, after having briefly covered ncftpget we can move on to ncftpput which (big surprise) works exactly the same way. So, to put a file called README.txt to someserver.net with out username and password we could use:
ncftpput -a -u someuser -p yourpassword ftp.someserver.net '/pub' 'README.txt'
Notice the -a switch to tell ftp that this is an ascii file. Also note that the remote directory (/pub) comes first, followed by the local file to upload. The interface into ncftpput is almost identical and the command line switches are pretty much the same. Once again, type ncftpput by itself to get a list of switches and you can consult the ncftpput man page with man ncftpput.
The End
I hope this helps someone out. As always in Linux, there are many other ways to perform this same task. You can check out Scurry, a neat package written by Doug Muth, to quote the README file that comes with it: This is a Perl script designed to automate FTP transfers, and transfers using Secure Copy (scp) which comes with ssh versions 1 and 2.. It looks to be a pretty slick package although I have not had time to fully try it out. You may also want to check out Expect which is a tool to automate nearly an task. Again, to quote the Expect website: Expect is a tool for automating interactive applications such as telnet, ftp, passwd, fsck, rlogin, tip, etc. Expect really makes this stuff trivial. Expect is also useful for testing these same applications. And by adding Tk, you can also wrap interactive applications in X11 GUIs.
Apache Today
Perl is ideally suited to reading in entire files, doing a bit of processing, and then print it out again. I have for years now used the same basic function of reproducing templates in Perl. The function looks like this:
sub parse_template { my ($template,%subs) = @_; open(TEMPLATE,$template) or print "I tried to load $template<br>\n"; { local $/; $_ = <TEMPLATE>; } close(TEMPLATE); foreach $sub (sort keys %subs) { $_ =~ s/\%\%$sub\%\%/$subs{$sub}/g; } return $_; }There are a few points to note about this function before we look at how best to use it. First and foremost, you'll notice that we load the entire template file into memory. This is because we want to process the file in it's entirety. The second point is that we don't actually print the template from within the function, instead, we return the translated text to the caller. This is just in case we want to use the template for something other than an active HTML page generated by a CGI script. We could use the same function to introduce templates into a static HTML file, whilst still allowing us to reproduce and parse the template in the process.
The third point is just a small nicety. If the file that's been selected doesn't exist, we print a little message to say that there's been an error. We could equally return nothing, but I prefer to be able to spot the problem. In production systems, I've actually used an SSI type error message, and also taken the time to mail an error message to the webmaster to highlight a possible problem.
Now for the important part. The second half of the function actually processes the template so that we can embed elements into the templates that can be replaced on the fly. We replace strings of the form %%string%% by using a hash which we supply to the function. The key of the hash is the string, and the value is the replacement text. For example, take the simple template:
<title>%%title%%</title>Using the function above we can print out the template using:
print parse_template('template','title' => 'This is the title text');This will produce the desired:
<title>This is the title text</title>You can create as many templates as you like, and have as many different replacement strings as you like. It'll also replace the same string a number of times, useful if you want the page title, and the title displayed within the page to be the same.
There is of course a little problem with this, in that in order for this to work, you need to have a different set of templates that support the %%text%% construct. So, the final trick is to change the way in which we search for the matching string that we want to replace. Instead of using %%text%%, you use a standard SSI construct, using a comment to encapsulate the text to be replaced. For example, we could have a template with:
<font size=+2><b><!--#include perltext=title --></b></font>Now if you use the template as an SSI include in another document, the 'replacement' text will be ignored, because the SSI system will treat it as a comment. But when parsed by an updated version of our function, the 'title' gets replaced with the desired text.
All you have to do is modify the function to replace the quoted string. I've included the full version of that function below:
sub parse_template { my ($template,%subs) = @_; open(TEMPLATE,"$template") or print "I tried to load $template<br>\n"; { local $/; $_ = <TEMPLATE>; } close(TEMPLATE); foreach $sub (sort keys %subs) { $search = quotemeta '<!--#include perltext=' . $sub . ' -->'; $_ =~ s/$search/$subs{$sub}/g; } return $_; }I've used quotemeta here to make sure that the whole string is suitable for use as a search string in the regular expression.
www.perl.com
Many people know your extensive work with Perl's regular expressions. What is the most common misunderstanding new programmers have about this pattern-matching language?
IZ: I do not remember. For me, the beginner stage was so long ago, and I try to avoid questions on c.l.p.misc which many posters have enough expertise to answer. Let me guess.
Perl's regular expressions are modeled (eventually) after command-line parameters to grep and other similar utilities. In the command-line world, everything is a string. Bingo: Perl regular expressions look like strings. (Let us forget for a moment that operators
qq()etc. were introduced to make strings look like regular expressions ... .)We have a language with binary operators (for example, `
|', `{4}', or `' - this was concatenation), unary operators (`[]', `[^]', `(?!)', '+' - both postfix and aroundfix), grouping (`(?:)'), keywords (`\w', `^'), ternary ('{3,7}'), naming (`()') etc. All of this is packed into a string. No wonder that even inherently unreadable languages like Tcl or Lisp start looking like Dr. Seuss compared to regular expressions.Additionally, newcomers do not understand that one needs to break a regular expression into tokens (not mentioning how to do it!), all these rules about what is special when backslashed, what is special when not backslashed and so on. To add insult to injury,
minm//is optional, butsins///is not,//xwould require you to go into "gory details," some switches in//ioxmsgapply to regular expressions, some to the operator in which the regular expression appears,print /foo/, 'bar'is applied to$_, butsplit /foo/, 'bar'is not etc., etc., etc.
//xwas introduced as a clever hack around the problem of "packing a language into a string," but it went only a small part of the way to make things more maintainable. Languages like SNOBOL introduced COBOL-style patterns, which swing into the opposite end of the scale: things become less readable due to the sheer size of patterns for any nontrivial task.Regular expressions are extremely powerful tools, they are the functional-programming oasis inside a procedural language, and, when well understood, they map wonderfully into the problem domain they address. Making them into eye candy is not impossible, but requires a lot of work (and probably significant changes in the current mindsets).
For those of us who use the Beast Emacs, you have provided the outstanding cperl-mode syntax highlighter and indenting system. What was so bad about the traditional perl-mode that made you want an alternative?
IZ: Again, I do not remember the details. But I did not invent the alternative, I just adopted an existing branch. Here's my attempt to reconstruct how it did happen (but it may be a false memory): At the time I grabbed
cperl-mode.elv1.3 (by Bob Olson) from gnu.emacs.source,perl-mode.elwas handling about 30% of constructs, whilecperl-mode.elwas handling 60%. Additionally, electric constructs were decreasing the irritation factor a lot. This was what I started with. Bob named and codedcperl-mode.elsimilarly to the difference betweenc-mode.elandcc-mode.el.Being locked into Emacs, being used to (extremely high) standards of good DOS programmers editors, and having a very low irritation threshold for bookkeeping-related repetitive tasks got me some minimal experience with Emacs Lisp (I needed several years to make my Emacs config tolerable). So when facing a problem with the existing cperl-mode.el, I would try to fix it instead of working around it.
While not time-efficient, this was bringing this warm fuzzy feeling of improving the universe instead of just increasing its entropy. So it went and went, with additional fuel supplied by annoyed/pleased/patchy users around the world.
What first attracted you to Perl?
IZ: Oh, this is easy to answer:
command.com. If all you have are scissors, everything starts looking like a nail. So you learn to deal with everything using your scissors. I remember my impression when I printed out the documentation for 4DOS/4OS2: Wow, these guys thought of everything! I may replace half of the tiny utilities I need with this one program!Then I saw a "go" script for running LaTeX/BibTeX until successful completion. It required an additional program, perl.exe, which was not exactly tiny (around 200K), but obviously demonstrated quite enough bang for a K. The manpage for this program had a few kilolines, was very well written, so was easy to grasp. Browsing "Programming Perl" did not hurt, either. (It took a lot of time to understand that the title is a false advertisement). Using this program for intelligent format conversion between bibliographical databases and BibTeX proved to be a success, including a chain of regular expressions like:
elsif (/^\s*No\.\s+([-\d\/]+(\s*\([-\/\d]+\))?)\s+(pp\.\s*)?([-\d]+)(\s*pp\.?)? (\s*\((\d{4})\))\s*$/i) { elsif (/^\s*(pp\.\s*)?(([ivxlcd]+\+)?([-\d]+)|((\w)\d+-+\6\d+))(\s*pp\.?)?\s*$/) {Then there was the year I was trying to make a math-editing widget based on a beefed up TK's text widget. With all the work for "typesetting" components of formula delegated by the widget to TCL callbacks, TCL turned out to be not an answer.
Could you describe in more detail what additional text-handling primitives you would like to see included with Perl? What string munging operations are absent that really ought to be included in Perl's core?
The problem: Perl's text-handling abilities do not scale well. This has two faces, both invisible as far as you confine yourselves to simple tasks only. The first face is not that Perl lacks some "operations;" it is not that some "words" are missing, whole "word classes" are not present. Imagine expressive power of a language without adjectives.
In Perl text-handling equals string-handling. But there is more in a text than the sequence of characters. You see a text of a program - you can see boundaries of blocks, etc.; you see an English text, you can see word boundaries and sentence boundaries, etc. With the exception of the word boundaries, all these "distinctive features" become very hard to recognize by a "local inspection of a sequence of characters near an offset" - unless you agree to use a heuristic which works only time to time. But a lot of problems require recognition of the relative position of a substring w.r.t. these "distinctive features".
Remember those "abstract algorithms" books and lessons? You can solve the problems "straightforwardly," or you can do it "smartly." Typically, "straightforward" algorithms are easy to code, but they do not scale well. Smart algorithms start by an appropriate preprocessing step. You organize your data first. The particular ways to do this may be quite different: you sort the data, or keep an "index" of some kind "into your data," you hash things appropriately, your balance some trees, and so on. The algorithms use the initial data together with such an "index."
Perl provides a few primitives to work with strings, which are quite enough to code any "straightforward" algorithm. What about "smart" ones? You need preprocessing. Typically, digging out the info is easy with Perl, but how would you store what you dug? The information should be kept "off band," for example, in an array or hash of offsets into the string.
Now modify the string a little bit, say, perform some
s()()substitutions, or cut-and-paste withsubstr(). What happens with your "off band" information? It went out of sync. You need to update your annotating structures. Do not even think about doings()()g, since you do not have enough info about the changes after the fact. You need to do yours()()one-by-one - but whiles()()gis quite optimized, a series ofs()()is not - and you get stuck again into the land of badly scaling algorithms.(Strictly speaking, for this particular example
s()()egcould save you - as well as code-embedded-into-a-regular-expression, but this was only a simple illustration of why off-band data is not appropriate for many algorithms. Please be lenient with this example!)Even if no modification is done, using off-band data is very awkward: how to check what are the attributes of the character at offset 2001 when there are many different attributes, each marking a large subset of the string?
That was the problem, and the solution supported by many text-processing systems is to have "in-band annotations", which is recognized by the editing primitives, and easily queryable. Perl allows exactly one item of in-band data for strings: pos(), which is respected by regular expressions. But it is not preserved by string-editing operations, or even by
$s1 = $s2!"In-band" data comes in several "kinds". A particular "kind" describes:
- how it behaves with respect to insertion or deletion of characters nearby;
- can the "same" markup appear "several times";
- can the markup "nest" (like nested comments in some languages); and
- is there an internal structure of the markup (as in a loop, which may be
[[LABEL DELIM0] KEYWORD [DELIM1 VAR1 SEP VAR2 ... DELIM2] [DELIM4 EXPR DELIM4] [DELIM5 BODY DELIM6]] - with some parts possibly missing, so the internal structure is a tree).Different answers lead to a zoo of intuitively different kinds of markup, each kind useful for some categories of problems. You can mark "gaps between" characters, or you can mark characters themselves. The markup may "name" a position ("the first
__END__in a Perl program"), or cover a subset of the string ("show in red", "is a link to this URL", or "inside comment"). Since the kind of the markup defines what happens when the string is modified, the system can support self-consistency of the markup "automatically" (in exceptionally complicated cases one may need to register a callback or two).The second face of problem is not with the expressive power of Perl, but with the implementation. Perl has a very rigid rule: a string must be stored in a consecutive sequence of bytes. Remove a character in the middle of the string, and all the chars after it (or before it) should be moved. As I said,
s()()ghas some optimizations which allow doing such movements "in one pass", but what if your problem cannot be reduced to one pass ofs()()g? Then each of the tiny modification you do one-at-a-time may require a huge relocation - or maybe even copying of the whole string. This is why a lot of algorithms for text manipulation require a "split buffer" implementation, when several chunks of the string may be stored (transparently!) at unrelated addresses.Such "split-buffer" strings may look incredibly hard to implement, as in "all the innards of Perl should be changed", but it is not. Just store "split strings" similarly to
tie()d data. TheFETCH(actually, the low-level MAGIC-read method) would "glue" all the chunks into one - and would remove the MAGIC - before the actual read is performed; and now no part of Perl requires any change. Now four or five primitives for text-handling may be changed to recognize the associatedtie()d structures - and act without gluing chunks together. We may even do it in arbitrarily small steps, one opcode at a time.Another important performance improvement needed for many algorithms would be the copy-on-write, when several variables may refer to the same buffer in memory, or different parts of the same buffer - with suitable semantic what to do when one of these variables is modified. (In fact the core of this is already implemented in one of my patches!) Together with other benefits, this would solve the performance problems of
$&and friends, as well as would makem/foo/; $& = 'bar';equivalent tos/foo/bar/. Having copy-on-write substrings may be slightly more patch-intensive than copy-on-write strings, though. The complication: currently the buffers are required to be 0-terminated (so that they may be used with the system APIs). It is hard to make 'b' as insubstr('abc',1,1)refer to the same buffer (containing "abc\0") as 'abc'. The solution may be to remove this requirement, and have two low-level string access API, SvPV() and SvPVz(), so that SvPVz() may perform the actual copying (as in copy-on-write) and the appending of\0- but only when needed!
**** http://www.oreilly.com/catalog/perlsysadm/chapter/ch09.html Chapter 9 of "Perl for System Administration" online. Chapter 9 is an in-depth look at one of the more common system administrator's duties: sifting through log files.
The chapter covers everything from basic syslog, text only log files to Microsoft NT, binary log files and how to interpret them using Perl. David N. Blank-Edelman does more than just explain how to grok the files, he addresses several other problems, such as log file rotation and stateful vs stateless data. There is also a very detailed section on log file analysis. He covers several different algorithms for analyzing the logs and turning them into useful data. Also, he addresses the use of databases in the logfile analysis process.
July 28th 2000 | engelschall.com
Bit::Vector is a (stand-alone) C library and an object-oriented Perl module (with overloaded operators) which allows you to handle bit vectors, sets (of integers), "big integer arithmetic", and boolean matrices (all of arbitrary size) very efficiently.
Changes: This release works with with Perl 5.6.0. It adds a method and an overloaded operator for exponentiation, and Copy() now accepts vectors of any size and truncates or fills up (according to sign) as necessary
July 1, 2000 | O'Reilly
Logs come in different flavors, so we need several approaches for dealing with them. The most common type of log file is one composed entirely of lines of text. Popular server packages like Apache (web), INN (Usenet news), and Sendmail (email) spew log text in voluminous quantities. Most logs on Unix machines look similar because they are created by a centralized logging facility known as syslog. For our purposes, we can treat files created by syslog like any other text file.
Here's a simple Perl program to scan for the word "error" in a text-based log file:
open(LOG,"logfile") or die "Unable to open logfile:$!\n";while(<LOG>){print if /\berror\b/i;}close(LOG);Perl-savvy readers are probably itching to turn it into a one-liner. For those folks:
perl -ne 'print if /\berror\b/i' logfile
It's been said that if you work on any program long enough, that program will eventually be able to send electronic mail. It doesn't matter what the original purpose of the program was (if you can still remember)--if you develop it long enough, some day that program will send its first piece of email.
From the vantage point of a systems or network administrator, this means there are lots and lots of programs out there generating mail daily. Mail filters like procmail can help us with this deluge by sorting through the mail stream. But sometimes it is more effective to write sophisticated programs to actually read the mail for us. For example, we might write a program to analyze unsolicted commercial email (spam) or one that keeps long-term statistics based on daily diagnostic email from a server.
IBM developerWorks
"Parsing an XML document into tree structures makes it possible to operate on the tree structure of the data. Find out how to use the functions for accessing and manipulating the document tree, and follow a sample stock-trading application that uses Perl, DOM, XML, and a database to evaluate trading rules."
Linux Magazine
"Objects provide encapsulation (to control access to data), abstract data types (to let the data more closely model the real world), and inheritance (to reuse operations that are similar but have some variation)."
IBM DeveloperWorks
Getting the job done in Perl is easy. The language was designed to make simple tasks easy, and hard tasks possible. But the built-in simplicity of the language can become a trap. Programmers are by nature averse to documenting or designing the architecture of their programs. The excitement of writing pure code lies in the direct connection to the machine, telling it exactly what to do. Teodor Zlatanov presents techniques to improve the reliability and maintainability of Perl programs through increasing clarity of the code. His tips are intended for the beginner or intermediate Perl programmer, with a stronger emphasis on establishing good standards rather than on changing particular coding styles.
From program description:
DBGUI is a complete X graphical database interface that can -
- perform any SQL command
- save the SQL results to a file
- perform incremental or standard searches on the SQL results
- keep a configurable history of _all_ SQL commands and parameters
- run on any UNIX or UNIX-like machine (Linux)
- sort (normal, numerical and reverse) on any column of the SQL results
- print the SQL results to a printer
- quick command line clear and restore for easy command line generation/pasting
- "clone" the results to a new display window for comparisons etc.
- utilize the DBI/DBD (or SybPerl) libraries or isql/sqsh binaries for the queries
- maintain four complete configuration "snapshots" for easy retrieval
- reload the last set of parameters on startup
- interactively enable and disable three different command lines for execution
- display the column data type in each column header
- display the column width in each column header
- solicit and quickly popup a list of the system datatypes. (SybPerl Version)
- indicate a busy/idle condition with a colored indicator (red/green)
- display the date/time of the last command execution in title bar
- load a specified checkpoint file on startup for pre-defined menu histories
- display the checkpoint file path in the title bar
- color code header info in multiple result-set data.
- quickly sum up any numerical data column.
- more probably....
I was taught by several "hardware" guys who got into programming somewhat reluctantly. Do you feel that novice programmers lose an important perspective without knowing what goes on under the hood?LR: I taught the first course on C at Bell Labs, using a draft of K&R, which helped vet the exercises. The students were hardware engineers who were being induced to learn programming. They found C (which is 'portable assembly language') much to their liking. Essentials such as pointers are very clear if you have a machine model in mind.
Perl is at a higher level of abstraction, so the machine model isn't as necessary at first. But when you get to complex data structures, which require references (which are like pointers in C, but much safer), a grounding in addressing becomes useful.
In an ideal world, a student would first learn an abstract assembly language such as MIX (see Knuth, Vol. 1), do some useful exercises, then take on a higher-level language with the machine model in the back of the head.
When did you run into Perl? What did you think of the language the first time you saw it?
LR: I am a relative latecomer to Perl. I was experimenting with CGI programming using shell scripts (!) because they were better for rapid prototyping than C. Soon I discovered that Perl offered advantages similar to the shell, but was much more expressive (particularly in the manipulation of data structures) and much faster to execute.
Because of my familiarity with Unix commands such as 'sed', which made heavy use of regular expressions, and because of my C experience, I was quite comfortable with Perl syntax. The hardest adjustment was to learn to write code with as few Perl operations as possible, because of the costs of dispatching each instruction. The Benchmark module became the most important tool I used to learn how to write efficient (and hence, sometimes elegant) Perl. I also learned a lot from the newsgroup comp.lang.perl.misc, to which I eventually became able to contribute.
Often Perl claims to be efficient in the scarce resource of the programmer's time. It isn't often that people tune scripts for optimum performance. Are there a few tips you can give to new Perl programmers on how squeak out a little better runtime performance?
LR: 'Script'
ne'program' again. When dealing with upward-scalable data sets, performance becomes important.I proposed a tutorial to Perl Conference 4.0 this year on this subject, but unfortunately it wasn't accepted.
- 0. Don't optimize prematurely!
- 1. Don't use an external command where Perl can do a task internally.
- 2. Refine the data structures and algorithms. References: The Practice of Programming (Kernighan and Pike), Programming Pearls (Bentley), Algorithms with Perl (Orwant et al).
- 3. After these are optimal, identify the remaining hot spots. (Judicious use of the time() and times() functions, or the Benchmark module.)
- 4. Try to improve the hot spots by using perl functions such as map() and grep() instead of explicit loops; making regexes more explicit to minimize backtracking; caching intermediate results to avoid unnecessary recalculation, ...
There's certainly no ANSI Perl. Does Perl need the same kind of official standardization that C got?
LR: I believe that it does, in order to increase its acceptability. Many organizations either cannot or will not endorse the use of unstandardized languages in their business-critical activities.
The current situation with Perl is better than it was with the other two languages I mentioned. Perl has one official open source for its implementation, whereas the others had multiple proprietary implementations, leading to different semantics for many language features. But this single 'official' Perl semantics has never been adequately characterized independent of the implementation, so is subject to arbitrary change.
Building on quicksand is acceptable for 'scripts' of limited longevity and applicability. It is not acceptable for 'programs' of significant commercial value. I think the lack of a firm, stable, well-defined foundation is the major inhibitor for the continuing commercial evolution of Perl. Of the major contributors to Perl, Ilya Zakharevitch is most outspoken in his view that Perl is not (yet?) a 'programming' language!
I'm curious how one would standardize Perl when the language changes so quickly and committees move so slowly. Consider than three years ago, Perl was not threaded. Now, threads are standard, but their interface may change in the near future. Mr. Zakharevich continues to pull new regex constructs from head of Zeus. Even more striking, Perl supports unicode. Is there some way to stage the standardization so that it isn't painfully out-of-date? Would standardization necessarily slow down perl development?
LR: Sometimes standardization speeds up development, by forcing evaluation and convergence on a specified way of doing things. Sometimes features are characterized and implemented during standardization (wide-character types for C, for example; the Standard Template Library and many other features for C++).
One way to view it is re Samuel Johnson's famous mot: "When a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully."
Back to Java for a moment. Because they are both "web technologies", Perl and Java are often seen as competitors. There have been attempts, like Larry Wall's JPL, to provide better integration between these two beasties. What sort of utility do you see in such a marriage?
LR: Hard for me to say. Java forces OO programming from the beginning, and I have never needed to write an object in any program. This may be for me a conceptual hurdle that I cannot (and need not) overcome.
C++ provides sweetened C syntax and semantics (particularly toward bits and bytes), and OO if you want it. This can all be done with the efficiency of C.
Perl provides higher-level syntax and semantics (particularly toward strings), and OO if you want it. I know how to write efficient Perl when I have to.
Java, in my opinion, fills a much-needed gap between those two approaches.
:-)Speaking of competitors, Python is making a big splash this year. Although it seems to satisfy many of the same itches Perl does, its proponents point to its cleaner syntax and more tradition OO implementation as making it a "better Perl". What are your thoughts on Python?
LR: Whatever improvements Python may offer are not sufficient to give it a critical mass of programmers and programming support relative to Perl. There is an order-of-magnitude difference in this metric (programmers, modules, books, ...), and I don't think significant inroads are being made. And, as I said, to me OO is a big yawner.
Have you had the chance to read Mr. Conway's _Object Oriented Perl_? I found I learned more about general Perl from it than OO techniques (which by no means is to say book is inadequate in the latter department).
LR: Yes, I have gotten to about the middle of it. It is a fine book. But it still didn't convince me about the necessity of objects. All I see is the performance-damaging complexity of the interfaces.
Sorting can be a major bottleneck in Perl programs. Performance can vary by orders of magnitude, depending on how the sort is written. In this paper, we examine Perl´s sort function in depth and describe how to use it with simple and complex data. Next we analyze and compare several well-known Perl sorting optimizations (including the Orcish Maneuver and the Schwartzian Transform). We then show how to improve their performance significantly, by packing multiple sortkeys into a single string. Finally, we present a fresh approach, using the sort function with packed sortkeys and without a sortsub. This provides much better performance than any of the other methods, and is easy to implement directly or by using a new module we created, Sort::Records.
NOTE: Sort::Records died during development but five years later, Sort::Maker was released and does all that was promised and more. Find it on CPAN
What is sorting and why do we use it?
Sorting is the rearrangement of a list into an order defined by a monotonically increasing or decreasing sequence of sortkeys, where each sortkey is a single-valued function of the corresponding element of the list. (We will use the term sortkeys to avoid confusion with the keys of a hash.)
Sorting is used to reorder a list into a sequence suitable for further processing or searching. In many cases the sorted output is intended for people to read; sorting makes it much easier to understand the data and to find a desired datum.
Sorting is used in many types of programs and on all kinds of data. It is such a common, resource-consuming operation that sorting algorithms and the creation of optimal implementations comprise an important branch of computer science.
This paper is about creating optimal sorts using Perl. We start with a brief overview of sorting, including basic algorithm theory and notation, some well-known sorting algorithms and their efficiencies, sortkey processing, and sorting outside of Perl. Next we will describe Perl´s sort function [1] and basic ways to use it. Then we cover handling complex sortkeys, which raises the question of how to optimize their processing. Finally we introduce a relatively new method, which moves all the sortkey processing out of the sort function, and which produces the most efficient Perl sort. A new module is also described, which implements this sorting technique and which has powerful support for sortkey extraction (the processing of the input data to produce the sortkeys.
Algorithm and sorting theory
A complete discussion of algorithm and sorting theory is beyond the scope of this paper. This section will cover just enough theory and terminology to explain the methods that we use to compare sort techniques.
The complexity of an algorithm is a measure of the resources needed to execute the algorithm -- typically there is a critical operation that needs to be executed many times. Part of algorithm theory is figuring out which operation is the limiting factor, and then formulating a function that describes the number of times the operation is executed. This complexity function is commonly written with the big-O notation -- O(f(N)) -- where `O´ is read as `order of´ and `f(N)´ is some function of N, the size of the input data set.
O(f(N)) comparisons have some unusual properties. The actual size of N is usually irrelevant to the correct execution of an algorithm, but its influence on the behavior of f(N) is critical. If an algorithm´s order is O(N*logN + N), when N is large enough the effect of the N on the function´s value is negligible compared to the N*logN expression. So that algorithm´s order is just O(N*logN). In many cases the calculated order function for an algorithm is a polynomial of N, but you see only the term with the highest power, and no coefficient is shown. Similarly, if two algorithms have the same order but one does more work for each operation, they are still equivalent in order space, even though there may be a substantial difference in real-world speeds. That last point is crucial in the techniques we will show to optimize Perl sorts, all of which have the same big-O function, O(N*logN).
Seems like everyone is writing a Perl book. The most disturbing part is that they're being written by people who have nothing to do with Perl. How to decide what's crap and what's not?
Worry no more! After many mirth-filled hours of flipping through many an awful Perl book, I have come up with a simple one-minute litmus test to determine if the book you're holding is worth the tree its printed on.
The Perl Book Litmus Test
Remember, the point of this test is to find bad books and there can only be negative results with this test. A book which passes all the tests put forth here CAN STILL SUCK.
Flip to the index. Look up the following tidbits and answer the questions.
- localtime
- Does it state that it returns the number of years since 1900? Does it mention that when used in scalar context it returns a nicely formated date?
- srand
- Check how it uses srand(). Does it warn you to call it only once in a given program? (If srand is never mentioned, that's okay)
- Number of elements in an array
- Does it say that an array will return its number of elements in scalar context, or does it use
$num = $#array + 1;- flock
- Does it discuss and use flock instead of lockfiles? (ie. setting some .lock file instead of using flock()) Its okay if file locking is never discussed at all.
- Portable Constants
- When performaing flocking, socket operations or sysopens does it define it use the constants defined by Perl, or do they define their own unportable constants? (Its okay if the book never has to use these constants at all)
Linux Today
"For the third installation in our four-week series on open source professionals (see Apache server professionals and Linux professionals), we take a look at Perl jobs. Though it doesn't command the highest salaries, knowledge of the Perl scripting language--the mortar of countless Web sites--is by far the most sought after open source skill. Our sister site, dice.com, an Internet-based job board for IT professionals, listed 3,365 Perl positions in January 2000. That's more than three times the number of listings for Apache, Linux, and Sendmail pros added together."
"Demand for Perl has done nothing but expand, and the outlook for future job growth is very good. Dice.com stats show a 200% increase in Perl jobs nationwide since September 1999. Perl pros in New York are cashing in more than other cities, garnering an average annual salary of $84,000 and a contract wage of $80 an hour. However, Silicon Valley has by far the greatest number of opportunities with 1,115 job postings, almost a third of the Perl listings nationwide. And if you're wondering about which job titles are especially hot, Web developers and Webmasters are America's most wanted Perl pros, with 793 availabilities. Software engineers and applications programmers are sizzling, too."
A Perl package that makes it easy to create, update and inspect FAQs. The features of FAQ Manager include:
- automatic creation of an outline structure that can be two headings deep;
- an interface with separate modes for creating new items, editing items, and moving or deleting items;
- commit rollback that allows undoing several stages of editing;
- use of HTML codes in documents; and
- an online help utility.
A source code distribution is available.
[http://www.eprotect.com/stas/TULARC/works/faq_manager/index.html]
[Nov 1, 2000] www.perl.com - Critique of the Perl 6 RFC Process
Low cost alternative Perl conference for the East Coast(Pittsburg)
[March 26, 2000] Linux Gazette GIMP-Perl GIMP Scripting for the Rest of Us
[Feb 26, 2000] Linux Today Perl.com Beginning Perl Ten Perl Myths
[Feb 20, 2000] pftp pFtp is a ftp client written in perl. It uses the Perl/Tk and Libnet libraries, both available from the CPAN FTP site. Download: pFtp 0.05
[Feb 20, 2000] The Perl Archive File Downloading Page 1
[Feb 20, 2000] Script Profile - Perl Utilities File management
"Perl uses many different methods to interact with databases, and below, I am outlining how to use just some of the methods."
Netscape's PerLDAP is an important tool for both programmers and administrators because it provides a mechanism for accessing directory information from Perl. Troy presents a high-level overview of PerLDAP, along with details of how you can use it. Additional resources include perldap.txt (listings) and perldap.zip (source code).
Full-text search engines are popular these days, and not just on web sites. Tim shows
how you can build a fast full-text search capability using Perl's built-in database
support. Additional resources include perlsrch.txt (listings) and perlsrch.zip (source code).
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March, 12, 2019