Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Comparison of Perl and Python string operations

News  Python -- Scripting language with generators and coroutines.

Best Python books for system administrators

Recommended Links Perl to Python functions map Execution of commands and shell scripts using subprocess module Two-pass "fuzzy" compiler from Perl to Python
Perl to Python functions map Perl and Python string functions mapping Perl special variable mapping to Python Execution of commands and shell scripts using subprocess module Command-Line Syntax and Options Imitation of Perl double quoted strings in Python Chomp to string.rstrip
Python ternary conditional operator Imitation of Perl double quoted strings in Python Loops in Python Loop else Execution of commands and capturing output of shell scripts and pipelines  Comparison of Perl and Python string operations Functions
Python source code checking tools to help you find common bugs Python pretty printers          
Python Debugging pdb — The Python Debugger Python Braces Debate Programming environment Python IDEs Pycharm IDE Jython
Debugging in Python Algorithms Quotes Python history Python Cheatsheets Etc Tutorials

Correspondence table

Perl function
Python equivalent
substr(text, offset) text[offset:len]  Not possible to use of the left side of the assignment.
index(text, substr, start) text.find(substr, start)
rindex(text, substr, start) text.rfind(text, substr, start)
split(expr, text, max) text.split(text, expr, max)
join(separator,list ) separator.string.join(list)

For examples Python String join() - Python Standard Library

lc(text) text.lower()
lcfirst(text) None
uc(text) text.upper()
ucfirst(text) text.capitalize()
chop(text), chomp(text) text.lstrip(text), string.rstrip(text), string.strip(text)
tr/from/to/ text.translate(text, maketrans(from, to)) you first need to create translation table via maketrans

And if one think that Python is a "regular language" I can tell you that it is not. For example simple variables in Python are treated in C/Perl style -- assignment creates a copy of the variable.

a=3
b=a
a=5 # at this point b is still equal 3, like in Perl 

But for arrays and other "compound objects" this is not the case:

alist = [25,50,75,100]
blist = alist # here Python copies the reference, not the array. so change arlist[0] actually changes blist[0] too

The same is true about overall complexity of the language. The complexity of Python was just pushed into modules, it did not disappeared. And for example for string processing Python is more complex and less expressive language then Perl in which most text processing is done via regex engine.  For example, Python does not  have anything close in convenience to double quoted literals with interpolation until Python 3.6. 

Only in Python 3.6+ you have something similar to Perl double quoted literals with f-strings:

#!/bin/env python3

job = 'foo'
cpus = 3
print(f"job {job}")
print(f"cpus {cpus}")

In older versions of Python you need to use C-style strings with % macros. And the best way to imitate Perl/shell double quoted string changes with each major version of Python (String interpolation - Wikipedia), which tell you something about consistency:

# in all versions
   apples = 4
   print("I have %d fruits" % apples)           # implementation via % operator; no longer recommended
   print("I have %(apples)d fruits" % apples )  # name of the variable is allowed; no longer recommended

# with Python 2.6+
   print("I have {0} fruits".format(apples))    # do now this is a method
   print("I have {a} fruits".format(a=apples))  # names instead of positional numerics

# with Python 2.7+
   print("I have {} fruits".format(apples))     # positional value now can be omitted
# with Python 3.0+
    from string import Template
    s = Template('I have $frutno fruits')      # The template object
    s.substitute(frutno=apples)                # actual substitution

# or with Python 3.6+
   print(f"I have {apples} apples")             # radically new implementation based on f-string

If you want interpolation in HERE strings in Perl you do not need to do anything special -- its automatic. But with Python only version 3.6+ has some analog called triple-quoted f-string:

cpus = 3
job  = 'foo'
print(f'''\
job {job}
cpus {cpus}''')

And if you you think that Python is logical original language superior to Perl I have a Brooklyn  bridge to sell to you. For example in Python search of the string was renamed from C (and PL/1)  function index which is de-facto standard name for this function to find.  Python index function raises exception if string is not found  and as such is not equivalent to Perl or PL/1 index function (C programmers and Perl users be  damned; but Larry wall also like similar tricks so here Perl user can't complain much):

message = "hello world"
pos = message.find("lo")
print(pos)
If the substring is not present, find returns a value of -1 like index function in Perl.  But the find() method should be used only if you need to know the position of the substring. To check if substring is present in the string in  conditional expression in Python you are better off using the in operator.

Again,  index function in Python that behave differently, just to make C -programmers crazy ;-) It throws exception if the substring is not found.  This incompatibility suggests that Python designers have very little knowledge/respect of Unix and C when they started their project.

Moreover, if one wants to to calculate the length of the string in Python, he/she needs to use len function, not length method as one would expect.

message = "hello world"
mlen = len(message)
And such "non-uniformities" and special cases are all over Python language.  Also the mere number of methods provided in for each type is overwhelming in Python. For example there are 37 (thirty seven) string methods. Perl has just a dozen string functions. Everything else is done via regular expression. Strings in Python are immutable which create performance penalty.  This is connected with the fact that string object contain length ( and some other information) as the prefix to byte representation.  But I think Python interpreter optimized some operation using immutable strings as mutable in some case to improve performance. For example, in no way for the operation a=a[a:-3] (truncating string) you need to a new representation of the string to be created. 

Usage of "+" for concatenation is also unfortunate.  It might be better to use less commonly used symbol like  "*" like in Julia (although logically ths should be replication of the string multiple time, so not a good idea ;-),  or a distinct character, as in Perl. It creates some idiosyncrasies as for += assignment.

Subsequent content was adapted from Manipulating Strings in Python Programming Historian

Concatenate

This term means to join strings together. The process is known as concatenating strings and it is done using the plus (+) operator. Note that you must be explicit about where you want blank spaces to occur by placing them between single quotation marks also.

In this example, the string “message1” is given the content “hello world”.

message1 = 'hello' + ' ' + 'world'
print(message1)
-> hello world

Multiply

If you want multiple copies of a string, use the multiplication (*) operator. In this example, string message2a is given the content “hello” times three; string message 2b is given content “world”; then we print both strings.

message2a = 'hello ' * 3
message2b = 'world'
print(message2a + message2b)
-> hello hello hello world

Append

What if you want to add material to the end of a string successively? There is a special operator for that (+=).

message3 = 'howdy'
message3 += ' '
message3 += 'world'
print(message3)
-> howdy world

In addition to operators, Python comes pre-installed with dozens of string methods that allow you to do things to strings. Used alone or in combination, these methods can do just about anything you can imagine to strings. The good news is that you can reference a list of String Methods on the Python website, including information on how to use each properly. To make sure that you’ve got a basic grasp of string methods, what follows is a brief overview of some of the more commonly used ones:

Length

You can determine the number of characters in a string using len. Note that the blank space counts as a separate character.

message4 = 'hello' + ' ' + 'world'
print(len(message4))
-> 11

Find

You can search a string for a substring and your program will return the starting index position of that substring. This is helpful for further processing. Note that indexes are numbered from left to right and that the count starts with position 0, not 1.

message5 = "hello world"
message5a = message5.find("worl")
print(message5a)
-> 6

If the substring is not present, the program will return a value of -1.

message6 = "Hello World"
message6b = message6.find("squirrel")
print(message6b)
-> -1

You can also use rfind to search substring from the opposite direction

NOTE: Perl capability to search from given symbol is also available

Lower Case

Sometimes it is useful to convert a string to lower case. For example, if we standardize case it makes it easier for the computer to recognize that “Sometimes” and “sometimes” are the same word.

message7 = "HELLO WORLD"
message7a = message7.lower()
print(message7a)
-> hello world

The opposite effect, raising characters to upper case, can be achieved by changing .lower() to .upper().

Replace

If you need to replace a substring throughout a string you can do so with the replace method.

message8 = "HELLO WORLD"
message8a = message8.replace("L", "pizza")
print(message8a)
-> HEpizzapizzaO WORpizzaD

Slice

If you want to slice off unwanted parts of a string from the beginning or end you can do so by creating a substring. The same kind of technique also allows you to break a long string into more manageable components.

message9 = "Hello World"
message9a = message9[1:8]
print(message9a)
-> ello Wo

You can substitute variables for the integers used in this example.

startLoc = 2
endLoc = 8
message9b = message9[startLoc: endLoc]
print(message9b)
-> llo Wo

This makes it much easier to use this method in conjunction with the find method as in the next example, which checks for the letter “d” in the first six characters of “Hello World” and correctly tells us it is not there (-1). This technique is much more useful in longer strings – entire documents for example. Note that the absence of an integer before the colon signifies we want to start at the beginning of the string. We could use the same technique to tell the program to go all the way to the end by putting no integer after the colon. And remember, index positions start counting from 0 rather than 1.

message9 = "Hello World"
print(message9[:5].find("d"))
-> -1

There are lots more, but the string methods above are a good start. Note that in this last example, we are using square brackets instead of parentheses. This difference in syntax signals an important distinction. In Python, parentheses are usually used to pass an argument to a function. So when we see something like

print(len(message7))

it means pass the string message7 to the function len then send the returned value of that function to the print statement to be printed. If a function can be called without an argument, you often have to include a pair of empty parentheses after the function name anyway. We saw an example of that, too:

message7 = "HELLO WORLD"
message7a = message7.lower()
print(message7a)
-> hello world

This statement tells Python to apply the lower function to the string message7 and store the returned value in the string message7a.

The square brackets serve a different purpose. If you think of a string as a sequence of characters, and you want to be able to access the contents of the string by their location within the sequence, then you need some way of giving Python a location within a sequence. That is what the square brackets do: indicate a beginning and ending location within a sequence as we saw when using the slice method.

Escape Sequences

What do you do when you need to include quotation marks within a string? You don’t want the Python interpreter to get the wrong idea and end the string when it comes across one of these characters. In Python, you can put a backslash (\) in front of a quotation mark so that it doesn’t terminate the string. These are known as escape sequences.

print('\"')
-> "
print('The program printed \"hello world\"')
-> The program printed "hello world"

Two other escape sequences allow you to print tabs and newlines:

print('hello\thello\thello\nworld')
->hello hello hello
world
 

Python string formatting

Python includes a special formatting operator that allows you to insert one string into another one. It is represented by a percent sign followed by an “s”. Open a Python shell and try the following examples.

frame = 'This fruit is a %s'
print(frame)
-> This fruit is a %s

print(frame % 'banana')
-> This fruit is a banana

print(frame % 'pear')
-> This fruit is a pear

There is also a form which allows you to interpolate a list of strings into another one.

frame2 = 'These are %s, those are %s'
print(frame2)
-> These are %s, those are %s

print(frame2 % ('bananas', 'pears'))
-> These are bananas, those are pears

In these examples, a %s in one string indicates that another string is going to be embedded at that point. There are a range of other string formatting codes, most of which allow you to embed numbers in strings in various formats, like %i for integer (eg. 1, 2, 3), %f for floating-point decimal (eg. 3.023, 4.59, 1.0), and so on. Using this method we can input information that is unique to the file.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

Recommended Links

Google matched content

Softpanorama Recommended

Top articles