Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

dos2unix - DOS to UNIX text converter

News Recommended Books Recommended Links Reference mv command

ln command

tar cpio Admin Horror Stories Unix History Humor

Etc


Introduction

dos2unix , the program that converts plain text files in DOS/MAC format to UNIX format . Default option is to convert the file in place (-o). There is also option -k --keepdate that allows to preserve the date stamp of the file.

The Dos2unix package includes utilities "dos2unix" and "unix2dos" to convert plain text files in DOS or Mac format to Unix format and vice versa.

In DOS/Windows text files a line break, also known as newline, is a combination of two characters: a Carriage Return (CR) followed by a Line Feed (LF). In Unix text files a line break is a single character: the Line Feed (LF). In Mac text files, prior to Mac OS X, a line break was single Carriage Return (CR) character. Nowadays Mac OS uses Unix style (LF) line breaks.

Binary files are automatically skipped, unless conversion is forced.

Non-regular files, such as directories and FIFOs, are automatically skipped.

Symbolic links and their targets are by default kept untouched. Symbolic links can optionally be replaced, or the output can be written to the symbolic link target. Symbolic links on Windows are not supported. Windows symbolic links always replaced, keeping the targets unchanged.

Dos2unix was modeled after dos2unix under SunOS/Solaris and has similar conversion modes.

Important options are

-k --keepdate
Keep the date stamp of output file same as input file.
-f, --force Force conversion of binary files (file with non-printable characters are considered to be binary by dos2unix)

Syntax and options

dos2unix [options] [-c convmode] [-o file ...] [-n infile outfile ...] 

Options:

[-hkqV] [--help] [--keepdate] [--quiet] [--version]

If single argument is given that file in converted "in-place" with the same name.

The following options are available:

Conversion modes

MAC MODE
In normal mode line breaks are converted from DOS to Unix and vice
versa. Mac line breaks are not converted.

In Mac mode line breaks are converted from Mac to Unix and vice versa.
DOS line breaks are not changed.

To run in Mac mode use the command-line option "-c mac" or use the
commands "mac2unix" or "unix2mac".

Conversion modes ascii, 7bit, and iso are similar to those of dos2unix/unix2dos under SunOS/Solaris.

ascii
In mode "ascii" only line breaks are converted. This is the default
conversion mode.

Although the name of this mode is ASCII, which is a 7 bit standard,
the actual mode is 8 bit. Use always this mode when converting
Unicode UTF-8 files.

7bit
In this mode all 8 bit non-ASCII characters (with values from 128
to 255) are converted to a 7 bit space.

iso Characters are converted between a DOS character set (code page)
and ISO character set ISO-8859-1 (Latin-1) on Unix. DOS characters
without ISO-8859-1 equivalent, for which conversion is not
possible, are converted to a dot. The same counts for ISO-8859-1
characters without DOS counterpart.

When only option "-iso" is used dos2unix will try to determine the
active code page. When this is not possible dos2unix will use
default code page CP437, which is mainly used in the USA. To force
a specific code page use options "-437" (US), "-850" (Western
European), "-860" (Portuguese), "-863" (French Canadian), or "-865"
(Nordic). Windows code page CP1252 (Western European) is also
supported with option "-1252". For other code pages use dos2unix in
combination with iconv(1). Iconv can convert between a long list
of character encodings.

Never use ISO converion on Unicode text files. It will corrupt
UTF-8 encoded files.

Examples

Get input from stdin and write output to stdout.

dos2unix

Convert and replace a.txt. Convert and replace b.txt.

dos2unix a.txt b.txt
dos2unix -o a.txt b.txt

Convert and replace a.txt in ASCII conversion mode.

dos2unix a.txt -c iso b.txt

Convert and replace b.txt in ISO conversion mode.

dos2unix -c ascii a.txt -c iso b.txt

Convert c.txt from Mac to Unix ascii format.

dos2unix -c mac c.txt b.txt

Convert and replace a.txt while keeping original date stamp.

dos2unix -k a.txt
dos2unix -k -o a.txt

Convert a.txt and write to e.txt.

dos2unix -n a.txt e.txt

Convert a.txt and write to e.txt, keep date stamp of e.txt same as a.txt.

dos2unix -k -n a.txt e.txt

Convert and replace a.txt. Convert b.txt and write to e.txt.

dos2unix a.txt -n b.txt e.txt
dos2unix -o a.txt -n b.txt e.txt

Convert c.txt and write to e.txt. Convert and replace a.txt. Convert and replace b.txt. Convert d.txt and write to f.txt.

dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt

Convert all files in a tree

find /srv/www/Public_html -type f -name "*.shtml" -print0  | grep -v "/_" | xargs -l dos2unix -k -f 
			

Convert from DOS default code page to Unix Latin-1

dos2unix -iso -n in.txt out.txt

Convert from DOS CP850 to Unix Latin-1

dos2unix -850 -n in.txt out.txt

Convert from Windows CP1252 to Unix Latin-1

dos2unix -1252 -n in.txt out.txt

Convert from Windows CP1252 to Unix UTF-8 (Unicode)

iconv -f CP1252 -t UTF-8 in.txt | dos2unix > out.txt

Convert from Unix Latin-1 to DOS default code page.

unix2dos -iso -n in.txt out.txt

Convert from Unix Latin-1 to DOS CP850

unix2dos -850 -n in.txt out.txt

Convert from Unix Latin-1 to Windows CP1252

unix2dos -1252 -n in.txt out.txt

Convert from Unix UTF-8 (Unicode) to Windows CP1252

unix2dos < in.txt | iconv -f UTF-8 -t CP1252 > out.txt

See also <http://czyborra.com/charsets/codepages.html> and
<http://czyborra.com/charsets/iso8859.html>.

UNICODE
Encodings
There exist different Unicode encodings. On Unix and Linux Unicode
files are typically encoded in UTF-8 encoding. On Windows Unicode text
files can be encoded in UTF-8, UTF-16, or UTF-16 big endian, but are
mostly encoded in UTF-16 format.

Conversion
Unicode text files can have DOS, Unix or Mac line breaks, like regular
text files.

All versions of dos2unix and unix2dos can convert UTF-8 encoded files,
because UTF-8 was designed for backward compatiblity with ASCII.

Dos2unix and unix2dos with Unicode UTF-16 support, can read little and
big endian UTF-16 encoded text files. To see if dos2unix was built with
UTF-16 support type "dos2unix -V".

The Windows versions of dos2unix and unix2dos convert UTF-16 encoded
files always to UTF-8 encoded files. Unix versions of dos2unix/unix2dos
convert UTF-16 encoded files to the locale character encoding when it
is set to UTF-8. Use the locale(1) command to find out what the locale
character encoding is.

Because UTF-8 formatted text files are well supported on both Windows
and Unix, dos2unix and unix2dos have no option to write UTF-16 files.
All UTF-16 characters can be encoded in UTF-8. Conversion from UTF-16
to UTF-8 is without loss. UTF-16 files will be skipped on Unix when the
locale character encoding is not UTF-8, to prevent accidental loss of
text. When an UTF-16 to UTF-8 conversion error occurs, for instance
when the UTF-16 input file contains an error, the file will be skipped.

ISO and 7-bit mode conversion do not work on UTF-16 files.

Byte Order Mark
On Windows Unicode text files typically have a Byte Order Mark (BOM),
because many Windows programs (including Notepad) add BOMs by default.
See also <http://en.wikipedia.org/wiki/Byte_order_mark>.

On Unix Unicode files typically don't have a BOM. It is assumed that
text files are encoded in the locale character encoding.

Dos2unix can only detect if a file is in UTF-16 format if the file has
a BOM. When an UTF-16 file doesn't have a BOM, dos2unix will see the
file as a binary file.

Use dos2unix in combination with iconv(1) to convert an UTF-16 file
without BOM.

Dos2unix never writes a BOM in the output file, unless you use option
"-m".

Unix2dos writes a BOM in the output file when the input file has a BOM,
or when option "-m" is used.

Unicode examples
Convert from Windows UTF-16 (with BOM) to Unix UTF-8

dos2unix -n in.txt out.txt

Convert from Windows UTF-16 (without BOM) to Unix UTF-8

iconv -f UTF-16 -t UTF-8 in.txt | dos2unix > out.txt

Convert from Unix UTF-8 to Windows UTF-8 with BOM

unix2dos -m -n in.txt out.txt

Convert from Unix UTF-8 to Windows UTF-16

unix2dos < in.txt | iconv -f UTF-8 -t UTF-16 > out.txt

EXAMPLES
Read input from 'stdin' and write output to 'stdout'.

dos2unix
dos2unix -l -c mac

Convert and replace a.txt. Convert and replace b.txt.

dos2unix a.txt b.txt
dos2unix -o a.txt b.txt

Convert and replace a.txt in ascii conversion mode.

dos2unix a.txt

Convert and replace a.txt in ascii conversion mode. Convert and
replace b.txt in 7bit conversion mode.

dos2unix a.txt -c 7bit b.txt
dos2unix -c ascii a.txt -c 7bit b.txt
dos2unix -ascii a.txt -7 b.txt

Convert a.txt from Mac to Unix format.

dos2unix -c mac a.txt
mac2unix a.txt

Convert a.txt from Unix to Mac format.

unix2dos -c mac a.txt
unix2mac a.txt

Convert and replace a.txt while keeping original date stamp.

dos2unix -k a.txt
dos2unix -k -o a.txt

Convert a.txt and write to e.txt.

dos2unix -n a.txt e.txt

Convert a.txt and write to e.txt, keep date stamp of e.txt same as
a.txt.

dos2unix -k -n a.txt e.txt

Convert and replace a.txt. Convert b.txt and write to e.txt.

dos2unix a.txt -n b.txt e.txt
dos2unix -o a.txt -n b.txt e.txt

Convert c.txt and write to e.txt. Convert and replace a.txt. Convert
and replace b.txt. Convert d.txt and write to f.txt.

dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt

RECURSIVE CONVERSION
Use dos2unix in combination with the find(1) and xargs(1) commands to
recursively convert text files in a directory tree structure. For
instance to convert all .txt files in the directory tree under the
current directory type:

find . -name *.txt | xargs dos2unix

LOCALIZATION
LANG
The primary language is selected with the environment variable
LANG. The LANG variable consists out of several parts. The first
part is in small letters the language code. The second is optional
and is the country code in capital letters, preceded with an
underscore. There is also an optional third part: character
encoding, preceded with a dot. A few examples for POSIX standard
type shells:

export LANG=nl Dutch
export LANG=nl_NL Dutch, The Netherlands
export LANG=nl_BE Dutch, Belgium
export LANG=es_ES Spanish, Spain
export LANG=es_MX Spanish, Mexico
export LANG=en_US.iso88591 English, USA, Latin-1 encoding
export LANG=en_GB.UTF-8 English, UK, UTF-8 encoding

For a complete list of language and country codes see the gettext
manual:
<http://www.gnu.org/software/gettext/manual/gettext.html#Language-Codes>

On Unix systems you can use to command locale(1) to get locale
specific information.

LANGUAGE
With the LANGUAGE environment variable you can specify a priority
list of languages, separated by colons. Dos2unix gives preference
to LANGUAGE over LANG. For instance, first Dutch and then German:
"LANGUAGE=nl:de". You have to first enable localization, by setting
LANG (or LC_ALL) to a value other than "C", before you can use a
language priority list through the LANGUAGE variable. See also the
gettext manual:
<http://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAGE-variable>

If you select a language which is not available you will get the
standard English messages.

DOS2UNIX_LOCALEDIR
With the environment variable DOS2UNIX_LOCALEDIR the LOCALEDIR set
during compilation can be overruled. LOCALEDIR is used to find the
language files. The GNU default value is "/usr/local/share/locale".
Option --version will display the LOCALEDIR that is used.

Example (POSIX shell):

export DOS2UNIX_LOCALEDIR=$HOME/share/locale

RETURN VALUE
On success, zero is returned. When a system error occurs the last
system error will be returned. For other errors 1 is returned.

The return value is always zero in quiet mode, except when wrong
command-line options are used.

STANDARDS
<http://en.wikipedia.org/wiki/Text_file>

<http://en.wikipedia.org/wiki/Carriage_return>

<http://en.wikipedia.org/wiki/Newline>

<http://en.wikipedia.org/wiki/Unicode>

AUTHORS
Benjamin Lin - <blin@socs.uts.edu.au> Bernd Johannes Wuebben (mac2unix
mode) - <wuebben@kde.org>, Christian Wurll (add extra newline) -
<wurll@ira.uka.de>, Erwin Waterlander - <waterlan@xs4all.nl>
(Maintainer)

Project page: <http://waterlan.home.xs4all.nl/dos2unix.html>

SourceForge page: <http://sourceforge.net/projects/dos2unix/>

Freecode: <http://freecode.com/projects/dos2unix>

SEE ALSO
file(1) find(1) iconv(1) locale(1) xargs(1)

dos2unix 2012-09-15 dos2unix(1)


Top updates

Softpanorama Switchboard
Softpanorama Search


NEWS CONTENTS

Old News ;-)

How do I convert between Unix and DOS text files?

The Unix and DOS operating systems (which includes Microsoft Windows) differ in the format in which they store text files. DOS places both a line feed and a carriage return character at the end of each line of a text file, but Unix uses only a line feed character. Some DOS applications need to see carriage return characters at the ends of lines, and may treat Unix-format files as giant single lines. Some Unix applications won't recognize the carriage returns added by DOS, and will display Ctrl-m ( ^M ) characters at the end of each line.

There are many ways to solve this problem. This document provides instructions for using FTP, screen capture, unix2dos and dos2unix, tr, awk, Perl, and Emacs to do the conversion. Before you use these utilities, the files you are converting must first be on a Unix computer.

FTP

When using an FTP program to move a text file between Unix and DOS, be sure the file is transferred in ASCII format. This will ensure that the document is transformed into a text format appropriate for the host. Some FTP programs, especially graphical applications like Rapid Filer, do this automatically. If you are using FTP from the Unix or DOS prompt, however, before you begin the file transfer, be sure to enter at the FTP prompt:

  ascii

Screen Capture

You can also convert files from Unix to DOS format when transferring them to a PC with a communications program by selecting ASCII text download. Select this option with your communications program to capture all the text subsequently displayed to your screen, and then enter at the Unix prompt:

  cat unixfile.txt

Replace unixfile.txt with the name of the Unix text file you are transferring. Most communications programs will add carriage returns to the stream of text as they save it to your computer's hard drive. Once the file has finished displaying, abort the text download.

Note: This method may be slow for large text files. Also, no error checking is performed on the file as it is transferred. Line noise may corrupt its contents, especially if you are using a terminal connect program such as HyperTerminal.

dos2unix and unix2dos

On systems using SunOS, such as , the utilities dos2unix and unix2dos are available. These utilities provide a straightforward method for converting files from the Unix command line.

To use either command, simply type the command followed by the name of the file you wish to convert, and the name of a file which will contain the converted results. Thus, to convert a DOS file to a Unix file, at the Unix prompt, enter:

  dos2unix dosfile.txt unixfile.txt

To convert a Unix file to DOS, enter:

  unix2dos unixfile.txt dosfile.txt

Note: These utilities are only available on SunOS systems. To determine what variety of Unix is running on your computer, see the Knowledge Base document In Unix, how can I display information about the operating system?

tr

You can use tr to remove all carriage returns and Ctrl-z ( ^Z ) characters from a DOS file by entering:

  tr -d '\15\32' < dosfile.txt > unixfile.txt

You cannot use to to convert a document from Unix format to DOS.

awk

To use awk to convert a DOS file to Unix, at the Unix prompt, enter:

  awk '{ sub("\r$", ""); print }' dosfile.txt > unixfile.txt

To convert a Unix file to DOS using awk, at the command line, enter:

  awk 'sub("$", "\r")' unixfile.txt > dosfile.txt

On some systems, the version of awk may be old and not include the function sub. If so, try the same command, but with gawk or nawk replacing awk.

Perl

To convert a DOS text file to a Unix text file using Perl, at the Unix shell prompt, enter:

  perl -p -e 's/\r$//' < dosfile.txt > unixfile.txt

To convert from a Unix text file to a DOS text file with Perl, at the Unix shell prompt, enter:

  perl -p -e 's/$/\r/' < unixfile.txt > dosfile.txt

You must use single quotation marks in either command line. This prevents your shell from trying to evaluate anything inside.

Emacs

You can also convert a DOS file named dosfile.txt to a Unix text file using Emacs, a Unix text editor. Enter at the Unix shell prompt:

  emacs dosfile.txt

This will open the file in the Emacs text editor. To remove all the ^M characters, enter:

  M-% C-q C-m RET RET !

Note: For Emacs, keystrokes are presented in a special format. See the Knowledge Base document In Emacs, how are keystrokes denoted?

It may also be necessary to remove a Ctrl-z at the end of the document. To quickly get to the end of the document in Emacs, type:

  M->

If you see a Ctrl-z at the end of the document, delete it.

To convert a Unix file named unixfile.txt to a DOS text file, first open it in Emacs. At the Unix shell prompt, enter:

  emacs unixfile.txt

This will open the file in the Emacs text editor. To add carriage returns, which will appear as ^M in Emacs, type:

  M-% C-q C-j RET C-q C-m C-q C-j RET !

It may also be necessary to add a Ctrl-z at the end of the document. At the very end of the document, type:

  C-q C-z

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

This is document acux in domain all from the Knowledge Base.
Last updated on October 16, 2000



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: October, 20, 2015