1. Perl
  2. here

Perl file input/output basics

Let's master the basics of Perl's file I/O. The goal is to be able to read and write text files.

File open

You must first open the file in order to read it. Files are managed by the OS, so you must first tell the OS that you want to work with the files programmatically. The OS returns the identifier of the target file to the program.

Use open function: title = open function] to open the file.

my $file = 'data.txt';
open my $fh, '<', $file
  or die qq/Can't open file "$file":$!/;

I will explain the process of opening files one by one.

open function

The open function takes three arguments. Specify the lexical variable in the first argument, the open mode in the second argument, and the file name in the third argument. When the file is opened, file handle is stored in the lexical variable specified by the first argument. Input from a file and output to a file are done through this file handle. The second argument is open mode, and if you want to read the contents from the file, specify "<".

If the open function fails because the file does not exist, it returns undef. If it fails, use die function to throw an exception. By using or operator, if executable statement on the left side of or is false, you can describe the process of executing executable statement on the right side.

Open mode

Open mode specifies the mode in which the file is opened. Some of the most commonly used open modes are:

Open mode meaning
< Read
> write in
>> >> Additional writing

Read mode "<" is used when you want to read the contents of a file. Write mode">" is used when you want to write to a file. If the file does not exist, it will be created automatically. If the file exists, all the contents of the file will be erased when it is opened. The additional write mode">>" is used when you want to add a new line to the end of the file. If the file does not exist, it will be created automatically.

There are literate and readable modes, but you'll rarely have a chance to use them. As I'll explain later, if you want to read a file and then write to the same file, it's better to take a different approach.

File name

The file name can be specified as an absolute path or a relative path. The absolute path is the full name of the file, such as "c:\foo\bar\data.txt" (Windows) or "/foo/ba/data.txt" (Unix). ..

The relative path is the path from the current directory, and if the current directory is "c:\foo", the relative path will be something like "bar\data.txt" (Windows). Relative paths do not start with "\" or "/".

It is best to specify the file name with an absolute path. This is because if you move the current directory in your program, you may unintentionally point to a different file if it is a relative path. Also, if your program doesn't start from the command line, you may not know where the current directory is.

Regarding the file name delimiter, you can use the "/" delimiter even in Windows. Even if you write "c:/foo/bar/data.txt", the open function works correctly. Also, functions other than the open function that receive a file name will work correctly even if the delimiter is "/" (for example, unlink or glob).

How to use Japanese file names

You can use Japanese file names in Perl, but you must always specify the file name with the character code of each OS.

Save the source code in UTF-8 and add "use utf8 " at the beginning. Use the encode function to convert to the character code of each OS.

use utf8;
use Encode 'encode';

my $file = "aiueo";

# For Windows
open my $fh, '<', encode('cp932', $file);

# For Linux
open my $fh, '<', encode('UTF-8', $file);

# For Mac OS X
use Encode::UTF8Mac;
open my $fh, '<', encode('' utf-8-mac'', $file);

Note that Mac OS X is stored in a special UTF-8, not a simple UTF-8. Use Encode::UTF8Mac module.

The following explanation is detailed about the handling of Japanese.

Exception handling

If the file open fails, it means that the migration cannot do the right thing. Therefore, in general, it is necessary to notify that the file open has failed and terminate the program. In such a case, use the die function to raise an exception. Specify the message you want to notify in the first argument of the die function.

die message;

If the file open fails, the following message will be displayed. A message is displayed that includes the line number where the exception occurred.

Can't open file "data.txt": No such file or directory at a.pl line 6.

What to include in the exception message

The exception message should include the following:

  1. What you want to tell the user
  2. The content of the message notified by the OS when the system call fails (set to $!)
die qq/Can't open file "$file":$!/;

"Can't open file" $file "" tells the user that the file open failed. Also, predefined variable called $! Is set to the content of the message notified by the OS when the system call fails. Please include the exception message.

For a detailed explanation of exception handling, see the following article.

Read file

If you open the file in read mode, you can read the contents from the file.

# Open in read mode
my $file = 'data.txt';
open my $fh, '<', $file
  or die qq/Can't open file "$file":$!/;

Use line input operator to read the file line by line.

# Line input operator
<File handle>

Use while statement to read the lines line by line. The following is an example of outputting all lines to standard output (screen).

while (my $line = <$fh>) {
  print $line;
}

The line input operator reads one line and assigns it to a variable called $line. This is done to the end of the file using while statement.

The line input operator returns an undef when it reaches the end of the file, so it escapes while statement at the end of the file.

File close

Close the file after reading or writing to it.

Use close function to close the file.

close $fh;

Especially when writing to a file, the contents written to the file are completely reflected (flash) during the close process, so close is an important process.

Write to file

Next, let's write to the file. To write to a file, first open the file in write mode">".

# Open in additional write mode
my $file = 'data.txt';
open my $fh, '>', $file
  or die qq/Can't open file "$file":$!/;

Use print function to write a string to the file.

print file handle string

Note that there is no comma between the filehandle and the string (one of the confusing and error-prone parts of Perl).

my $str = "Hello\n";
print $fh $str;

After writing, close the file. In the case of writing, writing to the file is completed when close is performed, so it is safe to add exception handling.

close $fh or die qw/Can't close file "$file":$!/;

You can see that the string "Hello" is written in data.txt.

Additional writing to file

Let's do additional writing to the file. To write to a file, first open the file in the additional write mode">>".

# Open in write mode
my $file = 'data.txt';
open my $fh, '>>', $file
  or die qq/Can't open file "$file":$!/;

Use the print function to write a string to a file.

my $str = "Hello\n";
print $fh $str;

After writing, close the file. In the case of writing, writing to the file is completed when close is performed, so it is safe to add exception handling.

close $fh or die qw/Can't close file "$file":$!/;

When you run the program repeatedly, you can see that the string "Hello" is added to data.txt.

Hello
Hello
Hello

Read the contents of the file at once

To read the entire contents of a file at once:

my $content = do {local $/; <$fh>};

This is confusing, but it is often described this way.

An input record separator is set for the predefined variable "$/". By default, the standard line feed code of the OS is set as the input record separator. You can use the line input operator to read the contents of a file at once by setting the input record separator to undef. So if you set $/ to undef, you can get all the contents of the file when you run <$fh>. However, $/ is global, so you should always restore it to its original value if you change it.

local is for temporarily changing the contents of a variable. local can be used for variable other than a lexical variable.

a local variable;

You can temporarily undef the contents of a variable until the end of scope. When the scope ends, the original value is restored.

Next is do block, but you can create a block that returns the return value by writing "do {...}". .. The last evaluated value in the block is the return value.

The following processing is performed throughout.

my $content = do {# create block
  # Temporarily undef the input record separator
  local $/;

  # Read all contents
  <$fh>
};

# The last evaluated value of "do {}" is assigned to $content
# Input record separator value is restored

Line feed code

A line feed code is a special string that represents the end of a line. The line feed code differs depending on the OS. In Windows, it is a sequence of characters "0D 0A" in hexadecimal, and in Unix it is "0A".

It is as follows when it is described as a string of Perl. You can write characters in hexadecimal with the notation "\x".

# Windows line feed code
my $ln_win = "\x0D\x0A";

# Unix line feed code
my $ln_unix = "\x0A";

Behavior of line input operator

When the line input operator "" finds an input record separator, it retrieves the position immediately after it as one line. The input record separator is set to a variable called "$/", which is "0D 0A" on Windows and "0A" on Unix.

# Input record separator on Windows
$/ = "\x0D\x0A";

# Unix input record separator
$/ = "\x0A";

behavior of chomp function

You can use chomp function to remove the trailing line break, but it's actually removing the trailing input record separator.

chomp string;

That is, you can change the line break that you remove by changing the input record separator.

Characters output by

\n

\n represents a line break as a code, but in Windows it is a sequence of characters "0D 0A" in hexadecimal, and in Unix it is "0A". This is a constant and cannot be changed.

Edit Windows files on Unix

Consider editing a Windwos file on Unix. Since it is on Unix, "0D 0A" is assigned to the default input record separator "$/", and "0D 0A" is output for the escape sequence\n. The line feed code of the file edited in Windows is "0D 0A", and it is necessary to correspond to this.

# File open
my $file = 'data.txt';
open my $fh, '<', $file
  or die qq/Can't open file "$file":$!/;

# Read file
{
  # Line feed code
  my $ln = "\x0D0A"

  # Temporarily change the input record separator
  local $/ = $ln;

  # Lines read correctly
  while (my $line = <$fh>) {

    # Line breaks are removed correctly
    chomp $line;
    
    # Something to do
    
    print "$line $ln"; # Assuming printing to another file by redirect etc.
  }
}

If you don't mind changing the line feed code that is output, it is better to output\n.

File read/write approach

If you want to read or write a file, you may think that you should open it in read/write mode in open mode, but when you actually read/write, it generally does not open in read/write mode. This is because if the system crashes during writing, the original contents will be destroyed. If you keep the original contents until the writing is completed, the original contents will remain even if it crashes during writing.

So, to read and write to a file, take the following approach.

  1. Read the file contents
  2. Content update
  3. Export to temporary file
  4. Rename temporary file to original file

Write to a temporary file, then rename the temporary file to the original file name after the write is complete. Temporary file names should not overlap with other files.

The actual script looks like this:

use strict;
use warnings;

use File::Copy 'move';

# (1) Read the contents of the file
my $file = 'date.txt';
open my $fh, '<', $file
  or die qq/Can't open file "$file":$!/;
my $content = do {local $/; <$fh>};
close $fh;

# (2) Update of contents
$content = 'Hello!';

# (3) Export to temporary file
my $temp_file = "$file. $$.". Int (rand 10000);
open my $temp_fh, '>', $temp_file
  or die qq/Can't open file "$file":$!/;
print $temp_fh $content;
close $temp_fh or die qq/Can't open file "$file":$!/;

# (4) Change the temporary file name to the original file name
move $temp_file, $file
  or die qq/Can't move "$temp_file" to "$file":$!/;

(Reference) File::Copy

File I/O Techniques

Please refer to the following for various file input/output techniques.

Handling CSV files

Write to file

File read/write

OS input/output mechanism

Unix - like filter programming

Random access

Buffering

File lock

Related Informatrion