1. Perl
  2. File input/output
  3. here

Read a CSV file and convert it to an array of arrays

This is an example that actually reads a CSV file, converts it to a Perl data type (array of arrays), and writes it to standard output. The conversion from csv to an array of arrays is done using subroutine. I think it will also be a basic template for reading and editing some kind of file.

CSV etc. including a line break etc. cannot be processed correctly by the following programs. If you want to process CSV correctly, Text::CSV_XS or There is Text::CSV::Encoded. Text::CSV::Encoded is a wrapper module of Text::CSV or Text::CSV_XS and can handle Japanese easily.

use strict;
use warnings;

# Argument processing
my $file = shift;

unless ($file) {
  # If there is no argument, indicate the usage and end.
  die "Usage: $0 file";
}

# Parse the file and convert the csv format data to an array of arrays.
my @recs = parse_file ($file);

# Output (because it is a two-dimensional array, follow it with for)
for my $items (@recs) {
  # Output by connecting with comma.
  print join(',', @$items) . "\n";
}

# Function for file analysis (I just write it back this time ...)
sub parse_file {
  my $file = shift;
    
  open(my $fh, "<", $file)
    or die "Cannot open $file for read:$!";
  
  # Prepare a reference to an array that stores multiple records
  my $recs = [];
  while (my $line = <$fh>) {
    chomp $line; # Remove a line break
    
    # Prepare a reference to the array that stores the data
    # Separate rows with commas and store in array
    # split returns an array, so
    # Dereference with @$items

    my $items = [];
    @$items = split(/,/, $line);

    # The first argument of the push function is an array, so @$recs
    # And dereference.
    push @$recs, $items;
  }
  close $fh;
  return $recs;
}

Below is an example of csv data. Create a CSV file, give it to the first argument of the script, and execute it.

masao, 10, Japan
taro, 20, USA
rika, 38, France

Code explanation

(1) Processing when the file name is not given in the first argument

unless ($file) {
  # If there is no argument, indicate the usage and end.
  die "Usage: $0 file";
}

Check the arguments as much as possible. If there is no file name in the first argument, indicate the usage and exit. $0 is the name of the script running in the predefined variable.

(2) Read the file and convert the csv format text to an array of arrays

my @recs = parse_file ($file);

parse_file is a self-made subroutine. Read the file and convert the csv format text to an array of arrays. The input and output images are as follows.

masao, 10, Japan
taro, 20, USA
rika, 38, France

↓

@recs = (
    ['masao', 10, 'Japan'],
    ['taro', 20, 'USA'],
    ['rika', 38, 'France'],
)

(3) Output in csv format

for my $items (@recs) {
  # Output by connecting with comma.
  print join(',', @$items) . "\n";
}

To output, repeat foreach statement for the number of records contained in @recs. Each record is concatenated with a comma with join function and output. Add a line break at the end.

(4) Explanation of the function that reads a file and converts csv format text into an array of arrays

sub parse_file {
  my $file = shift;
  
  open(my $fh, "<", $file)
    or die "Cannot open $file for read:$!";
  
  # Prepare a reference to an array that stores multiple records
  my $recs = [];
  while (my $line = <$fh>) {
    # Remove a line break
    chomp $line;
    
    # Prepare a reference to the array that stores the data
    # Separate rows with commas and store in array
    # split returns an array, so
    # Dereference with @$items
    my $items = [];
    @$items = split(/,/, $line);

    # Since the first argument of the push function is an array, dereference it with @$recs.
    push @$recs, $items;
  }
  close $fh;
  return $recs;
}

(4) - 1 Receive file name as an argument

my $file = shift;

Give the file name as an argument to open and close the file in the subroutine.

(4) - 2 Creating an array of arrays to return as a return value

my $recs = []; # Prepare a reference to an array that stores multiple records

The final result will be as follows, but first create only the outermost frame.

$recs = [
  ['masao', 10, 'Japan'],
  ['taro', 20, 'USA'],
  ['rika', 38, 'France'],
]

(4) - 3 Processing in a while loop

I'm using while statement to create a Perl data structure.

while (my $line = <$fh>) {
  # Remove a line break
  chomp $line;

  # Prepare a reference to the array that stores the data
  my $items = [];

  # Separate rows with commas and store in array
  # split returns an array, so
  # Dereference with @$items
  @$items = split(/,/, $line);

  # Since the first argument of the push function is an array, dereference it with @{$recs}.
  push @$recs, $items;
}

(4) - 3 - a Creating a reference to an empty array

my $items = [];

At the end of the first while loop, it will be [masao, 10, Japan], but first create only the frame.

(4) - 3 - b Comma - separated string split

@$items = split(/,/, $line);

The split function returns an array, so to receive it, dereference $items to @$items.

(4) - 3 - c Add the created record to the reference to the array

push @$recs, $items;

Add $items to $recs. When the first while is finished, it will be as follows.

$recs = [
  ['masao', 10, 'Japan'],
]

The first argument of push receives an array and dereference it.

Rewrote the subroutine after being pointed out by kits

Put the data in my @recs rather than push @$recs
It's like returning @recs or \@recs at the end.
(I feel that the effort of dereference is wasted)

The following is a rewritten subroutine. As kits said, this one is much cleaner. Thank you very much.

sub parse_file {
  my $file = shift;
    
  open(my $fh, "<", $file)
    or die "Cannot open $file for read:$!";
    
  # Change to array
  my @recs;
  while (my $line = <$fh>) {
    # Remove a line break
chomp $line;
      
     # Change to array
     my @items;
     @items = split(/,/, $line);

     # Create a reference with [] when pushing.
     push @recs, [@items];
   }
   close $fh;
   return \@recs;
}

Related Informatrion