- Perl ›
- File input/output ›
- here
Read a CSV file and convert it to an array of arrays
This is an example that actually reads a CSV file, converts it to a Perl data type (array of arrays), and writes it to standard output. The conversion from csv to an array of arrays is done using subroutine. I think it will also be a basic template for reading and editing some kind of file.
CSV etc. including a line break etc. cannot be processed correctly by the following programs. If you want to process CSV correctly, Text::CSV_XS or There is Text::CSV::Encoded. Text::CSV::Encoded is a wrapper module of Text::CSV or Text::CSV_XS and can handle Japanese easily.
use strict; use warnings; # Argument processing my $file = shift; unless ($file) { # If there is no argument, indicate the usage and end. die "Usage: $0 file"; } # Parse the file and convert the csv format data to an array of arrays. my @recs = parse_file ($file); # Output (because it is a two-dimensional array, follow it with for) for my $items (@recs) { # Output by connecting with comma. print join(',', @$items) . "\n"; } # Function for file analysis (I just write it back this time ...) sub parse_file { my $file = shift; open(my $fh, "<", $file) or die "Cannot open $file for read:$!"; # Prepare a reference to an array that stores multiple records my $recs = []; while (my $line = <$fh>) { chomp $line; # Remove a line break # Prepare a reference to the array that stores the data # Separate rows with commas and store in array # split returns an array, so # Dereference with @$items my $items = []; @$items = split(/,/, $line); # The first argument of the push function is an array, so @$recs # And dereference. push @$recs, $items; } close $fh; return $recs; }
Below is an example of csv data. Create a CSV file, give it to the first argument of the script, and execute it.
masao, 10, Japan taro, 20, USA rika, 38, France
Code explanation
(1) Processing when the file name is not given in the first argument
unless ($file) { # If there is no argument, indicate the usage and end. die "Usage: $0 file"; }
Check the arguments as much as possible. If there is no file name in the first argument, indicate the usage and exit. $0 is the name of the script running in the predefined variable.
(2) Read the file and convert the csv format text to an array of arrays
my @recs = parse_file ($file);
parse_file is a self-made subroutine. Read the file and convert the csv format text to an array of arrays. The input and output images are as follows.
masao, 10, Japan taro, 20, USA rika, 38, France ↓ @recs = ( ['masao', 10, 'Japan'], ['taro', 20, 'USA'], ['rika', 38, 'France'], )
(3) Output in csv format
for my $items (@recs) { # Output by connecting with comma. print join(',', @$items) . "\n"; }
To output, repeat foreach statement for the number of records contained in @recs. Each record is concatenated with a comma with join function and output. Add a line break at the end.
(4) Explanation of the function that reads a file and converts csv format text into an array of arrays
sub parse_file { my $file = shift; open(my $fh, "<", $file) or die "Cannot open $file for read:$!"; # Prepare a reference to an array that stores multiple records my $recs = []; while (my $line = <$fh>) { # Remove a line break chomp $line; # Prepare a reference to the array that stores the data # Separate rows with commas and store in array # split returns an array, so # Dereference with @$items my $items = []; @$items = split(/,/, $line); # Since the first argument of the push function is an array, dereference it with @$recs. push @$recs, $items; } close $fh; return $recs; }
(4) - 1 Receive file name as an argument
my $file = shift;
Give the file name as an argument to open and close the file in the subroutine.
(4) - 2 Creating an array of arrays to return as a return value
my $recs = []; # Prepare a reference to an array that stores multiple records
The final result will be as follows, but first create only the outermost frame.
$recs = [ ['masao', 10, 'Japan'], ['taro', 20, 'USA'], ['rika', 38, 'France'], ]
(4) - 3 Processing in a while loop
I'm using while statement to create a Perl data structure.
while (my $line = <$fh>) { # Remove a line break chomp $line; # Prepare a reference to the array that stores the data my $items = []; # Separate rows with commas and store in array # split returns an array, so # Dereference with @$items @$items = split(/,/, $line); # Since the first argument of the push function is an array, dereference it with @{$recs}. push @$recs, $items; }
(4) - 3 - a Creating a reference to an empty array
my $items = [];
At the end of the first while loop, it will be [masao, 10, Japan], but first create only the frame.
(4) - 3 - b Comma - separated string split
@$items = split(/,/, $line);
The split function returns an array, so to receive it, dereference $items to @$items.
(4) - 3 - c Add the created record to the reference to the array
push @$recs, $items;
Add $items to $recs. When the first while is finished, it will be as follows.
$recs = [ ['masao', 10, 'Japan'], ]
The first argument of push receives an array and dereference it.
Rewrote the subroutine after being pointed out by kits
Put the data in my @recs rather than push @$recs It's like returning @recs or \@recs at the end. (I feel that the effort of dereference is wasted)
The following is a rewritten subroutine. As kits said, this one is much cleaner. Thank you very much.
sub parse_file { my $file = shift; open(my $fh, "<", $file) or die "Cannot open $file for read:$!"; # Change to array my @recs; while (my $line = <$fh>) { # Remove a line break chomp $line; # Change to array my @items; @items = split(/,/, $line); # Create a reference with [] when pushing. push @recs, [@items]; } close $fh; return \@recs; }