1. Perl
  2. here

References and multidimensional data structures

A reference represents something that points to data. It's easy to understand if you think of it as a pointer in C language.

Array reference

I will explain the array reference. Twice

Array

Array is created.

my @nums = (1, 2, 3);

Creating an array reference

Creating an array reference. References are created using the "\" symbol.

my @nums = (1, 2, 3);
my $nums = \@nums;

"Reference" means "what you point to." $nums points to @nums.

$nums - ->@nums

Creating an anonymous array reference

Creating an anonymous array reference. "[]" Is called an anonymous array generator.

my $nums = [1, 2, 3];

$nums points to an array called "(1, 2, 3)" that has no name.

$nums - ->(1, 2, 3)

It's easy to think of the anonymous array reference description as a simple way to create an array reference. The anonymous array generator eliminates the need to create an array to create an array reference.

# Creating an array reference
my @nums = (1, 2, 3);
my $nums = \@nums;

# Abbreviated notation above
my $nums = [1, 2, 3];

In Perl, the appearance of both arrays and array reference is one reason why the code is hard to read. It is important to read the code with a clear distinction between "array" and "array reference".

Array reference dereference

To retrieve an array from an array reference, you need to do dereference operation. Use "@{}" to dereference the array.

my @nums = @{$nums};

You can also omit the parentheses as follows:

my @nums = @$nums;

You can dereference and then get the elements of the array, but that's a bit inconvenient. Perl provides a way to get elements directly from an array reference. Use the arrow operator "->" to retrieve an element from an array reference.

my $first = $nums->[0];
my $second = $nums->[1];

Compare with the case of getting the elements of the array as follows. The difference is whether or not there is an arrow operator.

my $first = $nums[0];
my $second = $nums[1];

When programming in Perl, always be aware of whether it's an array or an array reference.

Array reference are also explained in detail on the following pages.

Two - dimensional array

Let's create a two-dimensional array in Perl. A characteristic of Perl arrays is that they can only have scalar values as elements. So you can't have an array as an element of an array like this:

# Wrong example of a two-dimensional array
my @person1 = ('Ken', 'Japan', 19);
my @person2 = ('Taro', 'USA', 45);

my @persons = (@person1, @person2);

@persons becomes a one-dimensional array of ('Ken', 'Japan', 19, 'Taro', 'USA', 45).

Only scalar values can be in the elements of the array. The array reference is a scalar value. So you can have an array reference as an element of the array. Use "\" to create an array reference.

my @person1 = ('Ken', 'Japan', 19);
my @person2 = ('Taro', 'USA', 45);

my @persons = (\@person1, \@person2);

Also, as the simplest notation, the outer array is also used as a reference, and it is common to write as follows.

my $people = [
  ['Ken', 'Japan', 19],
  ['Taro', 'USA', 45]
];;

The above notation is the most common representation of a two-dimensional array in Perl.

Array element reference of array

Let's access the elements of the array of the created array. This is an array reference that has an array reference as an element, so you can access it as follows:

my $name1 = $people->[0]->[0];
my $country1 = $people->[0]->[1];
my $age1 = $people->[0]->[2];

my $name2 = $people->[1]->[0];
my $country2 = $persons->[1]->[1];
my $age2 = $persons->[1]->[2];

The point is to use the arrow operator twice. In Perl, you can omit the second and subsequent arrow operators, so you can also write:

my $name1 = $people->[0][0];
my $country1 = $people->[0][1];
my $age1 = $people->[0][2];

my $name2 = $people->[1][0];
my $country2 = $persons->[1][1];
my $age2 = $persons->[1][2];

Two - dimensional array loop processing

Let's loop through a two-dimensional array.

Output of all elements

First, let's output all the elements.

for my $person (@$persons) {
  for my $column (@$person) {
    print "$column\n";
  }
}

The output is as follows.

Ken
Japan
19
Taro
USA
45

Perl's two-dimensional array is essentially an array reference that has an array reference as an element. So if you want to loop, you need to dereference and retrieve the array. This means that you need to dereference the outer loop as "@$persons" and the inner loop as "@$person".

Output all records separated by commas

Next, let's output all records separated by commas as shown below.

Ken, Japan, 19
Taro, USA, 45

You can use join function to join the records with the specified string. You don't need to write the inner loop.

for my $person (@$persons) {
  print join(',', @$person) . "\n";
}

Create a two - dimensional array from comma - separated files

Now consider creating a two-dimensional array from comma-separated files. Suppose you have the following files: Suppose the file name is "persons.txt".

Ken, Japan, 19
Taro, USA, 45

We'll use the open function to read the file, but this time let's use a single diamond operator to simplify the reading:In combination with while statement, you can read each line of the file received as an argument.

while (my $line = <>) {
  ...
}

When executing, pass the file name as a command line argument.

perl script.pl persons.txt

Now let's create a two-dimensional array from a comma-separated file.

my $people = [];
while (my $line = <>) {
  chomp $line;
  my @person = split(',', $line);
  push @$persons, \@person;
}

chomp is a function that removes a line break. Use the split function to decompose a string with the specified delimiter to create an array. Note that the first argument of the push function is an array.

You need to dereference and pass it like "@$persons". Also, the elements added by push must be an array reference. Generate an array reference like "\@person" and specify it as the second argument of the push function.

The created two-dimensional array is as follows.

my $people = [
  ['Ken', 'Japan', '19'],
  ['Taro', 'USA', '45']
];;

Separate commasIt's better to use tab delimiters

This time, the output was separated by commas, but if you use it for business, it will be easier to use if you connect them by separating them with tabs. If it is separated by commas, you have to be aware of the case where the characters contain commas. If it is a tab, it is rare that the tab is included in the string. Also, if you set it as a tab delimiter, you can paste it in Excel as it is.

print join("\t", @$person) . "\n";

Hash reference

I will explain the reference of hash.

Hash

Creating a hash.

my %person = (name =>'Ken', age => 19);

Creating a hash reference

Creating a hash reference. References are created using the "\" symbol.

my %person = (name =>'Ken', age => 19);
my $person = \%person;

"Reference" means "what you point to." $person points to%person.

$person - ->%person

Creating an anonymous hash reference

Creating an anonymous hash reference. "{}" Is called an anonymous hash generator.

my $person = {name =>'Ken', age => 19};

$person points to the hash "(name =>'Ken', age => 19)" which has no name.

$person - ->(name =>'Ken', age => 19)

It's easy to think of the anonymous hash reference description as a simple way to create a hash reference. Anonymous hash generators eliminate the need to create a hash to create a hash reference.

# Creating a hash reference
my %person = (name =>'Ken', age => 19);
my $person = \%person;

# Abbreviated notation above
my $person = {name =>'Ken', age => 19};

In Perl, the appearance of both hashes and hash reference is one reason why the code is hard to read. It is important to read the code with a clear distinction between "hash" and "hash reference".

Hash reference dereference

To retrieve a hash from a hash reference, you need to do something called dereference. Use "%{}" for hash dereference.

my %person = %{$person};

You can also omit the parentheses as follows:

my %person = %$person;

You can dereference and then get the hash element, but that's a bit inconvenient. Perl provides a way to get an element from a hash reference. Use the arrow operator "->" to retrieve an element from a hash reference.

my $name = $person->{name};
my $age = $person->{age};

Compare with the case of getting the hash element as follows. The difference is whether or not there is an arrow operator.

my $name = $person{name};
my $age = $person{age};

When programming in Perl, always be aware of whether it's a hash or a hash reference.

Hash reference are detailed individually on the following pages.

Hash array

Let's create an array of hashes in Perl. As explained in Array Array, you can only have scalar values as elements of an array. So you need to specify the hash reference as an element of the array.

my %person1 = (name =>'Ken', country =>'Japan', age => 19);
my %person2 = (name =>'Taro', country =>'USA', age => 45);

my @persons = (\%person1, \%person2);

Also, as the simplest notation, the outer array is also used as a reference, and it is common to write as follows.

my $people = [
  {name =>'Ken', country =>'Japan', age => 19},
  {name =>'Taro', country =>'USA', age => 45}
];;

The above notation is the most common representation of an array of hashes in Perl.

Let's loop through an array of hashes.

Output of all elements

First, let's output all the elements.

for my $person (@$persons) {
  for my $key (keys %$person) {
    my $value = $person->{$key};
    print "$key: $value\n";
  }
}

I think the output will be as follows.

country: Japan
name: Ken
age: 19
country: USA
name: Taro
age: 45

A Perl hash array is essentially an "array reference" with a "hash reference" as an element. The outer loop dereference the array like "@$persons". In the inner loop, the hash key is obtained using the keys function, and the value of the hash corresponding to the key is output. Because the keys function receives a hash, not a hash reference

keys %$person

You need to dereference it like this.

One caveat is that the order of the keys obtained with the keys function is indefinite. So if you want to determine the order of the output, use the sort function to sort the keys.

for my $person (@$persons) {
  for my $key (sort keys %$person) {
    my $value = $person->{$key};
    print "$key: $value\n";
  }
}

Output all records separated by commas

Next, let's output all records separated by commas as shown below.

Ken, Japan, 19
Taro, USA, 45

You can use the join function to join records with a specified string. Set the output value to an array called @rec and then join with the join function. You don't need to write the inner loop.

for my $person (@$persons) {
  my @rec = (
    $person->{name},
    $person->{country},
    $person->{age}
  );
    
  print join(',', @rec) . "\n";
}

Create an array of hashes from comma - separated files

Now consider, on the contrary, creating an array of hashes from a comma-separated file. Suppose you have the following files: Suppose the file name is "persons.txt".

Ken, Japan, 19
Taro, USA, 45

Now let's create an array of hashes from a comma separated file. Reading a file uses a single diamond operator, as described in Array Arrays.

my $people = [];
while (my $line = <>) {

  # (1) Delete a line break
  chomp $line;
  
  # (2) Make an array of comma-separated strings
  my @rec = split(',', $line);
  
  # (3) Create a hash reference
  my $person = {};
  $person->{name} = $rec[0];
  $person->{country} = $rec[1];
  $person->{age} = $rec[2];
    
  # (4) Added to array reference
  push @$persons, $person;
}

(1) chomp is a function that removes a line break. (2) Use the split function to decompose the string with the specified delimiter to create an array. (3) Create a hash reference and set the key and value. (4) Add to the array reference. The first argument of the push function receives an array, so you need to dereference it like "@$persons".

The resulting array of hashes looks like this:

my $people = [
  {name =>'Ken', country =>'Japan', age => '19'},
  {name =>'Taro', country =>'USA', age => '45'}
];;

Array hash

Next, I will explain about "array hash".

Creating an array hash

Creating an array hash. Let's assume a scene where hourly aggregation is performed.

# Time => [Number, average response time, maximum response time]
my $infos = {
  '01:01'=> [3, 2.1, 4.6],
  '01:02'=> [5, 4.1, 7.4],
  '01:03'=> [6, 3.5, 5.7]
};

There were 3 accesses at 1:01, with an average response time of 2.1 seconds and a maximum response time of 4.6 seconds. There were 5 accesses at 1:02, with an average response time of 4.1 seconds and a maximum response time of 7.4 seconds. There were 6 accesses at 1:03, with an average response time of 3.5 seconds and a maximum response time of 5.7 seconds. This is the data.

Output all elements

Let's output all the elements.

# (1) Outer loop
for my $time(sort keys %$infos) {
  print "$time\n";
  
  # (2) Inner loop
  for my $column (@{$infos->{$time}}) {
    print "$column\n";
  }
}

(1) An array hash is a hash that has an array as an element, so the outer loop processes the hash. The time is retrieved using the keys function.

sort keys %$infos

The reason for sorting like this is because we want to output in the order of time.

(2) The inner loop is an array. The inner array is

@{$infos->{$time}}

You can take it out like this.

The output result is as follows.

01:01
3
2.1
4.6
01:02
Five
4.1
7.4.
01:03
6
3.5
5.7

Output all records separated by commas

Next, let's output all records separated by commas as shown below.

01:01,3,2.1,4.6
01:02,5,4.1,7.4
01:03,6,3.5,5.7
for my $time(sort keys %$infos) {
    
  my @rec = ($time, @{$infos->{$time}});
  print join(',', @rec) . "\n";
}

Just join them with the join function, as explained in "Array of arrays" and "Array of hashes".

Create array hash from comma separated files

Now consider creating an array hash from a comma-separated file. Suppose you have the following files: Suppose the file name is "access.log". The first column is the time of access (hours: minutes) and the second column is the response time.

01:01,3
01:01,2
01:02,5
01:02,3
01:02,2
01:03,9
01:03,4
01:03,6
01:03,1

From such a file, let's create a hash of an array with "number of cases per time", "total response time", and "maximum response time" as data. The total response time is saved because the average response time can be calculated later by "Total response time/number of cases".

my $infos = {};
while (my $line = <>) {
  chomp $line;
  
  # (1) Get time and response time
  my ($time, $res_time) = split(',', $line);
  
  # (2) Array reference for storing data at each time
  $infos->{$time} ||= [];
  
  # (3) Total number of cases
  $infos->{$time}[0]++;
  
  # (4) Total response time
  $infos->{$time}[1]+= $res_time;

  # (5) Maximum response time
  $infos->{$time}[2] = $res_time
    if !defined $infos->{$time}[2] || $res_time > $infos->{$time}[2];
}

(1) Since each line is separated by commas, the time and response time are got by the split function. (2) Since the information for each time is stored in the array reference, it is initialized if it has not been initialized by the empty array reference yet. (3) Add the number of cases. (4) Sum the response time to the first element of the array. (5) Substitute the maximum response time for the second element of the array. Substitutes if there is no response time yet, or if the got response time exceeds the maximum response time in the past.

You can hash an array like this:

my $infos = {
    '01:03'=> [
         Four,
         20, 20,
         9
   ],,
    '01:01'=> [
         2,
         Five,
         3
   ],,
    '01:02'=> [
         3,
         Ten,
         Five
     ]
};

Hash hash

Next, I will explain about "hash of hash".

Creating a hash of hashes

Creating a hash of hashes. Imagine what you're doing when you're doing hourly aggregation.

my $infos = {
  '01:01' => {count => 3, ave_time => 2.1, max_time => 4.6},
  '01:02' => {count => 5, ave_time => 4.1, max_time => 7.4},
  '01:03'=> {count => 6, ave_time => 3.5, max_time => 5.7}
};

There were 3 accesses at 1:01, with an average response time of 2.1 seconds and a maximum response time of 4.6 seconds. There were 5 accesses at 1:02, with an average response time of 4.1 seconds and a maximum response time of 7.4 seconds. There were 6 accesses at 1:03, with an average response time of 3.5 seconds and a maximum response time of 5.7 seconds. This is the data.

Output all elements

Let's output all the elements.

for my $time(sort keys %$infos) {
  print "$time\n";
  
  # (2) Inner loop
  for my $name (sort keys %{$infos->{$time}}) {
    my $value = $infos->{$time}{$name};
    print "$name: $value\n";
  }
}

(1) A hash of a hash is a hash that has a hash as an element, so the outer loop processes the hash. The time is retrieved using the keys function.

(2) The inner loop is a hash. Even in the inner loop

keys %{$infos->{$time}}

The hash key is taken out like this.

The output result is as follows.

01:01
ave_time: 2.1
count: 3
max_time: 4.6
01:02
ave_time: 4.1
count: 5
max_time: 7.4
01:03
ave_time: 3.5
count: 6
max_time: 5.7

Output all records separated by commas

Next, let's output the data created above by separating all records with commas as shown below.

01:01,3,2.1,4.6
01:02,5,4.1,7.4
01:03,6,3.5,5.7
for my $time(sort keys %$infos) {
    
  my @rec = (
    $time,
    $infos->{$time}{count},
    $infos->{$time}{ave_time},
    $infos->{$time}{max_time}
  );
  print join(',', @rec) . "\n";
}

Just join them with the join function, as explained in "Array of arrays" and "Array of hashes".

Create a hash of hashes from comma - separated files

Now consider creating a hash of hashes from a comma-separated file. Suppose you have the following files: Suppose the file name is "access.log". The first column is the time of access (hours: minutes) and the second column is the response time.

01:01,3
01:01,2
01:02,5
01:02,3
01:02,2
01:03,9
01:03,4
01:03,6
01:03,1

From such a file, let's create a hash of a hash with the total number of cases per time, the total response time, and the maximum response time as data. The total response time is saved because the average response time can be calculated by "Total response time/number of cases".

my $infos = {};
while (my $line = <>) {
  chomp $line;
  
  # (1) Get time and response time
  my ($time, $res_time) = split(',', $line);
  
  # (2) Array reference for storing data at each time
  $infos->{$time} ||= {};
  
  # (3) Total number of cases
  $infos->{$time}{count}++;
  
  # (4) Total response time
  $infos->{$time}{total_time}+= $res_time;

  # (5) Maximum response time
  $infos->{$time}{max_time} = $res_time
    if !defined $infos->{$time}{max_time}
      || $res_time > $infos->{$time}{max_time};
}

(1) Since each line is separated by commas, the time and response time are got by the split function. (2) Since the information for each time is included in the hash reference, it is initialized if it has not been initialized with the empty hash reference yet. (3) The number of cases is added. (4) The response times are totaled. (5) Substitute the maximum response time. Substitutes if there is no response time yet, or if the got response time exceeds the maximum response time in the past.

You can hash the following hashes.

my $infos = {
    '01:03'=> {
        'max_time' => 9,
        'count' => 4,
        'total_time' => 20
    },
    '01:01'=> {
        'max_time' => 3,
        'count' => 2,
        'total_time' => 5
                     },
    '01:02'=> {
        'max_time' => 5,
        'count' => 3,
        'total_time' => 10
    }
};

Other reference

EndRini

Now you are free to work with arrays and hashes. When you can handle data freely, you can do a lot of things by programming. Also, as an application, it is a good idea to challenge yourself so that you can sort these data freely. Explanation of sort function will be helpful.

Related Informatrion