Integrated Computational Materials Engineering (ICME)

MatLab Import Data

Abstract

This example shows how to import the data from a text file to MATLAB. The file with information about positions of the atoms is given and the objective is to extract all useful information from the data file and omit the unneeded information. In this tutorial shown how to use such MATLAB functions as fgetl, textscan, fopen, fprintf and others.

You can download the text file for this tutorial using this link: Out.cu001_1.f

Author(s): Dmitry I. Zhuk, Mark A. Tschopp

Input File

File of some formatting is given and the objective is to get a certain data from it, discharging all the other information. The first 20 lines of the example is shown below.

 ITEM: TIMESTEP
           1
 ITEM: NUMBER OF ATOMS
        5828
 ITEM: BOX BOUNDS
  0.000000000000000E+000   109.424979441872     
  0.000000000000000E+000   218.533081144200     
  0.000000000000000E+000   4.05000019073486     
 ITEM: ATOMS
       1   1     53.069549      5.989693      2.024924     -3.319741      0.153287
       2   1     52.569822      2.273330      2.024912     -3.141638     13.948742
       3   1      1.577682     42.407913      4.050134     -3.359999      0.000006
       4   1      3.599188     42.331114      2.025135     -3.360004      0.000006
       5   1      1.432760     38.355783      4.050126     -3.359985      0.000016
       6   1      3.525962     40.304560      4.050129     -3.360000      0.000010
       7   1      1.504881     40.382048      2.025129     -3.359994      0.000010
       8   1      3.453345     38.277394      2.025125     -3.359991      0.000016
       9   1      1.291089     34.301725      4.050119     -3.359945      0.000038
      10   1      3.381448     36.249498      4.050122     -3.359977      0.000024
      11   1      1.361451     36.329039      2.025123     -3.359969      0.000024
      ...

From this file the useful information is number of atoms, box bounds, atom number (first column in the table), atom's x, y, z coordinates (third, forth and fifth columns correspondingly). MATLAB is used in this case get for future use this information.

MATLAB Script

File processing have been done by using the following MATLAB program. [Note: The working directory has to be changed to the folder where you have the text file used for importing data. Type in "CD {location}" command to change to the directory you want.]

clear all, close all;
filename = some.file.to.convert.f;
fid = fopen(filename);
fgetl(fid);
fgetl(fid);
fgetl(fid);
tline = fgetl(fid);
a = textscan(tline, '%d');
atoms = a{1};
fgetl(fid);
tline = fgetl(fid);
a = textscan(tline,'%f %f');
xlo  = a{1}; xhi = a{2};
tline = fgetl(fid);
a = textscan(tline,'%f %f');
ylo  = a{1}; yhi = a{2};
tline = fgetl(fid);
a = textscan(tline,'%f %f');
zlo  = a{1}; zhi = a{2};
fgetl(fid);
B = textscan(fid, '%d %*d %f %f %f %*f %*f', atoms);
atomID = B{1};
Xpos = B{2};
Ypos = B{3};
Zpos = B{4};
fclose(fid);

The following is the explanation of the script and command used.

clear all, close all;
filename = some.file.to.convert.f;
fid = fopen(filename);

First command closes all MATLAB previously opened files and cleans the memory. This is done to make sure that none of the previous work will interrupt with the processing. Second line assigns actual file this data to variable "filename" and the second opens it with fileID "fid". This fileID will be used later to identify on which command does command have to act.

fgetl(fid);
fgetl(fid);
fgetl(fid);
tline = fgetl(fid);

All the following commands will process input file line by line. The first 3 commands are used to skip first 3 lines of the input file, because there no use in this information. The forth command in this list assign the information in the line to a temporary variable tline for future scanning.

a = textscan(tline, '%d');
atoms = a{1};
fgetl(fid);
tline = fgetl(fid);
a = textscan(tline,'%f %f');
xlo  = a{1}; xhi = a{2};
tline = fgetl(fid);
a = textscan(tline,'%f %f');
ylo  = a{1}; yhi = a{2};
tline = fgetl(fid);
a = textscan(tline,'%f %f');
zlo  = a{1}; zhi = a{2};

Textscan command is looking for the information of format "%d" in saved line tline and assigns it's value to a variable(here - atoms, xlo, xhi, etc). The format %d means that this line should contain only one number of a format "d" which means integer, signed number. The other most commonly used formats includes f - floating-point number, s or q - character strings.

It is possible to define format using combination of numbers or types. For example, textscan command with format "%d %d %f" will look for 2 integer numbers with a floating-point number following them.

Also, some data can be excluded in textscan command by adding "*" sign between % and a field type. So the format "%d %*d %f" will make textscan command to look for 3 numbers in the line but not to include the second one in the results.

More information about field types and format syntax can be obtained in MATLAB help.

B = textscan(fid, '%d %*d %f %f %f %*f %*f', atoms);
atomID = B{1};
Xpos = B{2};
Ypos = B{3};
Zpos = B{4};
fclose(fid);

Textscan command can also be used to process a finite number of lines having the same formatting. Since in this file there is number of lines same as atoms, it is possible to use previously stored variable "atoms" to define how many lines textscan command have process.

Later, the information from a temporary table variable B in copied to atomID, Xpos, Ypos, Zpos to make it more useful for future works. The last command closes the data file.