0% found this document useful (0 votes)
133 views

Software For Engineer Design

This document provides an overview of data acquisition, manipulation, and importing in Matlab. It discusses different file formats for datasets, including CSV and XLS files. It describes preprocessing data outside of Matlab using spreadsheet programs. The document reviews importing data into Matlab using the import wizard and discusses considerations for numerical versus string data types.

Uploaded by

Bayar Jargal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views

Software For Engineer Design

This document provides an overview of data acquisition, manipulation, and importing in Matlab. It discusses different file formats for datasets, including CSV and XLS files. It describes preprocessing data outside of Matlab using spreadsheet programs. The document reviews importing data into Matlab using the import wizard and discusses considerations for numerical versus string data types.

Uploaded by

Bayar Jargal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

www.zaluu.com www.zaluu.

com

Software for Engineer Design

Lecture 08

www.zaluu.com www.zaluu.com
Lecture 08

Dataset
Subjects:

Tables of data Importing Data Numerical data versus String data

Keyword
Cell, header, non-numerical data, import, format.

Abstract
This lecture focuses on data acquisition, manipulation, plotting, and meshing.

www.zaluu.com www.zaluu.com
Lecture 08
8.1 Datasets
Finding data sets is not easy, and it is just as difficult to create your own, especially when the data is geographical and/or social in nature. The U.S. government is comprised of hundreds of agencies that produce a vast amount of statistical data related to all sectors of society. A few selected links follow: U.S. Department of Education U.S. Department of Health U.S. Census Bureau U.S. Department of Commerce U.S. Department of Labor Gateway to statistics from over 100 U.S. Federal Agencies Some of these sites publish ready-made PDF files with tabulated entries. Others have customdesigned interfaces that allow for access to data. Unfortunately, neither the sites, nor the ready-made material, nor the query systems follow any guidelines. Another source to consider are hard-copy publications, for example: Statistical Abstract of the United States: 2006, The National Data Book, 125th Edition, US Cencus BureauThis book contains statistics from all sectors of society, including consumption, production, education, disabilities, etc. This book is available in the library.

8.2 Data Files


8.2.1 Tables of Data Statistical data tends to come in a tabular format, e.g.:
Date 7/8/1997 8/5/2000 ... Location New York Los Angeles ... Item Apple Banana ... Price in cents 59 69 ...

The first row in this example is considered a "header", and the other rows are observations. Some more complicated tables may have several lines of headers, and may include subtables. For the purpose of using this data in Matlab, it is recommended that tables be reformatted as in this example. If necessary, remove undesired entries, move data around, and possibly merge or split data. A good amount can be done through spreadsheets (e.g. Microsoft Excel), if the data set is small enough. 8.2.2 File Format: CSV Comma Separated Value files (CSV) are text files that contain human-readable data. Special delimiters (commas, tabs, carriage returns, quotes) are put in place for separating columns and rows. Glancing over such a data file may not reveal a sensible structure, but once loaded into a spreadsheet application, columns and rows can be identified more easily. A typical comma-separated file may look as follows:
Date,Location,Item,Price in cents 7/8/1997,New York,Apple,59

www.zaluu.com www.zaluu.com
Lecture 08
8/5/2000,Los Angeles,Banana,69 ...

Note that every field in this example is separated by a comma, hence Comma Separated Value. Sometimes, individual values are enclosed in double quotes:
"Date","Location","Item","Price in cents" "7/8/1997","New York","Apple","59" "8/5/2000","Los Angeles","Banana","69" ...

Matlab is able to import from CSV files, as this is the most universally portable file format. 8.2.3 File Format: XLS Proprietary spreadsheet file formats that Matlab can import include MS Excel XLS files. Unless errors occur when importing an XLS file, no pre-processing is required. Should errors occur, it is recommended to inspect the file using MS Excel, and re-saving it. If this does not help, the XLS data should be exported to CSV format within Excel and imported in Matlab. 8.2.4 File Format: MAT MAT is Matlab's own format for storing data. It is possible to save an entire Workspace of data, or selected matrices. It is unlikely that statistics are distributed in this format. 8.2.5 Size issues MS Excel imposes a size limit on a single data set table. This limit is: 65536 rows and 256 columns (i.e. 216 * 28 = 224 (16,777,216) cells). Matlab does not impose a pro forma limit on the dimensions and size of matrices. Keep in mind that hardware memory and hard disk size (for swapping memory) are the ultimate deciding factors of how much data can be loaded. For comparison, a reasonable size of data that can be loaded into Matlab exceeds the capabilities of Excel by far. It is thus possible to read in numerical values for a million data rows of five columns. 8.2.6 Pre-processing data with Matlab If the size of a data set exceeds the limitations of Excel or other applications, it must be preprocessed in Matlab. This may include one or more of the following steps: Cleaning of data by removing rows. Splitting data: if a data set contains data from several types of observations, filtering and splitting of the data may be necessary. This can be done by iterating over the data set and selectively moving data rows to other matrices. For example, if a dataset includes observations for U.S. states, U.S. regions, and U.S. cities, the three types may have to be moved to 3 different matrices. Sorting data: data sets can be sorted using the command sortrows. An example of processing data with Matlab can be found in the collection of M-files in the beginning of this lecture.

www.zaluu.com www.zaluu.com
Lecture 08
8.2.7 Pre-processing data outside of Matlab When possible, it is recommended to prepare datasets in a spreadsheet program before importing in Matlab. While Matlab does have a spreadsheet-like editor, it is not meant to replace a spreadsheet program. To prepare data in a spreadsheet, keep in mind that each column should maintain the same data type (double, int, string, ...). Columns (or row) headers should be distinguishable, and preferrably one per column (or row). Try to refrain from merging cells.

8.3 Importing Data


Matlab has several command-line functions that can be used to import many data types, including CSV files. However, for simplicity we will use the graphical interface. In the "Current Directory" file listing, highlight a data file. At this point, we can either use "File->Import", or open the context menu and choose "Import Data".

Figure 8.1 Click image to enlarge, or click here to open

Depending on the file type, the Import Wizard may start at different points in the import process. When importing from an XLS file, only the last of the Import Wizard screens appears. When importing from a CSV file, the process is slightly longer.

www.zaluu.com www.zaluu.com
Lecture 08
The first page of the wizard displays a portion of the text file, as well as a preview of the matrixversion of the data.

Figure 8.2 Click image to enlarge, or click here to open

The preview is broken into 2 spreadsheets: one for "data" and one for "textdata". "data" refers to numerical data, while "textdata" refers to anything that is not unambiguously numerical. That is, the strings "my house" and "60m" are considered "textdata", while "4" and "624.92746" are considered "data". Matlab distinguishes between the two and does not allow mixing of these data types in matrices. When importing a CSV file with numerical and textual data, Matlab thus splits the data and creates two matrices, one for each data type. More on the differences is discussed below.

Figure 8.3 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
Several data-specific decisions have to be made on the first page. Under "Select Column Separator(s)", select the delimiter that delimits columns. In many cases, this is the comma. Under "Number of text header lines", select the number of non-data rows that appear in the beginning of your data file. The text header lines will then not appear in the numerical data matrix. Note that the preview on the first page may not accurately depict the final matrices. The second screen of the Import Wizard shows a more realistic version of the final matrices. The second page of the Import Wizard displays a preview of the parsed matrices from the data file. From here, matrices can be renamed and excluded for the final import. It is recommended to inspect all matrices and their sizes before proceeding with the import. When ready, hit the "Finish" button.

Figure 8.4 Click image to enlarge, or click here to open

Figure 8.5 Click image to enlarge, or click here to open

After the import process is complete, the imported matrices will appear in the workspace. It is sometimes desirable to create sub-matrices out of the imported ones, especially if the import process did not successfully interpret all of the data. For example, the file regions.csv clearly contains column headers (region name), row headers (dates), and numerical values (price in cents). During the import process row and column headers were not identified, and instead were cast into one large text matrix. The following expressions disect the text matrix for easy processing later:

www.zaluu.com www.zaluu.com
Lecture 08
The first row of the textdata matrix contains column headers, including the field "Date" and region names. We extract the region names: regions=textdata(1,2:size(tex tdata,2)); The first column of the textdata matrix contains row headers, including the field "Date" and individual dates. We extract the dates: dates=textdata(2:size(textdat a,1),1); There is no need to further process the data matrix.

Figure 8.6 Click image to enlarge, or click here to open

8.4 Numerical data versus String data


Matlab uses several data types for differently typed data, and depending on the type, certain operations are allowed and others are not. The predominant type is "Double", which can be used for any numerical data: real, rational, and natural. "Uint8" is another numerical type, which is constrained to natural numbers in the range of 0..255. "Char" is a data type used for non-numerical data, i.e. textual data. Most imported data sets contain textual data, such as header lines. Manipulation of "Cell" data is somewhat different from numerical data. Below is a comparison of numerical versus textual data: Operation Scalar type Assignment Numerical Double a=5 Textual Char a='hello' Cell b={'abc','def','ghi'} b{2} c={'abc','def','ghi';'jkl','mno','pqr'}

Multidimensional type Double Vector Indexing Matrix b=[1,5,2] b(2) c=[1,2,3;4,5,6]

www.zaluu.com www.zaluu.com
Lecture 08
Addition d=3+4 d=strcat('ab','cd') num2str(6)

Conversion in between str2num('52.23') (Example)


Table 8.1

s='52.23'; 5+str2num(s), i=6; strcat('Hello Nr.', num2str(i))

Figure 8.7: Doubles, Chars, and Cells Click image to enlarge, or click here to open

Figure 8.8: Addition and Concatenation Click image to enlarge, or click here to open

There are many other functions by which char and cell can be manipulated. The main application for text manipulation for our purposes is plotting and meshing, especially for assigning x,y,z labels, titles, etc. For example, when column headers have been imported as textual data (cells), we are now able to manipulate and use them for the purpose of building bar graphs, plots, etc.

www.zaluu.com www.zaluu.com
Lecture 08
Figure 8.9: Indexing Click image to enlarge, or click here to open

8.5 Plotting
In Matlab, plotting refers to producing 2-dimensional graphs, while meshing refers to 3dimensional graphs. Since a 2-dimensional graph is merely a collection of points, the command plot takes as input a vector and simply plots the numbers. Given vector Y, the command plot(Y) plots the point in the vector. Without passing a separate vector with x-values, each point in vector Y is mapped linearly to a point on the xaxis. For example, if Y = [10, 7, -9, 0, 1] , then the corresponding X values are [1, 2, 3, 4, 5] , respectively. If this scale is not desirable, an X vector with a different scale can be passed as an argument to the function plot.

Figure 8.10 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
Given vectors X and Y, where X contains regularly or irregularly spaced sample points on the X axis, and Y contains the corresponding values in the Y direction, the command plot(X,Y) plots a graph of Y with the scale of X.

Figure 8.11 Click image to enlarge, or click here to open

For example, consider the following data points: y=[16,50,70,104,106,104,95,80,67,59,87,124,153,157,144,127,... 109,90,71,100,134,163,178,179,174,161,141,117,93,76,89,105,... 123,140,153,156,144,128,106,86,65,48,30,17,24,29,25,21,16,7]; Plotted in regular (default) scale assigns each data point to a proportionally increasing (+ 1) x value: plot(y);

Figure 8.12

www.zaluu.com www.zaluu.com
Lecture 08
Click image to enlarge, or click here to open

x=[10,5,3,2,9,14,17,20,25,27,28,29,30,38,45,49,52,54,58,... 59,60,62,66,72,78,81,82,84,87,90,97,102,106,109,112,119,... 125,128,126,122,118,117,121,134,154,174,190,194,194,185]; Given a different data range corresponding to the same y-data the graph exhibits distinct differences. plot(x,y);

Figure 8.13 Click image to enlarge, or click here to open

String modifiers can be used to change color, data point, and line styles. A summary is given in table 8.2. Colors b blue g green r red c cyan Point style . point o circle x x-mark + plus Line style - solid : dotted -. dashdot -- dashed (none) no line

m magenta * star y yellow k black s square d diamond

www.zaluu.com www.zaluu.com
Lecture 08
v triangle (down) ^ triangle (up) < triangle (left) > triangle (right) p pentagram h hexagram
Table 8.2

At most one modifier can be taken from each column and concatenated to result in a unique line/point/color style. For example: plot(x,y,'r');

Figure 8.14 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
plot(x,y,'g:');

Figure 8.15 Click image to enlarge, or click here to open

plot(x,y,'md:');

Figure 8.16 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
A figure's background color can be changed using the command whitebg . For example: whitebg('k')

Figure 8.17 Click image to enlarge, or click here to open

whitebg('y')

Figure 8.18 Click image to enlarge, or click here to open

8.5.1 Bar Charts 2D bar graphs plot data points in terms of their area. Given some random data: xBar=[1:10];

www.zaluu.com www.zaluu.com
Lecture 08
yBar=rand(1,10) * 100; a bar chart is produced using: bar(xBar,yBar);

Figure 8.19 Click image to enlarge, or click here to open

2D bar charts can also be created for matrices, in which case each row in the matrix is considered as one group of bars. The resulting graph distinguishes matrix columns with different colors. yBar=rand(7,3); bar(yBar);

Figure 8.20 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
3D bar graphs are easily obtained from matrices as well, using the bar3 command: bar3(yBar)

Figure 8.21 Click image to enlarge, or click here to open

8.5.2 Labels Common properties of all figures, whether 2D or 3D, plots, bar graphs, meshed, etc. are axes labels, titles. Every figure should be properly labeled for clarity. Axes labels can be assigned using commands xlabel , ylabel , or zlabel , selectively or in combination: plot(x, y, 'r*.'), xlabel('South'), ylabel('West')

Figure 8.22 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
A title is added by using the command title : plot(x, y, 'r*.'), xlabel('South'), ylabel('West'), title('Mysterious Constellation of a Waiving Hand');

Figure 8.23 Click image to enlarge, or click here to open

By default, the data range of x is used for labeling individual tick marks on the x-axis. Alternatively, named values can be used as replacements. x=1:8; y=rand(1,8) * 100; plot(x,y); set(gca, 'XTickLabel', {'Earth', 'Mercure', 'Saturn', 'Venus', 'Pluto', 'Neptune', 'Mars', 'Jupiter'}) The command set in this case changes property XTickLabel for figue handle gca (default figure) to a

Figure 8.24 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
vector of strings. Using the data imported from regions.csv, we plot the data matrix, and assign labels from the previously created vector dates: plot(data(:,1)); set(gca, 'XTickLabel', dates); This plot does not, however, exhibit the correct labels. Because of the large number of labels, Matlab decides to space them apart, seemingly irrationally.

Figure 8.25 Click image to enlarge, or click here to open

To display all tick marks on the x-axis, the following series of commands are necessary: plot(data(:,1)); set(gca, 'XTick', 1:length(dates)); However, this x-axis is not readable, which is my Matlab distributed the tick marks in the first place.

Figure 8.26 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
The following expressions help in spacing out the tick marks, while maintaining the correct index into the label vector: plot(data(:,1)); set(gca, 'XTick', 20:50:length(dates)) ; Essentially, starting at label index 21, every 50th label index is used for tick marks.

Figure 8.27 Click image to enlarge, or click here to open

To show the actual dates for these tick marks, we replace them using the XTickLabel feature: plot(data(:,1)); set(gca, 'XTick', 20:50:length(dates)); set(gca, 'XTickLabel', dates(20:50:length(dates )));

Figure 8.28 Click image to enlarge, or click here to open

In cases where the x-axis is labeled with long strings per tick mark, it is desirable to use slanted labels. While Matlab's plot function does not allow for rotation of labels, there exist functions that replace the mechanism by which labels are placed on the x-axis. One such function can be downloaded here:

www.zaluu.com www.zaluu.com
Lecture 08
xticklabel_rotate.m Given an existing plot with numerical labels on the xaxis, the function
xticklabel_rotate

rotates all labels by 90o. a=rand(1,30); plot(a); xticklabel_rotate;

Figure 8.29 Click image to enlarge, or click here to open

To rotate the labels for a different amount, the degree can be passed as a second parameter. The first parameter in this example remains empty (empty set []). This signifies that the x-ticks or labels should not be changed, but merely rotated. plot(a); xticklabel_rotate([] , 45); Note: Once the function xticklabel_rotate has been applied once to a given graph, it cannot be applied again. The plot command needs to be reexecuted, and xticklabel_rotate needs to be

Figure 8.30 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
called again. If it is desirable to use different x-tick spacings, as discussed above (see Figures 8.26, 8.27, and 8.28), the function xticklabel_rotate can be used instead of the function set(gca, ...). xticklabel_rotate can set the vector of x-ticks as well as the labels, whether numerical or text. Using xticklabel_rotate on the dataset of gasoline prices, it makes sense to space the x-tick marks farther apart, because there are too many to fit on the x-axis. We pass an indexed vector as a first parameter to space out the x-tick marks: plot(data(:,1)); xticklabel_rotate(20:15:size (data,1), 45); This displays and rotates every 15th tick label on the x-axis.

Figure 8.31 Click image to enlarge, or click here to open

Finally, to display dates (textual data) as opposed to numerical labels, we pass the cell vector as a third parameter. The cell vector is properly indexed to match the xtick vector (first parameter): plot(data(:,1)); xticklabel_rotate(20:15:size (data,1), 45, dates(20:15:size(dates)));

Figure 8.32

www.zaluu.com www.zaluu.com
Lecture 08
Click image to enlarge, or click here to open

8.5.3 Overlaying plots Several plots can be placed in the same figure by overlaying them. The simplest approach is to plot a matrix of values, in which each column is interpreted as one vector. y=rand(10, 3); plot(y);

Figure 8.33 Click image to enlarge, or click here to open

For the example of gasoline prices in regions.csv: plot(data);

Figure 8.34

www.zaluu.com www.zaluu.com
Lecture 08
Click image to enlarge, or click here to open

Alternatively, vectors can also be placed in the same graph individually by using the hold on and hold off functions: hold on; plot(data(:,2),'r') ; plot(data(:,4),'g') ; plot(data(:,6),'b') ; plot(data(:,8),'y') ; hold off;
Figure 8.35 Click image to enlarge, or click here to open

For multi-line graphs, legends can be added for descriptive purposes. The function legend takes as many string arguments as there are plots, and assigns each string to a plot, in the order in which they were placed in the graph: legend('the red graph', 'the green graph', 'the blue graph', 'the yellow graph');
Figure 8.36 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
Using the actual labels from textdata: legend(regions(1,2:2:8 ));

Figure 8.37 Click image to enlarge, or click here to open

8.5.4 Meshing (3d graphs) 3D graphs are generated using the function mesh or surf . Given a 2D matrix of values, each value is used as a z-value (elevation), and placed in a 3D view. Given a function of sine and cosine: z=[]; for i=1:100 for j=1:100 z(i,j) = sin(i/10) + cos(j/10); end end

www.zaluu.com www.zaluu.com
Lecture 08
mesh(z); creates a mesh (with holes)

Figure 8.38 Click image to enlarge, or click here to open

surf(z); creates a mesh with filled patches (a surface).

Figure 8.39 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
For the data set of gasoline prices per region, a viable mesh would be: mesh(data);

Figure 8.40 Click image to enlarge, or click here to open

And labels, and titles can be added as appropriate: mesh(data); title('Gasoline prices in U.S. regions'); xlabel('Region'); ylabel('Date'); zlabel('Price in cents'); set(gca, 'XTick', 1:length(regions)); set(gca, 'XTickLabel', regions); set(gca, 'YTick', 20:50:length(dates)); set(gca, 'YTickLabel', dates(20:50:length(dates )));
Figure 8.41 Click image to enlarge, or click here to open

www.zaluu.com www.zaluu.com
Lecture 08
8.5.5 Multiple plots Using the subplot function, it is possible to generate separate plots in a grid of figures.
SUBPLOT(M, N, P) creates a grid of figures for M rows, N columns, and fills the Pth cell

with the next figure. Example: subplot(3, 2, 1), plot(rand(1, 10)); subplot(3, 2, 2), bar(rand(1, 10)); subplot(3, 2, 3), surf(rand(20)); subplot(3, 2, 4), hist(rand(50)); subplot(3, 2, 5), plot(sin([0:0.1:10])) ; subplot(3, 2, 6), plot(rand(1,100),'gd: ');
Figure 8.42 Click image to enlarge, or click here to open

Each figure can be assigned its own labels and titles:

Figure 8.43

www.zaluu.com www.zaluu.com
Lecture 08
Click image to enlarge, or click here to open

subplot(3, 2, 1), plot(rand(1, 10)), xlabel('x-axis 1'), ylabel('y-axis 1'), title('random line'); subplot(3, 2, 2), bar(rand(1, 10)), xlabel('x-axis 2'), ylabel('y-axis 2'), title('random bars'); subplot(3, 2, 3), surf(rand(20)), xlabel('x-axis 3'), ylabel('y-axis 3'), zlabel('z-axis 3'), title('random surface'); subplot(3, 2, 4), hist(rand(50)), xlabel('x-axis 4'), ylabel('y-axis 4'), title('random histogram'); subplot(3, 2, 5), plot(sin([0:0.1:10])), xlabel('x-axis 5'), ylabel('y-axis 5'), title('sine wave'); subplot(3, 2, 6), plot(rand(1,100),'gd:'), xlabel('x-axis 6'), ylabel('y-axis 6'), title('Matrix, the Movie');

www.zaluu.com www.zaluu.com
Lecture 08

Links
http://202.5.195.17/emust/web/ http://uranchimeg.com/Education/?page_id=1534 http://www.aquaphoenix.com/

You might also like