2.1 Data Collection Techniques: Ection ATA Ools
2.1 Data Collection Techniques: Ection ATA Ools
TRADE
2-7
In the SDF format, each row contains one record and each field is a predefined size. No punctuation is used in the records and each record ends with a carriage return and line feed. This format is particularly useful for working with columnar data. An example of a file in SDF format is shown below. Smith Jones Tom John 25 41 234-43-5547 442-78-4531
The CDF format is widely used in spreadsheets and databases. In this format, each row again contains one record. However, fields are separated by a character, usually a comma, and character data are enclosed by punctuation marks, usually a double quote. Leading or trailing blanks in the data are trimmed off (i.e., fields may be varying lengths). Again, each record ends with a carriage return and line feed. The above file in CDF format, with comma separators and double quote delimiters, would be captured as: "Smith", "Tom", 25, "234-43-5547" "Jones", "John", 41, "442-78-4531" Although commas are most commonly used as separators and double quotes as delimiters, many software packages allow the user to specify the characters that are used. In some cases, use of an alternate character may be preferable. For example, FoxPro will output a tab delimited file that is particularly useful for outputting data that will be imported into a word processor. It should be noted that, in some software packages, the file extension is important when creating a text file for importing into a software package. For example, dBASE requires a .TXT extension for files that are being imported. Some general hints on downloading data from one system: 1. Make a backup copy of the downloaded file and the file it will be imported into BEFORE you do anything else.
2. Many database systems include some type of a unique identifying field for each record, e.g., an index number. If this field is accessible, including it in the download may be useful to facilitate later downloads to update information or include additional fields from the host database.
2-8
TRADE
3. When data are moved between software packages using formatted ASCII files, each field will be imported into a different column in a spreadsheet or into a different field in a database. If you are importing data into an existing database or spreadsheet, the order of the fields in the downloaded file needs to match the order of the fields in the existing database. The field sizes in the existing database must be at least as large as the corresponding fields in the downloaded data. 4. "Layered" spreadsheets (e.g., multidimensional .WK3 worksheets) do not import well into other programs. Frequently the layers are lost. It is better to save the spreadsheet in a different format and then try to import it.
5. Print a sample of the downloaded file (use a small font). Even though you can view the file on screen, some problems in data continuity are more apparent on the printed page. 6. If the file to be imported contains a large number of fields or extensive narrative data, keep in mind that each line in the file will be treated as a separate record. Depending on the system or the software package being used, there may be a limit to the line length. If this is the case, the data for a single record may wrap to more than one line, and subsequent lines need to be associated with the main record line during the import process. Downloading Data from PIDs The current design of DOEs Performance Indicator Data System (PIDS) provides the capability of downloading data in a formatted ASCII data file (CDF format). This data file is compatible with most spreadsheet and database programs. The delimited ASCII file download option from the PIDS report option automatically creates a file in the same format as the upload file that is used for submitting data to PIDS. Each line is a separate record. Fields are separated by commas, and all data are delimited with double quotes. All data in PIDS are treated as character data. The format for a performance indicator (PI) data record in PIDS is as follows: "year-quarter", "facility or contractor", "PI", "PI value", "change flag", "PI narrative" The characteristics of the fields are as follows: Year-Quarter Facility PI number PI value Change flag Narrative Character (4) Character (20) Character (8) Character (14) Character (2) Character (Unlimited)
TRADE
2-9
PI numbers or identifiers are stored without decimal points, e.g., PI 1.2 is stored as 12. All PI values are stored in PIDS as character data. In some cases, data may not be available. In the records where values are not available, the value is replaced by a code as follows: -1 Currently unavailable (CU) -2 Not available - security concerns (NAS) -3 Not applicable (NA) The format for a PIDS root cause (RC) data record is as follows: "year-quarter", "facility or contractor", "RPI", "root cause", "RC value", "change flat", "PI narrative". The characteristics of the fields are as follows: Year-Quarter Facility RPI number Root cause RC value Change flag Narrative Character (4) Character (20) Character (8) Character (4) Character (14) Character (2) Character (Unlimited)
The RPI number is the PI number or identifier, preceded by the letter "R", e.g. R12. When an error is detected after the data submission deadline, an errata form must be approved and submitted in order for data to be changed in PIDS. The format for an errata record is as follows: "year-quarter", "facility or contractor", "PI", "old PI value", "new PI value", "change flag", "PI narrative", "errata basis". The characteristics of the fields are as follows: Year-Quarter Facility PI number Old PI Value New PI value Change flag Narrative Errata basis
2 - 10
Character (4) Character (20) Character (8) Character (14) Character (14) Character (2) Character (Unlimited) Character (Unlimited)
TRADE
The use of the Internet has also become a valuable tool to collect, capture, and share information from different sources. Internet provides many capabilities, including the capability to transfer data files electronically. Large amounts of data can be transferred from one location to another in a matter of seconds. This capability can improve the timeliness of obtaining information necessary to support organizational performance measurement analyses. Many books and manuals are available that provide information on use of Internet.
TRADE
2 - 11
2 - 12
TRADE