loader

Month: #!31Fri, 23 Aug 2019 10:15:07 -0500p0731#31Fri, 23 Aug 2019 10:15:07 -0500p-10America/Guayaquil3131America/Guayaquilx31 23am31am-31Fri, 23 Aug 2019 10:15:07 -0500p10America/Guayaquil3131America/Guayaquilx312019Fri, 23 Aug 2019 10:15:07 -05001510158amFriday=345#!31Fri, 23 Aug 2019 10:15:07 -0500pAmerica/Guayaquil8#August 2019#!31Fri, 23 Aug 2019 10:15:07 -0500p0731#/31Fri, 23 Aug 2019 10:15:07 -0500p-10America/Guayaquil3131America/Guayaquilx31#!31Fri, 23 Aug 2019 10:15:07 -0500pAmerica/Guayaquil8#

Issues with delimiters within flat files

When working with historical information from one or more older systems, which I will call as Legacys, there is a good chance that the information stored in these systems has some kind of error.

Occasionally, the user can save information with formatting errors, special characters, or any value that the system should validate before saving but is allowed to pass. On the other hand, it can happen that the stored information wants to be extracted to use it as reports, but, at the moment of extracting it, it can be noticed how the existence of certain characters within the information cause reading, loading and reporting errors.

When exporting a given range of columns from Legacy for use, it may be the case that the number of columns in the file does not match the number of items in the file header. For example:

DateAmountDescriptionInvoiceID
2019/09/23$933.02Sale of supplies4253423531
2019/09/24$375.84Wiring Service2342432
2019/09/26$-583.73Payment of: Water, Electricity, Telephone3225253
2019/09/28$739.98, Auto Consumption2352344

In this table we can see how line number 3 and 4 present the comma symbol ( ,) within their information. If this file is exported for reporting or auditing purposes to a .csv format or a .txt format, a delimiter would be used, this is known as a plain text file.

In the case where the delimiter is a comma to separate each field. The original file would be read.

 Date , Amount , Description, Invoice ,ID
2019/09/23 , $933.02 , Sale of supplies , 425342353 , 1
2019/09/24 , $375.84 , Cabling Service , 234243 , 2
2019/09/26 , $-583.73 , Payment of: Water, Electricity, Telephone , 322525 , 3
2019/09/28 , $739.98 , , Auto consumption , 235234 , 4

Let's take a look at the following information line:
2019/09/26 , $-583.73 , Payment of: Water, Electricity, Telephone , 322525 , 3

As we can see, the comma is inside the description field, this field can be typed by a user manually and the comma is present to separate elements. The problem within this scenario arises when you want to read this information.

The programs that read and use this type of files ask to define the delimiter with which they must separate this long list of characters. If it is configured with a comma, it would read as follows:

DateAmountDescriptionInvoiceID
2019/09/23$933.02Sale of supplies4253423531
2019/09/24$375.84Wiring Service2342432
2019/09/26$-583.73Payment of: Water Electricity Phone 322525 3
2019/09/28$739.98 Auto Consumption 235234 4

The comma found within the information is read as a delimiter, causing the description column to run to the right and causing what is known as a Inconsistency of columns this causes the readers to show errors or when loading the information, its integrity is affected.

To solve this type of problems 2 possible solutions can apply. In the first instance you can use a delimiter that is very unlikely to be found in the information. For example a vertical bar ( | ). The data would be read:

Date| Amount| Description | Invoice |ID
2019/09/23 | $933.02 |  Sale of supplies  | 425342353 | 1 
2019/09/24 | $375.84 | Wiring Service | 234243 | 2 
2019/09/26 | $-583.73 | P Payment of: Water, Electricity, Telephone | 32252525 | 3 
2019/09/28 | $739.98 | | ,Auto consumption | 235234 | 4 

In this way we would avoid separation within the description. The second recommendation is based on enclosing each column in single quotes (' ') or double quotes.

"Date" | "Amount" | "Description"| "Invoice" | "ID"
"2019/09/23" | "$933.02" | "Sales of supplies" | "425342353" | "1" 
 "2019/09/24" | "$375.84" | "Cabling Service" | "234243" | "2" 
 "2019/09/26" | "$-583.73" | "Payment of: Water, Electricity, Telephone" | "322525" | "3" 
 "2019/09/28" | "$739.98" | ",Auto consumption" | "235234" | "4" 

In this way the delimiters within the quotation marks are omitted to keep the information consistent and can be read correctly.

In the last line:
2019/09/28′ | '$739.98' | ',Auto consumption'| '235234' | '4'.
We note that there is a comma within ',Auto consumption'.. For these cases it is necessary to apply a cleaning of information so that these characters that have no reason to be, disappear.

en_USEN
es_ESES en_USEN