Problemas con delimitadores dentro de archivos planos
When working with historical information from one or several old systems, which I will call Legacy, there is a great possibility that the information stored in these systems may have some kind of error.
Sometimes, the user may save information with errors, whether they are formatting errors, special characters or any value that the system should validate before saving but is allowed to pass. On the other hand, it may happen that the stored information wants to be extracted to be used as reports, but, at the moment of extracting it, it can be noticed how the existence of certain characters within the information causes reading, loading, and reporting errors.
When exporting a given range of columns from a Legacy for use, it may be the case that the number of columns in the file does not match the number of elements in the file header. For example:
Fecha | Monto | Descripción | Factura | ID |
2019/09/23 | $933.02 | Venta de insumos | 425342353 | 1 |
2019/09/24 | $375.84 | Servicio de Cableado | 234243 | 2 |
2019/09/26 | $-583.73 | Pago de: Agua, Luz, Teléfono | 322525 | 3 |
2019/09/28 | $739.98 | , Auto consumo | 235234 | 4 |
In this table, we can see how lines number 3 and 4 present the comma symbol ( ,) within their information. If this file were to be exported for reporting or auditing purposes to a .csv or .txt format, a delimiter would be used, this is known as a plain text file.
In the case where the delimiter is a comma to separate each field. The original file would read.
Date , Amount , Description , Invoice ,ID
2019/09/23 , $933.02 , Sale of supplies , 425342353 , 1
2019/09/24 , $375.84 , Wiring Service , 234243 , 2
2019/09/26 , $-583.73 , Payment of: Water, Electricity, Telephone , 322525 , 3
2019/09/28 , $739.98 , , , Self consumption , 235234 , 4
Let’s take main attention on the following line of information:
2019/09/26 , $-583.73 , Payment of: Water, Electricity, Phone , 322525 , 3
As we can see, the comma is inside the description field, this field can be typed by a user manually and the comma is present to separate elements. The problem in this scenario arises when trying to read this information.
The programs that read and use this type of file ask to define the delimiter with which they must separate this long list of characters. If it is configured with a comma, it would be read as follows: