I am excited to announce my third course on the data transformation and analytics tool, Alteryx#Designer.
With this course I close the series of three levels, Beginner, Intermediate and Advanced.
Do not forget to download your at the end of each level.
El curso avanzado abarca la utilización de 3 tipos de #macros to further automate the data cleaning and data quality processes. In addition, direct connections to databases will be used for total control over the information in real time. #realtimedata .
The course starts with a summary of the basic concepts exposed in past courses as a reminder, and then we will go into the formulas to see how they work at a deep level but with clear examples to understand the syntax of the tool.
Later, we will build workflows based on real situations and past experiences. We will take the best practices to solve common problems that the nature of information usually presents to obtain clean, orderly and high value information to work with.
Finally, when we obtain clean information, we will proceed to validate it to ensure that its degree of purity is the highest and identify those cases that need to be adjusted to make our workflow more intelligent.
I share with you my first course in Udemy.com which covers the initial bases necessary to be able to use the Alteryx Designer tool.
This course was created to instruct Spanish speakers in the use of this useful big data management tool. Many courses of different levels are available to the public, but at least on the Udemy platform this is the first to be taught natively in Spanish. In this way I seek to break down barriers for all those people whose language is a limitation.
From personal experience, this software is quite useful, and you can visually automate very complex processes. This is an added value for any project that involves an exorbitant amount of data, that has diverse sources of information, that requires trial and error to be able to execute the necessary actions and many more scenarios in which the common methods require complex processes.
This course covers the basic use of Alteryx Desginer oriented to beginners who wish to be introduced to the tool. The use of Alteryx optimizes the working time when transforming data.
It is an excellent tool when the data sources are distributed in several sources such as
-Multiple Excel files -Varias tablas de una base de datos -Manual input of information -Combinations of sources mentioned
It also facilitates the development of solutions by being a visual tool, and its fast processing optimizes the processes.
In this course, it is explained the characterization and operation of different components of the tool in order to understand the operation of the software and to be able to do multiple actions simultaneously. In addition to its use, different configurations will be explained in detail to obtain accurate results and valuable and refined information.
The generated reports can be exported in different formats to generate value and the processed information can be consulted on demand once the workflow has been saved, i.e. it can be executed again and again regardless of whether the source is updated, i.e. in real time.
Every day, more and more people sign up for it, and I'm glad they have a good experience. I will continue to update this course as needed and hope it will be the first of many.
REgex is an abbreviation for Regular Expression. REgex is a character pattern that is used to search for a specific text or a portion of text.
The purpose of the regex is usually to search for text matches and/or make replacements on this text. When working with a large text document or an extensive log file. REgex allows you to quickly search for the pattern you need.
The first thing we need to understand is that every letter, number and symbol is a character. A string is made up of several characters to form words or keys.
In REgex different symbols are used to identify the pattern we are looking for.
\d allows us to search for any number from 0 to 9.
\D will Match with any character other than a number.
In the image on the left the Regex matched the first number it found. In the middle, with 3 numbers in a row. And on the right, with the first character that was not numerical.
The tool allows to make tests in real time, selecting the text that coincides with the regex expression that we put as input.
Regex has several text identification tools. The important thing is to understand their syntax. There are several ways to search for the same character set. And it is up to each one to find the easiest way to do it.
Since in Regex the symbols on the keyboard have their own meaning, it is imperative to use an 'escape character' to be able to search for a symbol. If our expression looks for a point (.) we need to use the following syntax \. Otherwise the dot (.) itself is used in regex to search for 'any character'.
Another important example is inclusive and exclusive groups. This is done by using straight keys [ ].
Likewise the symbol (^) to deny a data set. Taking into consideration the previous example:
Below is a list of the syntax along with a link to a page that provides REgex exercises for practice.
When working with historical information from one or more older systems, which I will call as Legacys, there is a good chance that the information stored in these systems has some kind of error.
Occasionally, the user can save information with formatting errors, special characters, or any value that the system should validate before saving but is allowed to pass. On the other hand, it can happen that the stored information wants to be extracted to use it as reports, but, at the moment of extracting it, it can be noticed how the existence of certain characters within the information cause reading, loading and reporting errors.
When exporting a given range of columns from Legacy for use, it may be the case that the number of columns in the file does not match the number of items in the file header. For example:
Sale of supplies
Payment of: Water, Electricity, Telephone
, Auto Consumption
In this table we can see how line number 3 and 4 present the comma symbol ( ,) within their information. If this file is exported for reporting or auditing purposes to a .csv format or a .txt format, a delimiter would be used, this is known as a plain text file.
In the case where the delimiter is a comma to separate each field. The original file would be read.
Let's take a look at the following information line: 2019/09/26 , $-583.73 , Payment of: Water, Electricity, Telephone , 322525 , 3
As we can see, the comma is inside the description field, this field can be typed by a user manually and the comma is present to separate elements. The problem within this scenario arises when you want to read this information.
The programs that read and use this type of files ask to define the delimiter with which they must separate this long list of characters. If it is configured with a comma, it would read as follows:
Sale of supplies
Payment of: Water
The comma found within the information is read as a delimiter, causing the description column to run to the right and causing what is known as a Inconsistency of columns this causes the readers to show errors or when loading the information, its integrity is affected.
To solve this type of problems 2 possible solutions can apply. In the first instance you can use a delimiter that is very unlikely to be found in the information. For example a vertical bar ( | ). The data would be read:
In this way the delimiters within the quotation marks are omitted to keep the information consistent and can be read correctly.
In the last line: ‘2019/09/28′ | '$739.98' | ',Auto consumption'| '235234' | '4'. We note that there is a comma within ',Auto consumption'.. For these cases it is necessary to apply a cleaning of information so that these characters that have no reason to be, disappear.