Data transformation (computing): Difference between revisions

Content deleted Content added
No edit summary
Tags: Mobile edit Mobile web edit
Common nouns are not capitalised in English
Line 12:
When the data mapping is indirect via a mediating [[data model]], the process is also called '''data mediation'''.
 
==Data Transformationtransformation Processprocess==
Data transformation can be divided into the following steps, each applicable as needed based on the complexity of the transformation required.<br>
 
Line 35:
'''Data review''' is the final step in the process, which focuses on ensuring the output data meets the transformation requirements. It is typically the business user or final end-user of the data that performs this step. Any anomalies or errors in the data that are found and communicated back to the developer or data analyst as new requirements to be implemented in the transformation process.<ref name="cio.com"/>
 
==Types of Datadata Transformationtransformation==
 
===Batch Datadata Transformationtransformation===
Traditionally, data transformation has been a bulk or batch process,<ref name="tdwi.org">TDWI. 10 Rules for Real-Time Data Integration. Retrieved from: https://tdwi.org/Articles/2012/12/11/10-Rules-Real-Time-Data-Integration.aspx?Page=1</ref> whereby developers write code or implement transformation rules in a data integration tool, and then execute that code or those rules on large volumes of data.<ref name="andrefreitas.org">Tope Omitola, Andr´e Freitas, Edward Curry, Sean O'Riain, Nicholas Gibbins, and Nigel Shadbolt. Capturing Interactive Data Transformation Operations using Provenance Workflows Retrieved from: http://andrefreitas.org/papers/preprint_capturing%20interactive_data_transformation_eswc_highlights.pdf</ref> This process can follow the linear set of steps as described in the data transformation process above.
 
Line 44:
When data must be transformed and delivered with low latency, the term “microbatch” is often used.<ref name="tdwi.org"/> This refers to small batches of data (e.g. a small number of rows or small set of data objects) that can be processed very quickly and delivered to the target system when needed.
 
===Benefits of Batchbatch Datadata Transformationtransformation===
Traditional data transformation processes have served companies well for decades. The various tools and technologies (data profiling, data visualization, data cleansing, data integration etc.) have matured and most (if not all) enterprises transform enormous volumes of data that feed internal and external applications, data warehouses and other data stores.<ref name="The Value of Data Transformation">The Value of Data Transformation</ref>
 
===Limitations of Traditionaltraditional Datadata Transformationtransformation===
This traditional process also has limitations that hamper its overall efficiency and effectiveness.<ref name="cio.com"/><ref name="livinglab.mit.edu"/><ref name="andrefreitas.org"/>
 
Line 58:
There are companies that provide self-service data transformation tools. They are aiming to efficiently analyze, map and transform large volumes of data without the technical and process complexity that currently exists. While these companies use traditional batch transformation, their tools enable more interactivity for users through visual platforms and easily repeated scripts.<ref>{{Cite news|url=https://www.datanami.com/2016/05/31/self-service-prep-killer-app-big-data/|title=Why Self-Service Prep Is a Killer App for Big Data|date=2016-05-31|work=Datanami|access-date=2017-09-20|language=en-US}}</ref>
 
===Interactive Datadata Transformationtransformation===
Interactive data transformation (IDT)<ref>Tope Omitola , Andr´e Freitas , Edward Curry , Sean O’Riain , Nicholas Gibbins , and Nigel Shadbolt. Capturing Interactive Data Transformation Operations using Provenance Workflows Retrieved from: http://andrefreitas.org/papers/preprint_capturing%20interactive_data_transformation_eswc_highlights.pdf</ref> is an emerging capability that allows business analysts and business users the ability to directly interact with large datasets through a visual interface,<ref name="digital.lib.washington.edu"/> understand the characteristics of the data (via automated data profiling or visualization), and change or correct the data through simple interactions such as clicking or selecting certain elements of the data.<ref name="livinglab.mit.edu"/>
 
Line 99:
 
==See also==
 
* [https://en.wikiversity.org/wiki/Digital_Libraries/File_formats,_transformation,_migration File Formats, Transformation, and Migration] (related wikiversity article)
* [[Data cleansing|Data Cleansing]]