Content deleted Content added
No edit summary |
m →Databases: HTTP to HTTPS for Brown University |
||
(10 intermediate revisions by 5 users not shown) | |||
Line 2:
{{Short description|Software engineering approach to designing and developing information systems}}
{{Use mdy dates|date=August 2021}}
'''Data engineering'''
== History ==
Around the 1970s/1980s the term ''
In the early 2000s, the data and data tooling was generally held by the [[information technology]] (IT) teams in most companies.<ref name="hist2">{{cite web |last1=Dodds |first1=Eric |title=The History of the Data Engineering and the Megatrends |url=https://www.rudderstack.com/blog/the-data-engineering-megatrend-a-brief-history |website=Rudderstack |access-date=31 July 2022}}</ref> Other teams then used data for their work (e.g. reporting), and there was usually little overlap in data skillset between these parts of the business.
Line 21:
==== Databases ====
If the data is structured and some form of [[online transaction processing]] is required, then [[databases]] are generally used.<ref name="mit">{{cite web |title=Lecture Notes {{!}} Database Systems {{!}} Electrical Engineering and Computer Science {{!}} MIT OpenCourseWare |url=https://ocw.mit.edu/courses/6-830-database-systems-fall-2010/pages/lecture-notes/ |website=ocw.mit.edu |access-date=31 July 2022}}</ref> Originally mostly [[relational database]]s were used, with strong [[ACID]] transaction correctness guarantees; most relational databases use [[SQL]] for their queries. However, with the growth of data in the 2010s, [[NoSQL]] databases have also become popular since they [[Horizontal scaling#Horizontal and vertical scaling|horizontally scaled]] more easily than relational databases by giving up the ACID transaction guarantees, as well as reducing the [[object-relational impedance mismatch]].<ref name="leavitt">{{cite journal |
</ref><ref name="sigmodrecord">{{cite conference |first1=Andrew |last1=Pavlo |first2=Matthew |last2=Aslett |title=What's Really New with NewSQL? |book-title=SIGMOD Record |year=2016 |url=https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf |access-date=February 22, 2020}}
</ref><ref>{{cite web |url=https://cacm.acm.org/blogs/blog-cacm/109710-new-sql-an-alternative-to-nosql-and-old-sql-for-new-oltp-apps/fulltext |title=NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps |first=Michael |last=Stonebraker |publisher=Communications of the ACM Blog |date=June 16, 2011 |access-date=February 22, 2020}}</ref><ref name="high scalability">{{cite web |url=http://highscalability.com/blog/2012/9/24/google-spanners-most-surprising-revelation-nosql-is-out-and.html |title=Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In |first=Todd |last=Hoff |date=September 24, 2012 |access-date=February 22, 2020}}</ref>
Line 50:
=== Data modeling ===
{{
Data modeling is the analysis and representation of data requirements for an organisation. It produces a data model—an abstract representation that organises business concepts and the relationships and constraints between them. The resulting artefacts guide communication between business and technical stakeholders and inform database design.<ref name="simsionwitt">{{cite book |last1=Simsion |first1=Graeme |last2=Witt |first2=Graham |title=Data Modeling Essentials |edition=4th |publisher=Morgan Kaufmann |year=2015 |isbn=9780128002025}}</ref><ref name="date">{{cite book |last=Date |first=C. J. |title=An Introduction to Database Systems |edition=8th |publisher=Addison-Wesley |year=2004 |isbn=9780321197849}}</ref>
A common convention distinguishes three levels of models:<ref name="simsionwitt" />
* '''Conceptual model''' – a technology-independent view of the key business concepts and rules.
* '''Logical model''' – a detailed representation in a chosen paradigm (most commonly the relational model) specifying entities, attributes, keys, and integrity constraints.<ref name="date" />
* '''Physical model''' – an implementation-oriented design describing tables, indexes, partitioning, and other operational considerations.<ref name="date" />
Approaches include entity–relationship (ER) modeling for operational systems,<ref name="chen1976">{{cite journal |last=Chen |first=Peter P. |title=The Entity–Relationship Model—Toward a Unified View of Data |journal=ACM Transactions on Database Systems |volume=1 |issue=1 |year=1976 |pages=9–36 |doi=10.1145/320434.320440}}</ref> dimensional modeling for analytics and data warehousing,<ref name="kimball">{{cite book |last1=Kimball |first1=Ralph |last2=Ross |first2=Margy |title=The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling |edition=3rd |publisher=Wiley |year=2013 |isbn=9781118530801}}</ref> and the use of UML class diagrams to express conceptual or logical models in general-purpose modeling tools.<ref name="uml">{{cite report |title=Unified Modeling Language (UML) Version 2.5.1 |publisher=Object Management Group |date=2017 |url=https://www.omg.org/spec/UML/2.5.1/}}</ref>
Well-formed data models aim to improve data quality and interoperability by applying clear naming standards, normalisation, and integrity constraints.<ref name="date" /><ref name="simsionwitt" />
== Roles ==
=== Data engineer ===
A ''' data engineer''' is a type of software engineer who creates [[big data]] [[Extract, transform, load|ETL]] pipelines to manage the flow of data through the organization. This makes it possible to take huge amounts of data and translate it into [[business intelligence|insights]].<ref>{{
=== Data scientist ===
Line 72 ⟶ 81:
==Further reading==
{{refbegin|2}}
*
*
*
* Ian Macdonald (1986). "Information engineering". in: ''Information Systems Design Methodologies''. T.W. Olle et al. (ed.). North-Holland.
* Ian Macdonald (1988). "Automating the Information engineering methodology with the Information Engineering Facility". In: ''Computerized Assistance during the Information Systems Life Cycle''. [[T.W. Olle]] et al. (ed.). North-Holland.
* [[James Martin (author)|James Martin]] and [[Clive Finkelstein]]. (1981). ''Information engineering''. Technical Report (2 volumes), Savant Institute, Carnforth, Lancs, UK.
* James Martin (1989). ''Information engineering''. (3 volumes), Prentice-Hall Inc.
*
* {{cite book |last1=Reis |first1=Joe |last2=Housley |first2=Matt |title=Fundamentals of Data Engineering |date=2022 |publisher=O'Reilly Media |isbn=978-1-0981-0827-4 }}
{{refend}}
|