Talk:Sixth normal form

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconDatabases Unassessed (inactive)
WikiProject iconThis article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.
???This article has not yet received a rating on Wikipedia's content assessment scale.

Difference between Sixth normal form and Domain-key normal form unclear[edit]

Let me start by saying that I'm not a database expert, but I am a software developer. I think all of these pages could be more clear, but I am able to understand what DKNF is. Reading this 6NF page, however, I do not understand what it is at all.

The page says "Not to be confused with Domain-key normal form." but does not say why. Futhermore, it says "The term 6NF has historically also been used to refer to another normalization degree, which today is more commonly known as domain-key normal form (DKNF) (see Other meanings)."

In that section (other meanings), Sixth normal form (6NF) is sometimes instead used as a synonym for domain-key normal form (DKNF). This usage predates Date et al.'s work. [12]

I think it is not right to say things like this, that 6NF is very similar (implied by "sometimes instead used as a synonym" but not elaborate on what Date et al.'s work did to draw a distinction.

I also would like to know why the normal forms start with UNF, 1NF, 2NF, 3NF, but then some are interjected by normal forms that don't have names. Is it because historically there were 6, and rather than rename the forms and shift them up to 10, which would be confusing and be a "breaking change" for the literature", they inserted them between? Does it also imply that e.g. between 3NF and 4NF is a more important "jump" than EKNF and BCNF provide?


I'm not too happy with the relvar concept, because this is the onliest place where it is used, in the normalization articles. I'm going to add a small statement that relvar should be read as table.

Relvar is used in a direct quote. It is explained in Relation (database), Relational model, and Relvar. It is also used in other articles, such as Relational database, Candidate key, Superkey, and Tuple relational calculus.
I don't think it is really correct to use "table" instead of "relvar" or "relation variable", even if this is commonly done (sometimes with an apology). I don't think one could object to the use of "table" as part of an explanation, provided it is stated or implied that this is inexact and is not the term that Date etc. used. --Boson (talk) 19:53, 22 December 2009 (UTC)[reply]
I've taken the liberty to reorder your reply because the above unsigned remark isn't mine. Personally I'm not hung up on a particular terminology, but consistency across Wikipedia articles would be nice, and relvar appears to be specific to a particular author. Rp (talk) 10:11, 29 December 2009 (UTC)[reply]

I'm not too happy with the inclusion of this notion in the first place, because it is from one particular book and its notability is disputable. Can at least someone with access to the book supply a definition of U_projection? The Google Books preview doesn't include it. Rp (talk) 09:25, 22 December 2009 (UTC)[reply]

As regards "U_projection", this is a little difficult, and I don't think I can easily summarize what Date et al. explain in about a hundred pages.
For the purpose of dealing with temporal data, Date et al. describe generalizations of the normal relational operators, including "restrict" and "project". The essence, of the generalized ("U_") operators is that they permit (implicitly complex) operations on interval data. Interval data are (for instance) data representing an interval of time. An interval is treated, in a certain sense, as atomic, but there is an "unpack" operator which notionally separates the interval into its non-interval "quanta" (my term). For instance, if a day is treated as the "quantum" of time, the interval "d363:d365" can be implicitly "unpacked" into "d363", "d364", "d365". U_operations performed on such interval data will treat the interval of three days as representing three individual days, thus permitting this interval to "match" an interval of, say, one day.
The "definition" (page 148) is:
USING ( ACL ) < {BCL} >
is shorthand for
PACK ( ( UNPACK R ON ( ACL ) ) {BCL} ) ON (ACL )
(The "<" and ">" signs should actually be different symbols that I cannot easily reproduce here:
By the way, I'm not sure which notion you doubt the notability of. Surely not "sixth normal form"? --Boson (talk) 22:10, 22 December 2009 (UTC)[reply]
Thank you for your explanation. Having read some other books by Date, I'm confident that these operators have straightforward mathematical definitions that can be reproduced here, in whatever notation seems best. And yes I do doubt the notability of Date's notion of 6th normal form, which I don't mean to deny, but some evidence for it would be appropriate. Let's not forget that Wikipedia pages can do a lot to improve book sales. Rp (talk) 09:55, 29 December 2009 (UTC)[reply]
Oh, OK. I can imagine people querying its usefulness but I didn't think anyone disputed its notability. Here are a few independent references:
  • Harrington, Jan L. (2009). Relational Database Design and Implementation: Clearly Explained. Morgan Kaufmann Series in Data Management System. Morgan Kaufmann. ISBN 9780123747303.
  • Simsion, Graeme C. (2005). Data Modeling Essentials (3rd ed.). Morgan Kaufmann. ISBN 9780126445510. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  • Oppel, Andrew J. (2009). Relational Databases A Beginner's Guide. McGraw Hill Professional. p. 209. ISBN 9780071608466.
  • Hughes, Ralph (2008). Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP. iUniverse. ISBN 9780595471676.
  • Boyd, Michael (2005). "Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model". Journal on data semantics. doi:10.1007/11603412. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
--Boson (talk) 16:00, 29 December 2009 (UTC)[reply]

For a non-relational approach, see TSQL2.[edit]

How is this approach non-relational? Either explain it or replace the adjective. Rp (talk) 10:02, 29 December 2009 (UTC)[reply]

two years without response, I have removed it. RBarryYoung (talk) 21:06, 14 January 2012 (UTC)[reply]

DKNF Edits[edit]

I have commented out the "citation needed" flag as I read the cited footnote and it actually does support the statement. RBarryYoung (talk) 21:40, 14 January 2012 (UTC)[reply]

Usage edits[edit]

The Usage section was edited in July 2009 to clarify this section. However the attempted clarification actually significantly changed the meaning of one of the statements away from what was originally written, and in a way that was different enough from the cited source that the altered statement was no longer supportable by the citation. I assume that this was a misunderstanding on the part of the editor, so I have commented out those additions and have reverted that statement to its previous form. RBarryYoung (talk) 22:37, 14 January 2012 (UTC)[reply]

6NF Examples[edit]

Like many developers, I learn concepts easier when examples are provided as a complement to the explanation. The first five normal form articles all have good examples included. The sixth normal form article is a borderline stub article in comparison though and I'm having trouble understanding both the concept and the significance. Could someone familiar with the subject matter add an example to the article to help clear things up? Thanks in advance! CurtBennett (talk) 21:53, 20 June 2012 (UTC) yes. — Preceding unsigned comment added by 220.225.96.90 (talk) 07:04, 24 March 2014 (UTC)[reply]

The example in place is not internally consistent now.... the professional level as described (an exam) is not directly dependent on years in practice unless (counter to the real world) failing an advancement exam means you can no longer practice at all (even at your current level). — Preceding unsigned comment added by 70.186.132.174 (talk) 22:59, 3 November 2015 (UTC)[reply]
I also have problems with the present example regarding medical practitioners:
  • I second that, as expressed in the surrounding text, medic type (resident, probationer, specialist) is NOT intended to be functionally dependent on years practiced, but the "corrected" schema makes it so.
  • Further, even assuming the functional dependency is intended, this example demonstrates converting a 2NF database into 3NF, not 6NF: the original table contained a transitive functional dependency, which is then removed. From what I gather, to be 6NF, the tables would have to be further decomposed into <Medic Name, Occupation>, <Medic Name, Practice in years>, <Type, Min practice>, and <Type, Max practice>.
  • Lastly, the discussion of space requirements is off the mark: while space efficiency may be one motivation for converting to lesser normal forms (in particular BCNF or 4NF), 6NF essentially always requires more space than 4NF or 5NF because it must reproduce the keys for each attribute (assuming it is implemented literally; a 6NF DBMS can of course store data in 5NF or however else it wants so long as it can reproduce the behavior of 6NF for its users). Rather, the chief motivation for normalization at any level is to prevent anomalies and enforce constraints, or in the case of 6NF, to better track updates to individual attributes. (I would also argue that 6NF provides an elegant solution for nullity in databases: rather than storing a Null for some non-key attribute in a multi-attribute table, a 6NF single-non-key-attribute table can just omit the row entirely and let the joins figure out what to do.) Rriegs (talk) 22:16, 28 May 2016 (UTC)[reply]
It seems like in some cases 6NF can use less space. This would be apparent with a better example. The text mentions that " we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g., for historical data) but the three values may vary independently of each other and at different rates"
Here's an example: We have a relation on doctor, speciality, office, accepts-insurance, date-range
Doctor is a name
Speciality is the doctor's speciality
Office is the location where the doctor primarily practices
Accepts-insurance is a boolean
Date-range is the range for which this row of the relation applies.
So we might have
Smith,Cardiology,Boston,true, 2000-2010
Smith,Cardiology,Brookline,true, 2011-2012
Smith,Cardiology,Brookline,false, 2013-2015
Smith,Pulmonology,Brookline,false,2016-2017
Smith,Pulmonology,Brookline,true,2018-2019
In 6NF we would get
Smith,Cardiology,2000-2015
Smith,Pulmonology,2016-2019
Smith,Boston,2000-2010
Smith,Brookline,2010-2019
Smith,true,2000-2012
Smith,false,2013-2017
Smith,true,2018-2019
The 6NF representation has more rows, but the rows are smaller. And the string "Cardiology" appears only once instead of 3 times. (And "Pulmonology" appears only once instead of twice, and "Brookline" appears only once instead of 4 times.)
If you don't have any time-range type columns, I don't see how 6NF is really different from 5NF: You are just breaking each column off into its own table.
But if you have time ranges (or other "range" values), then 6NF can save space and improve the constraint enforcement. Bradleykuszmaul (talk) 12:42, 7 June 2023 (UTC)[reply]
Great example, much better than that of the article (which is useless imho). JCH77 (talk) 10:49, 3 August 2023 (UTC)[reply]