Talk:Q (number format)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Article needs more work[edit]

It needs more work; the introduction assumes too much knowledge. The Q number does not carry the field size. Q numbers get used even if the processor has a floating point unit. Another day. Charles Esson 23:04, 8 April 2007 (UTC)[reply]

Original research?[edit]

I think this was invented by TI; but we need a reference to say it.Charles Esson 11:52, 10 April 2007 (UTC)[reply]

Actually we need a reference to prove that that "Q number format" is an established name for binary fixed-point. In any case, I wholly support the merge into binary scaling. I would even merge both articles into fixed-point arithmetic, according to the proposal in Talk:fixed-point arithmetic. --Jorge Stolfi (talk) 18:34, 24 June 2009 (UTC)[reply]

Notation description is misleading[edit]

The article states for the Qm.n format:

m is the number of bits set aside to designate the two's complement integer portion of the number, exclusive of the sign bit (therefore if m is not specified it is taken as zero).

This might be true for Texas Instruments (among others), but they are not the only ones using this notation. ARM, in their CMSIS (vendor-independent hardware-abstraction layer for ARM-Cortex devices, see http://www.arm.com/products/processors/cortex-m/cortex-microcontroller-software-interface-standard.php), have e.g. Q31 numbers ("q31_t" in C) and describe them as "32-bit fractional data type in 1.31 format" (actually a mix between "Qf" and "s.f" notation). IMHO the article implies that there are several common notations for fixed-point numbers, but in reality there is basically a custom notation for every other vendor, and those notations often conflict/are ambiguous. The article should reflect this by explicitly mentioning *which* vendors us a described notation, and should make clear that there are other notations in use which are not described (discuss! I'll change this if no one is opposed). 2001:4CA0:0:F221:D194:90F0:4D26:25AE (talk) 17:45, 14 January 2013 (UTC)[reply]

  • Well noted. There is a section in fixed-point arithmetic that already points this out. There is no such thing as a "Q number format". That is just binary fixed-point (binary scaling) format, much older concept than TI. The "Q" stuff is just a succinct notation used by some programmers to specify the parameters of the format. So the question is, how widely used is that notation? --Jorge Stolfi (talk) 17:27, 5 July 2021 (UTC)[reply]

Range Error in Article[edit]

I dispute the following statement in the article:

For a given Q format, using an N-bit signed integer container with Q fractional bits:

  • Its range is [-2N-1-Q, 2N-1-Q-1]

A common usage is storing a Q15 value in a 16-bit signed integer object, so we have N=16 and Q=15. According to the article, the range would be [-216-1-15, 216-1-15-1] or [-20, 20-1] or [-1, 0] which, of course, is incorrect. The correct range is [-1, 0.99996948]. —Ksn 21:05, 16 April 2007 (UTC)[reply]

  • It is my view ( not worth much); your both right in a way. I think the article ( or at least the formula) is referring to the integer range. Your referring to the real range. I wonder what to do about it, cause your right it is not clear.Charles Esson 08:47, 18 April 2007 (UTC)[reply]
  • I added integer for now, but I think your right it has to be fixed. Charles Esson 08:54, 18 April 2007 (UTC)[reply]
  • Fixed it.Charles Esson 09:24, 18 April 2007 (UTC)[reply]

Thanx for the fix. It didn't make much sense to present the range in terms of integers when the point of the Q format is to express real numbers. —Ksn 00:02, 20 April 2007 (UTC)[reply]

Why does 2's complement refer to a "sign bit"? It is a misnomer. Sign-and-magnitude numbers have a "sign bit", and the sign bit has no magnitude...only sign. The most-significant bit in 2's complement with bit indices [n:m] has place value -2n. The n in Qn.m notation refers to the number of integer bits exclusive of the sign bit. Radix complement numbers [1] do not have a sign bit. The MSB connotes both the sign information AND MAGNITUDE, namely -2n. Therefore, it is misleading to say that a Q0.15 number has no integer bits. It has one integer bit that represents either 0 or negative 1, and should, therefore, be denoted as Q1.15, despite how (I believe) it is currently used. Chelmite (User:Steve_Kelem) 21:20, 2 November 2011 (UTC)[reply]

  • The term "sign bit" is appropriate because it indicates the sign of the number (1 = negative, 0 = non-negative). The difference between two's complement signed format and sign-magnitude format is in the way the magnitude is encoded. --Jorge Stolfi (talk) 17:20, 5 July 2021 (UTC)[reply]
    The article on two's complement in wikipedia states correctly that the most significant bit has a weight of -(2N-1), so this “sign bit” is part of the magnitude. So calling the most significant bit in two's complement the “sign bit” is misleading at best. 209.145.84.194 (talk) 13:21, 17 May 2022 (UTC)[reply]

References

  1. ^ Digital Systems and Hardware/Firmware Algorithms, Ercegovac and Lang 1985

Proposal to Dereference Referenced Paper[edit]

I propose that the external link to the paper Fixed Point Representation And Fractional Math be removed as it is not authoritative and contains errors, for example:

  • p. 2, after talking about using two's complement format it gives an example of an 8-bit value being specified as Q3.5, which does not accommodate the sign bit.
  • p. 2, it asserts that a C "int" is 16 bits, contrary to its actual definition; furthermore, int's are increasingly implemented as 32 bits.
  • p. 3, it gives the incorrect inequality "0 ≤ a ≤ 2QI", where the second "≤" should be "<".
  • p. 3, the equation "QI = ceiling(log2(abs(a)))" is incorrect for the same edge condition.
  • P. 3, it then presents different formulae when discussing the impact of signed numbers, but no change is necessary because QI is defined as being the number of integer bits, which implies not including the sign bit.

I didn't see the need to review the document further. —Ksn 13:56, 20 April 2007 (UTC)[reply]

  • I haven't read the document but I agree with your points.Charles Esson 06:08, 25 April 2007 (UTC)[reply]

These issues appear to have been addressed in a new release of the referenced paper User:anonymous —Preceding unsigned comment added by 128.104.188.119 (talk) 21:48, 30 August 2007 (UTC)[reply]

Using m and n instead of Q in the Math section; mixed fractional sizes in multiplication and division.[edit]

I've changed the first half of the article to uniformly use m and n (for a Qm.n format) instead of introducing Q and N variables. I think the math section that follows should follow suit, but I disagree with the statement that multiplication and division require the number of fractional digits in the dividend to be the same. Multiplying a Qa.b format number by a Qc.d format number can be done naturally by treating the Q format numbers as signed integers, and it will generally give a Qa+c.b+d format number. Division I'm not as sure about; I expect that Knuth's The Art of Computer Programming Volume 2 would have all the answers. Wdfarmer 04:06, 25 April 2007 (UTC)[reply]

  • If you don't keep the fractional digits the same then you change the type of Q number you are dealing with (as you have pointed out). Lets say you have Q14.17 by Q14.17, you end up with Q28.34, bit hard to fit it in a 32 bit register. If you don't fix up your base the division result is the different Q number(Qa-c.b-c); not the same Q number once again. Now if you going to change Q number type with multiplication and division than the statement 'resolution is constant' is wrong, resolution is only constant if you maintain the Q number type. What I have provided is a consistent set of operators that maintain the Q number type you start with and highlighted the need for rounding care. I think that is a lot more helpful than saying something along the line "If you don't fix you base the number will underflow or overflow". Charles Esson 06:04, 25 April 2007 (UTC)[reply]
  • Altered article to reflect your concern; there probable should be another section on treating m and n as indexes.Charles Esson 06:38, 25 April 2007 (UTC)[reply]

C sample code[edit]

There are more problems here than there are lines of code. I suspect it was written by someone who doesn't actually write C. Here are the issues, in descending severity:

  • Q is not defined anywhere in the code, so this won't compile.
  • The code is not valid C89, because it uses C++-style (//) comments. It may be valid C99, or C++, or C89 plus Microsoft or gcc extensions, but when people say "C" they generally mean C89.
  • If the code is meant to be C99, C++, VC, or gcc code, it's not idiomatic; variables should be declared at initialization rather than all together at the top.
  • The comment "2**(Q-1)" isn't going to help a C programmer, as there is no ** exponentiation operator in C.
  • K is clearly meant to be a constant. So, why not define it as such? Or, even better, give it a name?
  • You're allowed to give variables names longer than one letter.

Here's a suggestion:

static const short Q15_N = 15; /* fractional bits, n in a Q15 or Qm.15 format */

short Q15_add(short a, short b) {
  return a+b;
}

short Q15_subtract(short a, short b) {
  return a-b;
}

short Q15_multiply(short a, short b) {
  /* Rounding: mid values are rounded up */
  static const short Q15_ONE_HALF = 1 << (Q15_N - 1);
  long resultTimes2N = (long)a * b + Q15_ONE_HALF;
  /* Correct by dividing by base */
  return (short)(resultTimes2N >> Q15_N);
}

short Q15_divide(short a, short b) {
  /* pre-multiply by the base */
  long aTimes2N = (long)a << Q15_N;
  /* So the result will be rounded; mid values are rounded up */
  long roundedATimes2N = aTimes2N + b/2;
  return (short)(roundedATimes2N / b);
}

This still isn't really safe, as short isn't guaranteed to be smaller than long--but I can't think of any platforms where it isn't. (If you're worried, #if sizeof(short) == sizeof(long) #error "Wow!" #endif...) --75.36.132.72 11:28, 28 July 2007 (UTC)[reply]

It wouldn't be compliant with C89, but you could use the C99 types defined in stdint.h: int16_t, int32_t, et cetera, to make sure your variables had the desired length. --208.124.166.154 (talk) 14:23, 15 January 2009 (UTC)[reply]

Still not compliant, because:

  • Signed bit shift is implementation-defined and might produce wrong results depending on the compiler and optimization options. Division by power of two is portable and automatically causes truncation. Multiplication by power of two is also portable as long as signed overflow cannot happen.
  • Rounding up (ceil) tends to break algorithms defined on real numbers - they are designed with round-to-nearest in mind. It will especially break statistics by introducing bias. See the page on Rounding. Truncation is better though still not ideal.

--AstralStorm 217.67.201.162 (talk) 12:48, 27 October 2014 (UTC)[reply]

Only 1 number may be of q1.15 format int16_t q_mul(int16_t a, int16_t b) following is a single shift of Q, compensating for fractional value of a single q1.15 value. should probably be noted only one of a or b can be q1.15 — Preceding unsigned comment added by 68.101.98.50 (talk) 20:28, 27 December 2020 (UTC)[reply]

Ad Math operations[edit]

I suggest the following formulas explaining math operations on the general Q numbers and :

or:

— Preceding unsigned comment added by 148.81.172.134 (talk) 07:07, 9 January 2019 (UTC)[reply]

Proposal to merge into Fixed-point arithmetic[edit]

I am proposing to merge this article into Fixed-point arithmetic; see Talk:Fixed-point arithmetic#Merging and restructuring proposal for more information. Solomon Ucko (talk) 03:26, 25 June 2021 (UTC)[reply]

  • @Sollyucko: Indeed this article is badly conceived. There is no such thing as a "Q number format". That is just binary fixed-point format, a concept that was much older than Texax Instruments.
    The "Q" stuff is just one of many possible notations to succintly specify the parameters of the format. It is not clear how widely used that notation was. For it to deserve its own article, there would have to be evidence of widespread use outside Texas Instruments and its customers and contractors. Anyway, the paragraph in the fixed point article already seems to say all there is to say about it.
    Meanwhile, all the examples and code should be reworded to use a mode explicit format descriptions,and merged into the fixed-point article.
    --Jorge Stolfi (talk) 17:13, 5 July 2021 (UTC)[reply]
  • Don't merge Q number format is a different topic and there's more than enough to tell about it to keep it in a separate article. --Matthiaspaul (talk) 21:51, 22 December 2022 (UTC)[reply]
  • Merge The Q format is a solely a subset of Fixed-point arithmetic. And it's small enough of a topic that it is best included as part of that article. In addition, the C code shown to perform addition, subtraction, multiplication, and division, should be included in the fixed-point article. 69.5.112.154 (talk) 20:35, 26 December 2022 (UTC)[reply]

Texas Instruments[edit]

I removed from intro... the claim that it was introduced by Texas Instruments, because of lack of proof (reference). I don't know if it is right or wrong, but I think it should be removed until someone can provide proof. • SbmeirowTalk • 18:20, 6 July 2021 (UTC)[reply]

Texas Instruments[edit]

Fixed-point arithmetic links a source for Texas Instruments having a definition of the "Q format": https://www.ti.com/lit/ug/spru565b/spru565b.pdf

However I don't think it proves TI introduced it

I would also support merging this page into whatever page would be relevant, like Fixed-point arithmetic probably.

Only the history of Q notation seems like it would belong in its own page but since there are no actual sources on that, content related to history should probably be ditched?

Dragorn421 (talk) 12:40, 8 September 2021 (UTC)[reply]

Both m and n may be negative[edit]

In the introduction, it states that both m and n may be negative, although I fail to see how this is possible. Is this an error, or perhaps further elaboration is in order? — Preceding unsigned comment added by Dsabatta (talkcontribs) 08:16, 11 March 2022 (UTC)[reply]

intermediate multiplication and division results must be double precision[edit]

I don't know if there's a better way to phrase this, but double precision is often interpreted to mean IEEE double precision. What's meant here is that double the number of bits are required to accurately represent the full range of the product, and I wish that could be more clear.

rsaxvc (talk) 2 September 2022 — Preceding undated comment added 03:16, 3 September 2022 (UTC)[reply]

Division[edit]

Division[edit]

int16_t q_div(int16_t a, int16_t b)
{
    /* pre-multiply by the base (Upscale to Q16 so that the result will be in Q8 format) */
    int32_t temp = (int32_t)a << Q;
    /* Rounding to nearest */
    temp += b>>1;
    return (int16_t)(temp / b);
}

That is the correct C code whether temp < 0 or not and whether b < 0 or not. You can go through all 4 cases and if the consequence of the 32-bit division for a negative quotient is the two's complement of the result if the quotient was positive, having the same absolute value, if that is the case, then you never want to subtract half of b. Check every one of the 4 cases of sign combinations. 69.5.112.154 (talk) 23:06, 19 December 2022 (UTC)[reply]

Encoding dynamic range[edit]

I suggest adding dynamic range DB for some usual formats: Q0.7 42dB, Q0.15 90dB, Q0.23 138dB, Q0.31 186dB

https://source.android.com/docs/core/audio/data_formats

Encoding dynamic range[edit]

The dynamic range of an encoding is determined by the amount of numbers used to encode a range of number. For instance, in the fixed point encoding without sign, the range is [0, (2^N-1)*Q] and the resolution Q=2-b.

The dynamic range is the relationship between the maximum encoding difference [-2^N-1, 2^N-1 - 1] and the encoding resolution DR=(Nmax-Nmin)/Q = 2N-1.

In the case of the 2’s complement encoding, the range is [-2N-1, 2N-1-1] and the resolution is again. The dynamic range is also Q=2-b.

If this is computed in dB, defining the dynamic range as 20 log10(), we can obtain that the dynamic range is approximately 6.02N dB. Therefore, every bit increase the encoding dynamic range in 6.02 dB.

http://rubensm.com/fixed-point-representation/ 92.120.5.12 (talk) 07:16, 8 May 2024 (UTC)[reply]