Template talk:Unichar/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

Avoid hard coded gc sub-template

Why not just have a parameter to manually switch between displaying printable characters, displaying nothing, or some form of <control-n> type display, rather than relying on the automatic unichar/gc sub-template? Showing U+0007 <control-0007> seems redundant to me and I wonder if this feature is used much. I think it would be simpler and more flexible doing it with a manual parameter. Vadmium (talk, contribs) 05:07, 8 October 2011 (UTC).

Some more background (essentially: it is hardcoded by Unicode). Maybe you are confusing the graphic posistions in this template. Here are two examples, one a regular character and one a control-character (or code point):
  • {{unichar|00A9|Copyright sign|nlink=Copyright symbol}}U+00A9 © COPYRIGHT SIGN
  • {{unichar|0007|zero-seven|nlink=Main page}}U+0007 <control-0007> Main page
Unicode has defined a control character: general category gc (Cc, 'Other, Control'), that is; of which Unicode has defined 66 65, fixed forever. (Confusingly, Unicode sometimes uses 'control character' for an invisible formatting character -- forget about these for now). Such a "Control" character has these properties:
  1. it has no name in Unicode,
  2. a "label" can be used to reference that character,
  3. by nature, it has no graphical visible glyph.
What the template does is: when the codepoint is a Control character (by gc), it shows no Unicode name, it does show the Unicode-defined label like <control-007>, and can show a wikilink as shown above.
For now I conclude: if the code point is a Control (by gc), then we are OK when not showing any Unicode-name, forcibly show the Unicode alternative (which looks the same for the uninitiated reader btw), and allow for an editor-entered wikilink as exampled above 'Main page'. Also, there is the option to enter "note=This code point is known as "BEL".
So, returning to my opening statement about 'confusing positions': the text "<control-0007>" does not replace the graphic position (which is now empty), but hides any entered (illegal) name and still allows for a wikilink. There are options to add extra text. But hey, this is background. Is it an answer? -DePiep (talk) 20:51, 8 October 2011 (UTC)

Thanks for your explanation. My understanding of what you wrote: The Cc general category returned by gc is defined by Unicode and is unlikely to change significantly. My point of view is the <control> text does replace the character graphic, for instance {{unichar/glyph | hval=00A9 | gc=}} → © but {{unichar/glyph | hval=00A9 | gc=Cc}} → <control-00A9>. However I see your point that parameter 2 for the name is also replaced: {{unichar|0007|bell}} → U+0007 <control-0007>.

I guess my main point is that there are situations where I don’t want to display the stuff from the glyph sub-template, but the Cc general category does not apply. In particular the ZWNJ: technically there is a ZWNJ in the HTML of that page, but it has zero width and putting it there is distracting. I thought replacing the gc stuff could simplify the template at the same time as addressing my ZWNJ problem, but the complexity is a side issue. Vadmium (talk, contribs) 06:32, 9 October 2011 (UTC).

About <control> label.
Your rephrasing of my post is correct. First some minor points:
  • Unicode has promised that Cc category is fixed for ever, with 65 code points (all in the old C1 and C0 controls group, from pre-Unicode days when we chiseled our ASCII's in a clay tablet). This is stronger than "unlikely to change". It is a stability policy, to guarantee stability.
  • You are right, I was wrong: in the template a Cc character is replaced at the glyph-position, not the name-position. Since there is no Unicode Name, the visual position is at the same place (right after the U+0007 text). A text entered at position 2 is treated as an regular text, with a wikilink (using nlink=...; see example above with Main page).
  • The same thing with <...> happens with other Gc categories: Cs (surrogate cp) and Co (private use cp), Cn (non-character cp) along the same Unicode lines and rules:
U+D800 <surrogate-D800> (surrogate cp)
U+E000 <private-use-E000> (private use cp)
U+FFFF <noncharacter-FFFF> (not a character cp)
Now the main stuff:
Me concluding so far about <control> characters: you suggest the option to overwrite a <control>-output. Can you give an example e.g for U+0007 (aka BEL), which text an editor would put there instead of the <...> label? (My opinion is: it replaces the Unicode Name, which is correct. It also maintains the standard format for every usage of the template, which is a main aim. I do not see a reason yet to omit or overwrite it). -DePiep (talk) 12:41, 9 October 2011 (UTC)

I do not have any good examples with characters using this <control> label, since I have not seen the template used with such characters, other than the hypothetical examples given in the documentation. That’s why I thought it would be a good chance to reduce the complexity of the template. Anyway I’ll leave this issue alone and just live with it :). Vadmium (talk, contribs) 14:53, 9 October 2011 (UTC).

About ZWNJ: U+200C ZERO WIDTH NON-JOINER (&zwnj;). Minor:
  • ZWNJ is a formatting character (general category: Cf). In the template they are treated as a regular character: just publish it and the font will show the glyph (which is an invisible nothin in this case).
  • These characters have nothing to do with <control> replacement. We can treat these two topics separately, which we do gladly for overview reasons.
Main stuff:
Do you want the factual appearance in HTML to be different? Does it render a bad page in your browser maybe (because the character keeps doing it's job of formatting, even if it is there for "showing" only; I had such a problem with a RTL directionality character too, RLO U+202E, hence this issue)?
We can overrule such punctuation code points (there are 140 cps with Cf now). A bit like the Zf space cps are handled (creating the light-blue background visual effect). Another route could be: all formatting characters get the glyph in a mnemonic: U+0007 → ␇. But this could also be done in the Notes just like the HTML-code. -DePiep (talk) 12:41, 9 October 2011 (UTC)

Yes I do want the HTML and appearance to be slightly different. Perhaps you can’t see any difference in the following, but for me the first example shows twice as much space as the second:

  • U+200C ZERO WIDTH NON-JOINER (current template)
  • U+200C zero width non-joiner (“invisible nothing” <span> removed)

Another option would be to add a new parameter, say called noglyph, and then it would be up to each template instance to use it, like {{unichar|200C|zero width non-joiner|noglyph=}}. Or extending the gc sub-template to detect them with hard-coded code point values might also work; I think this is what you mean by “a bit like the Zf space cps”. Vadmium (talk, contribs) 14:53, 9 October 2011 (UTC).

Yes, I see the whitespace effect (it is different).
Yes, hardcoding can be done by checking for gc=Cf (formatting character). That should cover all 140 Cf cps list. This is what I prefer, because it is standard (no manual checking needed for forgetting a parameterized input, more standard formating in Unicode characters). Is it really OK to prevent a glyph in all Cf cases? SHY will be no-glyph-at-all, this way?
re noglyph: We could also create the parameter used like glyph=ZWNJ-in-a-box.svg (which should overrule the glyph-by-font). But adding multiple switches is hard to document & illustrate, I experienced.
New: we could also create an output that shows the ZWNJ-abbreviation as a boxed mnemonic, like ␇ for U+0007. Tbd: this output could replace the glyph, or be added as an extra note. This should work aotomatically for all accepted abbreviations (SHY, RTL, NULL, etc.)
Adding: I'll take a look into the Unicode definitions for Cf characters. If Unicode is clear about their glyphs, that's a good base to work from.

-DePiep (talk) 16:37, 9 October 2011 (UTC)

Thanks for looking into it. I am okay with automatically checking for specific formatting characters, as long as I don’t have to research and add 140 individual references to a template. But I’m willing to try just adding the SHY and ZWNJ. In fact the soft hyphen (SHY) was another instance where this template bothered me. I don’t see much point having the template output a SHY character on its own. I’d be happy to re-introduce the template to that article if this was also fixed.

I don’t think it would be too hard to illustrate my noglyph switch. Just say it inhibits any glyph, blue box etc, suggest using it for formatting characters like the ZWNJ, and illustrate it by comparing say the copyright sign example with and without a glyph.

Also I don't see the need for abbreviated mnemonic characters or other graphics; I don’t think they would benefit the leads of Soft hyphen or Zero-width non-joiner without a bit of explanation. And doesn’t the existing image parameter already allow you to do your ZWNJ-in-a-box.svg thing? Vadmium (talk, contribs) 07:00, 10 October 2011 (UTC).

On abbreviations/menmonics like SHY, NBSP, ZWNJ: is a separate topic indeed, so should be kept apart. Even if we would include them somehow, they should not replace the glyph(position).
No glyph situation (like ZWNJ). All right. So I will put the 140 cps in the subtemplate for Cf, and then adjust the glyph template (no glyph at all for these; that means no HTML character at all; so also no disturbances in layout). You can do SHY and ZWJN as you like, I'll do all shortly. I have taken a quick look at the list (linked above), I have not found any one that does have a glyph.
In general, you are right in this: no glyph should mean no extra spacing (I recall, I created the old situation to show these spaces explicitly, but we are on an improvement here). Even the dynamic effect of SHY should be explained in the text, not in this template.
-DePiep (talk) 10:10, 10 October 2011 (UTC)
Done step 1: adjust {{unichar/glyph}} for gc=Cc, Zl, Zp [1]. Zl and Zp are equally invisible formatting chars (actually, together they contain only the two LSEP and PSEP characters). No effect until subtemplate /gc is changed to return these three gc-ids. -DePiep (talk) 10:26, 10 October 2011 (UTC)
Done step 2: All Cf, Zp, Zl codes are in the /gc/sandbox subtemplate. So gc=Cf (or Zl, Zp) is returned. See sandbox test testpage, section. Interestingly: note the correct effect of omitting the RLO and RLE (U+202B and U+202E) characters themselves (currently, they are reproduced and so have a formatting effect).
Todo: some Arabic Cf cps do show a glyph:
  • U+0600 ؀ ARABIC NUMBER SIGN
  • U+0600 ؀ ARABIC NUMBER SIGN
  • U+0603 ؃ ARABIC SIGN SAFHA
  • U+0603 ؃ ARABIC SIGN SAFHA
Need to check if and how they are to be treated. (So please do not promote the /gc/sandbox into production, we have to solve this first). -DePiep (talk) 12:48, 10 October 2011 (UTC)
Adding: 0600..0603 show a glyph in firefox, but not in Safari (dunno about IE). The Unicode glossary says: "Format Character. A character that is inherently invisible but that has an effect on the surrounding characters." So presumably a firefox mistake. -DePiep (talk) 13:26, 10 October 2011 (UTC)
This is the situation. Unicode about U+0600..U+0603 and U+06DD (a sort of Arabic combining number markers) chapter section 8.2, pp 246-247: Their General Category value is Cf (format control character). Unlike most other format control characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order.
So they are visible. It looks like their gc was decided a bit too fast, because they do not alter the layout (not format) at all. They are more like a combining mark, as Unicode itself notes. Same for U+070F with Cf. I'll make a solution in the /gc/sandbox. -DePiep (talk) 13:49, 10 October 2011 (UTC)
Done step 4: in {{unichar/gc/sandbox}}; see testcases. The five Arabic number markings (U+0600..) now return General Category Cf (visual), which is both the correct gc and does not prevent them from hiding. Any remarks re this sandbox proposal? -DePiep (talk) 14:15, 10 October 2011 (UTC)

No problems that I can foresee, I think it is good to push to the live version. Vadmium (talk, contribs) 13:51, 11 October 2011 (UTC).

Done. Sandbox is into live. Adjusted /doc. No errors seen, so consider it done. -DePiep (talk) 14:52, 11 October 2011 (UTC)

HTML

The HTML entity indication is actualy only &#DEC;. I've seen so many people losing time to convert the hexadecimal to decimal since the hexadecimal was easier to find out (for example in Windows' charmap)... Why isn't shown the use of &#xHEX;?

a = a or a

Lacrymocéphale 16:57, 25 June 2012 (UTC)

As it is, both hex and decimal values are shown. No recalcultion needed. By showing both values, less-experienced users can easily recognise and use the decimal one (spelled out), and for experienced users the hex value is present (no recalc), though it takes the knowledge how to from it into HTML.
So changing away from decimal is no improvement at all. Adding the full hex notation &#xHEX; can be proposed, but the gain is limited I'd say. -DePiep (talk) 17:18, 25 June 2012 (UTC)

dot for separator?

In a recent edit, the list separator was turned into a dot:

U+00D7 × MULTIPLICATION SIGN (&times; · not to be confused with *)

I think that is not an improvement. TRhe template surely is meant to be used in-line, and so should support reading as in a sentence. TRhe middot is not a sentence punctuation (it is in structured lists); or in other words, try reading it out loud. I understand the comma had drawbackls too, but it is inline readable and there is a formatting distinction with list elements. I propose use the comma, until a better option comes along. -DePiep (talk) 12:23, 13 December 2014 (UTC)

Wrongly displayed

See the text U+20DD .. at Copyleft. Couldn't find a way to hack around it.. Might be a bug in the template? comp.arch (talk) 22:32, 19 July 2015 (UTC)

The character was repeated after the template (outside of it). Done this. It this as you expect? -DePiep (talk) 22:43, 19 July 2015 (UTC)
...but of course, 'ↄ⃝' should show the combined character. Using {{unicode}} gives:⃝ɔ -- ɔ⃝. I don't know why this fails (on my browser). Surely it is not {{Unichar}} faulting, which does only one. -DePiep (talk) 23:49, 19 July 2015 (UTC)
  • Aha. In my Firefox it shows nicely, so I didn't see the issue.
U+20DD COMBINING ENCLOSING CIRCLE -- current form, may show bad
U+20DD ​⃝ COMBINING ENCLOSING CIRCLE
The seconds form uses |cwith=&#x200B;, that is the ZWSP. {{Unichar}} documentation mentions this to prevent a combining character to mix with uninvolved neighbouring characters. It should show separated now.
However, when using |cwith=c, I do not see a combining effect: U+20DD c⃝ COMBINING ENCLOSING CIRCLE.
-DePiep (talk) 09:35, 20 July 2015 (UTC)

Automatic retrieval of character names and labels

I've created Module:Unicode data, copied from Wiktionary's Module:Unicode data. It has a function for automatically retrieving the names or labels of code points (including reserved code points). I'm not familiar with how {{unichar}} is used, but maybe this function would be useful. — Eru·tuon 02:51, 23 June 2018 (UTC)

Sounds good. Any doc or examples for the module? {{Unichar}} is quite old and could use an update. DePiep (talk) 03:36, 23 June 2018 (UTC)
There's some documentation on Wiktionary. I've added examples of the name-lookup functions on the module documentation page here (and further examples in the testcases). It looks like Module:Unicode data/control contains the data necessary to write a function to replace {{unichar/gc}} as well. — Eru·tuon 06:27, 23 June 2018 (UTC)

Special handling for invisible marks?

Should invisible characters with general category Mn (Mark, nonspacing) be added to the special handling from the {{Unichar/gc}} sub-template? I ask because I found myself wanting to use this template for variation selectors (as in "U+FE0E VARIATION SELECTOR-15"), but found the excess space to be distracting. (At first I thought the whole general category could be handled like whitespace characters, but I'd forgotten that there are plenty of graphical-but-nonspacing marks in that category…) -- Perey (talk) 19:41, 5 July 2020 (UTC)

(EC) Basically, you can use {{Unichar|FE0E|VARIATION SELECTOR-15|dec=}}U+FE0E VARIATION SELECTOR-15.
General Category (like 'Mn') does have an effect. What do you expect/propose? -DePiep (talk) 19:49, 5 July 2020 (UTC)
When using that, there is (at least in my browser) a noticeably wide space between the "FE0E" and the "VARIATION". This is made up of U+0020   SPACE, the (invisible, non-spacing) variation selector, and another U+0020   SPACE. I would propose treating invisible characters like this one in the same way as U+200C ZERO WIDTH NON-JOINER—that is, not displayed at all, with only a single SPACE between the code point and the name. -- Perey (talk) 20:02, 5 July 2020 (UTC)
Reduce to {{Unichar|FE0E|VARIATION SELECTOR-15}}U+FE0E VARIATION SELECTOR-15 (not the dec=)
Using Special:ExpandTemplates, the example gives:
<span class="nowrap"><templatestyles src="Mono/styles.css" /><span class="monospaced">U+FE0E</span></span> <span style="font-size:125%;line-height:1em">︎</span> <templatestyles src="smallcaps/styles.css"/><span class="smallcaps smallcaps-smaller">VARIATION SELECTOR-15</span>
There might be a (regular) space too much indeed. But is this disrupting? -DePiep (talk) 20:21, 5 July 2020 (UTC)
It's a nit to pick. A proud nail. A minor irritation. Something I thought I might be able to fix, but I don't dare touch a template like this without at least airing the matter first. WP:BEBOLD has its limits. -- Perey (talk) 02:43, 6 July 2020 (UTC)

Deviation of task

Psiĥedelisto, this template is designed to describe the Unicode definition (in various ways), possibly inline. How do the new options |nplus= and |sans= add to this job? The first one returns a unicode character plainly (i.e., without any defining or describing context), and the sans option is undoing partially the format adding confusion and introducing another fontstyle inline (ouch). How do you changes improve the template's job? -DePiep (talk) 10:46, 8 July 2020 (UTC)-DePiep (talk) 10:46, 8 July 2020 (UTC)

@DePiep: Hello, thank you for waiting for me to save Special:PermaLink/966670515. The article that edit is for Mojikyō, shows how I am using this template. Well, I agree with you on second thought, |uplus= is now unnecessary that I've added |br= after your comment, so let's remove it. I was using it as U+2B679 ({{Unichar|2b679|size=100%|uplus=n}}) to get U+2B679 (U+2B679 𫙹 (<#salted#>)) in a previous version of Mojikyō. Now I just use {{Unichar|2b679|size=100%|sans=y|br=()}}: U+2B679 𫙹 CJK UNIFIED IDEOGRAPH-2B679.
The problem of |sans= is trickier: I actually don't think this template should ever be mono. MOS:FONTFAMILY says nothing about it, but perhaps it should. MOS:CLI only discusses command line elements.
My reasoning is simple: {{code}} places unnecessary emphasis. I would have simply WP:BOLDly removed it, but I knew this could be very controversial as it will change many articles, and there are likely tables somewhere that expect U+ to be monospace and so have other encodings in monospace. I also did not feel like I had time to fight this battle, I'm already in the midst of a contentious template debate elsewhere. Being called a vandal over good-faith template edits ought to be enough for any editor!
The standard itself does not use it for U+, despite using monospace elsewhere. The official annexes to the standard only use monospace for U+ in the context of discussing the contents of Unicode's text files, such as Confusables.txt. Thinking of the reader, we should not use monospace if other authorities do not.
If that argumentum ab auctoritate does not satisfy, let's try a logical one instead: when would the {{code}} actually reduce reader confusion? I say never—U+ format is hexadecimal. Even if a font is somehow in use (no major operating system uses one) that does not differentiate 1 from I, an I can never occur so the point is moot. Neither can an O. Simply, in no font worth considering, can any of the characters which match regex [0-9A-FU\+] be confused for one another. Psiĥedelisto (talkcontribs) please always ping! 13:50, 8 July 2020 (UTC)

Capitalization of 'HTML'

@DePiep: I've always found the uppercase 'HTML' to be a bit jarring and over emphasized in juxtaposition with the small-cap Unicode description. Is there any reason why 'HTML' can't or shouldn't also be rendered in small caps (html rather than HTML) Thanks! —jameslucas (" " / +) 20:01, 15 December 2016 (UTC)

"HTML" in capitals is regular. But the Unicode name in caps is by Unicode habit (and smallcaps is a formatting choice here). The problem is in your 'juxtaposition' wording: it is not. The design choice was & is: format Unicode U+... character description in recognisable form. To me, that still happens. I'd conclude to keep it as it is. -DePiep (talk) 20:13, 15 December 2016 (UTC)
I don't disagree with anything you're saying except for the juxtaposition part. I think my point is that the current formatting generates a contrast that is not helpful. The template treats one typically capitalized thing (the Unicode description) differently from another typically capitalized thing (the acronym for Hypertext Markup Language). This in itself is not a problem, but my concern is that 'HTML' is not the important information here; it's merely a label for the information, and when I see several instances of the template (eg. at Decimal mark#Unicode characters), my eye catches on each 'HTML' because it's larger than the text around it (not the font size, but the perceived size). Its height draws additional attention to it even though it's the least important word in each line. Cheers! —jameslucas (" " / +) 20:37, 15 December 2016 (UTC)
Yes, keep HTML as a proper capitalisation (see MOS:CAPSACRS). The exception in wikipedia is that we use the Unicode captalisatoin for the Unicode name, but we do not want it to WP:SHOUT. So that is why the U-name is in smallcaps. The juxtaposition you see & point to, is just a consequence. -DePiep (talk) 20:44, 15 December 2016 (UTC)
Yeah, I respect your reasoning there, and I probably end up on the adhere-to-standards-and-accept-the-results side of the argument more often than not. This time though…I hadn't articulated this to myself before we started talking, but I think MOS:CAPSACRS is getting in the way of utility (as minor as this all seems) and should defer to the format that makes the hierarchy of information clearest. —jameslucas (" " / +) 20:59, 15 December 2016 (UTC)
If you pursue the 'hierarchy of information' you might be right (after a longer discussion). However, in that case end I'd say we should drop the smallcaps format -- not the "HTML" uc format. So that U+name would be turned into regular font writing, maybe title formal"title format". -DePiep (talk) 22:25, 15 December 2016 (UTC)
I'm not familiar with "title formal". What is that? Thanks —jameslucas (" " / +) 23:30, 15 December 2016 (UTC)
meant "title format": uppercase first letter, then lowercase. Of course nobody wants the character name in those regular (big) uppercase letters.
My recap: this template is for inline use. So it uses the regular font, and also the straight uppercase writing of the common abbreviation HTML. Only one exception is imposed: by Unicode convention, character names are spelled in all uppercase, but to prevent SHOUT we use smallcaps. There is no reason to have this smallcaps reason expand (export) formatting to outside the character name. IOW, writing in smallcaps can not dictate how to write regular text in its environment. -DePiep (talk) 13:10, 16 December 2016 (UTC)
Gotcha. I think Title Casing would be fine conceptually and would (of course) make the uppercase 'HTML' far less obtrusive. I suppose doing so, though, would force edits at hundreds of instances of the template which have (presumably) been written with little or no regard to capitalization. I also wonder if there are any corner-cases that would then need to be discussed. Does one capitalize the 'j' when describing U+01CB Nj LATIN CAPITAL LETTER N WITH SMALL LETTER J? I don't know. —jameslucas (" " / +) 20:25, 16 December 2016 (UTC)
No we domn't. Unicode name does only has uppercase, and so do we. The "lowercase J" is in the character, and in the name: "Small Letter j". That's how Unicode naming works.
To change the casing, as you mention, we can easily change the template. (Through another door, but this still reads like you relate it to the HTML writing. A bit strange). To change the case, you'd have to propose it and find consensus for it. For background reasons I described, I would oppose that. -DePiep (talk) 20:43, 16 December 2016 (UTC)
I'm not sure I follow what you're saying. My points are these: (1) The Title Case version fixes my 'HTML' concern not because it changes the 'HTML' casing but because the uppercase acronym would be less conspicuous when other capital letters are in the same line of text. The hierarchy would be flattened, which is good. (2) Since Unicode descriptions use only upper case (eg. "SMALL LETTER J"), there isn't (as far as I know) a definitive, authoritative verdict on whether "Small Letter J" or "Small Letter j" is correct Title Case formatting of the same phrase. I grant you that this example may be easy to resolve, but I suspect (and this is just a suspicion) that there exist descriptions of non-Latin characters that would be hard to correctly change to Title Case either by an algorithm or by a non-expert editor. The current small-caps casing of the official all-caps descriptions keeps that can of worms closed.
I'd probably oppose too, unless someone convinced me that my casing concerns were groundless. —jameslucas (" " / +) 21:26, 16 December 2016 (UTC)
Done my best to explain it, can't do any better. If you want to change anything, write a proposal and build consensus. -DePiep (talk) 21:34, 16 December 2016 (UTC)
Thanks for taking the time to discuss this with me! —jameslucas (" " / +) 02:34, 19 December 2016 (UTC)
I concur with DePiep on all of this.  — SMcCandlish ¢ 😼  14:08, 14 December 2020 (UTC)

Emoji/Variant Selectors

In some character (such as: U+261D WHITE UP POINTING INDEX) there is both emoji and a non emoji presentation forms. I think a new option should be added to this template to show how both look.

Ex: {{unichar|261D|White up pointing index|emoji-vs=1}} should generate U+261D ☝ WHITE UP POINTING INDEX (Emoji presentation (VS16): ☝️ Textual presentation (VS15): ☝︎)

Ex: {{unichar|00A9|Copyright sign|emoji-vs=1}} should generate U+0023 © COPYRIGHT SIGN (Emoji presentation (VS16): ©️ Textual presentation (VS15): ©︎)

See: http://unicode.org/emoji/charts/emoji-variants.html

--Gjvnq (talk) 04:59, 24 December 2018 (UTC)

I could see that, except we should suppress the first rendering (between "U+261D" and "WHITE UP POINTING INDEX") as redundant.  — SMcCandlish ¢ 😼  14:10, 14 December 2020 (UTC)

Right to left script

The template seems to struggle with RtL, as in this example: U+07F7 ߷ NKO SYMBOL GBAKURUNEN (it should display the glyph between the code-point and the descriptor, like: U+0059 Y LATIN CAPITAL LETTER Y, If I use the cwith= parameter it sort of works: U+07F7  ߷  NKO SYMBOL GBAKURUNEN but it is still not right, it tramples on the N of NKO. Suggestions? (dotted circle is irrelevant). (See also N'Ko script).--John Maynard Friedman (talk) 22:10, 24 January 2020 (UTC)

Someone seems to have fixed this already, since what I'm seeing is "as in this example: U+07F7 ߷ NKO SYMBOL GBAKURUNEN", which is what was intended. If this wasn't due to a recent-ish template change fixing it, then it would seem to be a problem with a specific browser.  — SMcCandlish ¢ 😼  14:21, 14 December 2020 (UTC)

Remove hex value from Code Point Labels

For characters without name, {{Unichar}} produces a Code Point Label:

  • {{unichar|E123}}U+E123 <private-use-E123>

As you can see, the hex value is repeated in the label, which is unwanted in many cases. I suggest removing the hex value from the label to show just "U+E123 <private-use>". If changing the template's default behavior is unwanted, then a parameter like label=nocode could be added. Petr Matas 07:14, 12 September 2016 (UTC)

Worth rethinking -DePiep (talk) 20:56, 15 December 2020 (UTC)
Probably number-related. For example, Han-characters usually do no have a name and are always referred to, this way, by hex number. -DePiep (talk) 23:43, 15 December 2020 (UTC)

Corrected smallcaps output

This thing was falsifying the Unicode names by first converting them to lower case, then running them through {{Smallcaps}}. This had two negative effects: It made any copy-paste or other re-use of the rendered output incorrect (e.g. giving a Unicode character name as "canadian syllabics hyphen", which is two kinds of error at once), and it rendered the output weirdly, distractingly tiny.

I've repaired this by instead forcing the case to upper (this catches input error, e.g. giving the Unicode name in mixed case), and then running it through {{smallcaps2}}, which produces conventional reduced-smallcaps output, the desired result, and what you'll see in any properly typeset book that mentions Unicode character names.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  19:13, 16 February 2016 (UTC)

A fine and well-described improvement. -DePiep (talk) 11:34, 17 February 2016 (UTC)

Or maybe not?

@SMcCandlish:, @DePiep:, would you have a look at Signature mark, please? Can you see why the descriptors are rendering in large caps? (It is not the use of {{sc}} earlier in the line, it happens with or without that.) I've seen this problem in the past and thought it had been fixed - in fact if you look at the test case in my sandbox from back then, it still seems to be! "It is not logical, Captain". --John Maynard Friedman (talk) 13:05, 14 December 2020 (UTC)

Definitely something weird going on here, probably some markup that isn't properly closed under some circumstance. When I take a chunk of that text:
More examples
and then strip parameters from the first instance of the four:
  • 0x32 REFERENCE MARK was re-encoded with U+203B REFERENCE MARK
  • 0x34 MALTESE CROSS, with U+2720 MALTESE CROSS (&malt;, &maltese;)
  • 0x36 RIGHTWARDS LEAF ARROW, with U+2767 ROTATED FLORAL HEART BULLET (also known as "hedera" and "ivy leaf")
  • 0x37 LATIN CAPITAL LETTER SIDEWAYS Q, U+213A ROTATED CAPITAL Q
then that actually "fixes" all of them, as in all of them suddenly show up in the intended smaller-all-caps font size ... at that article. But when I do it here on this talk page, that does not happen, presumably due to markup higher up on the page from other instances of the template.

Not sure what the issue is yet. It's a bit past my bed-time, and I'm stuffed full of gumbo, so it's kind of a double-sleepiness attack. LOL. Maybe DePiep will have an insight before I sleep on it and try to suss it out.

PS, a side issue: Even at the intended smaller-all-caps size in {{unichar}}, that size does not match the (too-small) results of {{sc}} in the first parts of these lines. The exact behavior of {{sc}} may need some re-examination. I'm not sure what it's doing is compliant with the 85% minimum size established in MOS:ACCESS. Looks more like 60% to my strained eyeballs, though I would have to go dig around in its code to be sure what it's getting up to.
 — SMcCandlish ¢ 😼  14:03, 14 December 2020 (UTC)

Yes, that was how I saw it many moons ago, as I recall the behaviour seemed to depend on an infobox and I thought it was the infobox that had caused it. I discussed with DePiep at the time, I don't think we got to the bottom of it. I'll try to find it again and add a diff, don't hold your breath.
wrt to your PS, {{sc}} output looks odd in context but it seems to match the x-height correctly. But I agree, it looks odd! Compare and contrast

TEXT and TEXT and x-height

— <small>TEXT</small> and {{sc|text}}
--John Maynard Friedman (talk) 14:44, 14 December 2020 (UTC)
I have found the previous discussion, see this diff to DePiep's talk page. It doesn't seem that I pursued it, as it looked rather intractable. @DePiep:, I don't want to put you through the same 'rinse, repeat' cycle if the result is going to be the same.
Could it be solved using explicit span style= ? --John Maynard Friedman (talk) 17:58, 14 December 2020 (UTC)

Overview:

{{Unichar}}, {{Unichar/name}} uses {{Smallcaps2}} (not {{sc}} = {{Smallcaps all}}).
{{Smallcaps2}} uses Wikipedia:TEMPLATESTYLES: {{Smallcaps/styles.css}}.
Setting UC/lc has effect.
See /testcases#Smallcaps

-DePiep (talk) 15:39, 14 December 2020 (UTC)

Todo: check my statements ;-)
{{smallcaps all}} is not used
To choose, implement after testing:
We prefer: font regular, all uppercase, size:85% OR true smallcaps (need enforced all lc)?
Pick right template (sc or sc2) to use; sc2 might need a change (is on 188 pages).
-DePiep (talk) 15:44, 14 December 2020 (UTC)
I have found the previous discussion, see this diff to DePiep's talk page. It doesn't seem that I pursued it, as it looked rather intractable. @DePiep:, I don't want to put you through the same 'rinse, repeat' cycle if the result is going to be the same.
Could it be solved using explicit span style= ? --John Maynard Friedman (talk) 17:58, 14 December 2020 (UTC)
Smallcaps has been bugging this for years. Maybe we can improve it this time. (HOwever, demo option 1 behaves strange). -DePiep (talk) 18:04, 14 December 2020 (UTC)

Recap

We are to decide (D)

D1: do we want the Name to be in Smallcaps, <small>UPPERCASE</small>, or something else?
D2: is Smallcaps an acceptable form, given WP:ACCESS and other webdesign considerations?
D3: Which sc templaste to use: {{Smallcaps all}} or {{{1}}}? (that's {{sc}}, {{sc2}} by R). MAybe sc2 must be adjusted?
D4: Why does #Option 1 not show correct on this page, when saved? (other pages, and here in Preview: OK ?!?)

User:SMcCandlish, am I correct and complete? Some are high-level wm I guess. We could prepare a question for VPT. How to continue? -DePiep (talk) 18:55, 14 December 2020 (UTC)

(I hope you don't mind my changing your informal sub-sub-sections to formal ones? It just makes editing more convenient. Feel free to reinstate if you disagree.)
I hate to say this but I think we would have to do a formal RFC to get a consensus on which size to use if we want to change anything: it is less of a problem if we get it to behave consistently as documented.
D1. Personally I prefer THIS to THIS or THIS or even THIS. (<small>, {{sc}}, {{sc2}} [misbehaving!], vanilla), but I take fright at the bit in MOS:SMALL (in MOS:ACCESS) which says "Note that the HTML <small>...</small> tag has a semantic meaning of fine print; it is not used for stylistic changes." which to me suggests that at some future date it will be rendered in an illegible-without-a-magnifying-glass 4 point font and if you complain well you didn't RTFM so tough.
D2. Smallcaps gives majiscule letter-forms at the same x-height as the adjoining text, there is no reduction in size whatever, let alone anything like 0.85. So I can't see any issue with MOS:ACCESS here.
D3. As per D1., I prefer a size midway between small caps and full caps, purely for aesthetic reasons.
D4. If we knew that, we wouldn't be having this conversation! I have seen this behaviour on multiple pages without any obvious pattern. But my guess is that we are inheriting an uncleared state from somewhere, but it is insane that it it varies by page and not by browser or platform. (I think!).
--John Maynard Friedman (talk) 00:21, 15 December 2020 (UTC)

I agree on D1, except (and I think some of this might have even changed, or maybe I needed more coffee) when I look at your examples, and here I will give their code and number them: "1. <small>THIS</small>: THIS; to 2. {{sc|THIS}} AKA {{smallcaps all|THIS}}: THIS; or 3. {{sc2|THIS}} AKA {{smallcaps2|THIS}}: THIS; or even 4. THIS: THIS", I am seeing 1 and 3 as identical (about 85% size), 2 as very small (x-height of lower-case of running text), and 4 as (of course) just plain all-caps. I'm wondering what purpose 2 ({{sc}}, {{smallcaps all}}) serves, since it may be an accessibility issues as covered next. (Aside {{smallcaps}} was not tested in this, but seems to look the same as {{smallcaps all}}; I think it just doesn't do forced normalilzation of the input first.) D2: Sounds right, but {{sc|THIS}} THIS produces excessively small text. That is, I think it is apt to be an actual accessibility issue for some people. Reducing capital letters to the x-height of lower-case ones is excessive; most fonts are not designed with that in mind, so readability is hindered. Maybe not "fatally" for a normal use-case, but it is not ideal at all. D3: Agreed, too, though as I say I think there's arguably an accessibility reason not just a subjective aesthetic one. D4: That is indeed the mystery. It might elucidate to examine source when it renders properly and when it does not. I s'pect that it really is some tiny error somewhere, like a tag not closing when it should, or a missing quote character, or something else trivial but hard to locate. To go back to the top of the thread, I think it's important that we produce copy-pasteable output that is not mangled; we're not really in a position to wreck the content and its reusability with bad output like "canadian syllabics hyphen" just to get a visual effect.  — SMcCandlish ¢ 😼  03:01, 15 December 2020 (UTC); revised 23:18, 15 December 2020 (UTC)

PS: Given that we only noticed this now, over 3 years after the start of this thread, it suggests that something broke in the interim, either in this template or one of those that it depends on. That might help in "walking" back through code; we have a probable known-good date to work forward from.  — SMcCandlish ¢ 😼  03:03, 15 December 2020 (UTC)

As I recall now, we narrowed it down to this: if the first invocation of {{unichar}} does NOT have an nlink= parameter, then (a) it renders correctly and (b) each subsequent invocation on the page, with or without nlink, will render correctly. But if it DOES use nlink, then neither it nor any subsequent invocations will render correctly. Further, if you have a series of invocations (as at signature mark), then to remove an nlink= from a middle one does not stop the 'rot'. --John Maynard Friedman (talk) 11:02, 15 December 2020 (UTC)
You put me on the right track: this looks like a bug fix! (If you see any article broken: revert please this edit). After this, the problems might be reduced? There is also infobox Euro sign, which shows acceptable (but possibly by unintended effects, like stacking font reductions ...). All in all, the template needs a total redesign & rebuild. Like, to be added: "when used in infobox, do not repead font reduction" (eg, add |child=yes). -DePiep (talk) 13:13, 15 December 2020 (UTC)
Thanks for tracking that down!  — SMcCandlish ¢ 😼  23:32, 15 December 2020 (UTC)
Let me recap the recap
@John Maynard Friedman and SMcCandlish: By now, the {{Unichar}} template situation has become too complicated to solve things in a simple matter. Many issues are involved and interacting, including other-template effects (like templates style, incoherent smallcaps templates), and wikimedia effects. That may be good for WP, but it is not good for this template.
We better conclude this thread to be closed (for this subhread).
I propose to abandon smallcaps altogether here, and research the {{small}} format (<small>X</small>, is 85% font-size btw).
Later on we can improve the template (from 2011, so pre-Lua!) more. See my #Remove_smallcaps next step. -DePiep (talk) 23:20, 15 December 2020 (UTC)
Generally agreed. We have no need to invoke some other smallcaps template (of any kind) instead of just locally apply <small> (which produces the same output as {{small}}, i.e. 85%-height smallcaps, which is what has been intended. The "magic" this template does to this particular string is normalize the input (to Unicode's official if strange all-caps naming) so that it copy-pastes properly, then apply smallcaps, and there has been no need for it to try to get at this via any other template call. Aside from adding complexity for no gain, it increases parser load, and pushes every page that uses it a step closer to the parserfunction limits. That said, I'm not keen on doing this as Lua/Scribunto module if not really necessary, as that adds a new layer of complexity. Adding complexity to avoid complexity isn't really a solution. :-) If there's something that this template really needs to do and it can't do it without using Lua, then okay. That could conceivably be the case, in dealing with other issues reported here, like strangeness in displaying combining characters, etc., etc., but we should try to fix it in usual template code first.  — SMcCandlish ¢ 😼  23:32, 15 December 2020 (UTC)

Option 1

Sandbox version not yet feasible (15 Dec 2020)

I have prepared the sandbox: use {{Smallcaps}}, and set input text into lowercase:

  • {{unichar/sandbox|203B|REFERENCE mark}} → U+203B REFERENCE MARK

If OK, we can publish it. We do need a check on smallcaps usage for WP:ACCESS though (should not produce bad fonts in certain situations). -DePiep (talk) 17:58, 14 December 2020 (UTC)

Trouble in paradise: the demo right above shows smallcaps OK in Preview, but when saved shows regular font. Very strange.
Yep, regular font, and all-lowercase.  — SMcCandlish ¢ 😼  03:02, 15 December 2020 (UTC)
Update: I'm now seeing it as smallcaps at about 85% size, which I believe is the intended result.  — SMcCandlish ¢ 😼  23:22, 15 December 2020 (UTC)

Remove smallcaps

I propose to remove the use of smallcaps in {{Unichar}} completely. -DePiep (talk) 22:28, 15 December 2020 (UTC)

Useful links:

Considerations

(later more) -DePiep (talk) 22:40, 15 December 2020 (UTC) Current {{Unichar}} rendering shows unexpected and unintended results. This includes: unpredictable results from {{sc2}} when wrapped; different output in this talkage only when saved not when previewed (btw, after-save only = safesubst effect?), changeingand unpredictable results from {{sc}} and {{sc2}} themselves (<templatestyles src="smallcaps/styles.css"/> introduced 2018). In infoboxes, like Euro sign, the effect changes again (as expected, but notr controlled nor correct probably b/c nesting size-settings). In general, the template currently does not perform as expected. Probably there are code errors (like unclosed tags). Requirements & testsituations to be redefined. -DePiep (talk) 19:46, 19 December 2020 (UTC)

See my 23:32, 15 December 2020 (UTC) note above. We need to abandon using pre-templated smallcaps templates. What this template does (for this data) in essence is convert input to Unicode's ALL-UPPERCASE naming convention, then reduce the size of it to be less annoying. That's not wrong, we've just been going about it in a clumsy, inefficient, and easily-broken way. The way to do it is to use the uc magicword, then just apply <small>. That results in smallcaps (in a MOS:ALLCAPS-sanctioned use of them) without adding unnecessary parser calls or relying on other templates which can change out from under us. If we wanted instead to change to ignoring Unicode conventions and presenting the names in a format like "hyphen-minus" or "Hyphen-Minus", I think a) we'd need an RfC to decide to do that significant change (and I would expect considerably pushback from it, since it diverges from established off-site style with regard to Unicode characters' names), and b) we would have a choice: b1) use Lua to lower-case it all and then uppercase just first characters of words to produce title-case ("Hyphen-Minus"), or b2) entirely depend on editors to supply correct input. B2 is something no one has really been doing (it's why we imposed case normalization in the first place, because random editors were doing "hyphen-minus", "Hyphen-Minus", and "HYPHEN-MINUS", sometimes in the same article, and even including incorrect lower-casing of proper names like "canadian"). If we wanted the B2 result, it would require massive site-wide cleanup of already deployed instances, in addition to regular post-change cleanup maintenance, and I don't think that's practical. So, we're left with my "clean" version of smallcaps, or with b1 (lower-case it all then capitalize words, in an automated fashion).  — SMcCandlish ¢ 😼  23:43, 15 December 2020 (UTC)
All of this, 100%. Will /sandbox this. (I trusted the *outside* SC templates too much too long). -DePiep (talk) 23:57, 15 December 2020 (UTC)
btw, the {{small}} template documents that small equals 85% font-size. Even better, stable mw. -DePiep (talk) 23:59, 15 December 2020 (UTC)
  •  Done [2]. @SMcCandlish and John Maynard Friedman: using {{small}} (=85% font-size per mediawiki). Also looks OK in infobox: Euro sign. -DePiep (talk) 21:19, 21 December 2020 (UTC)
    Yes, that looks better, also pound sign. Both were getting compound reductions, not now. --John Maynard Friedman (talk) 22:18, 21 December 2020 (UTC)
    Agreed, though I think it would be better to just apply the sizing span that {{small}} uses instead of invoking {{small}} itself, since doing the latter doubles the parserfunction usage for no practical reason. We do have pages that keep hitting the parserfunction limit.  — SMcCandlish ¢ 😼  02:42, 22 December 2020 (UTC)
    Did {{{1}}}. -DePiep (talk) 06:56, 22 December 2020 (UTC)

Auto parameter values, maintainability

It would be convenient if the template could automatically determine the character name based on its codepoint or determine the codepoint given the literal character itself. I noticed this was brought up previously but bot-archived. Module:Unicode data has matured since that discussion, and this use case looks quite straightforward: {{#invoke:Unicode data|lookup|name|2D}} → HYPHEN-MINUS

Secondly, the current system of subtemplates is challenging to understand and difficult to modify owing to the distribution of logic over multiple templates and the inability to even use a value twice without recalculating it; now that we have Lua, I think we could gain in maintainability and performance by re-implementing the template as a module. Looks like User:DePiep did most of the work on this; thoughts?  —wqnvlz (talk·contribs);  08:30, 6 April 2022 (UTC)

You are right. (I was working on a replacement already). Minor issue: today, some instances do have |name=blank. Boldly supplying the /data name could disorder those pages (eg tables). Other issue: I was researching the "script to language" automation, so as to use appropriate scripts (when non-Latin); don't know good form yet. Anyway, it's on my todo list, and I'd like to use Lua for this. -DePiep (talk) 10:53, 6 April 2022 (UTC)

Old comment, saved

From Template talk:Unichar/doc (2013), saved:

Why does the html entity output use decimal? It seems like hex would make more sense, to make it clearer what character it's referencing.

-DePiep (talk) 08:14, 17 April 2022 (UTC)

{{Unichar}} returns the U+ hex value (fit to use as &#xhhhh;). The &#nnn; decimal value is shown when |html= is set (blank or any value). As proposed (in 2013) I am projecting to remove this decimal value from the output, per WP:NOTHOWTO (we are not to provide the entering help; especially not inline). DePiep (talk) 08:26, 17 April 2022 (UTC)

Aww

I cant see the code of the template.

RuWP (talk) 15:08, 2 January 2022 (UTC)

The code is present in the template and its subpages (subtemplates). The code is complicated. Currently, the code is being redesigned. DePiep (talk) 08:32, 17 April 2022 (UTC)

Option to only show HTML mnemonic

Generally only the text form of the HTML shortcut is interesting. Anybody capable of using the decimal shortcut is probably able to also type &#xN; using the Unicode code point. I would two things:

  • A way to show the numeric entry only if no mnemonic
  • A way to show nothing (including no "HTML" and parenthesis) if there is no mnemonic, for convenience when building tables.

It would also be nice to show the #x version of the HTML, at least for any numbers larger than 999. Spitzak (talk) 20:40, 21 December 2021 (UTC)

Your second request already exists, if I understand you correctly? just omit the hmtl=. For example, {{unichar|00A7|Section sign}} produces U+00A7 § SECTION SIGN. --John Maynard Friedman (talk) 00:23, 22 December 2021 (UTC)
@Spitzak and John Maynard Friedman: Sure, these are couldbe-&-shouldbe options to add. IMO most detail thoughts & ideas are about the inline presentation, can't have too much clutter in there. In a table, OTOH, much more is possible in adding & formatting, so options "|format=table1, table2" is on the list to be added (bit like {{Convert#Table options}} do).
The good news is: I was working on a new version, exploring options (see /testcases). Bad news: I ran aground when looking for a 'language-to-script' function or template (editors usually know & enter a language to get a good font for the script). Then spring broke and summer and other templates needed attention ...
Have a nice edit, -DePiep (talk) 03:17, 22 December 2021 (UTC)
The second item was an idea so that the html can be shown only if there is a mnemonic, but without having to edit the template call depending on whether or not the mnemonic exists. What I meant by "table" is a table where the source text looks pretty much the same for every row, but the html only appears for the characters that have a mnemonic.
For actual tables it would be useful to have access to the "pieces" of this template. For instance it appears there is a translator from unicode code point numbers to names. There is also a template designed to show the glyph though I think it is mostly obsolete attempts to work around Windows font problems that have been fixed, and a template to correctly format the small caps. A template to return the mnemonic, with options as to whether more than one is wanted, whether the decimal or hex is wanted and if they are wanted if there is a mnemonic, etc. This would not include the letters "HTML" or any parenthesis or markup (unless a reliable way to make it "code" that does not put a nested box inside tables but does in the main text is found).Spitzak (talk) 18:42, 22 December 2021 (UTC)
@Spitzak: a very good report, everything you write is to the point & a feature request worth adding. (show mnemonic-only could be default behaviour even).
FYI: the mnemonics are in Module:Numcr2namecr. Formally called named character reference; as opposed to numeric character reference (which could be dec or hex).
Could you add some inspiring example article links, where a table might be enhanced with such options? (just to get the thinking going)
As said, features worth adding, but I don't see time in the short future for me to work on this. I'd start in Lua btw. -DePiep (talk) 05:31, 23 December 2021 (UTC)
  • @Spitzak:: I have changed current code quickly to achieve: |html=<is present> → will show menmonic when exists, otherwise no suffix is added. The decimal numeric option is removed altogether. |note= will appear when entered.
A9
{unichar|00A9|COPYRIGHT SIGN}} → U+00A9 © COPYRIGHT SIGN
{unichar|00A9|COPYRIGHT SIGN|note=Some note}} → U+00A9 © COPYRIGHT SIGN (Some note)
{unichar|00A9|COPYRIGHT SIGN|html=yes}} → U+00A9 © COPYRIGHT SIGN (&copy;, &COPY;)
{unichar|00A9|COPYRIGHT SIGN|html=}} → U+00A9 © COPYRIGHT SIGN (&copy;, &COPY;)
{unichar|00A9|COPYRIGHT SIGN|html= |note=Some note}} → U+00A9 © COPYRIGHT SIGN (&copy;, &COPY; · Some note)
{unichar|00A9|COPYRIGHT SIGN|html=}} → U+00A9 © COPYRIGHT SIGN (&copy;, &COPY;)
U+62
{unichar|0062|LATIN small LETTER B}} → U+0062 b LATIN SMALL LETTER B
{unichar|0062|LATIN small LETTER B|html=}} → U+0062 b LATIN SMALL LETTER B
{unichar|0062|LATIN small LETTER B|html= |note=Some note}} → U+0062 b LATIN SMALL LETTER B (Some note)
{unichar|0062|LATIN small LETTER B|html=}} → U+0062 b LATIN SMALL LETTER B
{unichar|0062|LATIN small LETTER B|note=Some note}} → U+0062 b LATIN SMALL LETTER B (Some note)
For now this sahould do; more options to be build in later. -DePiep (talk) 11:11, 17 April 2022 (UTC)

Redesign: parameter evaluation

I am working on a redesign. To keep in mind: the first aim for this template is: use inline (in running sentences). It is also used in tables & lists (changes should not break these).

Step 1 reconsider existing parameters. Proposed changes:
  • |sans=y (→ present the "U+00A9" part in sans-serif) Red X 8pxN: not desired in inline usage (=main template intention). Already ineffective, superseded by {{mono}} usage.
Deprecate & remove usages. Green tickY Done. Removed 6 instances; no effect. -DePiep (talk) 08:52, 17 April 2022 (UTC)
  • |dec=<anytext> (→ adds decimal codepoint value) Red XN: undesired, especially inline. As WP:NOTHOWTO explains, wiki is not to provide help for How to Input (&#x00A9 nor &#169)
Deprecate & remove usages. Green tickY Done. Removed 1 instance. -DePiep (talk) 08:52, 17 April 2022 (UTC)
DePiep (talk) 08:49, 17 April 2022 (UTC)
  • |br=<?> (→ unknown; something with {{Str sub old}}): unknown effect, unused. Red XN. -DePiep (talk) 09:09, 17 April 2022 (UTC)
  • |html=<present> (→ adds decimal entity like &#169 for U+A9) Red XN: remove the decimal per WP:NOTHOWTO (no need to show entering-ways; esp not inline).
Remove. Keep named entity alias showing when |html=, like &copy;. Green tickY done. So curently, no decimal value can be shown at all. -DePiep (talk) 10:24, 17 April 2022 (UTC)
  • |nlink=<blank> (→ will wikilink the name to the article as given), |Copyright symbol|nlink=| would link to COPYRIGHT SYMBOL (article title lower case).
Deprecate option Red XN. Some 2 usages, resolved. Needs different mechanism. -DePiep (talk) 06:15, 26 April 2022 (UTC)
This one needs more thought. The current arrangement is used and useful, as in most cases there is no need to spell out the target article name. Compare and contrast {{unichar|25CA|Lozenge|nlink=Lozenge (shape)}} and {{unichar|0026|ampersand|nlink=}}. In the first case, the long name is needed and has to be typed out, in the second it doesn't. Compare with the pipe trick: [[Lozenge (shape)|]]. --John Maynard Friedman (talk) 08:07, 26 April 2022 (UTC)
Backgrounds: first of all, I dislike the construct of |nlink=<present but blank> as a meaningful parameter use. Because for the editor, is has an opaque meaning & effect, very hard to document &tc. (tbh, I built this one myself, years ago ;-).
Also, it is used rarely. Out of 3200+ {{Unichar}} instances, 227 use the nlink parameter, of which some 5 (five) used the <blank> option. Says that it is not very popular. Note that it still required the exact article-title spelling (uc/lc) for |2=.
And importantly, re good usage, is: the character can use a better target linking system. It be either the character description page (like Copyright sign), or a more semantic target (like Greater than, right away. To be considered and tested, is the idea to link to the bare character article (like %), which then can be a redirect as appropriate. -DePiep (talk) 13:03, 26 April 2022 (UTC)
I can refine: the template could (should) easily be linked to the character article itself (like =), which in turn can redirect to either the true character description or to a more semantic page. cf.; U+003D = EQUALS SIGN (&equals;). This principle leaves it to the character article (-redirecting) to sort out the target (character description or something semantic). I am working on this in the sandbox; for example add option |link=# to link to the char page. -DePiep (talk) 18:22, 26 April 2022 (UTC)
I bet that I am the Prime Suspect responsible for those five instances . Yes, I agree that Parameter= eh? equals what? is ugly as well as being a pig to document, so I am broadly in favour of losing it permanently. Mainly I have used nlink to redirect to a section, which is poor practice: I really should have created a proper redirect article and nlinked to that.
I like the idea of linking automatically to an article named for the symbol itself. No need for to type the name out twice. Programming around the awkward cases (# and | etc) should prove entertaining ... maybe in those cases use the hex code-point number as the name for the redirect article? This would have handled the reorganisation of the various lozenges almost automatically, for example. To be clear, I'm now advocating complete deprecation of any nlink (or link etc) syntax. Existing usage gets ignored except apart from an entry in an error category for clean-up action to follow.
"You've got to ask yourself a question: 'do I feel lucky?' Well, do ya, punk?" I think we have a solution but a lot of articles will get messed up if we've missed an important detail somewhere. BEBOLD!--John Maynard Friedman (talk) 19:00, 26 April 2022 (UTC)
Yes, all of this. (Though no blaming for editing by documentation :-). More such details at hand, eg find right font for exotic scripts; auto-find character name. Improvements only, also wrt parameters; {{Unichar}} is a bit old. -DePiep (talk) 19:28, 26 April 2022 (UTC)