Wikipedia:Geographical names

From Wikipedia, the free encyclopedia

Wikipedia has over 700,000 articles about geographical entities such as villages, districts, lakes, rivers, mountains and protected areas. Their infoboxes vary considerably in layout and the information they support. The article title holds the common English form but the article may also give the common names used in the local language(s), official names, former names, other names and nicknames. Non-Latin script may be followed by a romanized or phonetic form.

All non-English forms of a name should be marked up so they are rendered correctly by a screen reader. This essay proposes standard ways to gather, validate and format the different names in the article text and in infoboxes, and outlines a migration approach. The core proposal is to adapt all the geographical entity infoboxes to use a standard child template, {{infobox geonames}}, which will undertake validation and formatting of the names.

Current situation[edit]

There are several hundred geo-infoboxes used in over 700,000 articles about geographical entities. As of February 2022 {{Infobox settlement}} was used in over 543,000 articles, {{Infobox river}} in 28,870, {{Infobox mountain}} in 26,448, {{Infobox building}} in 24,502, and so on down to a long tail of infoboxes like {{Infobox Tibetan Buddhist monastery}} (286 articles) or {{Infobox dive site}} (18 articles). As shown in #Sample infobox templates (below) the infoboxes are very inconsistent in the name-related parameters they accept, and as shown in #Current usage examples (below) they are also very inconsistent in the format they render.

Non-English names are common even in countries where English is the national language. A place in California might have former names in Spanish and indigenous languages. A place in England may have former names in Common Brittonic or Old English. In France, there may be variants of local names in Breton, Occitan or Corsican. India has a wealth of languages and scripts. Due to lack of consistent support for non-English names, editors may struggle with the default formatting, as with

  • |native_name = {{nobold|四国}}
  • |native_name = {{lang|tr|Anadolu Selçuklu Devleti}} {{lang|fa|سلجوقیان روم}} Saljūqiyān-i Rūm

Introducing standard validation and formatting for names in all geo-infoboxes will give a more consistent reader experience, reduce accessibility problems with screen readers, and make life easier for editors.

Proposed guidelines[edit]

1. Articles about geographical entities may provide extensive information about names, including the different types of name, etymology, pronunciation, non-Latin script, romanization and so on. However, the information does not have to all be crammed into the infobox and the lead sentence. As illustrated in the article on the Nile, it may be relegated to a section on naming.
2. Any non-English name in Latin script should be rendered in italics with proper HTML mark-up for a screen reader, and the language should be rendered before the name,
  • If it is to be rendered in the native language by a screen reader and/or
  • If readers will want to know what language the name is in

Example: German: MünchenBavarian: Minga

3. If a non-English name in Latin script may be rendered in English pronunciation, and readers will not be particularly interested in the language, the language need not be identified.

Example: EboracumEoforwicJorvikEverwic These former names for York are from obsolete languages with uncertain pronunciation.

4. Names in non-Latin script may be followed by an italicized romanized or phonetic form if relevant, and the language should be identified.

Example: Russian: Москва [Moskva]

5. A list of names of the same type in an infobox should be formatted as a horizontal list if it will fit on one line. Otherwise it should be formatted as a simple vertical list. Thus:
    French: BruxellesDutch: Brussel
But
    Brussels-Capital Region
    French: Région de Bruxelles-Capitale
    Dutch: Brussels Hoofdstedelijk Gewest

Identifying languages[edit]

Non-English names are often formatted using {{lang}} or {{native name}}. However, both these templates require a 2- or 3-digit ISO code. Many editors do not know what these codes are, and many former place names are in languages that do not have an ISO code. Thus River Derwent (Tasmania) was originally called timtumili minanya in the Mouheneener language. Sometimes the language is unknown. An explorer may have recorded what the "natives" called the place, but failed to record the natives' ethnic group.

The solution is to enhance the {{lang}} and {{native name}} templates, or create a new {{lang2}} template to allow the full names of languages as an alternative to the ISO code. Thus {{lang2|German|München}} and {{lang2|de|München}} should both be accepted and render the same result. {{infobox geonames}} would implement the same logic.

  • If a language is not found in the list of ISO codes that gives corresponding language names, check for it in a list of language names that gives corresponding ISO codes
  • The second list may include languages such as Chirr, Phuthi or Erzgebirgisch with ISO code "mis", meaning they have no ISO code
  • Both lists will also include the name of the Wikipedia article for the language, for use as a link
  • If the language is not known, use the language code "und"
  • Use the ISO code for HTML tagging and the corresponding language name for display purposes
  • Flag articles with unrecognized languages for manual follow-up

The enhanced or new template should also accept and display a romanised or phonetic version of the name. E.g.

{{lang2|ar|بَغْدَاد|baɣˈdaːd}} or {{lang2|Arabic|بَغْدَاد|baɣˈdaːd}}

would render

Arabic: بَغْدَاد [baɣˈdaːd]with the non-Latin name tagged with the html lang=ar.

Standard infobox parameters[edit]

See #Sample infobox templates (below) for parameters used in different infoboxes. Assuming the parameter names used in {{infobox settlement}} will prevail, and that official names, native names and other names can all have languages and may all have Romanized forms, the parameters could be

Alternative 1: Explicit[edit]

|name                =
|official_name       =
|official_name_lang  =     
|official_name_roman =     
<!--           Use |official_name2 = |official_name_lang2 = |official_name_roman2 = etc. for additional names, up to five -->
|native_name         =     
|native_name_lang    =     
|native_name_roman   =     
<!--           Use |native_name2 = |native_name_lang2 = |native_name_roman2 = etc. for additional names, up to five -->
|former_name         =
|former_name_lang    =     
|former_name_roman   =     
<!--           Use |former_name2 = |former_name_lang2 = |former_name_roman2 = etc. for additional names, up to five -->
|other_name          =
|other_name_lang     =     
|other_name_roman    =     
<!--           Use |other_name2 = |other_name_lang2 = |other_name_roman2 = etc. for additional names, up to five -->
|nickname            =

Alternative 2: Templated[edit]

|name                =    
|official_name       =    <!-- {{lang2|<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|native_name         =    <!-- {{lang2|<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|former_name         =    <!-- {{lang2=<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|other_name          =    <!-- {{lang2|<language>|<name>|<roman form>}} or  
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|nickname            =

Comparison of alternatives[edit]

In both alternatives the editor must enter the same information:

|official_name = name
|official_name_lang = language
|official_name_roman = roman form

or

|official_name = {{lang2|language | name | roman form}}

The first format is probably slightly easier for the novice editors, who may be put off by the curly brackets and vertical bars in the second form. Articles about major geographical entities like Cairo, Brahmaputra River or Mount Everest attract seasoned editors who can deal with formatting issues. But the majority of geographical articles are stubs like Orto, Corse-du-Sud, Maquan River or Klinkit Creek Peak, where the editors may find even a simple infobox a bit of a challenge.

The first form also makes it easier to ensure that languages are rendered correctly, since the {{infobox geonames}} template can see and validate all the parameters, for example checking for unusual characters in a name such as ":" or "(" that may indicate attempts to pre-format them. With the second approach {{infobox geonames}} can only see the result rendered by {{lang2}}, and cannot be sure that only the correct formatting template has been used. This essay therefore recommends the first, explicit alternative.

Rendered layout[edit]

See #Current usage examples for the various ways in which geographical infoboxes render name information. There is no reason why they should be so inconsistent. The obvious way to standardize collection, validation and rendering of name data is to use a child infobox that can be shared by all the geographical entity infoboxes. To demonstrate, {{Infobox geonames parent}} embeds child {{infobox geonames}}, which formats the names. This is just a crude mock-up of the alternative 2 format, with no real validation and formatting, but illustrates the concept. The code at the left (or below on a phone) renders the result at the right.

Article name
Native name or names
OfficialList of official names
FormerlyFormer names
VariantsOther names
NicknameNicknames
Other dataSpecialized information about the geographical entity
{{Infobox geonames parent
  |name=Article name
  |native_name = Native name or names
  |official_name = List of official names
  |former_name= Former names
  |other_name= Other names
  |nickname= Nicknames  
  |image=File:Przełęcz Karkonoska - panorama.jpg
  |otherdata=Specialized information about the geographical entity
}}

This is a rough first cut. The format rendered by {{infobox geonames}} should be carefully reviewed and adjusted. Logic must be added to validate the languages and ensure that names, languages, non-Latin scripts and lists of names are formatted correctly, and titles must be pluralized as needed. But once this is done, the standard validations and formatting will then be picked up automatically by all geo-infoboxes that embed {{infobox geonames}}.

General migration approach[edit]

{{lang}}, {{native name}} etc. should be enhanced to support language names as an alternative to language codes, and to support romanized or phonetic forms. This can be done at any time, and will have no impact on existing articles.

Migration to a more standard way of collecting, validating and formatting names can be done infobox by infobox.

  • Every effort should be made to minimize disruption.
  • A geo-infobox change that introduces red error messages in the text of many articles where there were no error messages before is unacceptable
  • The preferred approach is to flag issues using a hidden tracking category, and allow gnomes to work through the flagged formatting replacing it by the new standard. Once almost all the non-standard formatting has been eliminated, the geo-infobox may start to render red error messages.

Two types of change may be introduced independently:

  1. The geo-infobox is changed to use the new {{infobox geonames}}
  2. The geo-infobox is changed to eliminate non-standard parameter names

Converting to {{infobox geonames}}[edit]

  • The first step for each geo-infobox is to obtain agreement on its talk page and associated project talk page to migrate to the standard {{infobox geonames}}
  • A version of the geo-infobox using {{infobox geonames}} is prepared and carefully tested
  • This version will use the standard parameter names, but will also accept variants to provide backward compatibility
  • Assuming no problems, the standardized geo-infobox template will be cut into production, passing "mode=transition" to {{infobox geonames}}. In this mode, {{infobox geonames}} will populate tracking categories with error messages, but will attempt to format the data provided, and will not generate red error messages.
  • Once the tracking categories have mostly been cleared, the geo-infobox will start passing "mode=strict" to {{infobox geonames}}. In this mode, {{infobox geonames}} will generate red error messages

Standardizing parameter names[edit]

In the long run, it will be easier for editors if all geo-infoboxes use the same names for the same parameters.

  • The geo-infobox passes {{infobox geonames}} parameters with the standard names, but also passes the old parameter names:
    |other_name={{{other_name|{{{name_other|}}} }}}
  • The documentation is changed to show both parameter names:
    |other_name=      <!-- or |name_other = -->
  • At some point, the old name is deprecated, with articles that use it put into maintenance categories
  • Gnomes work through changing to the standard parameter names
  • Eventually the old parameter names are dropped, and flagged as errors when the article is in edit mode

Providing support for the standard parameter names is important. Removing variant usage is less important, and should not be allowed to get in the way of the main thrust to standardize name validation and formatting.

Appendices[edit]

Sample infobox templates[edit]

See Category:Place infobox templates for the complete set.

Type Template Example Count[a] Parameters
Divisions
Continent {{Infobox continent}} Africa 56 title
Island {{Infobox islands}} Borneo 8,317 name, native_name (or local_name), native_name_link[b], native_name_lang, sobriquet (or nickname), etymology
Country {{Infobox country}} Albania 5,769 name, conventional_long_name, common_name, native_name, linking_name
Settlement {{Infobox settlement}} Brussels 543,470 name, official_name, other_name, native_name, native_name_lang, etymology, nickname
Structures
Airport {{Infobox airport}} Frankfurt Airport 15,543 name, nativename, nativename-a (non-western characters), nativename-r (Romanized)
Amusement park {{Infobox amusement park}} Epcot 1,027 name, previous_names
Ancient site {{Infobox ancient site}} Nineveh 4,653 name, native_name, native_name_lang, alternate_name
Bridge {{Infobox bridge}} Band-e Kaisar 5,684 name, native_name, native_name_lang, official_name, other_name, named_for
Building {{Infobox building}} Palace of Versailles 24,502 name, native_name, native_name_lang, former_names, alternate_names, etymology
Cemetery {{Infobox cemetery}} Glasnevin Cemetery 1,416 name, native_name, native_name_lang
Church {{Infobox church}} Durham Cathedral 13,394 name, fullname, other name, native_name, native_name_lang, former name
Dam {{Infobox dam}} Red Bluff Diversion Dam 4,159 name, name_official
Dzong {{Infobox Tibetan Buddhist monastery}} Potala Palace 286 name + language specifics[c]
Hindu temple {{Infobox Hindu temple}} Meenakshi Temple, Madurai 2,274 name, native_name, native_name_lang
Historic site {{Infobox historic site}} Diocletian's Palace 10,063 name, native_name, native_language, native_name2, native_language2, native_name3, native_language3, other_name, etymology
Power station {{Infobox power station}} Ekibastuz GRES-2 Power Station 2,852 name, name_official
Natural geography
Mountain {{Infobox mountain}} Central Eastern Alps 26,448 name, other_name, etymology, nickname, native_name, native_name_lang, translation, pronunciation, authority
Body of water {{Infobox body of water}} Lake Sevan 17,050 name, native_name, other_name
River {{Infobox river}} Nile 28,870 name, native_name, name_other, name_etymology, nickname
Canal {{Infobox canal}} Royal Canal 584 name
Glacier {{Infobox glacier}} Vatnajökull 1,622 name, other_name
Landform {{Infobox landform}} Pongo de Manseriche 1,147 name, other_name
Mountain pass {{Infobox mountain pass}} Khunjerab Pass 1,303 name, other_name
Stratigraphic unit {{Infobox rockunit}} Burgess Shale 6326 name
Valley {{Infobox valley}} Alay Valley 737 name, other_name, native_name, translation
Waterfall {{Infobox waterfall}} Angel Falls 1,345 name
Ecology, parks etc.
Ecoregion {{Infobox ecoregion}} Alto Paraná Atlantic forests 919 name
Park {{Infobox park}} Park Güell 6,693 name, alt_name, native_name, native_name_lang
Protected area {{Infobox protected area}} Gran Paradiso National Park 13,312 name, alt_name
Site of Special Scientific Interest {{Infobox Site of Special Scientific Interest}} Lundy 2,052 name
Trail {{Infobox hiking trail}} The Ridgeway 1,164 name
World Heritage Site {{Infobox UNESCO World Heritage Site}} Park Güell 1,587 WHS, Official_name
Zoo {{Infobox zoo}} Baghdad Zoo 1,229 name

Miscellaneous not reviewed:

Not checked:

Current usage examples[edit]

The examples below are taken from articles as of February 2022, with the infoboxes edited to remove information other than names, and to show a standard image. They illustrate the varied visual styles and approaches to presenting names, partly imposed by the infobox templates, and partly chosen by the editors.

Island

Borneo
Kalimantan

Borneo (/ˈbɔːrni/; Indonesian: Kalimantan) is the third-largest island in the world and the largest in Asia. At the geographic centre of Maritime Southeast Asia, in relation to major Indonesian islands, it is located north of Java, west of Sulawesi, and east of Sumatra.

Country

Republic of Albania
Republika e Shqipërisë (Albanian)
Location of Albania

Albania (/ælˈbniə, ɔːl-/ a(w)l-BAY-nee-ə; Albanian: Shqipëri or Shqipëria), officially the Republic of Albania (Albanian: Republika e Shqipërisë), is a country in Southeastern Europe. It is located on the Adriatic and Ionian Sea within the Mediterranean Sea and shares land borders with Montenegro to the northwest, Kosovo to the northeast, North Macedonia to the east and Greece to the south. Tirana is its capital and largest city, followed by Durrës, Vlorë and Shkodër.

Settlement

Brussels
  • Brussels-Capital Region
  • Région de Bruxelles-Capitale (French)
  • Brussels Hoofdstedelijk Gewest (Dutch)
Nicknames: 
Capital of Europe, Comic City

Brussels (French: Bruxelles [bʁysɛl] or [bʁyksɛl] ; Dutch: Brussel [ˈbrʏsəl] ), officially the Brussels-Capital Region (French: Région de Bruxelles-Capitale; is a region of Belgium comprising 19 municipalities, including the City of Brussels, which is the capital of Belgium. The Brussels-Capital Region is located in the central portion of the country and is a part of both the French Community of Belgium and the Flemish Community, but is separate from the Flemish Region (within which it forms an enclave) and the Walloon Region. Brussels is the most densely populated and the richest region in Belgium in terms of GDP per capita. The five times larger metropolitan area of Brussels comprises over 2.5 million people, which makes it the largest in Belgium. It is also part of a large conurbation extending towards Ghent, Antwerp, Leuven and Walloon Brabant, home to over 5 million people.

Airport

Frankfurt Airport

Flughafen Frankfurt Main
Summary

Frankfurt Airport (IATA: FRA, ICAO: EDDF; German: Flughafen Frankfurt Main [ˈfluːkhaːfn̩ ˈfʁaŋkfʊʁt ˈmaɪn], also known as Rhein-Main-Flughafen), is a major international airport located in Frankfurt, the fifth-largest city of Germany and one of the world's leading financial centres. It is operated by Fraport and serves as the main hub for Lufthansa, including Lufthansa CityLine and Lufthansa Cargo as well as Condor and AeroLogic. The airport covers an area of 2,300 hectares (5,683 acres) of land and features two passenger terminals with capacity for approximately 65 million passengers per year; four runways; and extensive logistics and maintenance facilities.

Ancient site

Nineveh
نَيْنَوَىٰ

Nineveh (/ˈnɪnɪvə/; Arabic: نَيْنَوَىٰ Naynawā; Syriac: ܢܝܼܢܘܹܐ, romanizedNīnwē; Akkadian: 𒌷𒉌𒉡𒀀 URUNI.NU.A Ninua) was an ancient Assyrian city of Upper Mesopotamia, located on the outskirts of Mosul in modern-day northern Iraq. It is located on the eastern bank of the Tigris River and was the capital and largest city of the Neo-Assyrian Empire, as well as the largest city in the world for several decades. Today, it is a common name for the half of Mosul that lies on the eastern bank of the Tigris, and the country's Nineveh Governorate takes its name from it.

Bridge

Band-e Kaisar

بند قیصر,
Other name(s)Pol-e Kaisar, Bridge of Valerian, Shadirwan

The Band-e Kaisar (Persian: بند قیصر, "Caesar's dam"), Pol-e Kaisar ("Caesar's bridge"), Bridge of Valerian or Shadirwan was an ancient arch bridge in Shushtar, Iran, and the first in the country to combine it with a dam. Built by the Sassanids, using Roman prisoners of war as workforce, in the 3rd century AD on Sassanid order, it was also the most eastern example of Roman bridge design and Roman dam, lying deep in Persian territory. Its dual-purpose design exerted a profound influence on Iranian civil engineering and was instrumental in developing Sassanid water management techniques.

Building

Palace of Versailles
Château de Versailles (French)

The Palace of Versailles (/vɛərˈs, vɜːrˈs/ vair-SY, vur-SY; French: Château de Versailles [ʃɑto d(ə) vɛʁsɑj] ) is a former royal residence located in Versailles, about 12 miles (19 km) west of Paris, France. The palace is owned by the French Republic and has since 1995 been managed, under the direction of the French Ministry of Culture, by the Public Establishment of the Palace, Museum and National Estate of Versailles. 15,000,000 people visit the Palace, Park, or Gardens of Versailles every year, making it one of the most popular tourist attractions in the world. However, due to the COVID-19 pandemic, the number of paying visitors to the Chateau dropped by 75 percent from eight million in 2019 to two million in 2020. The drop was particularly sharp among foreign visitors, who account for eighty percent of paying visitors.

Historic site

Historical Complex of Split with the Palace of Diocletian
Native name
Croatian: Povijesna jezgra grada Splita s Dioklecijanovom palačom

Diocletian's Palace (Croatian: Dioklecijanova palača, pronounced [diɔklɛt͡sijǎːnɔʋa pǎlat͡ʃa]) is an ancient palace built for the Roman emperor Diocletian at the turn of the fourth century AD, which today forms about half the old town of Split, Croatia. While it is referred to as a "palace" because of its intended use as the retirement residence of Diocletian, the term can be misleading as the structure is massive and more resembles a large fortress: about half of it was for Diocletian's personal use, and the rest housed the military garrison.

Mountain

Central Eastern Alps

The Central Eastern Alps (German: Zentralalpen or Zentrale Ostalpen), also referred to as Austrian Central Alps (German: Österreichische Zentralalpen) or just Central Alps, comprise the main chain of the Eastern Alps in Austria and the adjacent regions of Switzerland, Liechtenstein, Italy and Slovenia. South them is the Southern Limestone Alps.

Body of water

Lake Sevan
Սևանա լիճ (Armenian)

Lake Sevan (Armenian: Սևանա լիճ, romanizedSevana lich) is the largest body of water in both Armenia and the Caucasus region. It is one of the largest freshwater high-altitude (alpine) lakes in Eurasia. The lake is situated in Gegharkunik Province, at an altitude of 1,900.44 m (6,235 ft) above sea level. The total surface area of its basin is about 5,000 km2 (1,900 sq mi), which makes up 16 of Armenia's territory. The lake itself is 1,264 km2 (488 sq mi), and the volume is 32.8 km3 (7.9 cu mi). It is fed by 28 rivers and streams. Only 10% of the incoming water is drained by the Hrazdan River, while the remaining 90% evaporates.

River

Nile

The Nile is a major north-flowing river in northeastern Africa. It flows into the Mediterranean Sea. The longest river in Africa, it has historically been considered the longest river in the world, though this has been contested by research suggesting that the Amazon River is slightly longer. The Nile is amongst the smallest of the major world rivers by measure of cubic metres flowing annually. About 6,650 km (4,130 mi) long, its drainage basin covers eleven countries: Tanzania, Uganda, Rwanda, Burundi, the Democratic Republic of the Congo, Kenya, Ethiopia, Eritrea, South Sudan, Republic of the Sudan, and Egypt. In particular, the Nile is the primary water source of Egypt, Sudan and South Sudan. Additionally, the Nile is an important economic river, supporting agriculture and fishing.

Valley

Alay Valley
Naming
Native nameАлай өрөөнү (Kyrgyz)

The Alay Valley (Kyrgyz: Алай өрөөнү, Kyrgyz pronunciation: [ɑlɑj ørø:ny]) is a broad, dry valley running east–west across most of southern Osh Region, Kyrgyzstan. It spreads over a length of 174 km east–west. The valley extends in north–south direction with varying width of 27 km in the west, 40 km - in the central part, and 3–7 km - in the east. The altitude of the valley ranges from 2,440 m near Karamyk to 3536 m at Toomurun Pass with an average altitude of about 3000 m. The area of the valley is 8400 km2. The north side is the Alay Mountains which slope down to the Ferghana Valley. The south side is the Trans-Alay Range along the Tajikistan border, with Lenin Peak, (7134 m). The western 40 km or so is more hills than valley. On the east there is the low Tongmurun pass and then more valley leading to the Irkestam border crossing to China.

Notes[edit]

  1. ^ Transclusion count as of February 2022
  2. ^ link to the article about the language used for the native name
  3. ^ Infobox Tibetan Buddhist monastery collects the following parameters for native name: |t=ཇོ་ཁང་ |w=Jo-khang |to = {{{to}}} |ipa={{IPA|{{{ipa}}}}} |z={{{z}}} |thdl=thdl |e={{{e}}} |tc=大昭寺 |s={{{s}}} |p=Dàzhāosì