Content deleted Content added
মোঃ দেলওয়ার হোসেন, পিতা: ডাঃ মোঃ জয়নাল আবেদীন, মাতা: আনোয়ারা বেগম, ঠিকানা: ডাঃ মোঃ জয়নাল আবেদীনর বাড়ি,গ্রাম: দক্ষিণ মহতাপুর,ডাকঘর: চর মটুয়া-৩৮০৯, নোয়াখালী সদর নোয়াখালী,পরিচয় পত্র নং-১৪৬৪০৬৪০২৯, মোবাইল নং- ০১৭১২৬৬০১০৯ Tags: Reverted Visual edit |
|||
(7 intermediate revisions by 5 users not shown) | |||
Line 16:
| {{official website|1=https://www.unicode.org/main.html|name=Technical website}}}}
}}
{{Contains special characters|special=uncommon Unicode characters}}
Unicode has largely supplanted the previous environment of myriad incompatible [[character sets]] used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most [[web pages]], and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters.
Line 74 ⟶ 75:
[[File:Unicode sample.png|class=skin-invert-image|thumb|right|200px|Many modern applications can render a substantial subset of the many [[scripts in Unicode]], as demonstrated by this screenshot from the [[OpenOffice.org]] application.]]<!-- screenshot fair use rationale: this screenshot is used specifically to illustrate the Unicode-related capabilities of modern desktop applications and the breadth of supported Unicode scripts -->
{{As of|September 2024}}, a total of 168<ref>{{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=16 September 2022 |website=Unicode}}</ref>
=== Proposals for adding scripts ===
The Unicode Roadmap Committee ([[Michael Everson]], Rick McGowan, Ken Whistler, V.S. Umamaheswaran)<ref>{{Cite web |title=Roadmap to the BMP |url=https://www.unicode.org/roadmaps/bmp/ |access-date=30 July 2018 |publisher=[[Unicode Consortium]]}}</ref> maintain the list of scripts that are candidates or potential candidates for encoding and their tentative code block assignments on the Unicode Roadmap<ref>{{Cite web|url=https://www.unicode.org/roadmaps/|title=Roadmaps to Unicode|website=Unicode |url-status=live |archive-url= https://web.archive.org/web/20231208091250/http://www.unicode.org/roadmaps/ |archive-date= Dec 8, 2023 }}</ref> page of the [[Unicode Consortium]] website. For some scripts on the Roadmap, such as [[Jurchen script|Jurchen]] and [[Khitan large script]], encoding proposals have been made and they are working their way through the approval process. For other scripts, such as [[Numidian language|Numidian]] and [[Rongorongo]], no proposal has yet been made, and they await agreement on character repertoire and other details from the user communities involved.
Line 84 ⟶ 85:
There is also a [[Medieval Unicode Font Initiative]] focused on special Latin medieval characters. Part of these proposals has been already included in Unicode.
The Script Encoding Initiative (SEI),<ref>{{Cite web|url=https://sei.berkeley.edu/ |title=Script Encoding Initiative |website=Script Encoding Initiative |url-status=live |archive-url=https://web.archive.org/web/20230325131114/https://linguistics.berkeley.edu/sei/ |archive-date= Mar 25, 2023 }}</ref> a project created by Deborah Anderson at the [[University of California, Berkeley]], was founded in 2002 with the goal of funding proposals for scripts not yet encoded in the standard. Now run by Anushah Hossain, SEI has become a major source of proposed additions to the standard in recent years.<ref>{{Cite web |title=About The Script Encoding Initiative |url=https://www.unicode.org/pending/about-sei.html |access-date=4 June 2012 |publisher=The Unicode Consortium}}</ref> Although SEI collaborates with the Unicode Consortium and the ISO/IEC 10646 standards process, it operates independently, supporting the technical, linguistic, and historical research needed to prepare formal proposals. SEI maintains a database of scripts that have yet to be encoded in the Unicode Standard on the project's website.<ref>{{Cite web |title=Scripts to Encode |url=https://sei.berkeley.edu/scripts-to-encode/ }}</ref>
Line 845 ⟶ 846:
=== Security<span class="anchor" id="Security issues"></span> ===
Unicode has a large number of [[homoglyphs]], many of which look very similar or identical to ASCII letters. Substitution of these can make an identifier or URL that looks correct, but directs to a different ___location than expected.<ref>{{Cite web |title=UTR #36: Unicode Security Considerations |url=https://unicode.org/reports/tr36/ |website=Unicode}}</ref> Additionally, homoglyphs can also be used for manipulating the output of [[NLP (computer science)|natural language processing (NLP)]] systems.<ref>{{Cite book |last1=Boucher |first1=Nicholas |last2=Shumailov |first2=Ilia |last3=Anderson |first3=Ross |last4=Papernot |first4=Nicolas |title=2022 IEEE Symposium on Security and Privacy (SP) |chapter=Bad Characters: Imperceptible NLP Attacks |year=2022
A security advisory was released in 2021 by two researchers, one from the [[University of Cambridge]] and the other from the [[University of Edinburgh]], in which they assert that the [[Bidirectional Text#Bidirectional text#Explicit formatting|BiDi marks]] can be used to make large sections of code do something different from what they appear to do. The problem was named "[[Trojan Source]]".<ref>{{Cite web |first1=Nicholas |last1=Boucher |first2=Ross |last2=Anderson |title=Trojan Source: Invisible Vulnerabilities |url=https://www.trojansource.codes/trojan-source.pdf |access-date=2 November 2021}}</ref> In response, code editors started highlighting marks to indicate forced text-direction changes.<ref>{{Cite web |title=Visual Studio Code October 2021 |url=https://code.visualstudio.com/updates/v1_62#_unicode-directional-formatting-characters |access-date=11 November 2021 |website=code.visualstudio.com |language=en}}</ref>
|