<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://zakuarbor.codeberg.page/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://zakuarbor.codeberg.page/blog/" rel="alternate" type="text/html" /><updated>2025-08-09T02:40:54-04:00</updated><id>https://zakuarbor.codeberg.page/blog/feed.xml</id><title type="html">zakuarbor</title><subtitle>A random blog by a random human</subtitle><author><name>Ju Hong Kim</name></author><entry><title type="html">The Issue With Default in Switch Statements with Enums</title><link href="https://zakuarbor.codeberg.page/blog/switch-default-enum/" rel="alternate" type="text/html" title="The Issue With Default in Switch Statements with Enums" /><published>2025-07-05T00:00:00-04:00</published><updated>2025-07-05T00:00:00-04:00</updated><id>https://zakuarbor.codeberg.page/blog/switch-default-enum</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/switch-default-enum/"><![CDATA[<p>Reading the coding standards at a company I recently joined revealed to me the issue with default label within the switch statement and why it’s prohibitted when its being 
used to enumerate through an <code class="language-plaintext highlighter-rouge">enum</code>. <code class="language-plaintext highlighter-rouge">default</code> label is convenient to handle any edge cases and it’s often used to handle errors. However, when working with enums, it is often 
the case that the prpogrammer intends to handle all possible values in the enum. To catch this mishap, programmers would enable <code class="language-plaintext highlighter-rouge">-Wswitch</code> or <code class="language-plaintext highlighter-rouge">-Werror=switch</code> to their compiler.
For instance, let’s suppose I have an enum named <strong>Suit</strong> to represent the different suits in a deck of cards.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="n">Suit</span> <span class="p">{</span>
  <span class="n">Diamonds</span><span class="p">,</span>
  <span class="n">Hearts</span><span class="p">,</span>
  <span class="n">Clubs</span><span class="p">,</span>
  <span class="n">Spades</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Let’s suppose I forget to enumerate through <code class="language-plaintext highlighter-rouge">Spades</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">switch</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">case</span> <span class="n">Diamonds</span><span class="p">:</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Diamonds</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">break</span><span class="p">;</span>
  <span class="k">case</span> <span class="n">Hearts</span><span class="p">:</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Hearts</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">break</span><span class="p">;</span>
  <span class="k">case</span> <span class="n">Clubs</span><span class="p">:</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"Clubs</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
    <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then I’ll get the following warning:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LC_MESSAGES=C gcc -Wswitch /tmp/test.c
/tmp/test.c: In function ‘main’:
/tmp/test.c:12:3: warning: enumeration value ‘Spades’ not handled in switch [-Wswitch]
   12 |   switch(suit) {
      |   ^~~~~~
</code></pre></div></div>

<p><strong>Note:</strong> <code class="language-plaintext highlighter-rouge">LC_MESSAGES=C</code> is just to instruct GCC to default to traditional C English language behavior since my system is in French</p>

<p>Based on GCC Documentation on <a href="https://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Warning-Options.html">Warning Options</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-Wswitch
  Warn whenever a switch statement has an index of enumerated type and lacks a 
  case for one or more of the named codes of that enumeration. 
  (The presence of a default label prevents this warning.) 
  case labels outside the enumeration range also provoke warnings when this 
  option is used. This warning is enabled by -Wall. 
</code></pre></div></div>

<p>Based on the documentation, we should no longer see the warning anymore if we add a <code class="language-plaintext highlighter-rouge">default</code> label:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">switch</span><span class="p">(</span><span class="n">suit</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="n">Diamonds</span><span class="p">:</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Diamonds</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
      <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="n">Hearts</span><span class="p">:</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Hearts</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
      <span class="k">break</span><span class="p">;</span>
    <span class="k">case</span> <span class="n">Clubs</span><span class="p">:</span>
      <span class="n">printf</span><span class="p">(</span><span class="s">"Clubs</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
      <span class="k">break</span><span class="p">;</span>
    <span class="nl">default:</span>
<span class="p">}</span>
</code></pre></div></div>

<p>And as expected, we see no warnings:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LC_MESSAGES=C gcc  -Wswitch /tmp/test.c
$
</code></pre></div></div>

<p>However, I notice a similar warning option in the documentation which will catch this misbehavior even with the <code class="language-plaintext highlighter-rouge">default</code> label:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-Wswitch-enum
    Warn whenever a switch statement has an index of enumerated type and lacks 
    a case for one or more of the named codes of that enumeration. case labels 
    outside the enumeration range also provoke warnings when this option is used. 
</code></pre></div></div>

<p>So regardless if we have a <code class="language-plaintext highlighter-rouge">default</code> label or not:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LC_MESSAGES=C gcc  -Wswitch-enum /tmp/test.c
/tmp/test.c: In function ‘main’:
/tmp/test.c:12:3: warning: enumeration value ‘Spades’ not handled in switch [-Wswitch-enum]
   12 |   switch(suit) {
      |   ^~~~~~
</code></pre></div></div>

<p>On a side note, <code class="language-plaintext highlighter-rouge">-Wall</code> will not catch this misbehavior if a <code class="language-plaintext highlighter-rouge">default</code> is present:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LC_MESSAGES=C gcc  -Wall /tmp/test.c
$ 
</code></pre></div></div>

<p>This is because <code class="language-plaintext highlighter-rouge">-Wall</code> enables most warnings but not all warnings. Based on the documentation, we see that <code class="language-plaintext highlighter-rouge">-Wall</code> enables <code class="language-plaintext highlighter-rouge">-Wswitch</code> instead of <code class="language-plaintext highlighter-rouge">Wswitch-enum</code>.</p>]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="c/c++" /><summary type="html"><![CDATA[Default suppresses warnings that may not be desirable]]></summary></entry><entry><title type="html">MicroBlog 2024 Edition</title><link href="https://zakuarbor.codeberg.page/blog/2025-edition/" rel="alternate" type="text/html" title="MicroBlog 2024 Edition" /><published>2025-02-23T00:00:00-05:00</published><updated>2025-02-23T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/2025-edition</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/2025-edition/"><![CDATA[<p>In 2023, I was fascinated in learning about a revival of the old internet where chaos and nostalgia ensues in neocities which is an attempt to recreate the community of the old internet 
that geocities provided in the past. However, I never progressed much aside from creating an initial introduction page. Almost a year after in the summer of August 2024, I decided to 
take advantage of the amount of free time I have rececntly due to taking a break from school 
to start posting shorter content as a way to write something quick and potentially more personal as this blog site has become 
more of a technical blog rather than a personal blog. Though that doesn’t mean I won’t post random odd blog posts that aren’t technical nor do I guarantee any professional (it is 
the internet and a blog site that I maintain without any sponsorship nor earn anything monetary). Though any potential chaos or out of context content I may post in the future will be 
contained in my neocities site while it lasts which should spare this blog site from any weird oddities for the time being.</p>

<p>While my microblog isn’t exactly short, it is definitely shorter than my typical technical blog posts. You will probably see some parallels between my microblog and the blogs I post 
here. This is because some of the technical aspects of the micro posts are the quick scratch notes that gives an overview of the topic I wish to write about on this blog site.</p>

<p>You can visit my microblog if you are interested: <a href="https://randombits.neocities.org/">Random Bits</a></p>

<h1>Complete List</h1>
<ul>
        <li><a href="#framework">[2024-12-29] New Laptop: Framework 16</a></li>
        <li><a href="#alias-interactive">[2024-12-29] Utilizing Aliases and Interactive Mode to Force Users to Think Twice Before Deleting Files</a></li>
        <li><a href="#small-stack">[2024-12-20] Stack Overflow: The Case of a Small Stack</a></li>
        <li><a href="#jekyll-cache">[2024-12-17] Jekyll Cache Saving the Day</a></li>
        <li><a href="#qnx-community">[2024-11-09] QNX is 'Free' to Use</a></li>
        <li><a href="#email-gpg-signature">[2024-10-08] [Preview] Manually Verifying an Email Signature</a></li>
        <li><a href="#halfwidth-fullwidth">[2024-10-06] [Preview] Half-Width and Full-Width Characters</a></li>
        <li><a href="#scientific-notation">[2024-09-18] Mixing Number and String</a></li>
        <li><a href="#dot-dns">[2024-08-30] `.` At The End of a URL</a></li>
        <li><a href="#split-pdf-even-odd">[2024-08-28] Splitting Pdfs into Even and Odd Pages</a></li>
        <li><a href="#exec-script-loophole">[2024-08-28] Executing Script Loophole</a></li>
        <li><a href="#replace-main">[2024-08-24] Replacing main()</a></li>
        <li><a href="#edit-gifs">[2024-08-18] Editing GIFS and Creating 88x31 Buttons</a></li>
        <li><a href="#multiple-def">[2024-08-10] multiple definition of `variable` ... first defined here</a></li>
        <li><a href="#framework-power">[2024-08-04] Delusional Dream of a OpenPower Framework Laptop</a></li>
        <li><a href="#2024-update">[2024-08-04] 2024 Update</a></li>
</ul>

<hr class="bits-hr" />

<p><a name="framework"></a></p>
<div class="bits">
<h1 class="title">New Laptop: Framework 16</h1>
<div>
<p class="date">December 29, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>

</div>
</div>
<p>Ever since <a href="https://www.youtube.com/watch?v=0rkTgPt3M4k">Linus Tech Tips (LTT)</a> introduced Framework, a repairable and modular laptop, back in 2021, I always wanted one for myself. I always loved the idea of modular electronics 
ever since <a href="https://www.onearmy.earth/project/phonebloks">PhoneBloks</a> introduced their idea of modular phones. Electronics that are modular are usually highly repairable due to the fact that one can easily swap a faulty 
component with a new component instead of going to a repair shop or dumping the phone into the garbage. The appeal of bringing the desktop experience of being able to upgrade various parts such as the CPU, RAM and storage to the laptop 
was very appealing. Electronics of the past were much easier to repair and upgrade but these days laptops are designed to not be easily upgradable such as the use of soldered RAM. 
Laptops are also designed to not be as repairable as it once was with the use of integrating more components into the SoC which allows manufacturers to significantly design a more 
compact and sleeker device. There are lots of benefits of SoC than just compactness, it also can help with power efficiency and speed as it can be optimized to have fast access to both the 
CPU and memory. While there could be engineering reasons to soldered RAM, it is likely to also encourage consumers to purchase a new laptop instead.</p>

<p><img src="../assets/micro/products/framework-parts.png" alt="An image of the Framework laptop" /></p>

<p class="caption">A Framework laptop and its various parts. Source: <a href="https://frame.work/ca/en">Framework</a></p>

<p>The Framework laptop is great but every criticism you have heard about the Framework laptop holds true. Cost is the biggest issue with Framework laptops. As Framework is a small company, it cannot build in scale unlike the other OEMs. You will be paying an extremely hefty price to obtain a modular laptop. You could get a laptop 
from other OEMs with better specs for way less than what Framework offers. The laptop is not suitable for the regular consumers and is way more expensive than a luxurious laptop (aka Macbooks). There are other issues with the Framework 
laptop but I consider this to not be the cost of Framework but rather the cost of modularity. As I mentioned earlier, there are tradeoffs between modularity and integrating everything into 
an SoC. When you are getting a Framework laptop, you are buying the laptop for its modularity and repairability. For instance, when you buy a Framework 16 for instance, you can see the 
outlines of the various sliders around the keyboard and touchpad. In addition, you can clearly see the outlines of each expansion card on the laptop.</p>

<p>On a very positive note, you can swap the expansion cards to fit your needs and for those who care about colors, you can easily swap the colors of the screen bezel and the panels surrounding 
the keyboard such as adding a numpad, swapping the keyboard for an RGB keyboard, or getting an LED matrix panel. The flexibility to change the expansion cards was the biggest appeal 
of the laptop for me as you get to choose which IO ports will be HDMI, USB-As, or USB-Cs (with some restrictions).</p>

<p>I should keep this more brief as this is a microblog … Anyhow, now that I have access to my first dedicated GPU, I can now play video games that isn’t Minesweeper, Solitaire, 
Starcraft (Broodwar) and PC ports of old games like Final Fantasy 7. Ever since players were forced to move onto Counterstrike 2, I was no longer able to play CounterStrike with my old Lenovo Gen 7 X1 Carbon 
laptop. I was surprised by how noisy the laptop can be when playing Counterstrike 2 though that is likely due to my inexperience playing videogames that requires a dedicated GPU (and 
I am playing on a laptop which is probably not the best idea if you want to play videogames). 
Here’s the specs:</p>

<pre class="highlight" style="background-color: #1b1b1b; padding: .5rem; line-height: 1.25em">$ neofetch
<font color="#2A7BDE"><b>             .&apos;,;::::;,&apos;.</b></font>                <font color="#2A7BDE"><b>zaku</b></font>@<font color="#2A7BDE"><b>fedora</b></font> 
<font color="#2A7BDE"><b>         .&apos;;:cccccccccccc:;,.</b></font>            ----------- 
<font color="#2A7BDE"><b>      .;cccccccccccccccccccccc;.</b></font>         <font color="#2A7BDE"><b>OS</b></font>: Fedora Linux 40 (Workstation Edition) x86_64 
<font color="#2A7BDE"><b>    .:cccccccccccccccccccccccccc:.</b></font>       <font color="#2A7BDE"><b>Host</b></font>: Laptop 16 (AMD Ryzen 7040 Series) AJ 
<font color="#2A7BDE"><b>  .;ccccccccccccc;</b></font><b>.:dddl:.</b><font color="#2A7BDE"><b>;ccccccc;.</b></font>     <font color="#2A7BDE"><b>Kernel</b></font>: 6.11.4-201.fc40.x86_64 
<font color="#2A7BDE"><b> .:ccccccccccccc;</b></font><b>OWMKOOXMWd</b><font color="#2A7BDE"><b>;ccccccc:.</b></font>    <font color="#2A7BDE"><b>Uptime</b></font>: 5 hours, 46 mins 
<font color="#2A7BDE"><b>.:ccccccccccccc;</b></font><b>KMMc</b><font color="#2A7BDE"><b>;cc;</b></font><b>xMMc</b><font color="#2A7BDE"><b>:ccccccc:.</b></font>   <font color="#2A7BDE"><b>Packages</b></font>: 2254 (rpm), 12 (flatpak) 
<font color="#2A7BDE"><b>,cccccccccccccc;</b></font><b>MMM.</b><font color="#2A7BDE"><b>;cc;</b></font><b>;WW:</b><font color="#2A7BDE"><b>:cccccccc,</b></font>   <font color="#2A7BDE"><b>Shell</b></font>: bash 5.2.26 
<font color="#2A7BDE"><b>:cccccccccccccc;</b></font><b>MMM.</b><font color="#2A7BDE"><b>;cccccccccccccccc:</b></font>   <font color="#2A7BDE"><b>Resolution</b></font>: 1920x1080 
<font color="#2A7BDE"><b>:ccccccc;</b></font><b>oxOOOo</b><font color="#2A7BDE"><b>;</b></font><b>MMM0OOk.</b><font color="#2A7BDE"><b>;cccccccccccc:</b></font>   <font color="#2A7BDE"><b>DE</b></font>: GNOME 46.6 
<font color="#2A7BDE"><b>cccccc:</b></font><b>0MMKxdd:</b><font color="#2A7BDE"><b>;</b></font><b>MMMkddc.</b><font color="#2A7BDE"><b>;cccccccccccc;</b></font>   <font color="#2A7BDE"><b>WM</b></font>: Mutter 
<font color="#2A7BDE"><b>ccccc:</b></font><b>XM0&apos;</b><font color="#2A7BDE"><b>;cccc;</b></font><b>MMM.</b><font color="#2A7BDE"><b>;cccccccccccccccc&apos;</b></font>   <font color="#2A7BDE"><b>WM Theme</b></font>: Adwaita 
<font color="#2A7BDE"><b>ccccc;</b></font><b>MMo</b><font color="#2A7BDE"><b>;ccccc;</b></font><b>MMW.</b><font color="#2A7BDE"><b>;ccccccccccccccc;</b></font>    <font color="#2A7BDE"><b>Theme</b></font>: Adwaita [GTK2/3] 
<font color="#2A7BDE"><b>ccccc;</b></font><b>0MNc.</b><font color="#2A7BDE"><b>ccc</b></font><b>.xMMd</b><font color="#2A7BDE"><b>:ccccccccccccccc;</b></font>     <font color="#2A7BDE"><b>Icons</b></font>: Adwaita [GTK2/3] 
<font color="#2A7BDE"><b>cccccc;</b></font><b>dNMWXXXWM0:</b><font color="#2A7BDE"><b>:cccccccccccccc:,</b></font>      <font color="#2A7BDE"><b>Terminal</b></font>: gnome-terminal 
<font color="#2A7BDE"><b>cccccccc;</b></font><b>.:odl:.</b><font color="#2A7BDE"><b>;cccccccccccccc:,.</b></font>       <font color="#2A7BDE"><b>CPU</b></font>: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics (16) @ 5.263GHz 
<font color="#2A7BDE"><b>:cccccccccccccccccccccccccccc:&apos;.</b></font>         <font color="#2A7BDE"><b>GPU</b></font>: AMD ATI c4:00.0 Phoenix1 
<font color="#2A7BDE"><b>.:cccccccccccccccccccccc:;,..</b></font>            <font color="#2A7BDE"><b>GPU</b></font>: AMD ATI Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600 
<font color="#2A7BDE"><b>  &apos;::cccccccccccccc::;,.</b></font>                 <font color="#2A7BDE"><b>Memory</b></font>: 7192MiB / 31386MiB 
</pre>

<p>On OpenBlender Benchmark:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>monster: 130.805407
junkshop: 85.742239
classroom:64.374681
</code></pre></div></div>

<p>Which is significantly better than what my X1 Carbon achieved (where higher numbers are better).</p>

</div>

<hr class="bits-hr" />

<p><a name="alias-interactive"></a></p>
<div class="bits">
<h1 class="title">Utilizing Aliases and Interactive Mode to Force Users to Think Twice Before Deleting Files</h1>
<div>
<p class="date">December 29, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#linux">linux</a>
</div>
</div>
<p>I previously mentioned that <a href="./jekyll-cache">I lost my file</a> by accidentally overwriting my file using the <code class="language-plaintext highlighter-rouge">cp</code> command. This got me thinking as to why this would be impossible on 
my work laptop since I would be constantly bombarded with a prompt to confirm my intention to overwrite the file.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cp 2024-12-01-template.md 2024-12-30-alias-interactive.md
cp: overwrite '2024-12-30-alias-interactive.md'?
</code></pre></div></div>

<p>Commands like <code class="language-plaintext highlighter-rouge">mv</code> and <code class="language-plaintext highlighter-rouge">cp</code> have an <strong>interactive</strong> flag <code class="language-plaintext highlighter-rouge">-i</code> to prompt before overwriting the file. As seen in <code class="language-plaintext highlighter-rouge">man 1 cp</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-i, --interactive
              prompt before overwrite (overrides a previous -n option)
</code></pre></div></div>

<p>To force everyone at work to have this flag enabled, they made <code class="language-plaintext highlighter-rouge">cp</code> and <code class="language-plaintext highlighter-rouge">mv</code> an alias in our default shell configs:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">alias cp</span><span class="o">=</span><span class="s2">"cp -i"</span>
<span class="nb">alias mv</span><span class="o">=</span><span class="s2">"mv -i"</span>
</code></pre></div></div>

<p>Which you can also verify using the <code class="language-plaintext highlighter-rouge">type</code> command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ type cp
cp is aliased to `cp -i'
$ type mv
mv is aliased to `mv -i'
</code></pre></div></div>

</div>

<hr class="bits-hr" />

<p><a name="small-stack"></a></p>
<div class="bits">
<h1 class="title">Stack Overflow: The Case of a Small Stack</h1>
<div>
<p class="date">December 20, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#stack">stack</a>
&nbsp;

<a href="/categories/#qnx">qnx</a>
&nbsp;

<a href="/categories/#C/C++">C/C++</a>


</div>
</div>
<p>Years ago I was once asked by an intern to debug a mysterious crash that seemed so innocent. While I no longer recall what the code was about, we stripped the program to a single line in 
<code class="language-plaintext highlighter-rouge">main</code>. Yet the program still continued to crash.</p>

<p><strong>Source:</strong></p>

<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>

<p><strong>Result:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ./prog-arm64 

Process 630803 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=00000025333267f0 mapaddr=00000000000007f0 ref=000000443dd5dc50
Memory fault (core dumped) 
</code></pre></div></div>

<p>This bewildered all of the interns as it made absolutely no sense. Through our investigation, there was two things we noticed:</p>
<ol>
  <li>The program worked on our local machines but not on our target virtual machine</li>
  <li>We were allocating an extremely large buffer in the stack which was unusual</li>
</ol>

<p>It turns out the intern wanted to allocate a 1MiB buffer for some networking or driver related ticket. If I recall correctly, our target 
only had 512MB RAM so this could have explained the mysterious crash. But even 1MiB buffer on the stack was too large for our target:</p>

<p><strong>Source:</strong></p>

<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>

<p><strong>Result:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ./prog-arm64 

Process 696339 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=0000004de7e7a7ec mapaddr=00000000000007ec ref=000000383b19fbe0
Memory fault (core dumped) 
</code></pre></div></div>

<p>One thing I purposely omitted was that our target was running QNX, a realtime operating system. If we were to take a look at the documentation:</p>
<blockquote>
  <p>A process’s main thread starts with an automatically allocated 512 KB stack
– <a href="https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino.prog/topic/process_stack.html">QNX SDP 8.0 - Stack Allocation</a></p>
</blockquote>

<p>This shocked all of us since 1 MiB is not a large buffer in 2021 where we had plenty of memory on our own personal system at home.</p>

<p><strong>Note 1:</strong> The target used in the example was an aarch64le. This example will work on amd64 (x86_64) but requires you to add something such as a print statement</p>

<p><strong>Note 2:</strong> QNX 8.0 was released to the general public in late 2023 or early 2024 so the actual target at the time when the question was asked was running either QNX 7.0 or QNX 7.1 (I do not recall which version)</p>

<p>The behavior for AMD64 (x86_64) as noted requires more fiddling to trigger a crash which came to my surprise. A slightly more detailed version will be released shortly on my <a href="https://zakuarbor.codeberg.page/blog/small-stack/">blog</a> 
which will include a very brief reason as to why AMD64 doesn’t crash if nothing extra is added like a call to <code class="language-plaintext highlighter-rouge">puts</code>.</p>

</div>

<hr class="bits-hr" />

<p><a name="jekyll-cache"></a></p>
<div class="bits">
<h1 class="title">Jekyll Cache Saving the Day</h1>
<div>
<p class="date">December 17, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#jekyll">jekyll</a>
&nbsp;

<a href="/categories/#cache">cache</a>


</div>
</div>
<p>I was in the midst of publishing a post on announcing that QNX released a non-commercial license which allows hobbyist to fiddle around 
but I accidentally deleted my file using the <code class="language-plaintext highlighter-rouge">cp</code> command. This effectively killed my mood as I did not want to rewrite everything 
from scratch. I then recall that Jekyll creates a cache to speed up the build process when converting markdown to HTML.</p>

<pre class="highlight" style="background-color: #1b1b1b; padding: .5rem; line-height: 1.25em">$ ls -ld .?* 
drwxr-xr-x. 1 zaku zaku 204 Dec 16 23:47 <font color="#268BD2"><b>.git</b></font>
-rw-r--r--. 1 zaku zaku   0 Oct 20 19:55 .gitignore
drwxr-xr-x. 1 zaku zaku  32 Oct 20 19:56 <font color="#268BD2"><b>.jekyll-cache</b></font>
</pre>

<p>If we were to traverse into the cache and into <code class="language-plaintext highlighter-rouge">Jekyll-Converters--Markdown</code>, you’ll see a lot of directories labelled what it appears to be in hex:</p>
<pre class="highlight" style="background-color: #1b1b1b; padding: .5rem; line-height: 1.25em"><font color="#859900"><b>.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown</b></font>$ ls
<font color="#268BD2"><b>0e</b></font>  <font color="#268BD2"><b>1c</b></font>  <font color="#268BD2"><b>22</b></font>  <font color="#268BD2"><b>24</b></font>  <font color="#268BD2"><b>2e</b></font>  <font color="#268BD2"><b>37</b></font>  <font color="#268BD2"><b>3f</b></font>  <font color="#268BD2"><b>44</b></font>  <font color="#268BD2"><b>47</b></font>  <font color="#268BD2"><b>53</b></font>  <font color="#268BD2"><b>57</b></font>  <font color="#268BD2"><b>5d</b></font>  <font color="#268BD2"><b>62</b></font>  <font color="#268BD2"><b>66</b></font>  <font color="#268BD2"><b>6e</b></font>  <font color="#268BD2"><b>74</b></font>  <font color="#268BD2"><b>7b</b></font>  <font color="#268BD2"><b>84</b></font>  <font color="#268BD2"><b>8d</b></font>  <font color="#268BD2"><b>90</b></font>  <font color="#268BD2"><b>91</b></font>  <font color="#268BD2"><b>9c</b></font>  <font color="#268BD2"><b>a7</b></font>  <font color="#268BD2"><b>a9</b></font>  <font color="#268BD2"><b>aa</b></font>  <font color="#268BD2"><b>ab</b></font>  <font color="#268BD2"><b>b1</b></font>  <font color="#268BD2"><b>b3</b></font>  <font color="#268BD2"><b>b6</b></font>  <font color="#268BD2"><b>c1</b></font>  <font color="#268BD2"><b>c6</b></font>  <font color="#268BD2"><b>cb</b></font>  <font color="#268BD2"><b>d4</b></font>  <font color="#268BD2"><b>d5</b></font>  <font color="#268BD2"><b>e1</b></font>  <font color="#268BD2"><b>e2</b></font>  <font color="#268BD2"><b>ea</b></font>  <font color="#268BD2"><b>f9</b></font>  <font color="#268BD2"><b>fc</b></font></pre>

<p>Using my trust tool <code class="language-plaintext highlighter-rouge">grep</code>, I was able to patch up pieces of my work. However, as the purpose of <code class="language-plaintext highlighter-rouge">Jekyll-Converters--Markdown</code> is to 
cache markdown files that have been converted to HTML, I obviously had to clean it up a bit but regardless, it was much faster than 
to rewrite the entire article.</p>

</div>

<hr class="bits-hr" />

<p><a name="qnx-community"></a></p>
<div class="bits">
<h1 class="title">QNX is 'Free' to Use</h1>
<div>
<p class="date">November  9, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#qnx">qnx</a>


</div>
</div>
<p>Recently on Hackernews, a relations developer from QNX announced that <a href="https://news.ycombinator.com/item?id=42079460">QNX is now free for anything non-commercial</a>. QNX also made an annoncement
to the LinkedIn Community as well which was where I learned about it.
For those who are not familiar with QNX, QNX is a properiety realtime operating system targetted for embedded systems and is installed in over 255 million vehicles.
QNX has a great reputation for being reliable and safe embedded system to build software on top of due to its microarchitecture and compliance to many industrial and engineering design process
which gives customers the ability to certify their software in safety critical systems more easily. What makes QNX appealing is a discussion on another time but for me, this is a good
opportunity to fiddle around with the system. I was <a href="https://zakuarbor.codeberg.page/blog/carletonu-qnx-license/">previously denied a license</a> from my university who had an agreement with QNX and
my attempts to get an educational license did not go far years ago.</p>

<p><img src="../assets/micro/products/qnx/announcement-linkedin.png" alt="LinkedIn Post announcing QNX 8.0 has a non-commercial license" /></p>

<p>Previously to gain access to QNX, one would have to either purchase a commericial license from QNX or have an academic license. This made hobbyists from having access to the operating system.
With the non-commericial license, QNX is now open for those who are interested in running a RTOS in their hobby projects and for open source developers to port their software on QNX. QNX is a
POSIX compliant software but as QNX was not open for public use, companies had to port open source projects into QNX such as ROS (Robotics Operating System which isn’t an actual OS). QNX
also mentions the non-commercial license allows one to develop training materials and books on utilizing QNX which is frankly scarce outside of QNX authorized materials (i.e. QNX training, Foundary27, and
QNX Documentation).</p>

<p><img src="../assets/micro/products/qnx/non-commercial-lic.png" alt="A sample of what is allowed with a non-commercial license" /></p>

<p>While the announcement is welcoming news for me who would love to tinker around, this is yet another product entering the hobbyist community late. The reason for the success of UNIX, Linux, RISCV, and ARM is the ease and
availability of the product to hobbyists and students who later bring this to their workplace or make the product better. Closing access to technology is a receipe for disaster in the long-term in terms of
gaining market advantage. This is exactly the reason why we see cloud corporations enticing either the student or the hobbyist population to have free (limited) access to their products and even at times
sponsor events targeted towards them. Linux, BSD, and FreeRTOS being open source makes them the dominant OS among the tinkering community and have wide adoption in the market. Over the years, we have seen a
shift from customers using commercial and custom grade hardware and software towards more open source or off the shelf solutions including on critical safety applications such as those on SpaceX using Linux and
non radiation hardened CPUs. IBM for instance has been late to developing an ecosystem of developers for their Cloud, Database and Power Architecture. IBM over the recent years has done a good job in creating free
developer focused trainings which tries to make use of their own technologies. However, it is plain obvious that IBM has failed to capture mainstream interest of hobbyists who much prefer other cloud providers such as
AWS, Google Cloud, Linode, and Digital Ocean. The SPARC and POWER architectures were open-source far too late by their own respective owners that developers have shifted towards RISCV and ARM as those architectures
are either more open or easier to obtain (such as through Raspberry Pi Foundation).</p>

<p>While I have not done any sentimental analysis of this announcement, I think overall this move is a good first step to develop an ecosystem of developers who appreciate and understand the QNX architecture but is also
met with sketpicism. For reference, QNX has messed with the community twice before which explains the big mistrust from experienced developers. The top comment on <a href="https://news.ycombinator.com/item?id=42079460">Hackernews</a>
does a great job summarizing the sketpicism. QNX used to have a bigger hobbyist community in the past where open source projects such as Firefox would have a build for QNX, but that all died when QNX closed their doors
to the community. Years later, QNX source code was available for the public to read (though probably with restrictions) but later shut the source code availability after being acquired by Blackberry who does not have the
best reputation to the developer community (hence why Blackberry Phones failed to capture the market from my understanding despite once being a market leader).</p>

<p>Regardless, I have plans to create a few materials on QNX in the coming months and perhaps create a follow up to <a href="https://zakuarbor.codeberg.page/blog/qnx-aps/">QNX Adapative Partitioning System</a> as it seemed to have gained enough
has been ranked top 5 on Google search results (though I doubt it had many readers due to the population of QNX developers):</p>

<p><img src="../assets/micro/products/qnx/aps-search-results.png" alt="Google Search Result Ranking for my QNX APS webpage" /></p>
<p class="caption">Google Search Console from July 9 2023 - Nov 8 2024 which had 308 clicks</p>


</div>

<hr class="bits-hr" />

<p><a name="email-gpg-signature"></a></p>
<div class="bits">
<h1 class="title">[Preview] Manually Verifying an Email Signature</h1>
<div>
<p class="date">October  8, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#gpg">gpg</a>
&nbsp;

<a href="/categories/#signing">signing</a>


</div>
</div>
<p>I noticed that the neocities community love using protonmail and some even share their public key to enable full encryption communication. 
While I care about cyber security more than the average human, I do not care enough to start requiring others to start encrypting their 
email and sign their messages so that I can verify the authenticity of the messages I receieve.</p>

<p>Out of curiosity, I decided to see how one would manually verify the signature of an email to ensure that the email has not been tampered with 
and comes from the person who it claims to be. I won’t go into how digital signatures work as those details will be posted shortly after at 
my <a href="https://zakuarbor.codeberg.page/blog/signature-verification/">blog</a>.</p>

<ol>
  <li>Import Alice’s public key:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --import publickey-alice@proton.me.asc 
gpg: key &lt;redacted&gt;: public key "alice@proton.me &lt;alice@proton.me&gt;" imported
gpg: Total number processed: 1
gpg:               imported: 1
</code></pre></div>    </div>
  </li>
  <li>Download the email <code class="language-plaintext highlighter-rouge">.eml</code> file and the signature
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls signature.asc 'GPG Signing test.eml'
'GPG Signing test.eml'   signature.asc
</code></pre></div>    </div>
  </li>
  <li>
    <p>Extract the message to verify from <code class="language-plaintext highlighter-rouge">.eml</code> file</p>

    <p>This is where things get difficult. The downloaded email <code class="language-plaintext highlighter-rouge">*.eml</code> has a lots of unnedded information that needs to be discarded. 
 I highly recommend that you make a copy of the email file because it does take a while to get used to.</p>

    <p>The content of the message starts after you see the following header (the hash will differ):</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --------7005887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69bc70a
</code></pre></div>    </div>

    <p>So for instance, let’s look at the following file:</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> MIME-Version: 1.0
 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha512; boundary="------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415"; charset=utf-8

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
 Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592
</code></pre></div>    </div>

    <p>Then the first line of the signed message is:</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592
</code></pre></div>    </div>

    <p>Where the signed message ends is a scene of confusion. On the internet, there are many that says you to put everything between the first boundary and the second boundary into 
 a new file. The boundary they are referring to is the line after <code class="language-plaintext highlighter-rouge">This is an OpenPGP/MIME signed message (RFC 4880 and 3156)</code> which has the form <code class="language-plaintext highlighter-rouge">----&lt;hash&gt;</code>.</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415

 //email content

 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
</code></pre></div>    </div>

    <p>Despite my many attempts, I had no success till I realized you have to delete all trailing new lines. One thing I notice is that the hash on the first line of the signed message
 is also the last line in the signed message.</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592
</code></pre></div>    </div>
    <p class="caption"> The first line of the signed file</p>

    <p>The hash on the first line of the signed message is: <code class="language-plaintext highlighter-rouge">ff35159c3ebf11234dd954191b3141592</code> so our file should also end with this hash.</p>

    <p>If our message looks something like this:</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> MIME-Version: 1.0
 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha512; boundary="------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415"; charset=utf-8

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
 Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592
 Content-Type: application/pgp-keys; filename="publickey - alice@proton.me - &lt;redacted&gt;.asc"; name="publickey-alice@proton.me.asc"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="publickey-alice@proton.me.asc"; name="publickey - alice@proton.me - &lt;redacted&gt;.asc"

 ABCDEF0x4ZjZkeGxSL0xUABCDEFmltotlUR0ABCDEFWaABCDEFE9PQP9ABCDEFAABCDEFtLUVORCBABCED
 ABCDEFEABCDEFFWSBCTE9DSy0tLABCDE==
 -----------------------ff35159c3ebf11234dd954191b3141592--

 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
</code></pre></div>    </div>

    <p>Then the signed message should be</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592
 Content-Type: application/pgp-keys; filename="publickey - alice@proton.me - &lt;redacted&gt;.asc"; name="publickey-alice@proton.me.asc"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="publickey-alice@proton.me.asc"; name="publickey - alice@proton.me - &lt;redacted&gt;.asc"

 ABCDEF0x4ZjZkeGxSL0xUABCDEFmltotlUR0ABCDEFWaABCDEFE9PQP9ABCDEFAABCDEFtLUVORCBABCED
 ABCDEFEABCDEFFWSBCTE9DSy0tLABCDE==
 -----------------------ff35159c3ebf11234dd954191b3141592--
</code></pre></div>    </div>
  </li>
  <li>
    <p>Verify the signature: <code class="language-plaintext highlighter-rouge">gpg --verify signature.asc message.txt</code></p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> $ gpg --verify signature.asc message.txt 
 gpg: Signature made Mon 07 Oct 2024 11:29:48 PM EDT
 gpg:                using EDDSA key &lt;redacted&gt;
 gpg: Good signature from "alice@proton.me &lt;alice@proton.me&gt;" [unknown]
 gpg: WARNING: This key is not certified with a trusted signature!
 gpg:          There is no indication that the signature belongs to the owner.
 Primary key fingerprint: &lt;redacted&gt;
</code></pre></div>    </div>
  </li>
</ol>

<p>In practice, no one verifies the digital signatures of emails manually. Any sane individual will utilize any email client that would automate the verification process for 
them. This was a quick preview of a <a href="https://zakuarbor.codeberg.page/blog/signature-verification/">blog post</a> I will be writing in the next few days that will go into email signatures in more details 
with better explanations and diagrams.</p>

</div>

<hr class="bits-hr" />

<p><a name="halfwidth-fullwidth"></a></p>
<div class="bits">
<h1 class="title">[Preview] Half-Width and Full-Width Characters</h1>
<div>
<p class="date">October  6, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#encoding">encoding</a>


</div>
</div>
<p>Those of us who live and speak English will probably never think about how characters are encoded which is how characters such as the 
very letters you see in the screen are represented by being given some number like 65 for ‘A’ in ASCII which takes 1 byte to be represented 
such as a <code class="language-plaintext highlighter-rouge">char</code> in C.</p>

<p>I was not aware of the existence of full-width and half-width characters till the friend asked me to briefly explain the highlevel 
information about the difference in representing the characters. For those like me who weren’t aware that the Japanese mix between 
zenkaku (full-width) and hankaku (half-width) characters, look at the image below or visit this webpage: <a href="https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained">https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained</a></p>

<p><img src="https://images.ctfassets.net/rrofptqvevic/3276rMt8nR8HEVYYAhhZvV/633c276e889c8dd101c4ea89cc07f82d/image_-_2023-07-21T105935.292.webp" alt="An image displaying the difference between full and half-width characters" /></p>

<p>Based on the article I shared, half-width characters takes up 1 byte while full-width characters takes up 2 bytes (also can be called double byte character). 
I do believe this depends on 
the encoding used. For me, the most obvious distinction between half and full width characters is how much graphical space it consumes as 
evident from both the image above and below:</p>

<p><img src="https://zakuarbor.codeberg.page/blog../assets/micro/programming/encoding/full-half-width.png" alt="Full and Half Width Characters encoded on UTF-8" /></p>
<p class="caption">Full and Half Width encoded on UTF-8 as seen through Vim</p>

<p>While I have read and typed Korean during my younger years when I was forced to learn Korean, it never clicked to me how much space Korean 
takes up graphically. It is obvious in hindsight but it was nonetheless interesting. Taking a look at the size and bytes encoding, we can 
see that number <code class="language-plaintext highlighter-rouge">1</code> in UTF-8 encoding takes 1 and 3 bytes for half-width and full-width character repsectively</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ stat -c "%n,%s" -- halfwidth-utf8.txt fullwidth-utf8.txt 
halfwidth-utf8.txt,1
fullwidth-utf8.txt,3
</code></pre></div></div>

<p>One confusion I had was understanding what the difference between UTF-8 and UTF-16 and the following excercise helped me understand this:</p>
<ul>
  <li>UTF-8 encodes each character between 1-4 bytes</li>
  <li>UTF-16 encodes each characters between 2-4 bytes</li>
</ul>

<p>UTF-8 and UTF-16 as you can tell are variable length meaning they take up more or less bytes depending on the character being encoded. We can 
see this by comparing the number <code class="language-plaintext highlighter-rouge">1</code> arabic numeral v.s. <code class="language-plaintext highlighter-rouge">一</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ stat -c "%n,%s" -- halfwidth-1.txt chinese-1.md 
halfwidth-1.txt,1
chinese-1.md,3
</code></pre></div></div>

<p>In UTF-8, <code class="language-plaintext highlighter-rouge">1</code> takes up 1 byte which is unsurprising as ASCII has great advantage in UTF-8 compared to other Asian languages.</p>

<p><strong>Note:</strong> Do not attempt to display UTF-16 encoded files on the terminal without changing your locale (or whatever it is called). It will not display nicely. Vim on my machine will automatically open the file as UTF-16LE.</p>

<p><img src="https://zakuarbor.codeberg.page/blog../assets/micro/programming/encoding/chinese-garbage.png" alt="My default terminal settings is unable to display the content in Chinese properly" /></p>

<p>Let’s inspect the contents of the files between Half character <code class="language-plaintext highlighter-rouge">1</code> and Full Byte Character <code class="language-plaintext highlighter-rouge">１</code> in HEX:</p>
<pre class="highlight" style="background-color: #1b1b1b; padding: .5rem; line-height: 1.25em"><font color="#D0CFCC"><b>$ </b></font>cat halfwidth-1.txt; echo &quot;&quot;; xxd halfwidth-1.txt; cat fullwidth-1.txt ; echo &quot;&quot;; xxd fullwidth-1.txt 
1
00000000: <font color="#26A269"><b>31</b></font>                      <font color="#C01C28"><b>               </b></font>  <font color="#26A269"><b>1</b></font>
１
00000000: <font color="#C01C28"><b>efbc</b></font> <font color="#C01C28"><b>91</b></font>                   <font color="#C01C28"><b>             </b></font>  <font color="#C01C28"><b>...</b></font>
</pre>

<p>As we can see, the half-width character <code class="language-plaintext highlighter-rouge">1</code> in UTF-8 is represented as <code class="language-plaintext highlighter-rouge">0x31</code> meaning only one byte would be required. However, a full-width 
digit <code class="language-plaintext highlighter-rouge">１</code> is represented as <code class="language-plaintext highlighter-rouge">0xEFBC91</code>. Now let’s compared this with UTF-16:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat halfwidth-utf16.txt; echo ; xxd halfwidth-utf16.txt; cat fullwidth-utf16.txt; echo; xxd fullwidth-utf16.txt 
1
00000000: 0031                                     .1
�
00000000: ff11                                     ..
</code></pre></div></div>

<p><strong>Note:</strong> To view UTF-16 on VIM run on command mode (i.e. press <code class="language-plaintext highlighter-rouge">esc</code> to exit current mode and press <code class="language-plaintext highlighter-rouge">:</code> to enter command mode): <code class="language-plaintext highlighter-rouge">e ++enc=utf-16be fullwidth-utf16.txt</code></p>

<p>As expected, UTF-16 represents code points in the upper range very well where we now see <code class="language-plaintext highlighter-rouge">１</code> (full-width 1) being represented with only 2 bytes unlike the 3 that was required in UTF-8. 
Though the same cannot be said for code points in the lower range such as our half-width digit 	<code class="language-plaintext highlighter-rouge">1</code> which now takes 2 bytes by appending <code class="language-plaintext highlighter-rouge">0x00</code> to its hex representation.</p>

<p>I will be writing a more detailed look into encoding at my <a href="https://zakuarbor.codeberg.page/blog/halfwidth-fullwidth-encoding/">blog</a> in the coming days. This is just a quick preview.</p>

</div>

<hr class="bits-hr" />

<p><a name="scientific-notation"></a></p>
<div class="bits">
<h1 class="title">Mixing Number and String</h1>
<div>
<p class="date">September 18, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#programming">programming</a>


</div>
</div>
<p>A <a href="https://tmendez.dev/posts/rng-git-hash-bug/">recent post</a> has gotten somewhat popular on the web and is something many of us could
somewhat relate with. In the case of many including the author, the issue stems from how YAML treats strings and numbers. As a rule of
thumb, I would always suggest avoiding any potential confusion by always adding the quotes around a string to ensure the value is treated
as a string as intended. The crux of the post was how their Git commit inconveniently happened to be <code class="language-plaintext highlighter-rouge">556474e378</code> which is very rare 
to obtain. Recall that scientific notation is in the form of <code class="language-plaintext highlighter-rouge">\d+(\.\d+)?E-?\d+</code> such as 8.5E-10 to refer to 8.5 x 10<sup>-10</sup>.
The issue that one may encounter when mixing numbers and strings is that things can go very unexpected like the author did whereby
<code class="language-plaintext highlighter-rouge">556474e378</code> was treated as 556474 x 10<sup>378</sup>. While I do not have any specific examples in mind when I have encountered such issues, 
I definitely have encountered this issue before where I mixed up a string and a number and obtained an undesired behavior. However, 
I do not think I ever encountered an issue where my numbers were interpreted as scientific notations.</p>

</div>

<hr class="bits-hr" />

<p><a name="dot-dns"></a></p>
<div class="bits">
<h1 class="title">`.` At The End of a URL</h1>
<div>
<p class="date">August 30, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#dns">dns</a>
&nbsp;

<a href="/categories/#network">network</a>


</div>
</div>
<p>I recently learned that websites can be terminated with a <code class="language-plaintext highlighter-rouge">.</code> such as <a href="www.google.com."><code class="language-plaintext highlighter-rouge">www.google.com.</code></a> or <a href="https://neocities.org."><code class="language-plaintext highlighter-rouge">https://neocities.org.</code></a>. 
However, this does not work <a href="https://jvns.ca/blog/2022/09/12/why-do-domain-names-end-with-a-dot-/">for all websites</a>. I was skimming through <em>Network for Dummies</em> 
during work and while it doesn’t cover anything useful for the work I am trying to do (if you have taken a network course before, don’t bother reading this book unless 
you were bored like I was<sup>1</sup>), terminating a website with a <code class="language-plaintext highlighter-rouge">.</code> was a surprise.</p>

<p>The book states that <code class="language-plaintext highlighter-rouge">If a domain name ends with a trailing dot, ..., and the domain name is said to be a fully qualified domain name (FQDN)</code>.
The difference between an absolute name (FQDN) and relative name is important when working with DNS and can cause an “internet outage” if 
done incorrectly as <a href="https://news.ycombinator.com/item?id=32862913">one user on hackernews</a> comments. Based on some <a href="http://www.dns-sd.org/trailingdotsindomainnames.html">article</a> 
(<a href="https://stackoverflow.com/questions/36931853/if-there-exists-a-dot-after-com-is-it-a-valid-url">linked by a stackoverflow user</a>), websites that fail 
to handle <code class="language-plaintext highlighter-rouge">.</code> in their domain names are the ones who are in violation of <a href="http://www.ietf.org/rfc/rfc1738.txt">RFC 1738</a> or at least not heeding 
to its recommendations.</p>

<p><strong>Notes:</strong></p>

<p><sup>1</sup> While Network for Dummies was actually fun to read surprisingly due to the author’s writing style, it lacks technical depth which should come to no surprise.</p>


</div>

<hr class="bits-hr" />

<p><a name="split-pdf-even-odd"></a></p>
<div class="bits">
<h1 class="title">Splitting Pdfs into Even and Odd Pages</h1>
<div>
<p class="date">August 28, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#printer">printer</a>
&nbsp;

<a href="/categories/#pdf">pdf</a>
&nbsp;

<a href="/categories/#utilities">utilities</a>


</div>
</div>
<p>During the winter break I have obtained an old Xerox XE88 Workstation Printer released in the year of 2000, the year where the media were 
worried about Y2K causing havok to our digital infrastructure which never came to the scale we all feared thankfully. Though of course 
<a href="https://en.wikipedia.org/wiki/2024_CrowdStrike_incident">a bug will eventually will creep and wreck havok</a>(i.e. Crowdstrike Falcon Update). 
But I digress, using this printer was filled with frustration as it is a relic from the past that is not meant to be used in 2024. 
Firstly, the printer requires a parallel port which no modern computer comes equip with. I have to drag out my last surviving desktop from my childhood that originally came with 
<a href="https://en.wikipedia.org/wiki/Windows_Me">Windows Me</a> that we immediately switched to the glorious Windows XP that we all know, love and 
dearly miss. As it turns out a few months later after my first use of the printer, my PS/2 connected mouse stopped working. I do not know if 
the <a href="https://en.wikipedia.org/wiki/PS/2_port">PS/2 port</a> is broken or if my PS/2 mouse is broken. I did manage to find another PS/2 mouse but as it was water damaged from a basement 
leak a few years ago, there was little chance it would work. Without a mouse made this task much harder, but I digress.</p>

<div class="dual-image-container">
<img style="width: 300px" src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Mini-Centronics_36_pin_with_Micro-Centronics_36_pin.jpg/1920px-Mini-Centronics_36_pin_with_Micro-Centronics_36_pin.jpg" alt="Parallel Port and connector" />
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/54/PS2_keyboard_and_mouse_jacks.jpg/300px-PS2_keyboard_and_mouse_jacks.jpg" />
<p class="caption">Parallel Port</p>
<p class="caption">PS/2 Port typically found in desktops from the 90s</p>
</div>

<p>Placing aside the hardware struggles to operate such printer in 2024, the printer does not have duplex printing so I had run commands on my 
pdfs on my Linux machine before transferring the files to my Windows XP machine (thankfully there are USB ports on this desktop that work 
or else I would have to dust off my 3.5 inch floppy disks and CDs). To split pdfs into even and odd pages turns out to be a very simple 
command:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">pdftk <span class="nv">A</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nb">cat </span>Aodd output <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">-odd.pdf"</span>
pdftk <span class="nv">A</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nb">cat </span>Aeven output <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">-even.pdf"</span></code></pre></figure>

<p>As I am printing a bunch of papers on <a href="https://en.wikipedia.org/wiki/Trusted_Computing">Trusted Computing</a>, I needed to split a lot of PDFs 
so this task can get quite tedious so I wrote a simple shell script:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="k">for </span>file <span class="k">in</span> <span class="k">*</span>pdf<span class="p">;</span> <span class="k">do
  </span>pdftk <span class="nv">A</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nb">cat </span>Aodd output <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">-odd.pdf"</span>
  pdftk <span class="nv">A</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">"</span> <span class="nb">cat </span>Aeven output <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">-even.pdf"</span>
<span class="k">done</span></code></pre></figure>


</div>

<hr class="bits-hr" />

<p><a name="exec-script-loophole"></a></p>
<div class="bits">
<h1 class="title">Executing Script Loophole</h1>
<div>
<p class="date">August 28, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#script">script</a>
&nbsp;

<a href="/categories/#linux">linux</a>


</div>
</div>
<p>I recently came across an <a href="https://lwn.net/Articles/982085/">article</a> discussing an attempt to close a loophole bypassing the normal 
execution permission bit. Exploiting a program suid and euid to gain high privilige is a commonly known technique called privilege 
escalation. This article does not cover this but it introduces a flaw in the current way Linux handles the execution of scripts. I 
do not know why privilige escalation came to my mind but as I usually write nonesensical things anyways, I shall keep it here for now. The article gives a neat 
example where a script does not have execution bit but is still executable by invoking the script via an interpreter.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls -l evil-script.py 
-rw-r--r--. 1 zaku zaku 86 Aug 28 00:20 evil-script.py
$ ./evil-script.py
bash: ./evil-script.py: Permission denied
$ python3 evil-script.py 
Evil script has been invoked. Terror shall fill this land
</code></pre></div></div>

<p>As you can see, the script has no execute bit set. However, the script is still executable by feeding the script to the interpreter.
I have never considered this a security loophole but after reading the article, I realized there are some concerns of allowing scripts 
to be executable bypassing the file’s permission. I have always made the habit of running many of the interpreted scripts non-executable 
and fed them to the interpreter due to laziness (I know it’s a one time thing to set the execute bit but I am just lazy to run <code class="language-plaintext highlighter-rouge">chmod</code>).</p>

<p>The article covers some promising approaches so I do expect a solution to be merged into the kernel sometime in the near future which will 
force me to change my habits once the interpreters make the change. Though if interpreters do make this patch, I do expect quite a few 
production and CI/CD servers to be impacted as there will always be someone like me who are lazy to set the execute bit on our scripts.</p>

<p>One benefit of closing this loophole is to force users to deliberately make the conscious choice to set the execute bit similar to how we have to 
set the flatpaks we download as executables (at least from my personal experience) before we can execute the flatpaks.</p>

</div>

<hr class="bits-hr" />

<p><a name="replace-main"></a></p>
<div class="bits">
<h1 class="title">Replacing main()</h1>
<div>
<p class="date">August 24, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#gcc">gcc</a>
&nbsp;

<a href="/categories/#C/C++">C/C++</a>


</div>
</div>
<p>Any beginner C programmer will know that the first function executed in any program is the <code class="language-plaintext highlighter-rouge">main()</code> function. However, that is not the entire 
truth. Just like how we have learned the Bohr and Lewis diagrams in Chemistry in Highschool, this is an oversimplification.  From my knowledge, 
the first function executed once the loader runs in a binary is <code class="language-plaintext highlighter-rouge">_start()</code>.</p>

<p>Without going into any details, we can replace <code class="language-plaintext highlighter-rouge">main()</code> with another function such as <code class="language-plaintext highlighter-rouge">foo()</code> (sorry for the lack of creativity).</p>

<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;stdlib.h&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"Called foo</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
  <span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"Called main</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
  <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>

<p>If we compile with <code class="language-plaintext highlighter-rouge">-e &lt;entry&gt;</code> where <code class="language-plaintext highlighter-rouge">&lt;entry&gt;</code> is the name of the function replacing <code class="language-plaintext highlighter-rouge">main()</code>, we can see the following results:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc foo.c -e foo
$ ./a.out 
Called foo
</code></pre></div></div>

<p>We can also observe from <code class="language-plaintext highlighter-rouge">objdump</code> and <code class="language-plaintext highlighter-rouge">nm</code> to see where the <code class="language-plaintext highlighter-rouge">start_address</code> of the C code is (here I am making a distinction between the 
first entry point of the C code and the binary).</p>

<pre class="highlight"><code><font color="#D0CFCC"><b>$ </b></font> objdump -f ./a.out | grep start
start address <font color="#C01C28"><b>0x0000000000401136</b></font>
<font color="#D0CFCC"><b>$ </b></font>nm ./a.out | grep foo
<b><font color="#C01C28">0000000000401136 T</font></b> foo</code></pre>

<h3 id="few-notes">Few Notes</h3>
<ol>
  <li>You must define <code class="language-plaintext highlighter-rouge">main()</code> even if it’s not going to be used. <a href="https://en.cppreference.com/w/c/language/main_function">CPP Reference</a> states 
this explicitly:
    <blockquote>
      <p>Every C program coded to run in a hosted execution environment contains the definition (not the prototype) of a function named main, which is the designated start of the program.</p>
    </blockquote>

    <p>Neglecting to define <code class="language-plaintext highlighter-rouge">main</code> results in an error like the following:</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc foo.c
/usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
</code></pre></div>    </div>
  </li>
  <li>The C program entry must call <code class="language-plaintext highlighter-rouge">exit()</code> to terminate if it is not <code class="language-plaintext highlighter-rouge">main()</code> or else a segfault will occur
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./a.out 
Called foo
Segmentation fault (core dumped)
</code></pre></div>    </div>

    <p>a backtrace via gdb won’t give much information as to why. Probably best to consult with glibc. Essentially it is likely due to the fact 
that <code class="language-plaintext highlighter-rouge">_start</code> is not a function that returns in the stack. It calls <code class="language-plaintext highlighter-rouge">exit</code> to terminate the program which probably does some cleaning via <code class="language-plaintext highlighter-rouge">atexit</code> 
and set the exit status <code class="language-plaintext highlighter-rouge">$?</code> to some value.</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) bt 
#0  0x0000000000000001 in ?? ()
#1  0x00007fffffffdd46 in ?? ()
#2  0x0000000000000000 in ?? ()
</code></pre></div>    </div>
  </li>
</ol>

<h3 id="random-links-for-later-research">Random Links for later Research</h3>
<ul>
  <li>https://vishalchovatiya.com/posts/crt-run-time-before-starting-main/</li>
  <li>https://www.gnu.org/software/hurd/glibc/startup.html</li>
  <li>https://stackoverflow.com/questions/63543127/return-values-in-main-vs-start</li>
</ul>

</div>

<hr class="bits-hr" />

<p><a name="edit-gifs"></a></p>
<div class="bits">
<h1 class="title">Editing GIFS and Creating 88x31 Buttons</h1>
<div>
<p class="date">August 18, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#gifs">gifs</a>
&nbsp;

<a href="/categories/#gimp">gimp</a>


</div>
</div>
<p>Lately I have been learning how to edit GIFS and it is painstaking difficult to remove a background from a GIF without using an 
AI tool, especially when the image has over 70 frames. There is likely an easier way to edit GIFs but I had to manually edit over 50 
frames, erasing the clouds from a GIF using the eraser tool frame by frame which took some time to finish. <br /></p>

<p><b>Original:</b></p>

<div class="tenor-gif-embed" data-postid="26494068" data-share-method="host" data-aspect-ratio="2.19178" data-width="100%"><a href="https://tenor.com/view/flying-pikachu-transparent-balloon-pikachu-pokemon-yellow-pikachu-pokemon-gif-26494068">Flying Pikachu Transparent Balloon Pikachu Sticker</a>from <a href="https://tenor.com/search/flying+pikachu+transparent-stickers">Flying Pikachu Transparent Stickers</a></div>
<script type="text/javascript" async="" src="https://tenor.com/embed.js"></script>

<p><b>Result:</b></p>

<p><img src="../assets/micro/gifs/flying-pikachu-3.gif" /></p>

<p>However, if you are not editing a GIF 
but rather trying to incorporate the GIF into your 88x31 buttons, it turns out to be quite simple. Following the instructions from 
<a href="https://www.youtube.com/watch?v=3XfrnY4mb5o">a video on Youtube</a>, I managed to create a few simple 88x31 buttons that have 
features cats, coffee, and the two programs I am or finished studying (i.e. doing a 2nd degree):</p>

<div class="quick-badges">
<div><img src="../assets/micro/buttons/coffee-powered.gif" /></div>
<div><img src="../assets/micro/buttons/cs-cat.gif" /></div>
<div><img src="../assets/micro/buttons/math-major.gif" /></div>
</div>

<p>To resize the gifs, I used <a href="https://ezgif.com/resize">ezgif resize tool</a> to set the height to be 31px since I didn’t know 
how to resize GIFs on GIMP as it would open a bunch of layers. I guess I could have used ffmpeg but using an online tool is just more 
convenient and easier. I do wonder if there are other standard anti-pixel button sizes aside from 80x15 pixels because a height of 
31 pixels is quite limiting. It’s amazing what the community has been able to do with such limiting number of pixels.</p>

<div class="quick-badges">
<div><img src="../assets/micro/buttons/c.png" /></div>
<div><img src="../assets/micro/buttons/perl.png" /></div>
<div><img src="../assets/micro/buttons/bash.png" /></div>
<div><img src="../assets/micro/buttons/latex.png" /></div>
</div>

<p>For instance, the Bash button I have made has the subtitle “THE BOURNE-AGAIN SHELL” which is very hard to make out. I am assuming the standard 
practice is to render the button as a GIF and display the text on the next frame. That way users would be able to see the full-text.</p>


</div>

<hr class="bits-hr" />

<p><a name="multiple-def"></a></p>
<div class="bits">
<h1 class="title">multiple definition of `variable` ... first defined here</h1>
<div>
<p class="date">August 10, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#gcc">gcc</a>
&nbsp;

<a href="/categories/#C/C++">C/C++</a>


</div>
</div>
<p>Randomly I decided to compile some old projects I worked on and I was surprised to see a few 
compilation errors in an assembler I wrote years back. As it has been many years since I last touched most of the projects I looked at, I was 
pleased to see the compiler catching obvious mistakes I had made in the past. Though this did come to a surprise as to why the compiler I used 
years ago never complained such obvious mistakes. The solution and reason for the last compilation error was not immediate to me:</p>

<pre><code>$ make
gcc -o assembler assembler.c symbol_table.c parser.c  -fsanitize=address -lasan
/usr/bin/ld: /tmp/cc1MoBol.o:(.bss+0x0): multiple definition of `table'; /tmp/cc0B4XxW.o:(.bss+0x0): first defined here
/usr/bin/ld: /tmp/cc1MoBol.o:(.bss+0x81): multiple definition of `__odr_asan.table'; /tmp/cc0B4XxW.o:(.bss+0x40): first defined here</code></pre>

<p>At first I thought I may had made a stupid mistake and defined the struct called <i>table</i> twice but all I could find was <code>symbol_table.h</code>, the file that declared the variable, 
being included by <code>assembler.c</code> and <code>parser.c</code>. This led to the conclusion there must have been a compiler behavioral change between GCC 9 and 
GCC 14. After a quick googling and going through going through the <a href="https://gcc.gnu.org/gcc-10/changes.html">Release Notes</a>, it turns out that starting from 
GCC 10, <code>GCC now defaults to -fno-common</code>:</p>

<blockquote>GCC now defaults to -fno-common. As a result, global variable accesses are more efficient on various targets. In C, global variables with multiple tentative definitions now result in linker errors. With -fcommon such definitions are silently merged during linking.
</blockquote>

<p>In the <a href="https://gcc.gnu.org/gcc-10/porting_to.html">Porting to GCC 10</a> webpage, the developers of GCC notes:</p>

<blockquote>
A common mistake in C is omitting extern when declaring a global variable in a header file. If the header is included by several files it results in multiple definitions of the same variable
</blockquote>

<p>To resolve this issue, one can either silently ignore their mistake and compile with <code>-fcommon</code> or to correctly declare the global variable with the <code>extern</code> keyword.</p>


</div>

<hr class="bits-hr" />

<p><a name="framework-power"></a></p>
<div class="bits">
<h1 class="title">Delusional Dream of a OpenPower Framework Laptop</h1>
<div>
<p class="date">August  4, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#framework">framework</a>
&nbsp;

<a href="/categories/#powerpc">powerpc</a>


</div>
</div>
<p>Framework is a company that makes modular and repairable laptops that has captured the interests of tech enthusiasts over the past 4 years. 
Currently Framework laptops are limited to x86-64 architecture supporting Intel and later AMD CPUs in 2023. Although Framework laptops are not 
entirely open source, they have <a href="https://github.com/FrameworkComputer">open source a decent chunk of their work</a> from my understanding and 
which allows third party development of components and makes partnership possible for other companies such as 
<a href="https://frame.work/ca/en/blog/introducing-a-new-risc-v-mainboard-from-deepcomputing" alt="Framework announcement of introducing a RISC-V mainboard">
DeepComputing to release a mainboard that runs a RISC-V CPU
</a>. While the new mainboard will not be usable for everyday applications, it is a step forward to a more open ecosystem and this is an exciting step for both 
Framework, RISC-V and the broader open-advocate community. This announcement makes me wonder the possibility of OpenPower running on a Framework laptop. 
Similarly to RISC-V, there isn’t an easily accessible way to obtain a consumer product running on OpenPower (aside from Raptor Computing with their 
extremely expensive machines). There is the 
<a href="https://www.powerpc-notebook.org/en/" alt="PowerPC NoteBook Community Page">
PowerPC Notebook project
</a> ran by a group of volunteers who are trying to develop an open source PowerPC notebook to the hands of hobbyists. It would be interesting 
if OpenPower community could also partner with Framework to develop a mainboard once the project is complete and the software is more matured. 
However, this would be a difficult step as there is no dedicated company like DeepComputing that will pour resources into making this happen. 
The interest into OpenPower is low and overshadowed by the wider industry interest in expanding the ARM and RISC-V architecture to consumers. 
IBM made a huge mistake in open sourcing the POWER architecture too late. But one could always dream (even if it’s delusional) :D</p>

</div>

<hr class="bits-hr" />

<p><a name="2024-update"></a></p>
<div class="bits">
<h1 class="title">2024 Update</h1>
<div>
<p class="date">August  4, 2024</p>
<div class="tags">

<a href="/categories/#micro">micro</a>
&nbsp;

<a href="/categories/#site">site</a>


</div>
</div>
<p><b>Website</b><br /></p>
<p>
In the past year I have been very lazy as evident with my <a href="https://zakuarbor.codeberg.page/blog/" alt="Personal Blog">lack of activity on my personal blog</a>.
I'm now trying to pick up blogging again. It's hard to believe that it's been almost an entire year since I created this neocity site, which has almost 0 updates since. 
I've been thinking about how to use this site since I already have a blog on GitHub Pages. Honestly, I forgot this corner existed, and it’s been bothering me that I couldn’t write my random, nonsensical thoughts because my main blog wouldn’t be a suitable medium until now.
So, I’ve decided that this corner will be a microblog where I can share random articles and thoughts. A microblog is different from a regular blog in that the content is much shorter. This space will allow me to quickly jot down something random. I hope that a collection of these random posts will evolve into a blog post or spark an idea for my final year thesis or project.
<p />

<b>How are my studies going?</b><br />
<p>
I’m still studying Mathematics, but I’ve lost much of my initial interest in the field after taking a few third-year courses. 
I ended up taking fewer Math courses, which puts my ability to graduate on time at risk. 
Listening to lectures and reading about abstract groups and rings made me really miss programming and computer science. 
<img src="../assets/micro/gifs/onion/study-confused.gif" />
Despite this, there were still some Math courses I enjoyed, such as Combinatorics and Real Analysis. 
However, I didn’t last long in the follow-up Real Analysis courses that covered Stone-Weierstrass and Commutative C* Algebra. 
Feeling tired of abstract Mathematics, I decided to take a break and pursue an internship at a telecommunications enterprise.</p>

<img src="../assets/micro/gifs/graph-retro-computer.webp" alt="retro computer fiddling with excel" />

<b>What am I doing Now?</b><br />
<p>As mentioned, I am currently doing a year-long internship with a telecommunications enterprise. Although the job isn't very exciting, it's a welcome break from Mathematics. This would typically be a great chance to catch up on my Computer Science studies by delving into textbooks and online resources, but I’ve been quite lazy. Instead, I've been focusing on learning French, a language I've always wanted to master. I started learning French in elementary school, as it’s a requirement in Canada. While it might make more sense to learn my mother tongue, I’m opting to learn French, which might seem confusing to some. For context, I don't have an English name and was born in some Asian country but I am unable to communicate with others in my mother tongue.</p>

</p>

</div>

<footer>
  <marquee id="buttons">
    <a href="index.html"><img src="../assets/micro/buttons/button.png" /></a>
    <a href="index.html"><img src="../assets/micro/buttons/button2.png" /></a>
    <a href="https://neocities.org/"><img src="../assets/micro/buttons/neocitiesorg.gif" /></a>
    <a href="https://www.vim.org/"><img src="../assets/micro/buttons/vim.gif" /></a>
    <a href="https://fedoraproject.org/"><img src="../assets/micro/buttons/gnu-linux.gif" /></a>
    <a href="https://fedoraproject.org/"><img src="../assets/micro/buttons/powered_by_fedora_alt.png" /></a>
  </marquee>
</footer>]]></content><author><name>Ju Hong Kim</name></author><category term="micro" /><summary type="html"><![CDATA[2024 Edition of All MicroPosts on RandomBit's Neocities webpage]]></summary></entry><entry><title type="html">this: the implicit parameter in OOP</title><link href="https://zakuarbor.codeberg.page/blog/this-asm/" rel="alternate" type="text/html" title="this: the implicit parameter in OOP" /><published>2025-02-11T00:00:00-05:00</published><updated>2025-02-11T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/this-asm</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/this-asm/"><![CDATA[<p>I was recently reminded that the variable <code class="language-plaintext highlighter-rouge">this</code> is an implicit parameter passed to all methods in OOP such as C++. We can observe this by comparing a regular function vs a method 
belonging to some class:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="kt">void</span> <span class="nf">greet</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="p">}</span>

<span class="k">class</span> <span class="nc">Human</span> <span class="p">{</span>
<span class="nl">public:</span>
    <span class="kt">void</span> <span class="n">greet</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">};</span>

<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="n">greet</span><span class="p">();</span>
    <span class="n">Human</span> <span class="n">human</span> <span class="o">=</span> <span class="n">Human</span><span class="p">();</span>
    <span class="n">human</span><span class="p">.</span><span class="n">greet</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ g++ test.C
$ ./a.out 
Hello World
Hello World
</code></pre></div></div>

<p>Furthermore, their resulting mangled names do not indicate that the function/method takes in any arguments:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nm a.out  | <span class="nb">grep </span>greet
0000000000401126 T _Z5greetv
000000000040115c W _ZN5Human5greetEv
</code></pre></div></div>

<p>C++ mangles the symbols to handle name resolutions produced by the compiler which can provide more information to the linker. One obvious 
problem name mangling solves is handling function overloading where the same function identifier can take in different number or different types of parameters.
The <code class="language-plaintext highlighter-rouge">v</code> suffix in the mangled names indicates that its only parameter is <code class="language-plaintext highlighter-rouge">void</code>. This is true, as the title suggests, <code class="language-plaintext highlighter-rouge">this</code> is an <strong>implicit</strong> parameter 
meaning its a “parameter” that the compiler will pass into the function. However, this can only be observed by inspecting the assembly code. A language 
that explicitly passes a reference to the object itself is Python where a typical constructor would look like the following:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Human</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">age</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span>  <span class="n">name</span>
        <span class="n">self</span><span class="p">.</span><span class="n">age</span> <span class="o">=</span> <span class="n">age</span>
</code></pre></div></div>

<p>Anyhow, let’s observe the assembly code. Note: I’ll be only showing the code of interest.</p>

<p>For <code class="language-plaintext highlighter-rouge">greet</code>:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Dump</span> <span class="nv">of</span> <span class="nv">assembler</span> <span class="nv">code</span> <span class="nv">for</span> <span class="nv">function</span> <span class="nv">_Z5greetv</span><span class="p">:</span>
   <span class="err">0</span><span class="nf">x0000000000401126</span> <span class="o">&lt;+</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">:</span>	<span class="nv">push</span>   <span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x0000000000401127</span> <span class="o">&lt;+</span><span class="mi">1</span><span class="o">&gt;</span><span class="p">:</span>	<span class="nv">mov</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span><span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x000000000040112a</span> <span class="o">&lt;+</span><span class="mi">4</span><span class="o">&gt;</span><span class="p">:</span>	<span class="nv">mov</span>    <span class="kc">$</span><span class="mh">0x402280</span><span class="p">,</span><span class="o">%</span><span class="nb">esi</span>
</code></pre></div></div>

<p>For <code class="language-plaintext highlighter-rouge">Human::greet</code>:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Dump</span> <span class="nv">of</span> <span class="nv">assembler</span> <span class="nv">code</span> <span class="nv">for</span> <span class="nv">function</span> <span class="nv">_ZN5Human5greetEv</span><span class="p">:</span>
   <span class="err">0</span><span class="nf">x000000000040115c</span> <span class="o">&lt;+</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">push</span>   <span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x000000000040115d</span> <span class="o">&lt;+</span><span class="mi">1</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">mov</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span><span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x0000000000401160</span> <span class="o">&lt;+</span><span class="mi">4</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">sub</span>    <span class="kc">$</span><span class="mh">0x10</span><span class="p">,</span><span class="o">%</span><span class="nb">rsp</span>
   <span class="err">0</span><span class="nf">x0000000000401164</span> <span class="o">&lt;+</span><span class="mi">8</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">mov</span>    <span class="o">%</span><span class="nb">rdi</span><span class="p">,</span><span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">)</span>
   <span class="err">0</span><span class="nf">x0000000000401168</span> <span class="o">&lt;+</span><span class="mi">12</span><span class="o">&gt;</span><span class="p">:</span>	<span class="nv">mov</span>    <span class="kc">$</span><span class="mh">0x402280</span><span class="p">,</span><span class="o">%</span><span class="nb">esi</span>
</code></pre></div></div>

<p>In x86 assembly, whenever you enter a function, the parameters are retrieved from the stack into registers rdi, rsi, rdx, etc (at least that’s how I understood it).
Since <code class="language-plaintext highlighter-rouge">greet</code> has not parameters, it goes straight to storing the address of our constant string “Hello World\n” into the esi register:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) x/1s 0x402280
0x402280:	"Hello World\n"
</code></pre></div></div>

<p>However, for our method <code class="language-plaintext highlighter-rouge">Human::greet</code>, <code class="language-plaintext highlighter-rouge">rdi</code> register which typically holds the first parameter of the function is being utilized</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mov</span>    <span class="o">%</span><span class="nb">rdi</span><span class="p">,</span><span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">)</span>
</code></pre></div></div>

<p>We can assume whatever register <code class="language-plaintext highlighter-rouge">rdi</code> is holding, it’s an 8B value which also happens to be the size of a pointer in x86-64.
<strong>This</strong> is our implicit argument, <code class="language-plaintext highlighter-rouge">this</code>, which contains the address of the object itself. We can observe this via gdb:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p &amp;human
$2 = (Human *) 0x7fffffffdc4f
...
(gdb) i r rdi
rdi            0x7fffffffdc4f      140737488346191
</code></pre></div></div>

<p>where we see that the <code class="language-plaintext highlighter-rouge">rdi</code> register contains the same address as our object <code class="language-plaintext highlighter-rouge">human</code>: <code class="language-plaintext highlighter-rouge">0x7fffffffdc4f</code>.</p>

<p>We can also replicate this in <code class="language-plaintext highlighter-rouge">arm</code> where <code class="language-plaintext highlighter-rouge">w0</code> or <code class="language-plaintext highlighter-rouge">x0</code> will be set with the address of our object <code class="language-plaintext highlighter-rouge">human</code> using compiler explorer:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">Human:</span><span class="p">:</span><span class="nf">greet</span><span class="p">():</span>
 <span class="nf">stp</span>	<span class="nv">x29</span><span class="p">,</span> <span class="nv">x30</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="o">-</span><span class="mi">32</span><span class="p">]</span><span class="err">!</span>
 <span class="nf">mov</span>	<span class="nv">x29</span><span class="p">,</span> <span class="nb">sp</span>
 <span class="nf">str</span>	<span class="nv">x0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">24</span><span class="p">]</span>
<span class="nf">...</span>
</code></pre></div></div>

<p>As you can observe, <code class="language-plaintext highlighter-rouge">x0</code> also containsi some 8B value from the stack (ie. 32 - 24 = 8). Running this on an QNX ARM image (I was too lazy to flash a new OS onto my Raspberry Pi),
we can observe <code class="language-plaintext highlighter-rouge">x0</code> register indeed does contain the same address as our object <code class="language-plaintext highlighter-rouge">human</code> which represents <code class="language-plaintext highlighter-rouge">this</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p &amp;human
$1 = (Human *) 0x81c60
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9	    void greet() {
&lt;+0&gt;:	stp	x29, x30, [sp, #-32]!
&lt;+4&gt;:	mov	x29, sp
&lt;+8&gt;:	str	x0, [sp, #24]
...
Dump of assembler code for function _ZN5Human5greetEv:
test.C:
9	    void greet() {
&lt;+0&gt;:	stp	x29, x30, [sp, #-32]!
&lt;+4&gt;:	mov	x29, sp
&lt;+8&gt;:	str	x0, [sp, #24]

(gdb) i r x0                     
x0             0x81c60             531552
</code></pre></div></div>

<!--
_Z begins mangled symbols
or nested names (including both namespaces and classes), this is followed by N
E is to indicate an end of the scope
wikipedia::article::format becomes:

_ZN9wikipedia7article6formatE


-->]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="asm" /><category term="C/C++" /><summary type="html"><![CDATA[A brief look into `this` parameter in OOP via assembly]]></summary></entry><entry><title type="html">view is just vim</title><link href="https://zakuarbor.codeberg.page/blog/view-vim/" rel="alternate" type="text/html" title="view is just vim" /><published>2025-01-24T00:00:00-05:00</published><updated>2025-01-24T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/view-vim</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/view-vim/"><![CDATA[<p>I recently found out accidentally at work that <code class="language-plaintext highlighter-rouge">vim</code> and <code class="language-plaintext highlighter-rouge">view</code> were the same thing when I happened to be editing a file on <code class="language-plaintext highlighter-rouge">view</code> instead of my beloved <code class="language-plaintext highlighter-rouge">vim</code> editor.</p>

<blockquote>
  <p><strong>Note:</strong> This is a follow up post from my <a href="https://randombits.neocities.org/micro/2025/01/vim-view">microblog</a></p>
</blockquote>

<p>For context, <code class="language-plaintext highlighter-rouge">view</code> is often used in lieu of <code class="language-plaintext highlighter-rouge">vi</code> when trying to open a file for read only while retaining the same shortcuts as <code class="language-plaintext highlighter-rouge">vi</code>. This is why it surprised me 
to see that I could modify a file when <code class="language-plaintext highlighter-rouge">view</code> was supposed to be read only. If we were to take a look at the documentation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ man -Leng view
VIM(1)                                                                  General Commands Manual                                                                  VIM(1)

NAME
       vim - Vi IMproved, a programmer's text editor

SYNOPSIS
       vim [options] [file ..]
       vim [options] -
       vim [options] -t tag
       vim [options] -q [errorfile]

       ex gex
       view
       gvim gview vimx evim eview
       rvim rview rgvim rgview
</code></pre></div></div>

<p>Interestingly, the man pages for <code class="language-plaintext highlighter-rouge">view</code> points to <code class="language-plaintext highlighter-rouge">vim</code> and we can see all sorts of different types of editors listed along with it such as <code class="language-plaintext highlighter-rouge">gvim</code>.
<a href="https://www.ibm.com/docs/en/aix/7.3?topic=v-view-command">AIX 7.3 documentation</a> states that <code class="language-plaintext highlighter-rouge">view</code> <code class="language-plaintext highlighter-rouge">Starts the vi editor in read-only mode.</code> This is indeed 
evident when I take a look at how view is defined in my system (Fedora 41):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">cat</span> /usr/bin/view
<span class="c">#!/usr/bin/sh</span>

<span class="c"># run vim -R if available</span>
<span class="k">if </span><span class="nb">test</span> <span class="nt">-f</span> /usr/bin/vim
<span class="k">then
  </span><span class="nb">exec</span> /usr/bin/vim <span class="nt">-R</span> <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
<span class="k">fi</span>

<span class="c"># run vi otherwise</span>
<span class="nb">exec</span> /usr/libexec/vi <span class="nt">-R</span> <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">-R</code> is a flag for <code class="language-plaintext highlighter-rouge">Read-only mode</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>       -R          Read-only  mode.  The 'readonly' option will be set.  You can still edit the buffer, but will be prevented from accidentally overwriting a file.  If
                   you do want to overwrite a file, add an exclamation mark to the Ex command, as in ":w!".  The -R option also implies the -n option (see above).  The
                   'readonly' option can be reset with ":set noro".  See ":help 'readonly'".
</code></pre></div></div>

<h2 id="vim-oddities">Vim Oddities</h2>

<p>What I found particularly odd was how at work, on one system <code class="language-plaintext highlighter-rouge">view</code> was simply a symlink to <code class="language-plaintext highlighter-rouge">vi</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ realpath view
/usr/bin/vi
</code></pre></div></div>

<p>while on another machine, the two had the same md5sum (the md5sum is for illustration purposes, I just replicated the behavior on my local machine):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>zaku@fedora:/usr/bin$ md5sum view
8fe562f5dd43b70c38f10ee2ec3310ca  view
zaku@fedora:/usr/bin$ md5sum vim
8fe562f5dd43b70c38f10ee2ec3310ca  vim
</code></pre></div></div>

<p>This odd behavior made me confused so I decided to make an experiment seeing how the only difference between <code class="language-plaintext highlighter-rouge">view</code> and <code class="language-plaintext highlighter-rouge">vim</code> on both systems at work was their names:</p>

<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ln -s /usr/bin/vim view-pika
$ ls -l view-pika
lrwxrwxrwx. 1 zaku zaku 12 22 janv. 22:52 <font color="#2AA198"><b>view-pika</b></font> -&gt; <font color="#859900"><b>/usr/bin/vim</b></font>
</code></pre></div></div>

<p>And it <strong>BEHAVED THE SAME</strong> as <code class="language-plaintext highlighter-rouge">view</code>. Thus I concluded, <code class="language-plaintext highlighter-rouge">vim</code> behaves differently depending on the name of the command being executed. More precisely, if the program 
started with the name <code class="language-plaintext highlighter-rouge">view</code> then it would open <code class="language-plaintext highlighter-rouge">vim</code> as read-only by taking a look at <code class="language-plaintext highlighter-rouge">argv[0]</code>. Upon looking at the source code on <a href="https://github.com/vim/vim/blob/master/src/main.c#L1954">Github</a>
under <code class="language-plaintext highlighter-rouge">main.c::parse_command_name()</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">if</span> <span class="p">(</span><span class="n">STRNICMP</span><span class="p">(</span><span class="n">initstr</span><span class="p">,</span> <span class="s">"view"</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">initstr = gettail((char_u *)parmp-&gt;argv[0]);</code> as suspected. This explains why <code class="language-plaintext highlighter-rouge">pika-view</code> did not work but <code class="language-plaintext highlighter-rouge">view-pika</code> worked. It only compared the first 
4 characters of <code class="language-plaintext highlighter-rouge">argv[0]</code> to see if it starts with <code class="language-plaintext highlighter-rouge">view</code>. If you inspect the code more, you’ll see that <code class="language-plaintext highlighter-rouge">vim</code> has many faces.</p>

<p>This behavior is entirely documented on the man pages which I never noticed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Vim behaves differently, depending on the name of the command (the executable may still be the same file).

       vim       The "normal" way, everything is default.

       ex        Start in Ex mode.  Go to Normal mode with the ":vi" command.  Can also be done with the "-e" argument.

       view      Start in read-only mode.  You will be protected from writing the files.  Can also be done with the "-R" argument.

       gvim gview
                 The GUI version.  Starts a new window.

       gex       Starts a new gvim window in Ex mode. Can also be done with the "-e" argument to gvim

       vimx      Starts gvim in "Vi" mode similar to "vim", but with additional features like xterm clipboard support

       evim eview
                 The GUI version in easy mode.  Starts a new window.  Can also be done with the "-y" argument.

       rvim rview rgvim rgview
                 Like the above, but with restrictions.  It will not be possible to start shell commands, or suspend Vim.  Can also be done with the "-Z" argument.
</code></pre></div></div>

<h3 id="extra-random-information-on-vim-and-vi">Extra Random Information on VIM and VI</h3>

<p>1.) Viewing Compilation Flags</p>

<p>That was all I wanted to look at in regards to <code class="language-plaintext highlighter-rouge">view</code> and <code class="language-plaintext highlighter-rouge">vim</code>. One interesting timbit about <code class="language-plaintext highlighter-rouge">vim</code> is that you can see what it appears to be the compilation flag by running: <code class="language-plaintext highlighter-rouge">vim --version</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>         fichier vimrc système : "/etc/vimrc"
     fichier vimrc utilisateur : "$HOME/.vimrc"
  2e fichier vimrc utilisateur : "~/.vim/vimrc"
  3e fichier vimrc utilisateur : "~/.config/vim/vimrc"
      fichier exrc utilisateur : "$HOME/.exrc"
 fichier de valeurs par défaut : "$VIMRUNTIME/defaults.vim"
               $VIM par défaut : "/usr/share/vim"
Compilation : gcc -c -I. -Iproto -DHAVE_CONFIG_H -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -DSYS_VIMRC_FILE=/etc/vimrc -D_REENTRANT -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 
Édition de liens : gcc -Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -Wl,--build-id=sha1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -L/usr/local/lib -o vim -lm -lselinux -lncurses -lsodium -lacl -lattr -lgpm
</code></pre></div></div>

<p>2.) One unique thing about <code class="language-plaintext highlighter-rouge">vim</code> is a charityware. If you simply type in <code class="language-plaintext highlighter-rouge">vim</code>, the menu will ask you to help children in Uganda through ICCF Holland.</p>

<p><img src="../assets/programming/vim-fr.png" alt="vim menu telling users to support children in Uganda" /></p>

<p>3.) <a href="https://vimconf.org/">Vimconf</a> is held in Japan. This indicates that <code class="language-plaintext highlighter-rouge">vim</code> either has a strong presence in Japan or a very dedicated fanbase.</p>

<p>4.) Ubuntu 24.04 ships <code class="language-plaintext highlighter-rouge">vim.tiny</code>, likely a more stripped down version of <code class="language-plaintext highlighter-rouge">vim</code></p>

<p>5.) <code class="language-plaintext highlighter-rouge">vi</code> packaged on a QNX virtual target is called <a href="https://en.wikipedia.org/wiki/Elvis_(text_editor)"><code class="language-plaintext highlighter-rouge">elvis</code></a>, an enhanced clone of <code class="language-plaintext highlighter-rouge">vi</code>. QNX probably ships <code class="language-plaintext highlighter-rouge">elvis</code> 
as the default editor due to its small size compared to <code class="language-plaintext highlighter-rouge">vim</code> (though this also means less features compared to <code class="language-plaintext highlighter-rouge">vim</code>). The QNX Raspberry Pi 4 image though ships with 
regular <code class="language-plaintext highlighter-rouge">vim</code>. Similarly to <code class="language-plaintext highlighter-rouge">vim</code>, renaming <code class="language-plaintext highlighter-rouge">elvis</code> to <code class="language-plaintext highlighter-rouge">view</code> will open the editor in read only mode.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># vi --version
elvis 2.2.0
Copyright (c) 1995-2003 by Steve Kirkendall
Permission is granted to redistribute the source or binaries under the terms of
of the Perl `Clarified Artistic License', as described in the doc/license.html
file.  In particular, unmodified versions can be freely redistributed.
Elvis is offered with no warranty, to the extent permitted by law.
</code></pre></div></div>

<p>v.s.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   system vimrc file: "$VIM/vimrc"
     user vimrc file: "$HOME/.vimrc"
 2nd user vimrc file: "~/.vim/vimrc"
 3rd user vimrc file: "~/.config/vim/vimrc"
      user exrc file: "$HOME/.exrc"
       defaults file: "$VIMRUNTIME/defaults.vim"
  fall-back for $VIM: "/builds/workspace/build/stage/target/qnx/usr/share/vim
"
Compilation: aarch64-unknown-nto-qnx8.0.0-gcc -mlittle-endian -mlittle-endian -c -I. -Iproto -DHAVE_CONFIG_H -mlittle-endian -I/builds/workspace/build/stage/target/qnx/usr/include -I/builds/workspace/build/qnx_sdp/target/qnx/usr/include -mlittle-endian -O2 -Wall -fplugin=/builds/workspace/build/qnx_sdp/host/linux/x86_64/usr/lib/gcc/aarch64-unknown-nto-qnx8.0.0/12.2.0/plugin/cmdline_save.so -fplugin=srcversion -fplugin-arg-srcversion-path=/builds/workspace/build/qnx_sdp/target/qnx -fplugin-arg-srcversion-path=/builds/workspace/build/code -fplugin-arg-srcversion-path=/builds/workspace/build/stage/target/qnx -fplugin-arg-srcversion-path=/builds/workspace/build/qnx_sdp/host/linux/x86_64 -fplugin-arg-srcversion-buildid=vim_br-main_be-800-16 -g -D_REENTRANT -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 
Linking: aarch64-unknown-nto-qnx8.0.0-gcc -mlittle-endian -mlittle-endian -mlittle-endian -L/builds/workspace/build/stage/target/qnx/aarch64le/lib -L/builds/workspace/build/stage/target/qnx/aarch64le/usr/lib -L/builds/workspace/build/qnx_sdp/target/qnx/aarch64le/lib -L/builds/workspace/build/qnx_sdp/target/qnx/aarch64le/usr/lib -Wl,-Map,install.map -Wl,--build-id=md5 -Wl,--as-needed -o vim -lm -lsocket -lncurses -liconv -lintl 
</code></pre></div></div>]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="vi" /><category term="utilities" /><category term="unix" /><summary type="html"><![CDATA[A Look into the many faces of vim]]></summary></entry><entry><title type="html">The Sign of Char</title><link href="https://zakuarbor.codeberg.page/blog/sign-of-char/" rel="alternate" type="text/html" title="The Sign of Char" /><published>2025-01-20T00:00:00-05:00</published><updated>2025-01-20T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/sign-of-char</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/sign-of-char/"><![CDATA[<blockquote>
  <p><strong>Note:</strong> This is a follow up post from my <a href="https://randombits.neocities.org/micro/2025/01/char-unsigned">microblog</a></p>

  <p><strong>WARNING:</strong> I am no expert in Assembly. The last and only time I ever wrote assembly was computing the Fibbonacci Sequences 8 years ago for the MIPS architecture</p>
</blockquote>

<p>The following below has a value that is vague:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</code></pre></div></div>

<p>The issue with the above line is that the value of <code class="language-plaintext highlighter-rouge">i</code> is not immediately obvious as compilers for different architectures could treat this as <code class="language-plaintext highlighter-rouge">signed</code> or <code class="language-plaintext highlighter-rouge">unsigned</code>.
The signedness of a data type can be simply thought as whether or not there is a dedicated sign bit that indicates whether or not the number is postive or negative.</p>

<h2 id="a-quick-review-of-signedness">A Quick Review of Signedness</h2>

<p>The size of <code class="language-plaintext highlighter-rouge">char</code> is 1 byte as defined in the <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf">C specification (C99 3.7.1)</a> which corresponds to 8 bits. 
This effectively gives <code class="language-plaintext highlighter-rouge">char</code> the ability to represent 2<sup>8</sup> = 256 values which is more than enough to represent all 128 characters of ASCII and other 
encodings that slightly extended ASCII to utilize the other remaining unused slots (i.e. ASCII utilizes only maps to 128 values) such as <a href="https://en.wikipedia.org/wiki/JIS_X_0201">JIS X 0201</a>.</p>

<p>There are different ways to represent negative numbers but the most common, at least from what I recall, is that negative numbers are represented using two’s complements.
From what I read online, it would seem that the advantage of two’s complement is that we can treat operations on the numbers the same regardless if it is negative or 
positive. This also allows us to not have a concept of negative 0 which would be quite odd to deal with.</p>

<p>Two’s complement is quite simple but it does require you to be familiar with binary since that is how computers represent any piece of data. The most significant bit 
(the left most bit) represents whether the number is negative or not. If set (i.e. set to 1 or true), then the number is negative and we must apply two’s complement 
to retrieve the number in decimal.</p>

<p>In our case, let’s look at how <code class="language-plaintext highlighter-rouge">-1</code> is represented using two’s complement: <code class="language-plaintext highlighter-rouge">1111 1111</code> or <code class="language-plaintext highlighter-rouge">0xFF</code></p>

<ol>
  <li>Invert all bits:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1111 1111 =&gt; 0000 0000
</code></pre></div>    </div>
  </li>
  <li>Add 1:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000 0000
0000 0001
---------
0000 0001
</code></pre></div>    </div>
  </li>
</ol>

<p>Since the result is a 1 and we know the most significant bit was set (or else we would not have to do 2’s complement), <code class="language-plaintext highlighter-rouge">1111 1111</code> represents <code class="language-plaintext highlighter-rouge">-1</code></p>

<h2 id="signedness-of-char-in-arm">Signedness of Char in ARM</h2>

<p>In Robert Love’s section on “Signedness of Chars” (Chapter 19 - Portability) of his book on the Linux Kernel Development, he notes that on some systems such as in ARM
would treat <code class="language-plaintext highlighter-rouge">char</code> as <code class="language-plaintext highlighter-rouge">unsigned</code> which goes against the logic of us AMD64 (x86-64) programmers. Effectively, the value of <code class="language-plaintext highlighter-rouge">i</code> will be stored as 255 rather than -1.
The reason for this is apparently due to performance.</p>

<p>Let’s verify this on my Raspberry Pi 4 machine running Linux:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">char</span> <span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">255</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"char is unsigned</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"char is signed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Result:</strong>  <code class="language-plaintext highlighter-rouge">char is unsigned</code></p>

<p>Let’s examine under the hood (using Godbolt GCC 14.2 with no optimization enabled):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">i</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">signed</span> <span class="kt">char</span> <span class="n">j</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
</code></pre></div></div>
<p>The corresponding assembly is:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">; unsigned char i = -1;</span>
<span class="nf">mov</span>	<span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xffffffff</span>            	<span class="o">//</span> <span class="err">#</span><span class="o">-</span><span class="mi">1</span>
<span class="nf">strb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">15</span><span class="p">]</span>

<span class="c1">; signed char j = -1</span>
<span class="nf">mov</span>	<span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xffffffff</span>            	<span class="o">//</span> <span class="err">#</span><span class="o">-</span><span class="mi">1</span>
<span class="nf">strb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">14</span><span class="p">]</span>
</code></pre></div></div>

<p>As you can observe, both signed and unasigned char results set of instructions to store its value. The differences should be the way the compiler treats each 
variable such as utilizing the signed or unsigned instructions.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">i</span><span class="o">++</span><span class="p">;</span>
<span class="n">j</span><span class="o">++</span><span class="p">;</span>
</code></pre></div></div>

<p>The corresponding assembly is:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">; i++</span>
<span class="nf">ldrb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">15</span><span class="p">]</span> <span class="c1">; w0 = -1 = 255 (treated as unsigned)</span>
<span class="nf">add</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0x1</span>      <span class="c1">; let's ignore the fact it'll ovevrflow   </span>
<span class="nf">strb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">15</span><span class="p">]</span>

<span class="c1">;j++</span>
<span class="nf">ldrsb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">14</span><span class="p">]</span> <span class="c1">; w0 = -1 (treated as signed)</span>
<span class="nf">and</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xff</span>
<span class="nf">add</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0x1</span>
<span class="nf">and</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xff</span>
<span class="nf">strb</span>	<span class="nv">w0</span><span class="p">,</span> <span class="p">[</span><span class="nb">sp</span><span class="p">,</span> <span class="err">#</span><span class="mi">14</span><span class="p">]</span>
</code></pre></div></div>

<p>As we can see, the variable <code class="language-plaintext highlighter-rouge">signed j</code> utilizes <code class="language-plaintext highlighter-rouge">ldrsb</code> instead of <code class="language-plaintext highlighter-rouge">ldrb</code> to load a <strong>signed</strong> byte and generates significatly more instructions than incrementing 
the <code class="language-plaintext highlighter-rouge">unsigned i</code>.</p>

<p>Let’s focus our attention to <code class="language-plaintext highlighter-rouge">ldr<b>s</b>b</code> which is loading the value (a byte) pointed by sp - 14 which corresponds to 
the value of <code class="language-plaintext highlighter-rouge">j</code>. <code class="language-plaintext highlighter-rouge">0xFF</code> is 255 if we treat it as unsigned but we must be able to distinguish between the number <code class="language-plaintext highlighter-rouge">255</code> and <code class="language-plaintext highlighter-rouge">-1</code>.
Recall that <code class="language-plaintext highlighter-rouge">w0</code> is a 32 bit register but we are only loading a single byte which is 8 bits long. 
This is where the sign extend 
comes into the story.</p>

<p><code class="language-plaintext highlighter-rouge">ldrb w0, #0xff</code> will look like the following:</p>

<table class="bit-table">
<tbody>
    <tr>
        <th>31</th><th>30</th><th>29</th><th>28</th>
        <th>27</th><th>26</th><th>25</th><th>24</th>
        <th>23</th><th>22</th><th>21</th><th>20</th>
        <th>19</th><th>18</th><th>17</th><th>16</th>
        <th>15</th><th>14</th><th>13</th><th>12</th>
        <th>11</th><th>10</th><th>9</th><th>8</th>
        <th>7</th><th>6</th><th>5</th><th>4</th>
        <th>3</th><th>2</th><th>1</th><th>0</th></tr>
    <tr class="bits-row">
        <td>0</td><td class="">0</td><td class="">0</td><td class="">0</td><td>0</td>
        <td class="">0</td><td class="">0</td><td class="">0</td><td class=" left-border">0</td><td class="">0</td><td class="">0</td><td class="">0</td><td class=" left-border">0</td><td class="">0</td><td class="">0</td><td class="">0</td><td class=" left-border">0</td><td class="">0</td><td class="">0</td><td class="">0</td><td class=" left-border">0</td><td class="">0</td><td class="">0</td><td class="">0</td><td class="highlight left-border">1</td><td class="highlight">1</td><td class="highlight">1</td><td class="highlight">1</td><td class="highlight left-border">1</td><td class="highlight">1</td><td class="highlight">1</td><td class="highlight right-border">1</td></tr>
    <tr class="hex-row"><td colspan="4">0</td><td colspan="4">0</td><td colspan="4">0</td><td colspan="4">0</td><td colspan="4">0</td><td colspan="4">0</td><td colspan="4">F</td><td colspan="4">F</td>
    </tr>
</tbody>
</table>

<p>Notice how bits 8-31 are set to 0, this is what we call <strong>zero-extends</strong> whereby the byte value is extended with 0s to obtain a 32-bit word.
Meanwhile for <code class="language-plaintext highlighter-rouge">ldrsb</code>, it loads the byte and then <code class="language-plaintext highlighter-rouge">sign extend to 32 bits</code> with 1s by setting the upper remaining bits 8-31 to 1:</p>

<table class="bit-table">
<tbody>
    <tr>
        <th>31</th><th>30</th><th>29</th><th>28</th>
        <th>27</th><th>26</th><th>25</th><th>24</th>
        <th>23</th><th>22</th><th>21</th><th>20</th>
        <th>19</th><th>18</th><th>17</th><th>16</th>
        <th>15</th><th>14</th><th>13</th><th>12</th>
        <th>11</th><th>10</th><th>9</th><th>8</th>
        <th>7</th><th>6</th><th>5</th><th>4</th>
        <th>3</th><th>2</th><th>1</th><th>0</th></tr>
    <tr class="bits-row">
        <td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight2 left-border">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight2 left-border">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight2 left-border">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight2 left-border">1</td><td class="highlight2">1</td><td class="highlight2">1</td><td class="highlight2">1</td>
        <td class="highlight left-border">1</td><td class="highlight">1</td><td class="highlight">1</td><td class="highlight">1</td>
        <td class="highlight left-border">1</td><td class="highlight">1</td><td class="highlight">1</td><td class="highlight right-border">1</td></tr>
    <tr class="hex-row"><td colspan="4" class="highlight2">F</td><td colspan="4" class="highlight2">F</td><td colspan="4" class="highlight2">F</td><td colspan="4" class="highlight2">F</td><td colspan="4" class="highlight2">F</td><td colspan="4" class="highlight2">F</td><td colspan="4">F</td><td colspan="4">F</td>
    </tr>
</tbody>
</table>

<p>After loading the byte (as signed) to <code class="language-plaintext highlighter-rouge">w0</code>, there are two extra instructions that differs between adding a <strong>signed</strong> and <strong>unsigned</strong> integer:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">and</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xff</span>
<span class="nf">add</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0x1</span>
<span class="nf">and</span>	<span class="nv">w0</span><span class="p">,</span> <span class="nv">w0</span><span class="p">,</span> <span class="err">#</span><span class="mh">0xff</span>
</code></pre></div></div>

<p>As to why these instructions are necessary is still something that is not clear to me why it is necessary to “truncate” <code class="language-plaintext highlighter-rouge">w0</code> such that 
all bits after the first 8 are set to 0 (if I understood this correctly). I know we are only interested on adding onto a single byte 
but I was under the impression that truncation wouldn’t be necessary as we are using <code class="language-plaintext highlighter-rouge">strb</code> to store the result back to memory.
Of course, I expect these <code class="language-plaintext highlighter-rouge">and</code> instructions to not exist when we tune the optimization. 
As this is a simple program, I do not think it’s worth the effort to look into this in further details.</p>

<h2 id="unsigned-char-in-other-architectures">Unsigned Char in Other Architectures</h2>

<p>ARM is not the only unique architecture that treats <code class="language-plaintext highlighter-rouge">char</code> as unsigned. <a href="https://trofi.github.io/posts/203-signed-char-or-unsigned-char.html">trofi</a> also did a nice 
overview of looking at the signedness of other architecture after encountering a bug in <a href="https://bugs.gentoo.org/630698">SQLite</a> whereby SQLite would hang 
(i.e. be stuck in an infinite loop) on PowerPC architecture. After looking at various architecture, he concluded that ARM, PowerPC, and s390 have unsigned char.</p>

<h2 id="signedness-based-on-os">Signedness based on OS</h2>

<p>The size and range for each data types is not solely based architecture as different OS could impose their own limits as well. On the same architecture, 
the size of <code class="language-plaintext highlighter-rouge">int</code> does differ between 64-bit Windows and 64-bit Linux (i.e. LP64 v.s. LLP64).</p>

<blockquote>
  <p>So amongst common 64 bit OSes, there are two different implementations of the sizes of int, long and long long. UNIX-based systems tend to use length of 4/8/8 (in bytes, as returned by sizeof()), whereas Windows uses 4/4/8. In a different terminology, 4/8/8 is called LP64 (long and pointers 64 bit) and 4/4/8 is LLP64 (long long and pointers 64 bit).</p>

  <p><a href="Portable C and long">Portable C and long</a></p>
</blockquote>

<p><img src="../assets/programming/builds/windows-linux-longint.gif" alt="A gif showing the difference between size of long on Windows and Linux on the same architecture" /></p>
<p class="caption">The differences between the size of `long int` on Linux and Windows</p>

<p>I do not have a Windows running on ARM processor to know what would be the signedness of <code class="language-plaintext highlighter-rouge">char</code> but as for MacOS, I did manage to ask a random stranger to confirm 
the signedness. Interestingly, MacOS running on its ARM chips such as the M3 treat <code class="language-plaintext highlighter-rouge">char</code> as <code class="language-plaintext highlighter-rouge">signed</code>. 
In QNX on ARM, <code class="language-plaintext highlighter-rouge">char</code> is unsigned as I expected, it’s just MacOS being weird. I wonder if there is a technical or historical reason for this. Perhaps this was due 
to the desire to port x86 code to ARM by emulating portability differences between the two architecture but that’s just speculation on my part.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Therefore to make your code portable, one should ensure to explicitly state whether or not <code class="language-plaintext highlighter-rouge">char</code> is signed or unsigned instead of making assumptions if they know
their <code class="language-plaintext highlighter-rouge">char</code> will lie outside of 0 to 127. All that the C standard guarantees is that its size is 1 byte.</p>

<hr />

<p><strong>Resources:</strong></p>
<ul>
  <li>Linux Kernel Development by Robert Love</li>
  <li><a href="http://computerscience.chemeketa.edu/armTutorial/Memory/LoadStoreBytes.html">http://computerscience.chemeketa.edu/armTutorial/Memory/LoadStoreBytes.html</a></li>
  <li><a href="https://developer.arm.com/documentation/102374/0102/Loads-and-stores---zero-and-sign-extension">https://developer.arm.com/documentation/102374/0102/Loads-and-stores—zero-and-sign-extension</a></li>
  <li><a href="https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges?view=msvc-170">https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges?view=msvc-170</a></li>
  <li><a href="https://abstractexpr.com/2023/04/30/the-anomaly-of-the-char-type-in-c/">https://abstractexpr.com/2023/04/30/the-anomaly-of-the-char-type-in-c/</a></li>
</ul>]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="C/C++" /><category term="arm" /><summary type="html"><![CDATA[A look into the signedness of char on ARM architecture]]></summary></entry><entry><title type="html">Utilizing Aliases and Interactive Mode to Force Users to Think Twice Before Deleting Files</title><link href="https://zakuarbor.codeberg.page/blog/alias-interactive/" rel="alternate" type="text/html" title="Utilizing Aliases and Interactive Mode to Force Users to Think Twice Before Deleting Files" /><published>2024-12-29T00:00:00-05:00</published><updated>2024-12-29T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/alias-interactive</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/alias-interactive/"><![CDATA[<p>I previously mentioned in my microblog that <a href="https://randombits.neocities.org/micro/2024/12/jekyll-cache">I lost my file</a> by accidentally overwriting my file using the <code class="language-plaintext highlighter-rouge">cp</code> command. This got me thinking as to why this would be impossible on 
my work laptop since I would be constantly bombarded with a prompt to confirm my intention to overwrite the file.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cp 2024-12-01-template.md 2024-12-30-alias-interactive.md
cp: overwrite '2024-12-30-alias-interactive.md'?
</code></pre></div></div>

<p>Commands like <code class="language-plaintext highlighter-rouge">mv</code> and <code class="language-plaintext highlighter-rouge">cp</code> have an <strong>interactive</strong> flag <code class="language-plaintext highlighter-rouge">-i</code> to prompt before overwriting the file. As seen in <code class="language-plaintext highlighter-rouge">man 1 cp</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-i, --interactive
              prompt before overwrite (overrides a previous -n option)
</code></pre></div></div>

<p>To force everyone at work to have this flag enabled, they made <code class="language-plaintext highlighter-rouge">cp</code> and <code class="language-plaintext highlighter-rouge">mv</code> an alias in our default shell configs:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">alias cp</span><span class="o">=</span><span class="s2">"cp -i"</span>
<span class="nb">alias mv</span><span class="o">=</span><span class="s2">"mv -i"</span>
</code></pre></div></div>

<p>Which you can also verify using the <code class="language-plaintext highlighter-rouge">type</code> command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ type cp
cp is aliased to `cp -i'
$ type mv
mv is aliased to `mv -i'
</code></pre></div></div>]]></content><author><name>Ju Hong Kim</name></author><category term="linux" /><summary type="html"><![CDATA[Using interactive mode and alias to force users to think twice before overwriting files]]></summary></entry><entry><title type="html">Stack Overflow: The Case of a Small Stack</title><link href="https://zakuarbor.codeberg.page/blog/small-stack/" rel="alternate" type="text/html" title="Stack Overflow: The Case of a Small Stack" /><published>2024-12-29T00:00:00-05:00</published><updated>2024-12-29T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/small-stack</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/small-stack/"><![CDATA[<p>Years ago I was once asked by an intern to debug a mysterious crash that seemed so innocent. While I no longer recall what the code was about, we stripped the program to a single line in 
<code class="language-plaintext highlighter-rouge">main</code>. Yet the program still continued to crash.</p>

<p><strong>Source:</strong></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Result:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ./prog-arm64 

Process 630803 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=00000025333267f0 mapaddr=00000000000007f0 ref=000000443dd5dc50
Memory fault (core dumped) 
</code></pre></div></div>

<p>This bewildered all of the interns as it made absolutely no sense. Through our investigation, there was two things we noticed:</p>
<ol>
  <li>The program worked on our local machines but not on our target virtual machine</li>
  <li>We were allocating an extremely large buffer in the stack which was unusual</li>
</ol>

<p>It turns out the intern wanted to allocate a 1MiB buffer for some networking or driver related ticket. If I recall correctly, our target 
only had 512MB RAM so this could have explained the mysterious crash. But even 1MiB buffer on the stack was too large for our target:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Result:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ./prog-arm64 

Process 696339 (prog-arm64) terminated SIGSEGV code=1 fltno=11 ip=0000004de7e7a7ec mapaddr=00000000000007ec ref=000000383b19fbe0
Memory fault (core dumped) 
</code></pre></div></div>

<p>One thing I purposely omitted was that our target was running QNX, a realtime operating system. If we were to take a look at the documentation:</p>
<blockquote>
  <p>A process’s main thread starts with an automatically allocated 512 KB stack
– <a href="https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino.prog/topic/process_stack.html">QNX SDP 8.0 - Stack Allocation</a></p>
</blockquote>

<p>This shocked all of us since 1 MiB is not a large buffer in 2021 where we had plenty of memory on our own personal system at home.</p>

<p><strong>Note 1:</strong> The target used in the example was an aarch64le. This example will work on amd64 (x86_64) but requires you to add something such as a print statement</p>

<p><strong>Note 2:</strong> QNX 8.0 was released to the general public in late 2023 or early 2024 so the actual target at the time when the question was asked was running either QNX 7.0 or QNX 7.1 (I do not recall which version)</p>

<h2 id="investigating-why-amd64-x86_64-seems-unaffected">Investigating why AMD64 (x86_64) seems unaffected</h2>

<p><strong>Note:</strong> Everything below is nothing shocking nor interesting. I just felt like keeping it there.</p>

<p>The behavior for AMD64 (x86_64) as noted requires more fiddling to trigger a crash which came to my surprise. From my understanding of the documentation, the stack size should still be 512KB.
Suspecting there could be some optimization going on, I fiddled around with the compiler setting and added some code to see if I could trigger the crash and it turns out that if I make 
a call to <code class="language-plaintext highlighter-rouge">printf</code>, the program will indeed crash as desired.</p>

<p><strong>Source Code:</strong></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
  <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="o">*</span><span class="mi">1024</span><span class="p">];</span>
  <span class="n">printf</span><span class="p">(</span><span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Result:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ./prog-amd64  

Process 2977812 (prog-amd64) terminated SIGSEGV code=1 fltno=11 ip=0000002c51b107f6 mapaddr=00000000000007f6 ref=0000003f4ece4b58
Memory fault (core dumped) 
</code></pre></div></div>

<p>To test my hypothesis that there was optimization under the hood, I generated the assembly (i.e. pass <code class="language-plaintext highlighter-rouge">-S</code> to <code class="language-plaintext highlighter-rouge">qcc</code>):</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">main:</span>
<span class="nl">.LFB0:</span>
        <span class="nf">.file</span> <span class="mi">1</span> <span class="s">"prog.c"</span>
        <span class="nf">.loc</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">12</span>
        <span class="nf">.cfi_startproc</span>
        <span class="nf">pushq</span>   <span class="o">%</span><span class="nb">rbp</span>
        <span class="nf">.cfi_def_cfa_offset</span> <span class="mi">16</span>
        <span class="nf">.cfi_offset</span> <span class="mi">6</span><span class="p">,</span> <span class="o">-</span><span class="mi">16</span>
        <span class="nf">movq</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span> <span class="o">%</span><span class="nb">rbp</span>
        <span class="nf">.cfi_def_cfa_register</span> <span class="mi">6</span>
        <span class="nf">subq</span>    <span class="kc">$</span><span class="mi">1048592</span><span class="p">,</span> <span class="o">%</span><span class="nb">rsp</span>
</code></pre></div></div>

<p>With much disappointment, my hypothesis was incorrect. We can see that the stack pointer indeed does move at least by 1 MiB (1024 x 1024 = 1048576). 
As this file was simply incomplete as we still needed to run the assembler and linker to make the program executable, I then proceeded to running the 
program on the debugger in hopes that I can save my hypothesis (spoiler: my initial hypothesis is false).</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nb">di</span><span class="nv">sassemble</span>
<span class="nf">Dump</span> <span class="nv">of</span> <span class="nv">assembler</span> <span class="nv">code</span> <span class="nv">for</span> <span class="nv">function</span> <span class="nv">main</span><span class="p">:</span>
   <span class="err">0</span><span class="nf">x0000000008048791</span> <span class="o">&lt;+</span><span class="mi">0</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">push</span>   <span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x0000000008048792</span> <span class="o">&lt;+</span><span class="mi">1</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">mov</span>    <span class="o">%</span><span class="nb">rsp</span><span class="p">,</span><span class="o">%</span><span class="nb">rbp</span>
   <span class="err">0</span><span class="nf">x0000000008048795</span> <span class="o">&lt;+</span><span class="mi">4</span><span class="o">&gt;</span><span class="p">:</span>     <span class="nv">sub</span>    <span class="kc">$</span><span class="mh">0x100010</span><span class="p">,</span><span class="o">%</span><span class="nb">rsp</span>
   <span class="err">0</span><span class="nf">x000000000804879c</span> <span class="o">&lt;+</span><span class="mi">11</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="mh">0x182d</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">),</span><span class="o">%</span><span class="nb">rax</span>        <span class="err">#</span> <span class="mh">0x8049fd0</span>
   <span class="err">0</span><span class="nf">x00000000080487a3</span> <span class="o">&lt;+</span><span class="mi">18</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rax</span><span class="p">),</span><span class="o">%</span><span class="nb">rcx</span>
   <span class="err">0</span><span class="nf">x00000000080487a6</span> <span class="o">&lt;+</span><span class="mi">21</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="o">%</span><span class="nb">rcx</span><span class="p">,</span><span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">)</span>
   <span class="err">0</span><span class="nf">x00000000080487aa</span> <span class="o">&lt;+</span><span class="mi">25</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">xor</span>    <span class="o">%</span><span class="nb">ecx</span><span class="p">,</span><span class="o">%</span><span class="nb">ecx</span>
   <span class="err">0</span><span class="nf">x00000000080487ac</span> <span class="o">&lt;+</span><span class="mi">27</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="kc">$</span><span class="mh">0x0</span><span class="p">,</span><span class="o">%</span><span class="nb">eax</span>
   <span class="err">0</span><span class="nf">x00000000080487b1</span> <span class="o">&lt;+</span><span class="mi">32</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="o">%</span><span class="nb">eax</span><span class="p">,</span><span class="o">%</span><span class="nb">edx</span>
   <span class="err">0</span><span class="nf">x00000000080487b3</span> <span class="o">&lt;+</span><span class="mi">34</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="mh">0x1816</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">),</span><span class="o">%</span><span class="nb">rax</span>        <span class="err">#</span> <span class="mh">0x8049fd0</span>
   <span class="err">0</span><span class="nf">x00000000080487ba</span> <span class="o">&lt;+</span><span class="mi">41</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">),</span><span class="o">%</span><span class="nb">rsi</span>
   <span class="err">0</span><span class="nf">x00000000080487be</span> <span class="o">&lt;+</span><span class="mi">45</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">sub</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rax</span><span class="p">),</span><span class="o">%</span><span class="nb">rsi</span>
   <span class="err">0</span><span class="nf">x00000000080487c1</span> <span class="o">&lt;+</span><span class="mi">48</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">je</span>     <span class="mh">0x80487c8</span> <span class="o">&lt;</span><span class="nv">main</span><span class="o">+</span><span class="mi">55</span><span class="o">&gt;</span>
   <span class="err">0</span><span class="nf">x00000000080487c3</span> <span class="o">&lt;+</span><span class="mi">50</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">call</span>   <span class="mh">0x8048620</span> <span class="o">&lt;</span><span class="nv">__stack_chk_fail@plt</span><span class="o">&gt;</span>
<span class="err">=</span><span class="o">&gt;</span> <span class="err">0</span><span class="nf">x00000000080487c8</span> <span class="o">&lt;+</span><span class="mi">55</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="o">%</span><span class="nb">edx</span><span class="p">,</span><span class="o">%</span><span class="nb">eax</span>
</code></pre></div></div>

<p>As we can see from the assembly above, the stack pointer does move at least by 1MiB so the theory of optimization is definitely ruled out.
Going through the program via the debugger using <code class="language-plaintext highlighter-rouge">stepi</code> I notice the following:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   <span class="err">0</span><span class="nf">x00000000080487be</span> <span class="o">&lt;+</span><span class="mi">45</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">sub</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rax</span><span class="p">),</span><span class="o">%</span><span class="nb">rsi</span>
   <span class="err">0</span><span class="nf">x00000000080487c1</span> <span class="o">&lt;+</span><span class="mi">48</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">je</span>     <span class="mh">0x80487c8</span> <span class="o">&lt;</span><span class="nv">main</span><span class="o">+</span><span class="mi">55</span><span class="o">&gt;</span>
   <span class="err">0</span><span class="nf">x00000000080487c3</span> <span class="o">&lt;+</span><span class="mi">50</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">call</span>   <span class="mh">0x8048620</span> <span class="o">&lt;</span><span class="nv">__stack_chk_fail@plt</span><span class="o">&gt;</span>
<span class="err">=</span><span class="o">&gt;</span> <span class="err">0</span><span class="nf">x00000000080487c8</span> <span class="o">&lt;+</span><span class="mi">55</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nv">mov</span>    <span class="o">%</span><span class="nb">edx</span><span class="p">,</span><span class="o">%</span><span class="nb">eax</span>
</code></pre></div></div>

<p>The instruction pointer skipped <code class="language-plaintext highlighter-rouge">&lt;__stack_chk_fail@plt&gt;</code> which is the the stack guard that is added to mitigate against stack buffer oveflows (whether intentional or not).
Essentially, a stack guard inserts some small value known as the canary between the stack variables and the return address. If the return address was overwritten, then the 
canary value would be overwritten. The way to check whether the canary has been overwritten can be done in either two ways:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">canary - original_canary != 0</code></li>
  <li><code class="language-plaintext highlighter-rouge">canary ^ original_canary != 0</code></li>
</ol>

<p>If any of the two are evaluated to be true, then the program will jump to the fail function to terminate the program.
In our program, it would seem that we did not overwrite register <code class="language-plaintext highlighter-rouge">rax</code> which appears to be our canary with the value of <code class="language-plaintext highlighter-rouge">0x8049fd0</code>.
I will now attempt to walk through with you what exactly is going on with my limited knowledge in Assembly (I’m going to use the excuse that I am a Mathematics student to excuse 
my lack of assembly knowledge :D):</p>

<p>For simplicity, I am going to modify the above assembly above to use more friendly notation when making references to addresses and write some pseudocode in C syntax (I’ll be 
omitting some details so it’s not a one to one replication). From instructions between <code class="language-plaintext highlighter-rouge">&lt;+11&gt;</code> to <code class="language-plaintext highlighter-rouge">&lt;+21&gt;</code>, 
we are storing the canary value 8 bytes below the base pointer:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&lt;+</span><span class="err">11</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">mov</span>    <span class="mh">0x182d</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">),</span><span class="o">%</span><span class="nb">rax</span>        <span class="err">#</span> <span class="mh">0x8049fd0</span>
<span class="o">&lt;+</span><span class="err">18</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">mov</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rax</span><span class="p">),</span><span class="o">%</span><span class="nb">rcx</span>
<span class="o">&lt;+</span><span class="err">21</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">mov</span>    <span class="o">%</span><span class="nb">rcx</span><span class="p">,</span><span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rax</span> <span class="o">=</span> <span class="mh">0x8049fd0</span>
<span class="n">rcx</span> <span class="o">=</span> <span class="n">rax</span>
<span class="o">*</span><span class="p">(</span><span class="n">rbp</span><span class="o">-</span><span class="mi">8</span><span class="p">)</span> <span class="o">=</span> <span class="n">rcx</span>
</code></pre></div></div>

<p>This value is then compared with <code class="language-plaintext highlighter-rouge">rax</code> register which is again loaded with the original canary value in the instruction address <code class="language-plaintext highlighter-rouge">&lt;+34&gt;</code>. The generated assembly code utilises the 
2nd method to check whether a canary value has been overwritten, by subtracting the two canary values:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&lt;+</span><span class="err">34</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">mov</span>    <span class="mh">0x1816</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">),</span><span class="o">%</span><span class="nb">rax</span>        <span class="err">#</span> <span class="mh">0x8049fd0</span>
<span class="o">&lt;+</span><span class="err">41</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">mov</span>    <span class="o">-</span><span class="mh">0x8</span><span class="p">(</span><span class="o">%</span><span class="nb">rbp</span><span class="p">),</span><span class="o">%</span><span class="nb">rsi</span>
<span class="o">&lt;+</span><span class="err">45</span><span class="o">&gt;</span><span class="p">:</span>    <span class="nf">sub</span>    <span class="p">(</span><span class="o">%</span><span class="nb">rax</span><span class="p">),</span><span class="o">%</span><span class="nb">rsi</span>
</code></pre></div></div>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rax</span> <span class="o">=</span> <span class="mh">0x8049fd0</span><span class="p">;</span><span class="c1">//store the original canary value into rax (this value will ideally be not modified)</span>
<span class="n">rsi</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">rbp</span><span class="o">-</span><span class="mi">8</span><span class="p">);</span> <span class="c1">//store our canary value to register rsi (this value could be modified if we have a buffer overflow)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">rsi</span> <span class="o">-</span> <span class="n">rax</span>
</code></pre></div></div>

<p>As the canary value was not modified, the result is set to <code class="language-plaintext highlighter-rouge">0</code>. <code class="language-plaintext highlighter-rouge">je</code> in iaddress <code class="language-plaintext highlighter-rouge">&lt;+48&gt;</code> will skip the next instruction to call <code class="language-plaintext highlighter-rouge">__stack_chk_fail@plt</code> (iaddress <code class="language-plaintext highlighter-rouge">&lt;+50&gt;</code>).</p>

<p><strong>Note:</strong> I did not read into the function <code class="language-plaintext highlighter-rouge">__stack_chk_fail@plt</code> so maybe they do more checks to see if the canary failed because it has the name <code class="language-plaintext highlighter-rouge">chk</code> into the name</p>

<p>As our program skipped <code class="language-plaintext highlighter-rouge">__stack_chk_fail@plt</code>, the program does not crash.</p>

<p>Now let’s take a quick look into why adding a print statement triggers the crash:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>=&gt; 0x00000000080487f6 &lt;+37&gt;:    call   0x8048650 &lt;puts@plt&gt;
   0x00000000080487fb &lt;+42&gt;:    mov    $0x0,%eax
   0x0000000008048800 &lt;+47&gt;:    mov    %eax,%edx
   0x0000000008048802 &lt;+49&gt;:    mov    0x17c7(%rip),%rax        # 0x8049fd0
   0x0000000008048809 &lt;+56&gt;:    mov    -0x8(%rbp),%rsi
   0x000000000804880d &lt;+60&gt;:    sub    (%rax),%rsi
   0x0000000008048810 &lt;+63&gt;:    je     0x8048817 &lt;main+70&gt;
   0x0000000008048812 &lt;+65&gt;:    call   0x8048660 &lt;__stack_chk_fail@plt&gt;
   0x0000000008048817 &lt;+70&gt;:    mov    %edx,%eax
   0x0000000008048819 &lt;+72&gt;:    leave
   0x000000000804881a &lt;+73&gt;:    ret
End of assembler dump.
(gdb) stepi

Program received signal SIGSEGV, Segmentation fault.
</code></pre></div></div>

<p>Immediately we can see that the stack guard is not the reason for the crash but rather a call to <code class="language-plaintext highlighter-rouge">puts@plt</code> that triggered the crash. 
Let’s compare the two instruction registers before the crash is triggered where the first is from a program with a valid buffer size:</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">i</span> <span class="nv">r</span>
<span class="nf">...</span>
<span class="nf">rbp</span>            <span class="mh">0x81ce0</span>             <span class="mh">0x81ce0</span>
<span class="nf">rsp</span>            <span class="mh">0x818d0</span>             <span class="mh">0x818d0</span>
<span class="nf">...</span>
</code></pre></div></div>

<p>v.s.</p>

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">info</span> <span class="nv">r</span>
<span class="nf">...</span>
<span class="nf">rbp</span>            <span class="mh">0x81ce0</span>             <span class="mh">0x81ce0</span>
<span class="nf">rsp</span>            <span class="mh">0xfffffffffff81cd0</span>  <span class="mh">0xfffffffffff81cd0</span>
<span class="nf">...</span>
</code></pre></div></div>

<p>Only the stack pointer <code class="language-plaintext highlighter-rouge">rsp</code> differs which is to be expected. To understand the crash, we first need to recall the fact that each function has their own stack.</p>

<details>
<summary>Side Note: Stacks</summary>

Feel free to skip this section. This section investigates how the stack grows. All you need to understand is that the new stack frame will be located "after" the 
callee stack frame.

Let's observe the following simple program:

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">z</span><span class="p">[</span><span class="mi">16</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="sc">'W'</span><span class="p">,</span> <span class="sc">'o'</span><span class="p">,</span> <span class="sc">'r'</span><span class="p">,</span> <span class="sc">'l'</span><span class="p">,</span> <span class="sc">'d'</span><span class="p">};</span>
    <span class="n">printf</span><span class="p">(</span><span class="s">"%d: %s %y</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">z</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">char</span> <span class="n">x</span><span class="p">[</span><span class="mi">32</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="sc">'H'</span><span class="p">,</span> <span class="sc">'e'</span><span class="p">,</span> <span class="sc">'l'</span><span class="p">,</span> <span class="sc">'l'</span><span class="p">,</span> <span class="sc">'o'</span><span class="p">};</span>
    <span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">21</span><span class="p">;</span>
    <span class="n">foo</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<b>Note:</b> When reading the stack, recall which way the stack grows and the endian. In our case, the stack grows downwards starting from a higher address and 
grows towards the lower addresses. The format is in little endian meaning the least significant bit is placed in the lower address.

Before <code class="language-plaintext highlighter-rouge">foo</code> is called, this is the state of our base and stack pointers:
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">i</span> <span class="nv">r</span> <span class="nb">rbp</span>
<span class="nf">rbp</span>            <span class="mh">0x81ce0</span>             <span class="mh">0x81ce0</span>
<span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">i</span> <span class="nv">r</span> <span class="nb">rsp</span>
<span class="nf">rsp</span>            <span class="mh">0x81ca0</span>             <span class="mh">0x81ca0</span>
</code></pre></div></div>

and the addresses of the stack variables:

<div class="highlighter-rouge"><pre class="highlight"><code>(gdb) p &amp;x[0]
$2 = <font color="ff5733"><b>0x81cb0</b></font> "Hello"
(gdb) p &amp;y
$3 = (int *) <font color="c898ff"><b>0x81cac</b></font>
</code></pre></div>

and the corresponding values in the stack (highlighted the same color as its corresponding addresses):
<div class="highlighter-rouge"><pre class="highlight"><code>(gdb) x/6x $sp
0x81ca0:        0x08049dd8      0x00000000      0x08049dd8      <font color="c898ff"><b>0x00000015</b></font>
0x81cb0:        <font color="ff5733"><b>0x6c6c6548      0x0000006f</b></font>
...
</code></pre></div>

<details>
<summary>Reading the Stack</summary>
<code class="language-plaintext highlighter-rouge">y = 21</code> corresponds to <code class="language-plaintext highlighter-rouge">0x15</code> stored in the address <font color="c898ff"><b>0x81cac</b></font>

<div class="highlighter-rouge"><pre class="highlight"><code>char x[32] = {'H', 'e', 'l', 'l', 'o'};
movabs $0x6f6c6c6548,%rax
</code></pre></div>

The string <code class="language-plaintext highlighter-rouge">x</code> starts from <font color="ff5733"><b>0x81cb0</b></font> where <code class="language-plaintext highlighter-rouge">H</code> is <code class="language-plaintext highlighter-rouge">0x48</code> (104 in ASCII) and <code class="language-plaintext highlighter-rouge">o</code> is <code class="language-plaintext highlighter-rouge">6f</code> (111 in ASCII)
</details>

When the function <code class="language-plaintext highlighter-rouge">foo</code> is called, we can observe that the new base stack is "above" (lower address) than the callee <code class="language-plaintext highlighter-rouge">main</code>:
<table>
<thead>
    <tr>
        <td>Register</td><td>main</td><td>foo</td>
    </tr>
</thead>
<tbody>
    <tr>
        <td>rbp</td><td>0x81ce0</td><td>0x81ca0</td>
    </tr>
    <tr>
        <td>rsp</td><td>0x81c90</td><td>0x81c60</td>
    </tr>
</tbody>
</table>

Therefore we should observe the stack variables under <code class="language-plaintext highlighter-rouge">foo</code> "above" (lower address) than the callee <code class="language-plaintext highlighter-rouge">main</code> as well:

<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p &amp;(x[0])
$4 = 0x81cb0 "Hello"
(gdb) p &amp;y
$5 = (int *) 0x81c64
(gdb) p &amp;z[0]
$6 = 0x81c70 "World"
(gdb) x/80x rsp
No symbol "rsp" in current context.
(gdb) x/-21x 0x81cb0+8
0x81c64:        0x00000015      0x00081cb0      0x00000000      0x6c726f57
0x81c74:        0x00000064      0x00000000      0x00000000      0x000e74a0
0x81c84:        0x00000001      0x649d7900      0xd7224120      0x00081ce0
0x81c94:        0x00000000      0x08048897      0x00000000      0x08049dd8
0x81ca4:        0x00000000      0x08049dd8      0x00000015      0x6c6c6548
0x81cb4:        0x0000006f
</code></pre></div></div>
</details>

<p>Any new stack frames will come after <code class="language-plaintext highlighter-rouge">rsp</code> (lower addresses in our case), so we can simply try to modify <code class="language-plaintext highlighter-rouge">rsp</code> with any random value to trigger a segfault. Considering we cannot even 
access the variable <code class="language-plaintext highlighter-rouge">buf</code>, it will come to no surprise that <code class="language-plaintext highlighter-rouge">gdb</code> will prevent us from writing to the memory address:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">p</span> <span class="nv">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="nf">Cannot</span> <span class="nv">access</span> <span class="nv">memory</span> <span class="nv">at</span> <span class="nv">address</span> <span class="mh">0xfffffffffff81cd0</span>
<span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">p</span> <span class="o">&amp;</span><span class="nv">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="kc">$</span><span class="err">4</span> <span class="err">=</span> <span class="err">0</span><span class="nf">xfffffffffff81cd0</span> <span class="o">&lt;</span><span class="nv">error</span><span class="p">:</span> <span class="nv">Cannot</span> <span class="nv">access</span> <span class="nv">memory</span> <span class="nv">at</span> <span class="nv">address</span> <span class="mh">0xfffffffffff81cd0</span><span class="o">&gt;</span>
<span class="p">(</span><span class="nf">gdb</span><span class="p">)</span> <span class="nv">set</span> <span class="err">{</span><span class="nv">int</span><span class="err">}</span><span class="kc">$</span><span class="nb">rsp</span><span class="err">=</span><span class="mi">8</span>
<span class="nf">Cannot</span> <span class="nv">access</span> <span class="nv">memory</span> <span class="nv">at</span> <span class="nv">address</span> <span class="mh">0xfffffffffff81cd0</span>
</code></pre></div></div>

<p>Honestly, this was very anti-climatic.</p>

<p><strong>Conclusions:</strong></p>
<ul>
  <li>The stack guard was never triggered since we never overwrote our canary value (I mean the program did nothing anyways)</li>
  <li>Adding a single print statement was enough to trigger a segmentation fault as the stack pointer will point into an unreachable address for writing</li>
</ul>

<hr />

<h3 id="some-random-notes-on-gdb">Some Random Notes on GDB</h3>

<p>A few notes on working with a remote target with gdb:</p>
<ul>
  <li><strong>Connecting to the target:</strong> <code class="language-plaintext highlighter-rouge">target qnx &lt;ip-address&gt;:8000</code></li>
  <li><strong>Load the file:</strong> <code class="language-plaintext highlighter-rouge">file &lt;prog-binary&gt;</code></li>
  <li><strong>Upload binary to the target:</strong> <code class="language-plaintext highlighter-rouge">upload &lt;file&gt; &lt;full_path_in_remote&gt;</code></li>
</ul>

<p><strong>Example:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) target qnx 192.168.124.207:8000
Remote debugging using 192.168.124.207:8000
MsgNak received - resending
Remote target is little-endian
Disabled 'set detach-on-fork' for remote targets
(gdb) file prog-amd64
Reading symbols from prog-amd64...
(gdb) upload prog-amd64 /tmp/prog-amd64
</code></pre></div></div>

<p>Some stuff I found out:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) x/1s 0x8048824
0x8048824:      "hello world"
</code></pre></div></div>]]></content><author><name>Ju Hong Kim</name></author><category term="micro" /><category term="stack" /><category term="qnx" /><category term="C/C++" /><summary type="html"><![CDATA[How a simple innocent looking one line code to allocate a buffer in main can crash the program and a look into why amd64 is yet isn't affected in our target platform]]></summary></entry><entry><title type="html">QNX is ‘Free’ to Use</title><link href="https://zakuarbor.codeberg.page/blog/qnx-non-commercial/" rel="alternate" type="text/html" title="QNX is ‘Free’ to Use" /><published>2024-11-09T00:00:00-05:00</published><updated>2024-11-09T00:00:00-05:00</updated><id>https://zakuarbor.codeberg.page/blog/qnx-non-commercial</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/qnx-non-commercial/"><![CDATA[<p>Recently on Hackernews, a relations developer from QNX announced that <a href="https://news.ycombinator.com/item?id=42079460">QNX is now free for anything non-commercial</a>. QNX also made an annoncement
to the LinkedIn Community as well which was where I learned about it.
For those who are not familiar with QNX, QNX is a properiety realtime operating system targetted for embedded systems and is installed in over 255 million vehicles.
QNX has a great reputation for being reliable and safe embedded system to build software on top of due to its microarchitecture and compliance to many industrial and engineering design process
which gives customers the ability to certify their software in safety critical systems more easily. What makes QNX appealing is a discussion on another time but for me, this is a good
opportunity to fiddle around with the system. I was <a href="https://zakuarbor.codeberg.page/blog/carletonu-qnx-license/">previously denied a license</a> from my university who had an agreement with QNX and
my attempts to get an educational license did not go far years ago.</p>

<p><img src="../assets/products/qnx/announcement-linkedin.png" alt="LinkedIn Post announcing QNX 8.0 has a non-commercial license" /></p>

<p>Previously to gain access to QNX, one would have to either purchase a commericial license from QNX or have an academic license. This made hobbyists from having access to the operating system.
With the non-commericial license, QNX is now open for those who are interested in running a RTOS in their hobby projects and for open source developers to port their software on QNX. QNX is a
POSIX compliant software but as QNX was not open for public use, companies had to port open source projects into QNX such as ROS (Robotics Operating System which isn’t an actual OS). QNX
also mentions the non-commercial license allows one to develop training materials and books on utilizing QNX which is frankly scarce outside of QNX authorized materials (i.e. QNX training, Foundary27, and
QNX Documentation).</p>

<p><img src="../assets/products/qnx/non-commercial-lic.png" alt="A sample of what is allowed with a non-commercial license" /></p>

<p>While the announcement is welcoming news for me who would love to tinker around, this is yet another product entering the hobbyist community late. The reason for the success of UNIX, Linux, RISCV, and ARM is the ease and
availability of the product to hobbyists and students who later bring this to their workplace or make the product better. Closing access to technology is a receipe for disaster in the long-term in terms of
gaining market advantage. This is exactly the reason why we see cloud corporations enticing either the student or the hobbyist population to have free (limited) access to their products and even at times
sponsor events targeted towards them. Linux, BSD, and FreeRTOS being open source makes them the dominant OS among the tinkering community and have wide adoption in the market. Over the years, we have seen a
shift from customers using commercial and custom grade hardware and software towards more open source or off the shelf solutions including on critical safety applications such as those on SpaceX using Linux and
non radiation hardened CPUs. IBM for instance has been late to developing an ecosystem of developers for their Cloud, Database and Power Architecture. IBM over the recent years has done a good job in creating free
developer focused trainings which tries to make use of their own technologies. However, it is plain obvious that IBM has failed to capture mainstream interest of hobbyists who much prefer other cloud providers such as
AWS, Google Cloud, Linode, and Digital Ocean. The SPARC and POWER architectures were open-source far too late by their own respective owners that developers have shifted towards RISCV and ARM as those architectures
are either more open or easier to obtain (such as through Raspberry Pi Foundation).</p>

<p>While I have not done any sentimental analysis of this announcement, I think overall this move is a good first step to develop an ecosystem of developers who appreciate and understand the QNX architecture but is also
met with sketpicism. For reference, QNX has messed with the community twice before which explains the big mistrust from experienced developers. The top comment on <a href="https://news.ycombinator.com/item?id=42079460">Hackernews</a>
does a great job summarizing the sketpicism. QNX used to have a bigger hobbyist community in the past where open source projects such as Firefox would have a build for QNX, but that all died when QNX closed their doors
to the community. Years later, QNX source code was available for the public to read (though probably with restrictions) but later shut the source code availability after being acquired bhy Blackberry who does not have the
best reputation to the developer community (hence why Blackberry Phones failed to capture the market from my understanding despite once being a market leader).</p>

<p>Regardless, I have plans to create a few materials on QNX in the coming months and perhaps create a follow up to <a href="https://zakuarbor.codeberg.page/blog/qnx-aps/">QNX Adapative Partitioning System</a> as it seemed to have gained enou
has been ranked top 5 on Google search results (though I doubt it had many readers due to the population of QNX developers):</p>

<p><img src="../assets/products/qnx/aps-search-results.png" alt="Google Search Result Ranking for my QNX APS webpage" /></p>
<p class="caption">Google Search Console from July 9 2023 - Nov 8 2024 which had 308 clicks</p>]]></content><author><name>Ju Hong Kim</name></author><category term="other" /><category term="qnx" /><summary type="html"><![CDATA[QNX now has a non-commercial license for hobbyists to fiddle around]]></summary></entry><entry><title type="html">Verifying Email Signature Manually</title><link href="https://zakuarbor.codeberg.page/blog/signature-verification/" rel="alternate" type="text/html" title="Verifying Email Signature Manually" /><published>2024-10-12T00:00:00-04:00</published><updated>2024-10-12T00:00:00-04:00</updated><id>https://zakuarbor.codeberg.page/blog/signature-verification</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/signature-verification/"><![CDATA[<p>I noticed that the neocities community love using protonmail and some even share their public key to enable full encryption communication. 
What makes protonmail special is the focus on privacy and security. All emails sent between Proton Mail users are end to end encrypted meaning not even Proton can have 
access to the messages. However, when communicating outside of Proton ecosystem to non-Proton Mail users like those with Gmail and Outlook, communication between the two 
are not encrypted end to end by default. This does not mean the encryption utilized by Gmail and Outlook are inadequate. The vast majority of emails are encrypted in transit 
using TLS encryption, the very same encryption you use to enter your password to your bank or entering your credit card to buy something online for instance.</p>

<p><strong>Aside:</strong> If you are curious about protonmail’s encryption scheme: <a href="https://proton.me/support/proton-mail-encryption-explained">https://proton.me/support/proton-mail-encryption-explained</a></p>

<h2 id="what-is-the-purpose-of-a-digital-signature">What is the Purpose of a Digital Signature</h2>

<p>Depending on your sense of security, TLS encryption may not be sufficient. There are a few issues with just relying on TLS encryption:</p>
<ol>
  <li><strong>Loss of Privacy:</strong> Companies like Google and Microsoft have access to your data. Depending on their policies, your emails could be used for training purposes, released to 
government authorities, or be leaked due to a security breach</li>
  <li><strong>Potential For Data to be Compromised:</strong> Even if you trust your company to respect your privacy, it does not mean the company has good security practices and could be attacked by 
a state sponsor. With data not potentially be encrypted at rest and encrypted properly, your data could be leaked to malicious actors</li>
</ol>

<p>Since communication outside of ProtonMail is not end-to-end encryption, if one wants to maintain the security level of their communication, they would need to require both parties to 
send emails encrypted with each other’s public key. Therefore, it is not uncommon to see people on the internet share their public key for others to communicate with them.</p>

<p>Personally, I am fine with using Gmail and Outlook for all my email communication but nonetheless, I thought it would be interesting to see how one would manually verify the signature 
of an email. One other use case of public key cryptography is signing. Encryption refers to obfuscating the original message to ensure confidentiality (to the best of one’s knowledge). 
Digital signing does not ensure confidentiality but <strong>authenticity</strong>. In other words, digital signing is a process to verify that the email has not been tampered with and comes from 
the person whom they claim to be. With man in the middle attacks, it is possible for an attacker to intercept and modify the original message. Here are some purposes (and potential 
uses) of digital signatures:</p>

<ol>
  <li><strong>Authenticity:</strong> A verification that you are indeed talking to the person whom you think you are talking to
    <ul>
      <li>this assumes that the private key of the other party is kept secret and secured and you are given the public key somehow in a <strong>secured and trusted</strong> way</li>
    </ul>
  </li>
  <li><strong>Integrity:</strong> The ability to detect if the message has been tampered with (similar to a tamper tape/seal on very sensitive envelopes or products)</li>
  <li><strong>Attestation</strong> I really should be careful what I mean by “attestation”. I am referring to the sender attesting that they indeed are the one who is communicating with them for 
legal purposes. Similar to how we sign documents to attest that we agreed to the accuracy of the documents and agreement to the terms outlined in the contract, digital signatures 
can be also used for similar purposes. A better word for this process is notarization.</li>
</ol>

<p>While authenticity and “attestation” (from my definition) sound similar, but there is a key difference between the two. Authenticity is for the receiver to verify they are indeed 
talking to the person they believe to be in contact with. “Attestation” is a way to legally bind the user to a contract. Therefore, if a digital signature is ever used for the purpose 
of entering a contract, one should ensure they use separate keys for signing and encryption. When you communicate with others using public key encryption, you are obviously not 
signing every message as if it was a legal contract. This is something I probably need to remember myself as I delve more into security.</p>

<p>One interesting aspect about digital signatures is protecting software from supply chain attacks. If you ever download a software from a big open source project like Fedora, they would 
often provide you either a hash or a signature. A hash can be used to verify that the file has not been tampered. However, this does not provide authenticity. Authenticity can only be 
obtained through the usage of digital signature. If an attacker manages to infilterate a server, they could potentially replace the file and its associated hash with their own 
malicious file. The client will not be able to protect themselves from this supply chain attack as both the file and the hash posted on the project’s website has been compromised. 
With digital signature, one can verify the authenticity of the file and have assurance the file has not been tampered with. However, this does require one to already have the public 
key beforehand as the attacker could already have compromised server that shares the project’s public key.</p>

<p><img src="../assets/programming/security/github-gpg.png" alt="An example of GPG used to sign commits" /></p>
<p class="caption">Commits can be signed ON Git. Github has a feature to mark the commit or tag as verified if the commit was both signed and verified by Github.</p>

<p>I mentioned that digital signatures can provide authenticity, but this is not entirely true. This is true if you have obtained the public key from a trusted source such as from the 
entity you are communicating with. This is where digital certificates can help.</p>

<p>Anyhow, that was enough rambling, time to go into the details of how to verify email signatures.</p>

<h2 id="how-to-verify-a-digital-signature">How to Verify a Digital Signature</h2>

<p>Digital signatures work by having the sender (Alice) <strong>sign</strong> the message with their private key. With this, the receiver (Bob) can use the sender’s (Alice’s) public key to verify 
the message. From my understanding, the signature is often appendded to the email message so that the receiver can easily obtained the signature when they receive the email. This 
could differ when using digital signatures for different purposes such as downloading a software from the publisher’s site. Wikipedia has a good diagram to visualize this process:</p>

<p><img class="transparent-background" alt="An image from Wikipedia to illustrate how digital signatures work" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Private_key_signing.svg/220px-Private_key_signing.svg.png" /></p>
<p class="caption">A diagram illustrating how the process of signing and verifying a digital signature works. Extracted from Wikipedia</p>

<p>I will not go into how to sign an email as my focus is on how to verify an email signature. More specifically, I will be using ProtonMail to automatically sign my email and send the email 
to my Gmail account.</p>

<h3 id="step-1-obtain-the-public-key">Step 1: Obtain the Public Key</h3>

<p>There are a few methods to obtain a public key such as from the organization’s website or attached to the website. This is likely the most vulnerable step in the entire process 
as an attacker could upload their own public key to a vulnerable website or masquerade as the person you expect to be communicating with such as having an email that resembles 
closely with a trusted identity or is spoofed to appear legitimate as seen with <a href="https://www.avanan.com/blog/how-outlook-unwittingly-helps-hackers">Outlook in 2021</a>. Protonmail 
offers an option to send a public key to those outside of Protonmail ecosystem automatically. While this method isn’t flawed (I ain’t a cybersecurity expert) per se, this does 
make me think twice about the validity of the public key that has been sent to me as using a compromised key could make this entire verification process go wrong. However, to 
initiate communication that is encrypted end to end, this is a necessary step. While I do not have a clear picture on certificates, certificates probably could alleviate this 
issue by having a trusted third party called the certificate authority to verify the identity of the sender.</p>

<div class="multiple_img_div">
<img src="../assets/programming/security/email-gpg.png" class="img_33" />
<img src="../assets/programming/security/fedora-gpg.png" class="img_60" />
</div>
<p class="caption"><b>Left:</b> Alice sending an email to Bob with her publickey and signature. <b>Right:</b> Instructions to verify Fedora ISO</p>

<h3 id="step-2-import-alicessenders-public-key">Step 2: Import (Alice’s/Sender’s) Public Key</h3>

<p>Once Alice’s (i.e. the sender) public key has been obtained, the key needs to be imported to the public keyring. I do not understand why the keys always have to be imported 
rather than just being specified to be honest. Perhaps it’s because I am using the public key as an armoured ASCII <code class="language-plaintext highlighter-rouge">asc</code> rather than the GNU Privacy Guard <code class="language-plaintext highlighter-rouge">gpg</code> public keyring file. Though 
I am not going to bother verifying this.</p>

<p><strong>To import a public key:</strong> <code class="language-plaintext highlighter-rouge">gpg --import &lt;key.gpg&gt;</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --import publickey-alice@proton.me.asc 
gpg: key &lt;redacted&gt;: public key "alice@proton.me &lt;alice@proton.me&gt;" imported
gpg: Total number processed: 1
gpg:               imported: 1 
</code></pre></div></div>

<p>We can verify the import with: <code class="language-plaintext highlighter-rouge">gpg --list-public-keys &lt;uid&gt;</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --list-public-keys alice@proton.me
pub   ed25519 2024-01-10 [SC]
      &lt;redacted fingerprints&gt;
uid           [ unknown] alice@proton.me &lt;alice@proton.me&gt;
sub   &lt;redacted&gt; 2024-01-10 [E]
</code></pre></div></div>

<h3 id="step-3-download-the-email-message">Step 3: Download the Email Message</h3>

<p>This step does vary depending on your email client but on Gmail, one can simply download the email by clicking on the kebab menu (the three dots or ellipses) found on the right 
side of the email as shown below:</p>

<p><img src="../assets/programming/misc/gmail-menu-expanded.png" alt="Gmail Expanded Menu to Downloads the email" /></p>

<p>This will download the email in the electronic mail format <code class="language-plaintext highlighter-rouge">.eml</code> which is <strong>not</strong> the signed email. <code class="language-plaintext highlighter-rouge">.eml</code> files have a lot of extra information that is packaged over the 
signed email. We will need to extract the content that has been signed to verify the message.</p>

<h3 id="step-3-extract-the-content-containing-the-signed-email">Step 3: Extract the Content Containing the Signed Email</h3>

<p>The content of the email that needs to be extracted is the data that has been signed by Alice’s public key to create the signature. The file will look something like the following:</p>

<p><img src="../assets/programming/security/email-no-attachment.png" alt="An image of an edited eml file that does not contain attachments aside from the signature" /></p>
<p class="caption">An edited email file that does not contain attachments aside from the signature</p>

<h3 id="step-4-extract-signed-message">Step 4: Extract Signed Message</h3>

<p>As mentioned in the previous step, we need to remove all the extra data in the email file that isn’t part of the signed message. You should make a backup of the email file because 
this is easy to mess up if you do not know what you are doing like the author had:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cp 'GPG KEY no publickey attachments.eml' 'GPG KEY no publickey attachments.eml.bak'
$ ls 'GPG KEY no publickey attachments.eml'*
'GPG KEY no publickey attachments.eml'  'GPG KEY no publickey attachments.eml.bak
</code></pre></div></div>

<p>The content of the message <b><u>starts after</u></b> you see the following header (the hash will differ):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--------AAAAAAAAAAAA
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">--------AAAAAAAAAAAA</code> is our boundary as clear denoted earlier in the file.</p>

<p>This means the very <strong>first line</strong> of the signed file is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Content-Type: multipart/mixed;boundary=---------------------BBBBBBBBBBBB
</code></pre></div></div>

<p>The contents of the signed message is enclosed within the boundary which is (does not include the boundary) as shown below (<strong>remove trailing newlines</strong>):</p>

<p><img src="../assets/programming/security/email-signed-msg.png" alt="An illustration of what is part of the signed message" /></p>
<p class="caption">The contents of the signed message</p>

<p>One thing I notice is that the hash on the first line of the signed message is also the last line in the signed message. For instance, in our example that would be:
<code class="language-plaintext highlighter-rouge">BBBBBBBBBBBB</code>. Therefore our file should also end with this hash.</p>

<p>For instance, if our message looked along the lines of:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MIME-Version: 1.0
 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha512; boundary="------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415"; charset=utf-8

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
 Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592
 Content-Type: application/pgp-keys; filename="publickey - alice@proton.me - &lt;redacted&gt;.asc"; name="publickey-alice@proton.me.asc"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="publickey-alice@proton.me.asc"; name="publickey - alice@proton.me - &lt;redacted&gt;.asc"

 ABCDEF0x4ZjZkeGxSL0xUABCDEFmltotlUR0ABCDEFWaABCDEFE9PQP9ABCDEFAABCDEFtLUVORCBABCED
 ABCDEFEABCDEFFWSBCTE9DSy0tLABCDE==
 -----------------------ff35159c3ebf11234dd954191b3141592--

 --------3141887d7abcdefgbe09e18825fd164103abcdefgf8c40b59382649cd69b31415
</code></pre></div></div>

<p>Then the signed message would be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Content-Type: multipart/mixed;boundary=---------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592

 ...

 -----------------------ff35159c3ebf11234dd954191b3141592
 Content-Type: application/pgp-keys; filename="publickey - alice@proton.me - &lt;redacted&gt;.asc"; name="publickey-alice@proton.me.asc"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="publickey-alice@proton.me.asc"; name="publickey - alice@proton.me - &lt;redacted&gt;.asc"

 ABCDEF0x4ZjZkeGxSL0xUABCDEFmltotlUR0ABCDEFWaABCDEFE9PQP9ABCDEFAABCDEFtLUVORCBABCED
 ABCDEFEABCDEFFWSBCTE9DSy0tLABCDE==
 -----------------------ff35159c3ebf11234dd954191b3141592--
</code></pre></div></div>

<h3 id="step-5-verify-the-email-signature">Step 5: Verify the Email Signature</h3>

<p><strong>Verify the signature:</strong> <code class="language-plaintext highlighter-rouge">gpg --verify signature.asc message.txt</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --verify signature.asc message.txt 
 gpg: Signature made Mon 07 Oct 2024 11:29:48 PM EDT
 gpg:                using EDDSA key &lt;redacted&gt;
 gpg: Good signature from "alice@proton.me &lt;alice@proton.me&gt;" [unknown]
 gpg: WARNING: This key is not certified with a trusted signature!
 gpg:          There is no indication that the signature belongs to the owner.
 Primary key fingerprint: &lt;redacted&gt;
</code></pre></div></div>

<p>While the signature has been verified: <code class="language-plaintext highlighter-rouge">Good signature</code>, we do see a warning about the key not being certified.</p>

<h3 id="optional-step-6-validate-imported-public-key">(Optional) Step 6: Validate Imported Public Key</h3>

<p>Upon reading <a href="https://www.gnupg.org/gph/en/manual/x56.html">gnupg manual</a>, there are instructions to verify the imported public key by checking if the key’s fingerprint matches the 
key you are expecting from Alice (the sender). This does involve Alice letting Bob know about it’s key’s fingerprint somehow whether that be in email, text, voice call or in some paper 
delivered to Bob. Let’s pretend the fingerprint of Alice’s public key was transmitted to you through a trusted source is:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>768B 218A CCD7 AA34 9830  52D8 9BD4 1A08 9D98 BC02
</code></pre></div></div>

<p>We can verify whether the public key really came from Alice by verifying the public key’s fingerpint and see if it matches:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --edit-key alice@proton.me
...
gpg&gt; fpr
pub   ed25519/[redacted] 2024-01-10 alice@proton.me &lt;alice@proton.me&gt;
 Primary key fingerprint: <font color="#C01C28"><b>768B 218A CCD7 AA34 9830  52D8 9BD4 1A08 9D98 BC02</b></font>
</code></pre></div></div>

<p>To validate Alice’s public key (<strong>proceed with caution</strong>), we must sign the key with our own private key:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gpg&gt; sign

pub  ed25519/[redacted]
     created: 2024-01-10  expires: never       usage: SC  
     trust: unknown       validity: unknown
 Primary key fingerprint: 768B 218A CCD7 AA34 9830  52D8 9BD4 1A08 9D98 BC02

     alice@proton.me &lt;alice@proton.me&gt;

Are you sure that you want to sign this key with your
key "Bob &lt;bob@gmail.com&gt;" ([redacted])

Really sign? (y/N) yes

gpg&gt; quit
Save changes? (y/N) y
</code></pre></div></div>

<p>However, this is not suffice to change the validity. On <a href="https://serverfault.com/questions/569911/how-to-verify-an-imported-gpg-key">serverfault</a>, Baker does a good job explaining that TRUST != VALIDITY. 
I am guessing due to the differences in the default settings on <code class="language-plaintext highlighter-rouge">gpg</code>, I need to set my <code class="language-plaintext highlighter-rouge">trust</code> level to 5 <code class="language-plaintext highlighter-rouge">ultimate</code> to remove this warning:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gpg&gt; trust
pub  ed25519/[redacted]
     created: 2024-01-10  expires: never       usage: SC  
     trust: unknown       validity: unknown
sub  cv25519/[redacted]
     created: 2024-01-10 expires: never       usage: E   
[ unknown] (1). alice@proton.me &lt;alice@proton.me&gt;

Please decide how far you trust this user to correctly verify other users' keys
(by looking at passports, checking fingerprints from different sources, etc.)

  1 = I don't know or won't say
  2 = I do NOT trust
  3 = I trust marginally
  4 = I trust fully
  5 = I trust ultimately
  m = back to the main menu

Your decision? 5

...

Please note that the shown key validity is not necessarily correct
unless you restart the program.
</code></pre></div></div>

<p>Now if we take a look at the verification, we no longer see the warnings.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --verify signature.asc message.txt
gpg: Signature made Mon 07 Oct 2024 11:29:48 PM EDT
gpg:                using EDDSA key &lt;redacted&gt;
gpg: Good signature from "alice@proton.me &lt;alice@proton.me&gt;" [ultimate]
</code></pre></div></div>

<h2 id="conclusion">Conclusion</h2>

<p>In practice, no one verifies the digital signatures of emails manually. Any sane individual will utilize any email client that would automate the verification process for them. 
As most individuals are not aware of digital signing and email encryption, I’ll probably not set up my email client for work, school, and personal email to automatically verify, sign, and encrypt emails 
unless I am required to. This does mean I am exposing myself to the spying eyes of my email providers and be suspectible to man in the middle attacks and have my personal information potentially leaked.</p>

<p>To summarize the steps:</p>
<ol>
  <li><strong>Import the keys:</strong> <code class="language-plaintext highlighter-rouge">gpg --import &lt;key.gpg&gt;</code></li>
  <li>Extract the signed message (this includes any attachments that is not the signature itself)</li>
  <li><strong>Verify the email:</strong> <code class="language-plaintext highlighter-rouge">gpg --verify signature.asc message.txt</code></li>
</ol>]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="pgp" /><category term="encryption" /><summary type="html"><![CDATA[A look into how to manually verify email signatures]]></summary></entry><entry><title type="html">A Quick Look Into Half-Width and Full-Width Characters</title><link href="https://zakuarbor.codeberg.page/blog/halfwidth-fullwidth-encoding/" rel="alternate" type="text/html" title="A Quick Look Into Half-Width and Full-Width Characters" /><published>2024-10-07T00:00:00-04:00</published><updated>2024-10-07T00:00:00-04:00</updated><id>https://zakuarbor.codeberg.page/blog/halfwidth-fullwidth-encoding</id><content type="html" xml:base="https://zakuarbor.codeberg.page/blog/halfwidth-fullwidth-encoding/"><![CDATA[<p>A friend of mine has been asking me a few questions about encoding for a paper he is working on. 
While I don’t understand what his research is on, all I can understand from his research is that he is 
working on analyzing Japanese texts and it involves understanding character encodings. 
Character encoding is not a topic that most native-English programmers are familiar with. 
The most that the average programmer will know is the existence of ASCII and UTF-8 encoding. 
If we are using anything beyond the English alphabets 
and arabic numerals (i.e. 1, 2, 3, 4, 5, 6, …) then we can utilize UTF-8, else use ASCII.</p>

<p>I am sure most of us has encountered the random garabage characters such as �  or the □ (U+25A1) when trying to read documents that 
have a mix of English and some foreign language or see random garbage displayed in our media displays like the Infotainment displays 
when we try to listen to music from Asia.</p>

<p><img src="https://users.ox.ac.uk/~martinw/dlc/images/Chapter%2004_img_1.jpg" alt="An example of Chinese not displaying correctly" /></p>
<p class="caption">Chinese characters not displaying correctly. Extracted from <a href="https://users.ox.ac.uk/~martinw/dlc/chapter4.htm">Developing Linguistic Corpora: a Guide to Good Practice</a></p>

<p>I was not aware of the existence of full-width and half-width characters till the friend asked me to briefly give an explaination on
the differences between the two from a technical aspect. 
For those like me who weren’t aware that the Japanese mix between 
zenkaku (full-width) and hankaku (half-width) characters, look at the image below or visit the following webpage for more explanation: <a href="https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained">https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained</a></p>

<p><img src="https://images.ctfassets.net/rrofptqvevic/3276rMt8nR8HEVYYAhhZvV/633c276e889c8dd101c4ea89cc07f82d/image_-_2023-07-21T105935.292.webp" alt="An image displaying the difference between full and half-width characters" /></p>

<p>As you can see, half-width characters unsurprisingly takes up less space visually than the full-width characters.</p>

<p><img src="https://zakuarbor.codeberg.page/blog/assets/programming/encoding/full-half-width.png" alt="Full and Half Width Characters encoded on UTF-8" /></p>
<p class="caption">Full and Half Width encoded on UTF-8 as seen through Vim</p>

<p>While I have read and typed Korean during my younger years when I was forced to learn Korean, it never clicked to me how much space Korean
takes up graphically. It is obvious in hindsight but it was nonetheless interesting.</p>

<p>There is also an implication 
on the amount of data half-width and full-width characters consume (though this does depend on the encoding). For Western 
audience, we know that <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> takes up 1 byte and can be represented as a <code class="language-plaintext highlighter-rouge">char</code> in C.</p>

<h2 id="extending-ascii">Extending ASCII</h2>

<p>One interesting fact about ASCII is that ASCII only maps to 128 characters (though only 95 is printable). Recall that ASCII can be 
represented by 1 byte which makes up of 8 bits. Doing the Math, 8 bits can represent 2^8 = 256 values. This leaves us with the remaining 
128 values unmapped to anything.</p>

<p><img src="../assets/programming/encoding/ascii-table.png" alt="ASCII Table From Wikipedia" /></p>
<p class="caption">ASCII Table. Extracted from Wikipedia</p>

<p>This allows other languages and programmers to take advantage in extending ASCII to display extra characters such as accents from European 
languages such as <code class="language-plaintext highlighter-rouge">é, è, ç, à</code> in Latin 8 and <a href="https://en.wikipedia.org/wiki/ISO/IEC_8859">ISO 8859</a> or Katakana characters in 
<a href="https://en.wikipedia.org/wiki/JIS_X_0201">JIS C 6220</a> (JIS X 0201) in 1969. 
Though JIS C 6220 does change a few characters so it is not exactly an extension of ASCII. Though ignoring the few differences, we can 
see that the Katakana characters are mapped in the remaining half starting from 0xA1 to 0xDF.</p>

<p><img src="../assets/programming/encoding/jis-c-6220.png" alt="JIS C 6220 chart" /></p>
<p class="caption">JIS C 6220 which is also known as JIS X 0201. Extracted from Wikipedia</p>

<p>ISO 8859 on the other hand such as Latin-8 seems to be a direct extension of ASCII where 0xA1 - 0xFF contains characters from several 
European languages such as French, Finnish and Celtic.</p>

<p><img src="../assets/programming/encoding/iso-8859.png" alt="ISO 8859-14" /></p>
<p class="caption">ISO 8859-14 (Latin-8) Encoding. Extracted from Wikipedia</p>

<h2 id="aside-utf-8-vs-utf-16">Aside: UTF-8 v.s UTF-16</h2>

<p>Based on the <a href="https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained">article</a> I shared, half-width characters takes up 1 byte while full-width characters takes up 2 bytes (also can be called double byte character). 
I do believe this depends on the encoding used. Taking a look at the size and bytes encoding, we can see that number <code class="language-plaintext highlighter-rouge">1</code> in UTF-8 encoding takes 1 and 3 bytes for half-width and full-width character repsectively</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ stat -c "%n,%s" -- halfwidth-utf8.txt fullwidth-utf8.txt 
halfwidth-utf8.txt,1
fullwidth-utf8.txt,3
</code></pre></div></div>

<p>One confusion I had was understanding what the difference between UTF-8 and UTF-16 and the following excercise helped me understand this:</p>
<ul>
  <li>UTF-8 encodes each character between 1-4 bytes</li>
  <li>UTF-16 encodes each characters between 2-4 bytes</li>
</ul>

<p>UTF-8 and UTF-16 as you can tell are variable length meaning they take up more or less bytes depending on the character being encoded. We can 
see this by comparing the number <code class="language-plaintext highlighter-rouge">1</code> arabic numeral v.s. the Chinese character for the number 1 <code class="language-plaintext highlighter-rouge">一</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ stat -c "%n,%s" -- halfwidth-1.txt chinese-1.md 
halfwidth-1.txt,1
chinese-1.md,3
</code></pre></div></div>

<p>In UTF-8, <code class="language-plaintext highlighter-rouge">1</code> takes up 1 byte which is unsurprising as ASCII has great advantage in UTF-8 compared to other Asian languages such as Chinese where the character for 1 <code class="language-plaintext highlighter-rouge">一</code> consumed 3 bytes.</p>

<p><strong>Note:</strong> Do not attempt to display UTF-16 encoded files on the terminal without changing your locale (or whatever it is called). It will not display nicely. Vim on my machine will automatically open the file as UTF-16LE.</p>

<p><img src="https://zakuarbor.codeberg.page/blog/assets/programming/encoding/full-half-width.png" alt="My default terminal settings is unable to display the content in Chinese properly" /></p>

<p>Let’s inspect the contents of the files between Half character <code class="language-plaintext highlighter-rouge">1</code> and Full Byte Character <code class="language-plaintext highlighter-rouge">１</code> in HEX:</p>
<pre class="highlight" style="background-color: #1b1b1b; padding: .5rem; line-height: 1.25em"><font color="#D0CFCC"><b>$ </b></font>cat halfwidth-1.txt; echo &quot;&quot;; xxd halfwidth-1.txt; cat fullwidth-1.txt ; echo &quot;&quot;; xxd fullwidth-1.txt 
1
00000000: <font color="#26A269"><b>31</b></font>                      <font color="#C01C28"><b>               </b></font>  <font color="#26A269"><b>1</b></font>
１
00000000: <font color="#C01C28"><b>efbc</b></font> <font color="#C01C28"><b>91</b></font>                   <font color="#C01C28"><b>             </b></font>  <font color="#C01C28"><b>...</b></font>
</pre>

<p>As we can see, the half-width character <code class="language-plaintext highlighter-rouge">1</code> in UTF-8 is represented as <code class="language-plaintext highlighter-rouge">0x31</code> meaning only one byte would be required. However, a full-width 
digit <code class="language-plaintext highlighter-rouge">１</code> is represented as <code class="language-plaintext highlighter-rouge">0xEFBC91</code>. Now let’s compared this with UTF-16:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat halfwidth-utf16.txt; echo ; xxd halfwidth-utf16.txt; cat fullwidth-utf16.txt; echo; xxd fullwidth-utf16.txt 
1
00000000: 0031                                     .1
�
00000000: ff11                                     ..
</code></pre></div></div>

<p><strong>Note:</strong> To view UTF-16 on VIM run on command mode (i.e. press <code class="language-plaintext highlighter-rouge">esc</code> to exit current mode and press <code class="language-plaintext highlighter-rouge">:</code> to enter command mode): <code class="language-plaintext highlighter-rouge">e ++enc=utf-16be fullwidth-utf16.txt</code></p>

<p>As expected, UTF-16 represents code points in the upper range very well where we now see <code class="language-plaintext highlighter-rouge">１</code> (full-width 1) being represented with only 2 bytes unlike the 3 that was required in UTF-8. 
Though the same cannot be said for code points in the lower range such as our half-width digit 	<code class="language-plaintext highlighter-rouge">1</code> which now takes 2 bytes by appending <code class="language-plaintext highlighter-rouge">0x00</code> to its hex representation.</p>

<h2 id="half-width-and-full-width-in-japanese-specific-encodings">Half-Width and Full-Width in Japanese Specific Encodings</h2>

<p>I had earlier mentioned about <a href="https://en.wikipedia.org/wiki/JIS_X_0201">JIS C 6220</a> (JIS X 0201) which utilized the fact that the last 128 bytes of ASCII isn’t utilized which allowed 
the Japanese to add Katakana support. Although it’s not a direct extension as the Japanese did changed the lower 128 characters slightly to be localized to the Japanese such as replacing 
the <code class="language-plaintext highlighter-rouge">\</code> with the Japanese Yen <code class="language-plaintext highlighter-rouge">¥</code>. Full-Width Japanese characters apparently started to appear in 1978 starting with JIS C 6226.2) where Kanji can be displayed.</p>

<p>A more recent standard is the <a href="https://en.wikipedia.org/wiki/Shift_JIS">Shift-JIS</a> in 1997 and is apparently the current second mostly used encoding among <code class="language-plaintext highlighter-rouge">.jp</code> (Japananese) websites.
Based on a <a href="https://w3techs.com/technologies/segmentation/tld-jp-/character_encoding">survey</a> on October 7 2024, Shift JIS is still used by 4.8% of <code class="language-plaintext highlighter-rouge">.jp</code> websites, 2.3 for 
EUC-JP and the remaining going to UTF-8. As mentioned previously, it would seem to be the case for Japanese encoding such as Shift JIS, half-width characters not only have a smaller 
width but also requires half the number of bytes to be represented. Half-Width characters do not imply less bytes to represent in general but for Shift-JIS, that would seem to 
be the case:</p>

<p><img src="../assets/programming/encoding/shift-jis-ah.png" alt="Using charset.7jp.net to view the HEX representation of ア and ｱ" /></p>
<p class="caption">Hex Representation of ア and ｱ. Credits to <a href="http://charset.7jp.net/dump.html" alt="charset.7jp.net">charset.7jp.net</a></p>

<p>As you may notice, I am using the same example from the <a href="https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained">article</a> but I opted to generate my own 
image. The blog for some reason decided to add <code class="language-plaintext highlighter-rouge">0x0D0A</code> which corresponds to <code class="language-plaintext highlighter-rouge">CRLF</code> i.e. <code class="language-plaintext highlighter-rouge">\r\n</code> making it less obvious to readers that the full-width character takes 2 Bytes 
and the half-width chaacter only takes 1 byte. As I don’t know Japanese, but according to the article both characters have the same phonetic sound. Though I am pretty sure the 
two are the same in written (i.e. handwriting) language. The likely reason for this behavior is that fact that it is an extension of JIS X 0201:1997, they very same encoding 
that first introduced Katakana (though the edition differs) and encodes the double-byte characters from JIS X 0208:1997 according to <a href="https://en.wikipedia.org/wiki/Shift_JIS">wikipedia</a>.</p>

<p><strong>Note:</strong> 1 byte character can also be referred as single-byte character while 2 bytes characters can be referred as double-byte characters</p>

<p>Based on the above image, we can make the following observations:</p>
<ol>
  <li>Full-Width characters take 2 bytes in Shift-JIS</li>
  <li>Half-Width characters take 1 byte in Shift-JIS</li>
  <li>UTF-8 and UTF-16 do not seem very optimized to take Japanese characters taking 3 bytes and 4 bytes respectively</li>
</ol>

<p>Unsurprisingly, Shift-JIS was designed for the Japanese and therefore are more space efficient than the more international/universal versions like UTF-8. 
According to my friend and the article, Japan still requires users to switch between full-width and half-width characters. I have no clue as to why but I have heard that Asian countries 
such as Japan and Korea can be slow to modernize their digital infrastructures despite being technology leaders and innovators. The article suggests it is due to the bureaucracy and 
work culture not fostering a culture to take some risks and not seeing the need to fix what isn’t broken.</p>

<p>The remaining content is not relevant to the title but is a refresher of Hex</p>

<h2 id="review-of-hex">Review of HEX</h2>

<p>Computers work in binary which consists of only 0 or 1 (i.e. base 2). The decimal system we all use is base 10. Hexadecimal are base 16 and tend to be the favorite way to represent 
a series of bytes due to its more compact form (or at least that’s what it seems like to me). Hexadecimal numbers have 16 values: 0-9 and A-F. In binary, a single bit can represent 
2 values which can be expressed as 2^0. This means that 4 bits can represent 2^4 = 16 bits. This means a single hexidecimal digit can be represented using only 4 bits. Two 
hexadecimal digit will therefore take 8 bits = 1 byte. That is why the half-character <code class="language-plaintext highlighter-rouge">ｱ</code> takes up one byte as it is <code class="language-plaintext highlighter-rouge">0xB1</code> in Shift-JIS. <code class="language-plaintext highlighter-rouge">B1</code> consists of two hexadecimal digits 
and hence only 8 bits and therefore 1 byte. The full-width character <code class="language-plaintext highlighter-rouge">ア</code> is <code class="language-plaintext highlighter-rouge">0x8341</code> which consists of 4 hexadecimal digits and therefore 4 * 4 bits = 16 bits or 2 bytes.</p>

<h2 id="good-resources">Good Resources</h2>
<ul>
  <li><a href="https://www.unicode.org/charts/">https://www.unicode.org/charts/</a></li>
  <li><a href="http://charset.7jp.net/dump.html">http://charset.7jp.net/dump.html</a></li>
  <li><a href="https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained">https://mailmate.jp/blog/half-width-full-width-hankaku-zenkaku-explained</a></li>
</ul>]]></content><author><name>Ju Hong Kim</name></author><category term="programming" /><category term="corpus" /><category term="linguistic" /><category term="utf" /><category term="encoding" /><summary type="html"><![CDATA[A small visual look into half-width and full-width characters]]></summary></entry></feed>