http://queirozf.com/queirozf.com - Main Entries Feed2024-03-09T04:11:45-03:00Technology reference and information archive.FelipeJekyllhttp://queirozf.com/entries/paper-summary-learning-to-summarize-from-human-feedbackPaper Summary: Learning to summarize from human feedback2024-03-09T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/xIbcAVP.png" alt="alt-text">
<em>Learning to summarize from human feedback <a href="https://arxiv.org/pdf/2009.01325.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>RLHF<sup><a href="#myfootnote1">1</a></sup> is applied to the task of generating <em>abstractive</em> summaries of an input text.</p>
<h2 id="why">WHY</h2>
<p>The authors wanted to extend the work by Ziegler et al 2019, using offline instead of online RL and better managing the labelers.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p>1) Generate and/or collect pairs of summaries and have a human labeler select the best of the two.</p></li>
<li><p>2) Train a Reward model to be able to tell which of a pair of summaries was the best one.</p></li>
<li><p>3) Use the Reward model from Step 2 to train an RL model using PPO. </p>
<ul>
<li>I.e.: generate a summary for a post then get its score from the reward model, update the RL model, and repeat.</li>
</ul></li>
</ul>
<p><div class="img-div" markdown="1">
<img src="//queirozf.com/images/contents/h1xSGqT.png" alt="rlhf-flow-from-stiennon-et-al">
<em>The RL flow used by Stiennon et al. Remarkably similar to the image from the <a href="https://arxiv.org/pdf/2203.02155.pdf">InstructGPT Paper</a>. <br/> <a href="https://arxiv.org/pdf/2009.01325.pdf">Source</a></em>
</div></p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li>Abstractive summarization with RLHF works much better than previous baselines from SFT only.
<ul>
<li>Both in terms of subjective quality and ability to generalize to unseen domains.</li>
</ul></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><p>On generalization: <em>"Our human feedback models can also generate excellent summaries of CNN/DM news articles without any further training."</em></p></li>
<li><p>On using numeric metrics to measure subjective quality: <em>"We also find that ROUGE fails to track sample quality as our models improve."</em></p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li><p>The setup is adapted from Ziegler et al 2019.</p></li>
<li><p>Model architecture is based on GPT-3, using 1.7 and 6.7B parameters.</p></li>
<li><p>TL;DR summarization dataset from Reddit.</p></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>The RL setup is offline, not online as in the previous paper by Ziegler et al.</p></li>
<li><p>The initial generation of summaries is done with a simple LLM, and it's fully <em>in-context</em>.</p></li>
<li><p>The KL-divergence correction in the reward function to prevent the RL model from finding hacks seems to have been introduced here.</p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>Interesting points about the length vs quality tradeoff. Models that generate longer summaries may be taken to be better but they are kind of <em>cheating</em>.</p></li>
<li><p>The authors say that generating the summaries was done in a <em>zero-shot</em> manner but they then say that they provided examples in the context, which makes it <em>few-shot</em> (<strong>not</strong> zero-shot) :thinking:.</p></li>
<li><p>The authors correctly predict several problems related to hallucinations and possible bias generated by using humans to direct model preferences.</p></li>
</ul>
<hr>
<p><a name="myfootnote1">1</a>: The acronym RLHF is nowhere to be found in this article, however.</p>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2009.01325.pdf">Arxiv: Learning to summarize from human feedback</a></li>
</ul>
2024-03-08T22:47:38-03:00http://queirozf.com/entries/paper-summary-zephyr-direct-distillation-of-lm-alignmentPaper Summary: Zephyr: Direct Distillation of LM Alignment2024-01-14T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/8N0RD5w.png" alt="alt-text">
<em>Zephyr: Direct Distillation of LM Alignment <a href="https://arxiv.org/pdf/2310.16944.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>Authors instruction-tune Mistral-7B vanilla by <em>distillation</em>: using <a href="https://queirozf.com/entries/paper-summary-direct-preference-optimization-your-language-model-is-secretly-a-reward-model">DPO</a> on open preference datasets and samples generated from previously aligned teacher models.</p>
<h2 id="why">WHY</h2>
<p>Because traditional distillation strategies are only good at transferring stylistic — not alignment capabilities.</p>
<h2 id="how">HOW</h2>
<p>Starting with Mistral-7B as the V0 model:</p>
<ul>
<li><p>1) Run SFT on V0 using input/output pairs from the UltraChat dataset, generating model V1</p></li>
<li><p>3) Use inputs from UltraFeedback dataset and, for each input, feed it to intermediary models (Claude, Falcon, etc), generating multiple output variations for the same input.</p></li>
<li><p>4) For each input from step 3, feed all the output variations to the teacher model (GPT-4) and ask it to select the best one.</p></li>
<li><p>5) Use DPO to align model V1, using the best output for each input, as selected on step 4.<sup><a href="#myfootnote1">1</a></sup></p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>It's possible to transfer alignment capabilities from teacher models using the suggested approach.</p></li>
<li><p>The DPO model overfits quickly with longer training.</p></li>
<li><p>Zephyr-7B outperforms 70B models (such as Llama-chat-70B) on some benchmarks.</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><em>"... without an initial SFT step ... models are not able to learn at all from feedback and perform terribly."</em>
<ul>
<li>This is interesting. We can't jump to reward modeling without the initial SFT step.</li>
</ul></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li><p>Mistral-7B</p></li>
<li><p>Other aligned LLMs as teachers: Claude, Falcon, Llama, GPT-4.</p></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p><em>Distillation</em> appears to be the default term for extracting the capabilities of a "teacher" model into a simpler and cheaper "student" model. Apparently it was introduced by <a href="https://arxiv.org/abs/1503.02531">Hinton et al 2015</a>.</p></li>
<li><p>Zephyr-7B was fully optimized for Helpfulness only.</p></li>
</ul>
<hr>
<p><a name="myfootnote1">1</a>: More precisely, DPO is optimized using the best response to each each, but contrasting it to a randomly chosen response. It doesn't classify response, it <em>ranks</em> them</p>
2024-01-01T21:52:09-03:00http://queirozf.com/entries/examples-installing-and-updating-packages-with-aptExamples: Installing and Updating Packages with Apt2024-01-01T00:00:00-03:00Felipe<blockquote>
<p><span style="color:red; font-weight:bold">WIP Alert</span> This is a work in progress. Current information is correct but more content may be added in the future.</p>
</blockquote>
<h2 id="list-installed-packages">List installed packages</h2>
<p>Use <code>apt list --list installed</code>, optionally using <code>grep</code> to limit the search </p>
<div class="highlight"><pre><code class="language-" data-lang="">$ apt list --installed | grep qua
pngquant/focal,now 2.12.2-1 amd64 [installed]
quarto/now 1.3.433 amd64 [installed,local]
</code></pre></div>
<h2 id="install-deb-package">Install .deb package</h2>
<p><code>sudo dpkg -i my-file.deb</code> </p>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="uninstall-deb-package">Uninstall .deb package</h2>
<blockquote>
<p>To find out the package name, <a href="#get-package-name-from-deb-file">get package from .deb file</a></p>
</blockquote>
<p>Run <code>$ sudo apt remove my-package-name</code> where <code>my-package-name</code> is the name of the package installed by the <code>.deb</code> file. </p>
<h2 id="get-package-name-from-deb-file">Get package name from .deb file</h2>
<p>Run <code>$ dpkg --info my_deb_file.deb</code></p>
<div class="highlight"><pre><code class="language-" data-lang="">$ dpkg --info quarto-1.3.433-linux-amd64.deb | grep Package:
Package: Quarto
</code></pre></div>2023-12-31T21:15:40-03:00http://queirozf.com/entries/git-examples-reverting-a-file-from-a-branchGit Examples: Reverting a File from a Branch2023-12-27T00:00:00-03:00Felipe<blockquote>
<p>Git version 2.x used</p>
</blockquote>
<h2 id="revert-file-from-branch">Revert file from branch</h2>
<p>Use <code>git checkout</code> to retrieve a file from <code>my-other-branch</code></p>
<div class="highlight"><pre><code class="language-" data-lang="">$ git checkout my-other-branch -- path/to/file
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="revert-file-from-remote-branch-using-git-log-and-checkout">Revert file from remote branch (using git log and checkout)</h2>
<ul>
<li><p>Retrieve the last commit hash of the remote branch</p>
<div class="highlight"><pre><code class="language-" data-lang="">$ git log origin/main
commit 123456abcde (origin/main, origin/HEAD, main)
Author: John Doe <john-doe@example.com>
Date: Mon Jul 3 12:44:25 2023 -0300
some commit message
</code></pre></div></li>
<li><p>Revert file to that commit hash with <code>git checkout</code> (like <a href="#restore-file-from-previous-commit">Restore file from Previous commit</a>)</p>
<div class="highlight"><pre><code class="language-" data-lang="">$ git checkout 123456abcde -- path/to/your/file
</code></pre></div></li>
</ul>
<h2 id="revert-file-from-remote-branch-using-fetch-and-checkout">Revert file from remote branch (using fetch and checkout)</h2>
<ul>
<li><p>Use git fetch to make sure your local branches are up to date</p>
<div class="highlight"><pre><code class="language-" data-lang="">$ git fetch --all
</code></pre></div></li>
<li><p>Then just use checkout to reset the file to <code>origin/my-other-branch</code></p>
<div class="highlight"><pre><code class="language-" data-lang="">$ git checkout origin/my-other-branch -- path/to/your/file
</code></pre></div></li>
</ul>
2023-12-26T23:49:38-03:00http://queirozf.com/entries/paper-summary-constitutional-aiPaper Summary: Constitutional AI2023-11-20T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/5L0Csyz.png" alt="front-page-of-article-constitutional-ai">
<em>Constitutional AI <a href="https://arxiv.org/pdf/2212.08073.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>Constitutional AI (CAI) is a strategy to fine-tune LLMs that place a higher value on harmlessness<sup><a href="#myfootnote1">1</a></sup>, without being overly evasive.</p>
<p>CAI employs Reinforcement Learning with AI feedback (RLAIF), standing in contrast to RLHF, as used by <a href="https://queirozf.com/entries/paper-summary-training-language-models-to-follow-instructions-with-human-feedback">InstructGPT</a> and similar models.</p>
<h2 id="why">WHY</h2>
<p>To improve upon RLHF, such that:</p>
<ul>
<li><p>Fewer human-provided labels are needed;</p></li>
<li><p>The model can be <em>steered</em> with a set of principles, i.e. a <em>Constitution</em>;</p></li>
<li><p>The model chooses clarity over <em>evasion</em> when rejecting promptings that don't fit its principles.</p></li>
</ul>
<h2 id="how">HOW</h2>
<p>1) Using a third-party fine-tuned LLM optimized exclusively for helpfulness, generate outputs for prompts selected for their "toxicity".</p>
<p>2) Ask the third-party LLM to modify (<em>critique</em>) the outputs from Step 1 according to a randomly chosen principle in the constitution.</p>
<p>3) Repeat step 2 multiple times, for a variety of inputs and constitution principles</p>
<p>4) Fine-tune a vanilla LLM in a supervised fashion using the toxic inputs and the critiqued outputs.</p>
<p>5) Use the fine-tuned model from Step 4 to generate two outputs (at a high temperature) for each toxic input.</p>
<p>6) Build a preference dataset from the output of Step 5, by:</p>
<ul>
<li><p>Creating a multiple-choice question with each input-output pair along with one of the Constitution principles.</p></li>
<li><p>Asking the fine-tuned model which of those two outputs is more aligned to the given principle and using that as a label.</p></li>
</ul>
<p>7) Join the dataset produced by Step 6 with a third-party human-labeled helpfulness preference dataset.</p>
<p>8) Use the dataset from Step 7 to train a preference model (PM).</p>
<p>9) Use the PM from Step 8 to run a Reinforcement Learning (RL) loop to fine-tune the model from Step 4, arriving at the final version.</p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>Authors claim that using Chain-of-Thought to explain why some inputs aren't given a helpful answer is a good way to defuse the tension between helpfulness and harmlessness.</p></li>
<li><p>Authors devised a way to encode generic constraints to the outputs via a Constitution.</p></li>
<li><p>Authors created an algorithm to reduce the level of harmlessness while not being overly evasive when refusing to answer questions helpfully.</p></li>
<li><p>Authors used AI itself to create a preference model, to be used in an RL loop to fine-tune vanilla LLMs.</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li>On RLHF: <em>"RLHF typically uses tens of thousands of human preference labels."</em></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>HH Models from Anthropic's previous article, <a href="https://arxiv.org/pdf/2204.05862.pdf">Bai et al, 2022</a></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li>Anthropic's <a href="https://www.anthropic.com/product">Claude</a> was trained using Constitutional AI. The constitution used can be found <a href="https://www.anthropic.com/index/claudes-constitution">here</a>.</li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>The whole thing seems to depend on a previously fine-tuned LLM optimized exclusively for helpfulness.</p></li>
<li><p>Using Chain-of-Thought to avoid evasive answers doesn't increase Helpfulness, from the point of view of the user. It's just trying to educate people according to the principles in the Constitution.</p></li>
<li><p>The relative weights of each "H" in HH models don't seem to be mentioned, but they will affect the model behavior. A 50/50 model will be very different from an 80/20 or a 20/80 model.</p></li>
</ul>
<hr>
<h3 id="footnotes">Footnotes</h3>
<p><a name="myfootnote1">1</a>: Over <em>honesty</em> and <em>helpfulness</em>, the other 2 "H's" of alignment.</p>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2212.08073.pdf">Arxiv: Constitutional AI</a></li>
</ul>
2023-11-15T21:20:33-03:00http://queirozf.com/entries/pytest-examples-handling-exceptionsPytest Examples: Handling Exceptions2023-10-29T00:00:00-03:00Felipe<blockquote>
<p>Python 3x+, Pytest 7x+ used unless otherwise stated.</p>
</blockquote>
<h2 id="assert-exception-is-raised">Assert exception is raised</h2>
<p>Use <code>with pytest.raises(ValueError):</code> as a context manager:</p>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># inside my_test.py</span>
<span class="k">def</span> <span class="nf">test_raises_index_error</span><span class="p">():</span>
<span class="c"># test will success if an IndexError is raised</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="nb">IndexError</span><span class="p">):</span>
<span class="n">arr</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="n">arr</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="assert-exception-with-specific-text">Assert exception with specific text</h2>
<p>Use <code>pytest.raises(<class>, match=<regular_expression></code>). <code><regular_expression></code> supports whatever you can use in <a href="https://docs.python.org/3/library/re.html#re.search">re.search</a>.</p>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># inside my_test.py</span>
<span class="k">def</span> <span class="nf">test_raises_specific_exception</span><span class="p">():</span>
<span class="c"># test will success if a ValueError is raised,</span>
<span class="c"># but only if the text contains a number starting with "5"</span>
<span class="c"># (e.g. 500 or 503 HTTP errors)</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="nb">RuntimeError</span><span class="p">,</span> <span class="n">match</span><span class="o">=</span><span class="s">r"5</span><span class="err">\</span><span class="s">d+"</span><span class="p">):</span>
<span class="n">some_code_that_raises_the_exception</span><span class="p">()</span>
</code></pre></div>2023-10-18T20:08:25-03:00http://queirozf.com/entries/troubleshooting-colima-start-problemsTroubleshooting Colima Start Problems2023-10-18T00:00:00-03:00Felipe<blockquote>
<p>All examples run on an Intel-MacOS</p>
</blockquote>
<h2 id="waiting-for-the-essential-requirement-1-of-5-ssh">Waiting for the essential requirement 1 of 5: ssh</h2>
<p>This has many possible reasons. In my case, what solves this is running these two commands:</p>
<ul>
<li><p><code>$ colima delete</code></p></li>
<li><p><code>colima start --arch x86_64</code></p></li>
</ul>
<p>You will <em>still</em> see the dreaded message once or twice, but it should work.</p>
<hr>
<h3 id="references">References</h3>
<ul>
<li><p><a href="https://github.com/abiosoft/colima/issues/424#issuecomment-1335912905">The original thread on github</a></p></li>
<li><p><a href="https://github.com/abiosoft/colima/issues/777#issuecomment-1676135303">Sometimes it's caused by a regression so you need to downgrade</a></p></li>
</ul>
2023-10-01T13:03:50-03:00http://queirozf.com/entries/pyenv-and-jupyter-notebook-integrationPyenv and Jupyter Notebook Integration2023-09-16T00:00:00-03:00Felipe<blockquote>
<p>Examples run on MacOS</p>
</blockquote>
<h2 id="add-pyenv-environment-as-kernel">Add pyenv Environment as Kernel</h2>
<p>Running <a href="//queirozf.com/entries/jupyter-kernels-how-to-add-change-remove#add-virtualenv-as-python-kernel">ipython kernel install</a> <em>alone</em> doesn't seem to work for PyEnv.</p>
<p>Do this instead:</p>
<p>1) Activate the environment: <code>$ pyenv activate my-venv</code></p>
<p>2) Install the kernel with <code>$ ipython kernel install --name "my-venv" --user</code> (This creates a file in <code>~/Library/Jupyter/kernels/my-env/kernel.json</code>)</p>
<p>3) The created file (on the path above) will probably have the <strong>wrong</strong> path to the Python executable.<sup><a href="#myfootnote1">1</a></sup> Open it and edit it to point to the PyEnv executable:</p>
<div class="highlight"><pre><code class="language-python" data-lang="python"> <span class="p">{</span>
<span class="s">"argv"</span><span class="p">:</span> <span class="p">[</span>
<span class="s">"~/.pyenv/versions/my-venv/bin/python"</span><span class="p">,</span> <span class="c"># <-- HERE</span>
<span class="s">"-m"</span><span class="p">,</span>
<span class="s">"ipykernel_launcher"</span><span class="p">,</span>
<span class="s">"-f"</span><span class="p">,</span>
<span class="s">"{connection_file}"</span>
<span class="p">],</span>
<span class="s">"display_name"</span><span class="p">:</span> <span class="s">"my-venv"</span><span class="p">,</span>
<span class="s">"language"</span><span class="p">:</span> <span class="s">"python"</span><span class="p">,</span>
<span class="s">"metadata"</span><span class="p">:</span> <span class="p">{</span>
<span class="s">"debugger"</span><span class="p">:</span> <span class="n">true</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="troubleshooting-no-module-named-ipykernel_launcher">Troubleshooting: No module named ipykernel_launcher</h2>
<p>You need to install <code>ipykernel</code> in the virtualenv you want to use.</p>
<hr>
<h3 id="references">References</h3>
<p><a name="myfootnote1">1</a>: In my case it pointed to a native Python version: <code>/usr/local/opt/python@3.11/bin/python3.11</code></p>
2023-08-17T12:24:43-03:00http://queirozf.com/entries/sublime-4-productivity-examples-keymaps-snippets-macrosSublime 4 Productivity Examples: Keymaps, Snippets, Macros2023-08-06T00:00:00-03:00Felipe<blockquote>
<p>Examples assume Sublime text version 4</p>
<p><a href="https://www.sublimetext.com/docs/key_bindings.html">All sublime-text bindings here</a></p>
</blockquote>
<h2 id="replace-characters">Replace characters</h2>
<p><strong>Example:</strong> Add a new Key Binding to Make <code>"--"</code> expand to <code>"&mdash;"</code></p>
<ul>
<li><p>Open <em>Settings</em> -> <em>Key Bindings</em>. This will open a file such as <code>Default (Linux).sublime-keymap</code></p></li>
<li><p>Add the following to that file:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[
{ "keys": ["-", "-"], "command": "insert", "args": {"characters": "&mdash;"}}
]
</code></pre></div></li>
</ul>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="run-snippet-on-selection">Run snippet on selection</h2>
<p><strong>Example</strong>: Wrap selected text with <code>"*"</code> to make a markdown text bold when you hit <code>ctrl</code>+<code>b</code></p>
<ul>
<li><p>Open <em>Settings</em> -> <em>Key Bindings</em>.</p></li>
<li><p>Add the following to that file:</p>
<div class="highlight"><pre><code class="language-" data-lang="">[
{ "keys": ["ctrl+b"], "command": "insert_snippet", "args": {"contents": "**${0:$SELECTION}**"}}
]
</code></pre></div></li>
</ul>
2023-08-05T17:21:32-03:00http://queirozf.com/entries/paper-summary-llama-2-open-foundation-and-fine-tuned-chat-modelsPaper Summary: Llama 2: Open Foundation and Fine-Tuned Chat Models2024-01-14T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/XGmLDJo.png" alt="llama-2-article-cover-arxiv">
<em>Llama 2: Open Foundation and Fine-Tuned Chat Models <a href="https://arxiv.org/pdf/2307.09288.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>Updated version of LLaMA 1 (<a href="https://queirozf.com/entries/paper-summary-llama-open-and-efficient-foundation-language-models">summary</a>) with more data (still fully open), double the context size, and enhanced attention.</p>
<p>Two model variations are published: a vanilla LLM and an instruction-tuned version.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p>LLaMA-2: Similar to LLaMA-1, with 40% more data (only public data), better data cleaning and larger context. One epoch over the training data. Also, enhanced attention.</p></li>
<li><p>LLaMA-2-chat: SFT and RLHF instruction-tuning on top of LLaMA-2.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>Using a smaller but higher-quality preference dataset yields better results.</p></li>
<li><p>RLHF is responsible for most of the increase in instruction-following performance.</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><p>Small but high-quality instruction-following data for SFT: <em>"We found that SFT annotations in the order of tens of thousands was (sic) enough to achieve a high-quality result. We stopped annotating SFT after collecting a total of 27,540 annotations"</em></p></li>
<li><p>Reward model initialization: <em>"We initialize our reward models from pretrained chat model checkpoints, as it ensures that both models benefit from knowledge acquired in pretraining. In short, the reward model “knows” what the chat model knows."</em></p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li><p>Main architectural decisions from LLaMA-1 (Touvron et al., 2023).</p></li>
<li><p>Grouped-query Attention (GQA), from Ainslie et al., 2023.</p></li>
<li><p>RLHF loop from Instruct-GPT (Ouyang et al., 2022).</p>
<ul>
<li>But they experiment with Rejection Sampling Fine-tuning instead of PPO.<br></li>
</ul></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>Just like the DPO paper (<a href="https://queirozf.com/entries/paper-summary-direct-preference-optimization-your-language-model-is-secretly-a-reward-model">summary</a>), the authors used GPT-4 to evaluate the models subjectively.</p></li>
<li><p>Authors tried to decrease hallucination by oversampling known trusted sources.</p></li>
<li><p>Two reward models were trained, one optimized only helpfulness and the other only optimized safety.</p></li>
<li><p>The reward model is also a transformer-based LM (but trained for regression instead of predicting the next token).</p></li>
<li><p>Authors introduce a variant of Attention during fine-tuning, called Ghost Attention. The objective is to help the optimizer learn from multi-turn messaging like a chat conversation.</p></li>
<li><p>Authors used red-team adversarial attacks on the model, to test its safety.</p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li>PPL shows no sign of saturation as more tokens are used (Figure 5)</li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2307.09288.pdf">Arxiv: Touvron et al 2023: Llama 2: Open Foundation and Fine-Tuned Chat Models</a></li>
</ul>
2023-08-01T20:41:25-03:00http://queirozf.com/entries/python-dependency-management-examples-and-referencePython Dependency Management: Examples and Reference2023-08-06T00:00:00-03:00Felipe<blockquote>
<p><span style="color:red; font-weight:bold">WIP Alert</span> This is a work in progress. Current information is correct but more content may be added in the future.</p>
</blockquote>
<h2 id="get-path-to-site-packages">Get path to site-packages</h2>
<blockquote>
<p>Must activate virtualenv, if applicable</p>
</blockquote>
<p>Run <code>python -m site</code>. It will be listed (usually last element)</p>
<div class="highlight"><pre><code class="language-" data-lang=""># python -m site
sys.path = [
'/momo/src/momo',
'/usr/local/lib/python39.zip',
'/usr/local/lib/python3.9',
'/usr/local/lib/python3.9/lib-dynload',
'/usr/local/lib/python3.9/site-packages', <--- HERE
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.9/site-packages' (doesn't exist)
ENABLE_USER_SITE: True
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
2023-07-22T17:33:08-03:00http://queirozf.com/entries/paper-summary-deep-reinforcement-learning-from-human-preferencesPaper Summary: Deep Reinforcement Learning from Human Preferences2023-07-16T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/r4stcNj.png" alt="deep-reinforcement-learning-from-human-preferences-cover">
<em>Deep Reinforcement Learning from Human Preferences <a href="https://arxiv.org/pdf/1706.03741.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>An algorithm to estimate a reward function using human opinions. The function is then optimized in a Reinforcement Learning (RL) setting.</p>
<p>This approach is now called RLHF (Reinforcement Learning from Human Preferences).</p>
<h2 id="why">WHY</h2>
<p>Because it isn't practical to mathematically formulate a reward function for some types of RL problems. But it <em>is</em> possible to ask humans to subjectively rate how <em>preferable</em> a given state is.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p><strong>1)</strong> Show humans pairs of states and ask them to <em>rank</em> these states in terms of desirability (i.e. say which state is preferable);</p></li>
<li><p><strong>2)</strong> Learn a reward function in a supervised manner using the data from step 1;</p></li>
<li><p><strong>3)</strong> Train an RL model using the learned reward function as a proxy for the real reward.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>It is possible to use a learned reward function built from human preferences.</p></li>
<li><p>In some cases, a learned reward function performs better than an actual mathematical reward function.</p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>OpenAI Gym</li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>Performance is evaluated on a set of robotics and video-game-playing RL tasks.</p></li>
<li><p>In addition to human feedback, authors also used so-called <em>synthetic</em> feedback—building a reward function from actual true signals.</p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>The term "RLHF" is not mentioned in the article.</p></li>
<li><p>RLHF is not introduced in this article. The authors' contributions revolve around making the process more efficient.</p></li>
<li><p>RLHF is relevant for NLP and instruction-tuning because it is not trivial to estimate how <em>appropriate</em> an output is to a given instruction. RLHF can be used to fine-tune a pre-trained LLM.</p></li>
<li><p>There exists a way to produce a function from pairwise preference rankings—the Bradley-Terry model.</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/1706.03741.pdf">Arxiv: Christiano et al., 2017: Deep Reinforcement Learning from Human Preferences</a></li>
</ul>
2023-07-15T19:48:15-03:00http://queirozf.com/entries/jenv-examples-on-macosJenv Examples on MacOS2023-07-09T00:00:00-03:00Felipe<h2 id="list-available-java-versions">List available java versions</h2>
<div class="highlight"><pre><code class="language-" data-lang="">$ jenv versions
system
1.8
1.8.0.362
19.0
19.0.2
openjdk64-19.0.2
* temurin64-1.8.0.362 (set by /Users/felipe.almeida/.jenv/version)
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="add-java-version-installed-with-homebrew">Add java version installed with homebrew</h2>
<p>Example with OpenJDK 11 installed via <code>brew install openjdk@11</code></p>
<p>Add it to <code>jenv</code>: <code>$ jenv add /usr/local/opt/openjdk@11/libexec/openjdk.jdk/Contents/Home/</code></p>
<h3 id="references">References</h3>
<ul>
<li><a href="https://www.jenv.be/">jenv website</a></li>
</ul>
2023-07-07T15:28:51-03:00http://queirozf.com/entries/paper-summary-finetuned-language-models-are-zero-shot-learnersPaper Summary: Fine-tuned Language models are Zero-Shot Learners2023-07-07T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/Ol0e1GY.png" alt="flan-finetuned-models-are-zero-shot-learners">
<em>Finetuned Language models are Zero-Shot Learners <a href="https://arxiv.org/pdf/2109.01652.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>Fine-tune LaMDA-PT 137B with NLP tasks framed as natural language instructions. The final model is called FLAN.</p>
<h2 id="why">WHY</h2>
<p>To understand the impact of instruction-tuning LMs for free-form NLP problems.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p>Took supervised datasets for 12 NLP tasks and rewrote those as pure natural language tasks.</p></li>
<li><p>Fine-tuned a LaMDA-PT 137B model on the rewritten tasks</p></li>
<li><p>Compared the results from the fine-tuned model (FLAN) with the pre-trained version (LaMDA-PT) and GPT-3 on several regimes<sup><a href="#myfootnote1">1</a></sup> and tasks.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>FLAN outperforms GPT-3 (untuned) on most zero-shot tasks.</p></li>
<li><p>FLAN performs better using zero-shot in some tasks than GPT-3 using few-shot examples.</p></li>
<li><p>Instruction-tuning enhances results even on unseen tasks.</p></li>
</ul>
<p><div class="img-div" markdown="1">
<img src="//queirozf.com/images/contents/zFtYA8R.png" alt="effect-of-number-of-params-on-scaling-flan">
<em>Fine-tuning only helps once the pre-trained <br/> model reach a minimum number of parameters. Under that threshold, <b>fine-tuning <br/> actually hurts performance.</b> <a href="https://arxiv.org/pdf/2109.01652.pdf">Source</a></em>
</div></p>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li><p>LaMDA-PT 137B</p></li>
<li><p>Data processing from T5 <a href="https://queirozf.com/entries/paper-summary-exploring-the-limits-of-transfer-learning-with-a-unified-text-to-text-transformer">summary</a></p></li>
<li><p><em>Prompt Tuning</em> (Lester et al., 2021)</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<p><a name="myfootnote1">1</a>: Zero-shot and few-shot learning.</p>
<ul>
<li><a href="https://arxiv.org/pdf/2109.01652.pdf">Arxiv: Wei et al., 2022: Fine-tuned Language models are Zero-Shot Learners</a></li>
</ul>
2023-07-02T16:39:47-03:00http://queirozf.com/entries/paper-summary-cross-task-generalization-via-natural-language-crowdsourcing-instructionsPaper Summary: Cross-Task Generalization via Natural Language Crowdsourcing Instructions2023-07-02T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/i7peRXW.png" alt="mishra-et-al-2022-instruction-following">
<em>Cross-Task Generalization via Natural Language Crowdsourcing Instructions <a href="https://aclanthology.org/2022.acl-long.244.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<ul>
<li><p>Build a dataset with pairs of high-quality instruction-following examples;</p></li>
<li><p>Measure how fine-tuned models perform when trained to follow those instructions.</p></li>
</ul>
<h2 id="why">WHY</h2>
<ul>
<li><p>To provide a dataset for other people to build up on.</p></li>
<li><p>To examine the tradeoff between fine-tuning a smaller model vs using a much larger model</p></li>
</ul>
<h2 id="how">HOW</h2>
<ul>
<li><p>Build a dataset with examples of instructions and fine-tune a pre-trained LM on those</p></li>
<li><p>The datasets consist of instructions and task examples, so models are queried in a <em>few-shot</em> setting.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>LMs fine-tuned for instruction-following can generalize into task instances and even task <em>types</em> not seen in the training dataset.</p></li>
<li><p>A 170M-parameter model (BART), when fine-tuned, is better at following instructions than GPT-3 with 175B parameters.</p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>BART LM(Lewis et al., 2019)</li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li>Authors didn't try to fine-tune GPT-3, apparently because they didn't have enough compute resources <em>"We cannot fine-tune the parameters of [GPT-3] and use it as-is under its default setting"</em></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>Uses ROUGE for evaluation (generated vs actual)</p></li>
<li><p>Examples in the evaluation set are not from <em>different</em> tasks as those in the training set—they are different <em>examples</em> of the same tasks.</p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>Why don't people use this preference dataset more often?</p></li>
<li><p>This is an updated version of a 2021 paper called "Natural Instructions: Benchmarking generalization to new tasks from natural language instructions". It is sometimes referenced by its old name.</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://aclanthology.org/2022.acl-long.244.pdf">ACL: Mishra et al., 2022: Cross-Task Generalization via Natural Language Crowdsourcing Instructions</a></li>
</ul>
2023-06-25T07:38:04-03:00http://queirozf.com/entries/python-3-regex-named-capture-examplesPython 3 Regex: Named Capture Examples2023-06-25T00:00:00-03:00Felipe<h2 id="extract-named-capture-groups">Extract Named capture groups</h2>
<blockquote>
<p><code>re.match</code> only matches at the start of the string!</p>
</blockquote>
<p>Extract matches into a dict, using <code>re.match</code> and <code>.groupdict()</code>:</p>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">re</span>
<span class="c"># a word followed by a comma and then another word followed by a period</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s">r'(?P<param1>[</span><span class="err">\</span><span class="s">w]+),(?P<param2>[</span><span class="err">\</span><span class="s">w]+)</span><span class="err">\</span><span class="s">.'</span>
<span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span><span class="s">'foo,bar.'</span><span class="p">)</span><span class="o">.</span><span class="n">groupdict</span><span class="p">()</span>
<span class="c"># >>> {'param1': 'foo', 'param2': 'bar'}</span>
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="re-search">Re.search</h2>
<blockquote>
<p><code>re.search</code> matches anywhere in the string!</p>
</blockquote>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">re</span>
<span class="c"># a word followed by a comma and then a word followed by a period</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s">r'(?P<param1>[</span><span class="err">\</span><span class="s">w]+),(?P<param2>[</span><span class="err">\</span><span class="s">w]+)</span><span class="err">\</span><span class="s">.'</span>
<span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span><span class="s">'xxx foo,bar.'</span><span class="p">)</span>
<span class="c"># >>> <re.Match object; span=(5, 13), match='foo,bar.'></span>
</code></pre></div>
<h2 id="search-multiple-matches">Search, multiple matches</h2>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">re</span>
<span class="c"># a word followed by a comma and then a word followed by a period</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s">r'(?P<param1>[</span><span class="err">\</span><span class="s">w]+),(?P<param2>[</span><span class="err">\</span><span class="s">w]+)</span><span class="err">\</span><span class="s">.'</span>
<span class="n">matches</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span><span class="s">' foo,bar. aaaand another xxx,yyy.'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">match</span> <span class="ow">in</span> <span class="n">matches</span><span class="o">.</span><span class="n">groups</span><span class="p">():</span>
<span class="k">print</span><span class="p">(</span><span class="n">match</span><span class="p">)</span>
<span class="c"># >>> foo</span>
<span class="c"># >>> bar</span>
</code></pre></div>
<h2 id="re-findall">Re.findall</h2>
<blockquote>
<p><code>re.findall</code> returns a list of matches</p>
</blockquote>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">re</span>
<span class="c"># a word followed by a comma and then another word followed by a period</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s">r'(?P<param1>[</span><span class="err">\</span><span class="s">w]+),(?P<param2>[</span><span class="err">\</span><span class="s">w]+)</span><span class="err">\</span><span class="s">.'</span>
<span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span><span class="s">'foo,bar.'</span><span class="p">)</span>
<span class="c"># >> [('foo', 'bar')]</span>
</code></pre></div>
<h2 id="findall-multiple-matches">Findall, multiple matches</h2>
<div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">re</span>
<span class="c"># a word followed by a comma and then another word followed by a period</span>
<span class="n">pattern</span> <span class="o">=</span> <span class="s">r'(?P<param1>[</span><span class="err">\</span><span class="s">w]+),(?P<param2>[</span><span class="err">\</span><span class="s">w]+)</span><span class="err">\</span><span class="s">.'</span>
<span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span><span class="s">'foo,bar. and another xxx,yyy.'</span><span class="p">)</span>
<span class="c"># >>> [('foo', 'bar'), ('xxx', 'yyy')]</span>
</code></pre></div>2023-06-25T05:54:52-03:00http://queirozf.com/entries/paper-summary-direct-preference-optimization-your-language-model-is-secretly-a-reward-modelPaper Summary: Direct Preference Optimization: Your Language Model is Secretly a Reward Model2023-08-02T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/7FGVAUN.png" alt="direct-preference-optimization-arxiv">
<em>Direct Preference Optimization: Your Language Model is Secretly a Reward Model <a href="https://arxiv.org/pdf/2305.18290.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>An approach to align pre-trained LMs to human preferences without using Reinforcement Learning (RL).</p>
<h2 id="why">WHY</h2>
<p>Because RL-based instruction-tuning methods (such as RLHF) are costly and difficult to implement.</p>
<h2 id="how">HOW</h2>
<p>The authors figured out a way to represent the objective function from RLHF as a loss function that can be directly optimized using algorithms such as SGD.</p>
<p>A dataset containing <strong>good</strong> (so-called <em>preferred</em>) as well as <strong>bad</strong> (so-called <em>dispreferred</em>) prompt/output pairs is needed to fine-tune the model. The loss function includes both types of pairs to calculate the loss.</p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>Objective evaluation: better results than PPO (the RL algorithm used by RHLF) as measured by reward and KL-divergence from the original text distribution.</p></li>
<li><p>Subjective evaluation: Also better results than RLHF-PPO <strong>but</strong> the comparison setup is very nontraditional and based upon proxies. Authors use GPT-4 to provide ground truth for experiments, sentiment classifiers to filter generated text wrt sentiment, etc.</p></li>
<li><p>Learning with DPO is more stable (smaller variance) than RLHF-PPO.</p></li>
<li><p>DPO converges quickly.</p></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>GPT-4 (zero-shot) was used to evaluate DPO against other types of fine-tuning. Crazy.</p></li>
<li><p>DPO was applied on an LM that had been previously fine-tuned with regular SFT.</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2305.18290.pdf">Arxiv: Rafailov et al., 2023: Direct Preference Optimization: Your Language Model is Secretly a Reward Model</a></li>
</ul>
2023-06-22T22:31:44-03:00http://queirozf.com/entries/pyenv-examples-managing-multiple-python-versions-and-virtualenvsPyenv Examples: Managing multiple Python versions and Virtualenvs2023-07-07T00:00:00-03:00Felipe<h2 id="create-virtualenv">Create virtualenv</h2>
<p><code>$ pyenv virtualenv my-venv</code></p>
<h2 id="create-virtualenv-with-python-version">Create virtualenv with python version</h2>
<p>Create a virtualenv using a specific python version.</p>
<p><code>$ pyenv virtualenv 3.7.16 my-venv</code></p>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="activate-virtualenv">Activate virtualenv</h2>
<p>To activate a virtualenv called <code>my-venv</code>:</p>
<div class="highlight"><pre><code class="language-" data-lang="">$ pyenv activate my-venv
</code></pre></div>
<h2 id="set-default-virtualenv-for-directory">Set default virtualenv for directory</h2>
<p>Use <code>pyenv local my-venv</code>. </p>
<p>This will create a hidden <code>.python-version</code> file (should not be versioned).</p>
<p>The virtualenv will be activated automatically every time you cd to that directory (without the need to call <code>pyenv activate</code>)</p>
<div class="highlight"><pre><code class="language-" data-lang="">$ pyenv local my-venv
</code></pre></div>
<h2 id="virtualenv-location">Virtualenv location</h2>
<p>For virtualenv <code>my-venv</code>:</p>
<ul>
<li>on MacOS</li>
</ul>
<div class="highlight"><pre><code class="language-" data-lang=""> ~/.pyenv/versions/my-venv
</code></pre></div>
<h2 id="install-python-version">Install python version</h2>
<div class="highlight"><pre><code class="language-" data-lang="">$ pyenv install 3.9
</code></pre></div>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 257px !important; height: 257px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_medium_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="3340325586"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="list-python-versions">List python versions</h2>
<div class="highlight"><pre><code class="language-" data-lang="">$ pyenv versions
</code></pre></div>2023-06-21T20:52:25-03:00http://queirozf.com/entries/paper-summary-pythia-a-suite-for-analyzing-large-language-models-across-training-and-scalingPaper Summary: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling2023-06-24T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/umS7kEa.png" alt="pythia-biderman-et-al-2023">
<em>Pythia: A Suite for Analyzing Large Language <br/>Models Across Training and Scaling<br/><a href="https://arxiv.org/pdf/2304.01373.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>A framework—<em>Pythia</em>—to <em>uniformly</em> train variations of LLMs to measure the impact of hyperparameter choices:</p>
<ul>
<li>number of layers</li>
<li>model dimensionality</li>
<li>number of attention heads</li>
<li>dimensionality of attention heads</li>
<li>batch size</li>
<li>learning rate</li>
</ul>
<h2 id="why">WHY</h2>
<p>It's hard to measure the impact of hyperparameters using other published LLMs because they have been trained using different architectures, different data, and different training decisions.</p>
<h2 id="how">HOW</h2>
<ul>
<li>Train 8 variations of GPT-3-like models and study the impact of changing hyperparameters on the model performance, as evaluated on several NLP tasks (via <a href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI/lm-evaluation-harness</a>)</li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>Deduplicating <em>The Pile</em> training dataset had no benefit on performance, contrary to existing literature.</p></li>
<li><p>Using parallel attention and MLP sublayers did not degrade performance, contrary to existing literature.</p></li>
<li><p>Using multi-lingual datasets hurt performance less than expected.</p></li>
<li><p>The position of a piece of text—i.e. at the start or the end of the training dataset— does not make it more or less likely to be <em>memorized</em> by the model.</p></li>
<li><p>Term frequencies in the pretraining dataset <em>do affect</em> the downstream performance of the model, especially in models with higher capacity.</p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>Toolset from <a href="https://github.com/EleutherAI/gpt-neox">GPT-NeoX</a></li>
<li>GPT-3 for architecture and most other decisions</li>
<li><a href="https://pile.eleuther.ai/">EleutherAI's <em>The Pile</em> dataset</a></li>
<li>BPE tokenizer (from GPT-NeoX-20B)</li>
<li>Flash Attention (Dao et al., 2022)</li>
<li>Rotary Embeddings (Su et al., 2021)</li>
<li>Parallel Attention (from GPT-J-6B)</li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li>Authors applied <em>interventions</em> :thinking: in the dataset to address bias</li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li>Most new articles are using a curated English-language dataset called <em>The Pile</em></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2304.01373.pdf">Arxiv: Biderman et al., 2023: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling</a></li>
</ul>
2023-06-18T19:58:21-03:00http://queirozf.com/entries/paper-summary-llama-adapter-efficient-fine-tuning-of-language-models-with-zero-init-attentionPaper Summary: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention2024-01-14T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/UcxlAzb.png" alt="llama-adapter">
<em>LLaMA-Adapter: Efficient Fine-tuning of Language<br/>Models with Zero-init Attention<br/><a href="https://arxiv.org/pdf/2303.16199.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>A cheaper way to fine-tune a vanilla LLM based the on 52k input/output pairs from <a href="//queirozf.com/entries/paper-summary-self-instruct-aligning-language-models-with-self-generated-instructions">self-instruct</a>.</p>
<h2 id="why">WHY</h2>
<p>To reduce the cost to fine-tune LLMs for instruction-following.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p>A few layers (1.2M parameters) are added to a pre-trained LLaMA model and only <em>these</em> are unfrozen and fine-tuned.</p></li>
<li><p>Attention mechanisms in the unfrozen layers are initialized with zeros and a gating mechanism, to prevent disturbing the information coming from the base LLM.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li>Fine-tuning a LLaMA 7B model takes 1 hour. Comparable performance to Alpaca while taking 1/3 of the time.</li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>Adapter-based Fine-tuning from <a href="https://arxiv.org/pdf/1902.00751.pdf">Houlsby et al 2019</a> </li>
<li><p>Fine-tuning input/output pairs from <a href="https://queirozf.com/entries/paper-summary-self-instruct-aligning-language-models-with-self-generated-instructions">Self-instruct</a></p></li>
<li><p>Base LLM from <a href="https://queirozf.com/entries/paper-summary-llama-open-and-efficient-foundation-language-models">LLaMA</a></p></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>LLaMA-adapter also supports other modalities (audio, images, video).</p></li>
<li><p>LLaMA-adapter is a type of Parameter-Efficient Fine-Tuning (PEFT)</p></li>
</ul>
<h2 id="my-2c">MY 2c</h2>
<ul>
<li>No quantitative comparison with Alpaca, only examples (possibly cherry-picked) and a vague claim of "comparable instruction-following proficiency with the 7B Alpaca"</li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><p><a href="https://arxiv.org/pdf/2303.16199.pdf">Arxiv: Zhang et al 2023:LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention</a></p></li>
<li><p><a href="https://github.com/ZrrSkywalker/LLaMA-Adapter">Github: ZrrSkywalker/LLaMA-Adapter</a></p></li>
<li><p><a href="https://arxiv.org/pdf/1902.00751.pdf">Arxiv: Houlsby et al 2019: Parameter-Efficient Transfer Learning for NLP</a></p></li>
</ul>
2023-06-04T20:54:16-03:00http://queirozf.com/entries/paper-summary-llama-open-and-efficient-foundation-language-modelsPaper Summary: LLaMA: Open and Efficient Foundation Language Models2023-08-02T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<h2 id="what">WHAT</h2>
<p>An LLM (LLaMA) is trained from scratch using more data but fewer training iterations than GPT3. Only public data is used.</p>
<h2 id="why">WHY</h2>
<p>To test how the tradeoff data vs compute budget behaves as the scale grows.</p>
<h2 id="how">HOW</h2>
<p>LLaMA is a standard Transformer LLM with some optimizations used by previous LMs. It's trained exclusively on open-access data.</p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>Models with fewer parameters are cheaper to use at inference time</p></li>
<li><p>LLaMA outperforms or matches LMs having 3-10x the number of parameters (GPT3, Gopher, Chinchilla) at most natural language tasks (Zero-shot and Few-shot)</p></li>
</ul>
<h2 id="extends-uses">EXTENDS/USES</h2>
<ul>
<li>AdamW Optimizer</li>
<li>Transformers Implementation from <a href="https://github.com/facebookresearch/xformers">facebookresearch/xformers</a></li>
<li>RMSNorm</li>
<li>SwiGLU Activation Function</li>
<li>Rotary Embeddings from GPTNeo</li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>Total number of tokens used for training: 1.4T</p></li>
<li><p>Some fine-tuning was done using simple SFT</p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>This is an engineering article; not many theoretical advancements.</p></li>
<li><p>The moat enjoyed by big players gets smaller every day.</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><p><a href="https://arxiv.org/pdf/2302.13971.pdf">Arxiv: Touvron et al 2023: LLaMA: Open and Efficient Foundation Language Models</a></p></li>
<li><p><a href="https://github.com/facebookresearch/llama">Github: facebookresearch/llama</a></p></li>
</ul>
2023-06-04T16:13:09-03:00http://queirozf.com/entries/paper-summary-self-instruct-aligning-language-models-with-self-generated-instructionsPaper Summary: Self-instruct: Aligning Language Models with Self-generated Instructions2023-06-25T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/iqUE1iK.png" alt="self-instrut-article-image">
<em>Self-Instruct: Aligning Language Models with Self-Generated Instructions<br/><a href="https://arxiv.org/pdf/2212.10560.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>A way to fine-tune LLMs to follow instructions using only information from the model itself—no human annotation needed.</p>
<h2 id="why">WHY</h2>
<p>Because human-annotated datasets are expensive to come by.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p><strong>1)</strong> Use the pre-trained LLM <em>itself</em> to generate input/output instruction pairs, from a <strong>small</strong> set of seed pairs (one seed example per task, 175 examples in total).</p>
<ul>
<li><a href="https://github.com/yizhongw/self-instruct/blob/main/data/seed_tasks.jsonl">Seed data</a></li>
<li><a href="https://github.com/yizhongw/self-instruct/blob/main/data/gpt3_generations/batch_221203/all_instances_82K.jsonl">Generated instructions</a></li>
</ul></li>
<li><p><strong>2)</strong> Perform supervised fine-tuning with the pairs from step <strong>1)</strong>, using heuristics to classify which outputs are better than others.</p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li>In one experiment, GPT3<sub>self-instruct</sub> hits 44.4% of correct answers while InstructGPT (GPT3 aligned with RLHF) hits 50.7%.</li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li>All tasks are represented in the form <code>(task definition, input/output pairs)</code>. It's a versatile way to represent any kind of task. Example below:</li>
</ul>
<p><div class="img-div" markdown="1">
<img src="//queirozf.com/images/contents/jUrpZxG.png" alt="sample-input-output-pairs-self-instruct">
<em>How the authors represent the instruction tasks to align the model.<br/><a href="https://arxiv.org/pdf/2212.10560.pdf">Source</a></em>
</div></p>
<ul>
<li>No need to host a local version of GPT3. Everything was done <a href="https://github.com/yizhongw/self-instruct">using Open AI CLI tools and making HTTP requests to GPT3 endpoints</a></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<p>The contribution is how to generate alignment examples from a vanilla LLM.</p>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2212.10560.pdf">Arxiv: Wang et al., 2023: Self-Instruct: Aligning Language Models with Self-Generated Instructions</a></li>
</ul>
<p><a name="myfootnote1">1</a>:Such as <a href="//queirozf.com/entries/paper-summary-training-language-models-to-follow-instructions-with-human-feedback">InstructGPT/ChatGPT</a> which are based on RHLF</p>
2023-06-03T18:00:09-03:00http://queirozf.com/entries/as-a-manager-is-it-worthwhile-how-worthwhileAs a Manager: Is it Worthwhile? How Worthwhile?2023-05-29T00:00:00-03:00Felipe<h2 id="prioritization-is-key">Prioritization is key</h2>
<p>As a (new) manager, one of your most important activities will be to <em>prioritize</em> work to be done by your team.</p>
<p>Get into the habit of thinking not only <em>if</em> some given task/project is worthwhile, but also <em>how</em> worthwhile.</p>
<p>You will never have infinite resources or people in your team, so some tasks or projects must by necessity be dropped to the benefit of others.</p>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 257px !important; height: 257px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_medium_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="3340325586"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="value-effort-ratio-instead-of-just-value">Value/Effort ratio instead of just Value</h2>
<p>Every task or project has an expected <em>Value</em>–the benefit (financial or otherwise) it brings to the team or organization.</p>
<p>But the Value alone is not what you should use to prioritize tasks. A better metric to use is Value/Effort, whereby you also take into account the <em>Effort</em> (i.e. man-hours) needed to accomplish the task.</p>
<p>There are many other dimensions that must also be addressed in order to prioritize, naturally (emergency tasks, unblocking tasks).</p>
2023-05-28T19:47:59-03:00http://queirozf.com/entries/as-a-manager-stating-the-obvious-is-importantAs a Manager: Stating the Obvious is Important2023-05-29T00:00:00-03:00Felipe<blockquote>
<p><span style="color:red; font-weight:bold">WIP Alert</span> This is a work in progress. Current information is correct but more content may be added in the future.</p>
</blockquote>
<p>Stating the obvious when talking with reports is important but managers don't always do it, either due to laziness or to wrongly assuming reports already know it.</p>
<h2 id="define-expectations-precisely-with-examples">Define expectations precisely, with examples</h2>
<p>The definition of what it means for some task to be <em>done</em> varies a lot from person to person. It's important to make expectations clear. </p>
<ul>
<li><p><em>"Make sure you double check the result after completing the task, to confirm the task's objective was achieved."</em></p></li>
<li><p><em>"Make sure you look at the system logs after deploying the changes, to make sure they worked."</em></p></li>
<li><p><em>"Make sure you communicate everyone who may be impacted before you start working."</em></p></li>
</ul>
<h2 id="explain-the-impacts-of-ones-actions-to-impart-a-sense-of-ownership">Explain the impacts of one's actions to impart a sense of ownership</h2>
<p>TODO</p>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="stating-the-obvious-fills-knowledge-gaps">Stating the obvious fills knowledge gaps</h2>
<p>Many people have <em>knowledge gaps</em>–things they don't fully understand about their work. These gaps are usually skipped over or downright ignored, as people cut some corners to get work done.</p>
<p>Stating the obvious is a good way to help people plug gaps they may not even realize they have.</p>
<h2 id="be-careful-not-to-be-repetitive">Be careful not to be repetitive</h2>
<p>Stating the obvious can easily turn you into a "boring" person if you don't watch out. Be careful not to overdo it.</p>
<p>You need to be able to detect <em>who</em> you need to state the obvious to and <em>when</em> to do it.</p>
<p>An alternative is to ask people if they understand the topic you are about to explain–but make sure you ask this in a nonjudgemental manner. If the person suspects that you are asking them about something that should be trivial they may choose to <strong>lie</strong> and say they do, in fact, understand it, to "save face".</p>
<h2 id="watch-peoples-body-language-as-you-explain">Watch people's body language as you explain</h2>
<p>If you choose to state the obvious and people seem impatient of looking elsewhere as you talk, it may be a sign you don't need to explain that particular thing to that particular person.</p>
<p>Likewise, if the listener tries to listen closely to what you are saying it might be a sign that you are indeed filling a knowledge gap (or they may be just trying to flatter you).</p>
2023-05-28T18:00:50-03:00http://queirozf.com/entries/as-a-manager-drive-growth-by-asking-open-ended-questionsAs a Manager: Drive Growth by Asking Open-Ended Questions2023-08-06T00:00:00-03:00Felipe<p>When managing junior/midlevel engineers, one of your key objectives should be to encourage them to think about what they are doing—instead of just executing pre-assigned tasks.</p>
<p>Asking <em>open-ended</em> questions is an excellent way to get reports to <em>think</em> and <em>talk</em> about topics they may not have thought about yet—making them think at a higher level about what they are doing, and enabling them to become more <strong>autonomous</strong> and <strong>self-conscious</strong>.</p>
<p>Think of it as a sort of <em>therapy</em>: one of the reasons why therapy works is that people hear themselves talking about their issues, rather than having people tell them what to do.</p>
<blockquote>
<p>Asking questions (rather than providing answers) is better to encourage growth!</p>
</blockquote>
<p>Here are some examples of the questions you could ask your reports, during 1:1 meetings, project review meetings, etc.</p>
<h2 id="fishing-for-problems-are-the-next-steps-clear">Fishing for problems: Are the next steps clear?</h2>
<p>TODO</p>
<h2 id="what-do-you-think-should-be-the-next-steps-of-this-project-and-why">What do you think should be the next steps of this project, and why?</h2>
<p>This gets reports to think about the project as a whole (rather than the specific task they are currently executing).</p>
<p>As they think about the next tasks, they will have to think about topics such as:</p>
<ul>
<li><p><strong>Project management</strong> (how to properly conduct a project such that there's less risk of failure)</p></li>
<li><p><strong>Task sequencing</strong> (which tasks should be done first, unblocking other team members, de-risking the project by doing risky tasks first, etc)</p></li>
</ul>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="what-do-you-think-we-should-be-working-on-next">What do you think we should be working on next?</h2>
<p>This is useful to get people to think about <strong>prioritization</strong>. When they are focused on executing only, it may be hard to think about the high-level objectives of the team.</p>
<p>This encourages them to think about:</p>
<ul>
<li><p><strong>Value/Effort Tradeoff</strong>: Each task has an estimated value but it also has some cost (i.e. work hours) attached to it. Both should be taken into account when selecting tasks to be worked on.</p></li>
<li><p><strong>Focus on business outcomes</strong>: Thinking about the team priorities forces people to don the "business" hat and think about which tasks are more important <em>from a business perspective</em>. </p>
<ul>
<li>Being able to think about business needs is a key skill many technically-oriented people lack.</li>
</ul></li>
</ul>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 257px !important; height: 257px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_medium_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="3340325586"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="if-you-could-start-over-what-would-you-have-done-differently">If you could start over, what would you have done differently?</h2>
<p>Thinking about one's work objectively—from a distance—is a great way to let go of unhelpful or unproductive behavior patterns that can hinder one's career.</p>
<p>By asking reports what they could have done differently in a project or task, you allow them to reflect upon their work, helping them grow as professionals.</p>
2023-05-21T23:51:16-03:00http://queirozf.com/entries/paper-summary-training-language-models-to-follow-instructions-with-human-feedbackPaper Summary: Training language models to follow instructions with human feedback2023-06-25T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<p><div class="paper-screenshot-img-div" markdown="1">
<img src="//queirozf.com/images/contents/8x6JCkv.png" alt="instruct-gpt">
<em>Training language models to follow <br/>instructions with human feedback <a href="https://arxiv.org/pdf/2203.02155.pdf">Source</a></em>
</div></p>
<h2 id="what">WHAT</h2>
<p>Introduce a strategy —InstructGPT— to fine-tune pre-trained LLMs to follow human instructions using Reinforcement Learning.<sup><a href="#myfootnote2">2</a></sup> </p>
<h2 id="why">WHY</h2>
<p>Pretraining LLMs on unlabelled data does not make them good at following instructions or providing output that's <em>aligned</em> with the user's intent: We need something else.</p>
<h2 id="rlhf">RLHF</h2>
<ul>
<li><p>It's a 3-stage strategy (assumes you already have a pre-trained, so-called <em>vannila</em> LM)</p>
<ul>
<li><strong>1) Supervised Fine-tuning (SFT)</strong>: Sample the vanilla LM and give out some of those <em>prompts</em> to human annotators and have them write a proper <em>response</em> to that prompt. Then fine-tune the pre-trained LM in a supervised manner on those prompt/answer pairs. </li>
<li><strong>2) Reward Model (RM)</strong> With the fine-tuned LM, we again sample some prompts and feed them to the model<sup><a href="#myfootnote1">1</a></sup> and get some outputs. We then ask human annotators to <em>rank</em> the outputs on a Likert scale to define how aligned the outputs are to the original prompt.
<ul>
<li>The outcome is a model (RM) that takes a prompt/output pair and says how <em>aligned</em> it is to what humans usually want.</li>
<li>Also an LLM; can be Transformer-based</li>
</ul></li>
<li><strong>3) RL Fine-tuning</strong> Intiate a Reinforcement Learning (RL) feedback loop whereby:
<ul>
<li>Sample the LM for a prompt/output pair</li>
<li>Score the prompt/output pair with the Reward model a Preference Reward)</li>
<li>Score the output with the original LM itself (before fine-tuning) to see how close to "normal language" the output is. </li>
<li><strong>PPO-ptx</strong>: Calculate a Final Reward that takes into account <em>both</em> the preference Reward <em>and</em> the original LM perplexity to make sure the output is both good in terms of alignment but also that it should be <em>natural</em> (as defined by the original, untuned LM)</li>
<li>Feed the Final Reward back to the LM and repeat the loop</li>
</ul></li>
</ul></li>
</ul>
<h2 id="how">HOW</h2>
<p>The <em>how</em> is basically applying RLHF to a GPT-3 LM, with some technical optimizations.</p>
<p>PPO (Proximal Policy Optimization) is used to update the LM in the RL Fine-tuning loop, with a modification that lends some weight to the original, untuned LM (PPO-ptx, see above <a href="#rlhf">RLHF</a>) </p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>InstructGPT (1.3B params) provides better outputs than GPT-3 (175B params). (According to labelers)</p></li>
<li><p><em>The cost of increasing model alignment is **modest</em>* relative to pretraining"*</p></li>
<li><p>Learned alignment generalizes to hold-out annotators</p></li>
<li><p>PPO-ptx can be used to avoid regressions (i.e. text that is statistically very close to preferences but unnatural and/or bad in other ways)</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><p><strong>Misalignment</strong>: <em>"... the language modeling objective used for many recent large LMs—predicting the next token on a webpage from the internet—is different from the objective "follow the user’s instructions helpfully and safely""</em></p></li>
<li><p><strong>Alignment Tax</strong>: <em>"... our alignment procedure comes at the cost of lower performance on certain tasks that we may care about."</em></p>
<ul>
<li>This is reduced with PPO-ptx</li>
</ul></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>The 3 H's (helpful, honest, and harmless) of implicit alignment were defined in Askell et al., 2021. (see <a href="#refs">refs</a>)</p></li>
<li><p>Types of alignment</p>
<ul>
<li><strong>Explicit alignment</strong>: Following express orders such as "write a list such that..."</li>
<li><strong>Implicit alignment</strong>: Not producing outright misleading text, not hallucinating.</li>
</ul></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li>In addition to every technological breakthrough in the paper, it's a masterpiece of <em>experiment design</em> as well. Everything is done toavoid bias, inaccuracies and make efficient use of the resources (humans, computing, etc)</li>
</ul>
<h3 id="footnotes">Footnotes</h3>
<p><a name="myfootnote1">1</a>: With an appropriate temperature setting, to generate diverse samples.</p>
<p><a name="myfootnote2">2</a>: It is widely believed that ChatGPT was trained using RLHF as described in this article.</p>
<hr>
<h3 id="references">References</h3>
<ul>
<li><p><a href="https://arxiv.org/pdf/2203.02155.pdf">Arxiv: Ouyang et al 2022: Training language models to follow instructions with human feedback</a></p></li>
<li><p><a href="https://openai.com/blog/instruction-following/">Open AI Blog: Aligning Language Models to Follow Instructions</a></p></li>
<li><p><a href="https://www.youtube.com/watch?v=2MBJOuVq380">Youtube: Reinforcement Learning from Human Feedback: From Zero to chatGPT</a></p>
<ul>
<li>Amazing Video Lecture on RLHF by Nathan Lambert @HuggingFace</li>
</ul></li>
<li><p><a href="https://arxiv.org/abs/2112.00861">Arxiv: Askell et al 2021: A General Language Assistant as a Laboratory for Alignment</a></p></li>
</ul>
2023-02-05T13:12:46-03:00http://queirozf.com/entries/paper-summary-language-models-are-few-shot-learnersPaper Summary: Language Models are Few-Shot Learners2023-02-05T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<h2 id="what">WHAT</h2>
<p>GPT-3 model is introduced.</p>
<p>Authors show that, if you have enough data, you can start solving all kinds of problems by few-shot prompting, even beating SOTA, with no fine-tuning.</p>
<h2 id="why">WHY</h2>
<p>Because the usual pretraining/fine-tuning architecture for NLP tasks has some downsides:</p>
<ul>
<li><p>The need to have a smaller annotated dataset for each new downstream application is still a cost/time bottleneck.</p></li>
<li><p>Forcing such a large pretrained model to relearn on small task-specific datasets doesn't necessarily go well.</p></li>
</ul>
<h2 id="how">HOW</h2>
<p>Added more data (and more money $$) with some tweaks on top of <a href="https://queirozf.com/entries/paper-summary-language-models-are-unsupervised-multitask-learners">GPT-2</a></p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>The more parameters a model has, the larger the performance differences between zero-, one-, and few-shot learning.</p></li>
<li><p>In some tasks, Few-shot (even one- or zero-shot) learning with GPT-3 175B surpasses task-specific fine-tuned models, but not in all.<sup><a href="#myfootnote1">1</a></sup> </p></li>
<li><p>Near 100% of accuracy in adding/subtracting up to 3 digits, but gets worse as we add more digits (few-shot setting).</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><strong>Model size and ability to learn from context</strong>: <em>"Larger models make increasingly efficient use of in-context information"</em></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>They provide a consistent definition of <strong>zero-shot</strong>, <strong>one-shot</strong> and <strong>few-shot</strong> learning, i.e. the number of examples provided at inference time (in the prompt), without any inference-time weight updates.</p></li>
<li><p>Several models are trained, with increasing number of parameters, to test how the performance scales with more capacity (from 125M to 175B params)</p></li>
<li><p>One of the several tasks the model was evaluated on was to ask humans to detect if some text was model-generated or not!</p></li>
<li><p>GPT3 does not include any <strong>bidirectional architecture</strong></p></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<ul>
<li><p>One interesting point is that they found a bug in removing overlaps between train/test data. But the <strong>cost</strong> of retraining was prohibitive and they didn't retrain the whole thing because of that!</p></li>
<li><p>For translation tasks, the <strong>direction</strong> of translation matters a lot (performance is better when translating <em>into</em> English than when translating <em>from</em> English)</p></li>
<li><p>There are <strong>so many different NLP tasks</strong> available; you can basically encode any problem as an NLP problem, provided you can represent it in words.</p></li>
<li><p>The <strong>data overlap</strong> problem is larger than I first thought - makes one wonder how much of that skews the results</p></li>
<li><p><strong>Section 5: Limitations</strong> is a great work on the operational challenges of getting such models to train and run inference on</p></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2005.14165.pdf">Brown et al 2020: Language Models are Few-Shot Learners</a></li>
</ul>
<h3 id="footnotes">Footnotes</h3>
<p><a name="myfootnote1">1</a>: But in all likelihood, training GPT-3 with more than 175B params could change that.</p>
2023-01-01T01:44:34-03:00http://queirozf.com/entries/paper-summary-bert-pre-training-of-deep-bidirectional-transformers-for-language-understandingPaper Summary: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding2023-02-05T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<h2 id="what">WHAT</h2>
<p>This article introduces the BERT model, which is a type of transformer-based <strong>fine-tuning</strong><sup><a href="#myfootnote3">3</a></sup> architecture for all sorts of NLP tasks.</p>
<p>BERT introduces bidirectional self-attention to Transformers (instead of left-to-right only) and combine both token-level and sentence-level self-supervision so that the model is good both levels of tasks.</p>
<h2 id="why">WHY</h2>
<p>Verify if transfer-learning approaches can also benefit from bidirectional architectures.</p>
<p>Test different self-supervision strategies (token-level and sentence-level) together.</p>
<h2 id="how">HOW</h2>
<ul>
<li><p><strong>Two steps</strong>: Pre-training and fine-tuning</p></li>
<li><p><strong>Self-supervision target</strong>. BERT uses two tasks:</p>
<ul>
<li>A masked language model, AKA the <strong>Cloze</strong> task whereby one word at random is masked an the net must predict it from surrounding words.</li>
<li>"Next sentence prediction" self-supervision target in addition to the above. (Binarized, as in a 1 or 0 target)</li>
</ul></li>
<li><p><strong>Bidirectional Transformers</strong>: BERT uses bidirectional self-attention (vanilla Transformers use left-only self-attention)</p></li>
<li><p><strong>Encoding</strong>: Input embeddings are actually a sum of the raw token embeddings (WordPiece), segment embeddings to tell which sentence it's from and a sine/cosine positional embedding. </p></li>
</ul>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p>SOTA scores for many NLP tasks and benchmarks such as GLUE and SQuAD.</p></li>
<li><p>Better results than GPT-1 with the same number of parameters</p></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li><p><strong>Feature-based</strong> adaptation vs <strong>fine-tuning</strong>: <em>"There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning"</em></p>
<ul>
<li><strong>Feature-based</strong>: <em>"task-specific architectures that include the pre-trained representations as additional features"</em><sup><a href="#myfootnote1">1</a></sup> </li>
<li><strong>Fine-tuning</strong>: <em>"introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pretrained parameters"</em><sup><a href="#myfootnote1">2</a></sup> </li>
</ul></li>
<li><p><strong>Architecture</strong>: <em>"A distinctive feature of BERT is its unified architecture across different tasks. There is minimal difference between the pre-trained architecture and the final downstream architecture."</em></p></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>They mention that the Billion Word Benchmark is a collection of <em>shuffled</em> sentences and this hurts <em>document-grain</em> comprehension.</p></li>
<li><p>During the fine-tuning task, all pre-trained parameters are updated. No frozen layers.</p></li>
<li><p>BERT can be used to just produce embeddings to be used downstream too. It performs slightly worse than in the fine-tuning approach but is still very good.</p>
<ul>
<li>Note that it's possible to use several model layers as embeddings, not just the last layer!</li>
</ul></li>
</ul>
<h2 id="my-2">MY 2¢</h2>
<p>Very important point: left-only (as in, unidirectional) Transformers are also called <em>Transformer Decoders</em> (because they can be used to generate text) while bidirectional transformers are called <em>Transformer Encoders</em> in the literature.</p>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/1810.04805.pdf">Devlin et al 2019 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a></li>
</ul>
<hr>
<h3 id="footnotes">Footnotes</h3>
<p><a name="myfootnote1">1</a>: One example of a feature-based strategy is <a href="https://arxiv.org/abs/1802.05365">Peters et al, 2018: Deep Contextualized Word Representations</a></p>
<p><a name="myfootnote1">2</a>: Fine-tuning is the strategy used by GPT-1 (Radford et al, 2018)</p>
<p><a name="myfootnote3">3</a>: As opposed to <em>feature-based</em> (see <a href="#quotes">quotes</a>)</p>
2022-12-31T22:51:53-03:00http://queirozf.com/entries/paper-summary-long-short-term-memory-networks-for-machine-readingPaper Summary: Long Short-Term Memory-Networks for Machine Reading2022-12-26T00:00:00-03:00Felipe<blockquote>
<p><span style="font-weight:bold">Please note</span> This post is mainly intended for my <strong>personal use</strong>. It is not peer-reviewed work and should not be taken as such.</p>
</blockquote>
<h2 id="what">WHAT</h2>
<p>Authors present an enhancement to how Attention is used in LSTMs, namely <strong>intra-attention</strong> or <strong>self-attention</strong></p>
<p>They name it LSTMNs (Long Short-Term Memory Networks)<sup><a href="#myfootnote1">1</a></sup></p>
<h2 id="how">HOW</h2>
<p>In the LSTMN, the attention mechanism is added <strong>within</strong> the encoder (whereas in previous implementations it was added <strong>between</strong> the encoder and the decoder.)</p>
<p>Authors present <strong>two ways</strong> of integrating self-attention into LSTMs:</p>
<ul>
<li><p><em>"Shallow Fusion"</em>: Use encoder-decoders and both use self-attention</p></li>
<li><p><em>"Deep Fusion"</em>: Use encoder-decoders and they use both inter-attention and self-attention</p></li>
</ul>
<p><div class="img-div" markdown="1">
<img src="//queirozf.com/images/contents/2pC1w53.png" alt="self-attention-deep-shallow-lstm">
<em>On the left the <i>Shallow Fusion</i> integration technique and on the right <br/>the <i>Deep Fusion</i> technique, where the encoder and the decoder <br/>have <b>both</b> regular and self-attention</em>
</div></p>
<h2 id="why">WHY</h2>
<p>Traditional LSTMs with Attention may have a hard time storing knowledge that:</p>
<ul>
<li><p>Requires it to store long sequences of text</p></li>
<li><p>Has structure (other than sequential ordering)</p></li>
</ul>
<p>Traditional LSTMs have to recursively <em>compress</em> the knowledge in its memory cells after each iteration; this makes it harder for them to represent finer concepts accurately.</p>
<h2 id="claims">CLAIMS</h2>
<ul>
<li><p><strong>Language modelling</strong></p>
<ul>
<li>LSTMN beats traditional LSTMs with the same memory (as measured by perplexity)</li>
</ul></li>
<li><p><strong>Sentiment Analysis</strong></p>
<ul>
<li>LSTMN beats traditional LSTMs on this task (measured by accuracy)</li>
<li>But a CNN (called T-CNN) was better than both LSTMN and traditional LSTMS</li>
</ul></li>
<li><p><strong>Natural Language Inference</strong> (textual entailment)</p>
<ul>
<li>LSTMNs beats traditional LSTMs on this task (measured by accuracy)</li>
</ul></li>
</ul>
<h2 id="quotes">QUOTES</h2>
<ul>
<li>On <strong>self-attention</strong>: <em>"A key idea behind the LSTMN is to use attention for inducing relations between tokens"</em></li>
</ul>
<h2 id="notes">NOTES</h2>
<ul>
<li><p>Model is tested in the following tasks: language modeling, sentiment analysis, and natural language inference</p></li>
<li><p>The term "self-attention" doesn't seem to show up in this article - they call it <em>"intra-attention"</em> (as opposed to Bahdanau's <em>"inter-attention"</em>)</p></li>
<li><p>There was <strong>no pre-training</strong> (self-supervised or otherwise)</p>
<ul>
<li>But they used pretrained embeddings</li>
</ul></li>
</ul>
<hr>
<h3 id="references">References</h3>
<ul>
<li><a href="https://arxiv.org/pdf/1601.06733.pdf">Cheng et al, 2016: Long Short-Term Memory-Networks for Machine Reading</a></li>
</ul>
<h3 id="footnotes">Footnotes</h3>
<p><a name="myfootnote1">1</a>: "Memory networks" refer back to <a href="https://arxiv.org/pdf/1410.3916.pdf">Weston et al 2015: Memory Networks</a></p>
2022-12-25T17:02:11-03:00http://queirozf.com/entries/as-a-manager-tell-reports-why-as-often-as-possibleAs a Manager: Tell Reports "Why" as Often as Possible2022-12-27T00:00:00-03:00Felipe<h2 id="not-telling-them-why-slows-their-development">Not telling them "Why" slows their development</h2>
<p>You lose an opportunity to get people to understand the real reason why they are doing the work they are doing.</p>
<p>When people know the reason for things, they:</p>
<ul>
<li><p>Will be better able to handle work on their own when you are not there to help them (autonomy)</p></li>
<li><p>Will develop more of an <strong>"owner mentality"</strong> because they will be able to connect their work with the real-world impact the company has</p></li>
<li><p>Can even think of <strong>better ways to achieve the underlying objective</strong> - often in ways that you may not have considered yourself</p></li>
</ul>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 257px !important; height: 257px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_medium_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="3340325586"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="make-an-effort-to-articulate">Make an effort to articulate</h2>
<p>After years of experience in your field, you will have seen things happen over and over again and by the time it happens for the 10th time, you already know how it will end.</p>
<p>So you, as a manager, will sometimes tell reports things like:</p>
<p><code>"Do it like this. It's better, trust me."</code></p>
<p><strong>Why do you do this?</strong></p>
<p>You have 10 other things to do during the day so you just don't have the time to explain:</p>
<ul>
<li><p>Why it's better to do X instead of Y as that will unblock other tasks being done by another team;</p></li>
<li><p>Why it's better to do X first because you will de-risk the project and simultaneously enable some person to do a task they are good at right now before they leave on vacations</p></li>
</ul>
<p>But at the end of the day, it <em>is</em> possible to articulate these fuzzy nuggets of experience. It will take some minutes to write it down or explain it in spoken form but it's doable.</p>
<p><strong>Don't be lazy.</strong> It's your job to help people grow.</p>
<h2 id="write-things-down">Write things down</h2>
<p>Writing stuff down is much more scalable than coaching on a 1:1 meeting. For one, text scales to multiple people and it scales across time<sup><a href="#myfootnote1">1</a></sup> .</p>
<p>After you catch yourself having to explain something to someone more than 2-3 times, <strong>it's time to write it down.</strong></p>
<p>Examples from myself:</p>
<ul>
<li><p><a href="https://queirozf.com/entries/how-to-ask-for-tech-support">How to Ask for Tech Support</a></p></li>
<li><p><a href="https://queirozf.com/entries/if-you-use-git-you-should-be-very-liberal-in-deleting-stale-code">If you use git you should be very Liberal in Deleting Stale Code</a></p></li>
</ul>
<div style="text-align:center; margin-bottom: 7px; margin-top: 7px; min-height: 287px !important; height: 287px !important;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- ab_test_large_square -->
<ins class="adsbygoogle"
style="display:inline-block;width:336px;height:280px"
data-ad-client="ca-pub-2217532725941275"
data-ad-slot="7164375745"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<h2 id="get-them-to-try-and-guess">Get them to try and guess</h2>
<p>If you absolutely do not have the time to explain the rationale behind some decision, the next best thing is to get them to try and guess the underlying reason and check back with them later:</p>
<p><code>"Can you see why doing X is better for the project right now than doing Y? Think about it and we'll follow up on our next meeting"</code></p>
<p>(Can also be followed up async via text if the next meeting is too far away into the future)</p>
<hr>
<p><a name="myfootnote1">1</a>: Meaning: <em>a)</em> once a document is written, it can be read by 1 or by 1,000 people at no additional cost to you and <em>b)</em> a document can still be used for years after it was written.</p>
2022-12-11T21:18:16-03:00