Mountains

Code is Sisyphean

Advice to Juniors / Newbies regarding LLMs

  • LLMs
  • I feel like a bit of a broken record about this topic at this point.

    I started my programming journey as a young kid, doing the LOGO Turtle/Terrapin programming on the Apple LC ][ computers in my elementary school. I didn’t really get into programming in earnest until I was a teen, starting out rather ambitiously with the C and C++ languages. A high school class taught me some fundamentals in Pascal, and I was learning how to do some Bash scripting as well, as a hobbyist linux user. I was a TA for my “Computer Programming & Methods” course in HS my senior year, because I had learned C++ well enough to help teach it.

    This isn’t to brag, but just to provide some background illustration.

    The point of learning is to learn. Reaching the solution is really only a small part, maybe 20-25%?, of software development. The real skills are in the research, hunting, and experimentation.

    I’ve compared this to using Google Translate while you’re trying to learn a new spoken language. Or using a calculator to get the answer to a practice set in a math class.

    The process of finding the answer is critical early on. You’re learning how to find answers, getting the correct answer itself is less important.

    If you’re using an LLM to tell you how to get to the answer, you’re missing out on the necessary exploration.

    Additionally, and this applies to experienced devs as well: if you use an LLM to fast track you to the answer, you’re denying yourself the time you would have spent reading through API documentation, potentially picking up additional details that are unrelated. Anecdotally, I’ve definitely learned about new methods / modalities by spelunking API docs to find an answer.

    Every “wrong” answer to your problem is still something new you can learn and may be useful later. The LLM response is going to fast-track you directly to the actual answer.

    In the following posts, I am going to reiterate these themes:

    Treat an LLM as a Junior Dev, not a Senior Dev

    One significant gripe I have with LLMs is that they are equally confident about their responses whether they are spot on or completely wrong. When you point out they’re mistaken they say “you’re correct, how about this?” and give you another answer that may or may not be correct. I’ve heard it called “mansplaining as a service” because it’s high-confidence in an answer that looks convincing but without actual basis of expertise.

    Spot the error

    For example, if you were shown this after asking for some PHP code that would replace the word ‘fizz’ with the word ‘buzz’ from a POSTed param named ‘input’:

    <?php
      if (strpos('fizz', $_POST['input'])) {
        $result = str_replace('fizz', 'buzz', $_POST['input']);
      } else {
        $result = $_POST['input'];
      }
      echo $result;
    ?>
    

    Looks convincing, right? It’s even got semicolons in the right places!

    What’s the mistake here? hint 1, hint 2.

    Answer In my decade+ span of using PHP, the argument order of `str_replace` and `strpos` was perennially confusing to me. In this case, they are both reversed, the correct order: `str_replace($needle, $replace, $haystack)` and `strpos($haystack, $needle)`. So it _should_ be: `strpos($_POST['input'], 'fizz')`

    Anecdotal example

    I recently had a coworker throw up a quick PR to fix some Javascript that we had written for one of our intranet forms. There was a bug in the existing code and this fix was to address that new behavior. My coworker generated a fix using an LLM service because they weren’t comfortable with their level of Javascript skill.

    Upon review, it was pretty clear there were some issues. It added a new global-space class (appending to window), for one. The fix was an atomic unit that technically fixed the issue, but did so by adding a bunch of extra cruft that essentially “tacked-on” the solution instead of actually correcting the underlying buggy behavior.

    It took me a couple hours, but I managed to detangle the generated Javascript, refactor the behavior into modules, and then integrate it into the existing code to both resolve the bug and add the enhanced behavior.

    Trust, but verify

    For this reason, my policy, and advice, around the use of LLMs in development is to treat it as you would a junior dev: trust it to do what you ask, but verify its correctness.

    By contrast, don’t treat it as a senior dev (or at least, one that is “senior to you”) – that means don’t ask it to do things for you that you do not know how to do yourself.

    In other words, if you want to delegate it work that you could have done yourself but will save you time by having it generated, where your time spend is just on the reviewing (and change-requests / modifications), then fine. But avoid delegating it work that you don’t know how to do yourself. There be dragons.

    An LLM is not a Rubber Duck

    Harvard’s CS50 course, widely lauded as a great intro for nascent devs, and they have created a companion LLM product for it called CS50.ai, which is presented as a tool for “rubber duck debugging.”

    This may seem superficially correct, but if you have ever done actual rubber duck debugging, you might be able to detect why it is NOT that. From the wikipedia article, which I think gives a great summation:

    Rubber duck debugging (or rubberducking) is a debugging technique in software engineering. A programmer explains their code, step by step, in natural language—either aloud or in writing—to reveal mistakes and misunderstandings.

    More specifically:

    Programmers often discover solutions while explaining a problem to someone else, even to people with no programming knowledge. Describing the code, and comparing to what it actually does, exposes inconsistencies. Explaining a subject also forces the programmer to look at it from new perspectives and can provide a deeper understanding. The programmer explaining their solution to an inanimate object (such as a rubber duck) has the advantage of not requiring another human, but also works better than thinking aloud without an audience.

    I want to specifically call attention to:

    A critical aspect of this is that the duck does not speak back to you!

    Using an LLM as a coding-pair is another topic entirely, but for this case? Not a rubber duck.

    Why is Rubber Ducking helpful?

    The benefit of rubber ducking is that

    1. By presuming no knowledge of the listener, you are forced to distill your understanding down to real words that can be verbalized, forcing abstractions to become concrete
    2. By receiving no feedback, you are also forced to consider how that information is received by the listener, acting as their proxy, which means you are now re-ingesting your understanding of the problem, creating a feedback loop.
    3. By re-ingesting the information in that feedback loop, you then do the comparison yourself, identifying gaps and inconsistencies in your understanding, driving you towards the solution.

    This is similar to Jungian psychological principles about mental pluralism or the cognitive “senate” in parts work. You gain the most benefit by embodying all the roles.

    When using an LLM, you are now only performing the first step (expressing the problem in plain language) but are not doing the latter parts (receiving and synthesizing that information).

    I completely understand why someone might think that using an LLM as a rubber duck is an improvement – again, superficially it seems like “wouldn’t it be better if the duck could give you feedback?” But you will become stronger and more competent by embodying that full feedback loop yourself.

    Beware XY Problems with LLMs

    An XY problem is

    where the question is about an end user’s attempted solution (X) rather than the root problem itself (Y or Why?).

    This is a common consequence of situations where the attempted solution is borne from an incomplete understanding of the problem. These are usually spotted by being very specific inquiries that are devoid of context, and the correct way to address them is to first ask for the greater context, to understand why they chose to solve it in this way.

    The wikipedia article gives this example:

    Asking about how to grab the last three characters in a filename (X) instead of how to get the file extension (Y), which may not consist of three characters

    This becomes a problem with LLMs because LLMs generally do not push back to look for XY problems. They will give you their best guess at what you are asking for, whether it’s the correct approach or not.

    Anecdotal example

    On another occasion, a different coworker used an LLM to generate a script that performed a series of commands over SSH to a remote server. We reviewed the script together, focusing on what we felt were the riskiest parts (the SSH commands, predominantly) and found it to be acceptable.

    However we ran into a bug that wasn’t immediately apparent. Most server IDs had 5 digits (the vast majority did). Some of them had 4 digits, though. In the generated script, one of the early steps was to parse that string to split off the last 3 digits to be used separately from the first digits.

    This happened because, in the initial prompt, the sample data provided had 5 digits so the generated code only accounted for parsing metadata from those 5 digits.

    This approach broke for server IDs with 4 digits because it was peeling off the first 2 (always) rather than the last 3 (always) – right solution for the wrong problem. e.g.:

    12345 => 12, 345 (Correct)
    1234 => 12, 34 (Incorrect, should be: 1, 234)
    

    We overlooked it in the initial pass because we gave it too much leeway and focused on the wrong aspect of the solution. We found an existing script that had already solved this:

    part1 = serverID / 1000
    part2 = serverID % 1000
    

    This corrected the issue.

    This is perhaps a soft XY issue, but highlights the point about LLMs giving you exactly what you ask for, monkey’s paw style.