[ $davids.sh ] — david shekunts blog

🤩 Finally, Useful Tips for Working with LLMs 🤩

# [ $davids.sh ] · message #301

🤩 Finally, Useful Tips for Working with LLMs 🤩

I’ve been ranting about how "LLMs are dumb" (and that’s still true), but suddenly I stumbled upon a few approaches that completely changed my perspective on them, ladies and gentlemen—I’m in love:

  • Shotgunning – an insane life hack for working with LLMs
  • Architecture as Code – and nothing else. Someone in the comments suggested an MCP-over-RAG approach for all kinds of documentation, which is even better (if possible)
  • Copilot – trash, Cursor – legend

More details, as always, in the comments.

#ai

  • @ [ $davids.sh ] · # 1979

    # Shotgunning

    Inspired by this post, I decided to try shotgunning:

    • Quickly, but in detail, try to describe the architecture of the current solution + the task you are facing
    • Glue it all together with the ENTIRE codebase and start sending it to the AI
    • Initially say: "you are an architect, ask questions to detail the Architecture and Technical Specification"
    • Answer questions and get the Architecture
    • Send it again: "you are a project manager, describe in detail the tasks that need to be done"
    • Send the result again: "you are a group of senior developers, change the code to implement the feature"
    • Again: "you are a QA team, write tests and check the code"

    I do the last 2 stages directly in the IDE, but the result is simply fantastic...

    When you give the LLM the entire context with a lot of details that it wrote itself, it starts to produce a really cool result.

    The only thing is, it will take a long time to generate small features/edits, so it's better to make further iterations on edits in a large list, again through: "I want to fix this, help me describe it in more detail, write tasks on how to fix it, change the code, test it."

    # Architecture as Code – and nothing else

    The next important aspect is that all textual results generated in the previous stage should be stored in the repository along with the code. Just like all other documentation you will write.

    Because as soon as you describe your architecture in words and add it to the AI, it immediately becomes x100 smarter.

    In short, forget Notion, Google Docs, and other crap, and now write all documentation only in repos.

    # Copilot is crap, Cursor is awesome

    As a VSCode fan, I kept trying to get something out of Copilot, but gods... as soon as I opened Cursor and spent an hour with it, I realized I had been struggling in vain for so long.

    It understands context, doesn't cut off mid-sentence, is faster, and allows you to create rules and doesn't lose them.

    In short, don't even try, just get Cursor and that's it.

    For those who switched to something else after Cursor, the question is: what and why?

  • @ Ivan ITK 🚫 · # 1981

    In fact, MCP with GraphRAG on all docs, including those on the stack, is better.

  • @ [ $davids.sh ] · # 1982

    Mmm, sounds really sexy.

    We collect Github, Gitlab, Confluence, Notion, Swagger sites, Google Docs, put them into a DB, build RAG / CAG and MCP on top, and then the company has an awesome system that knows everything and can write code across the entire codebase.

    If this doesn't exist yet, we need to make it urgently.

  • @ [ $davids.sh ] · # 1984

    Do you mean this GraphRAG? What exactly is its unique selling point compared to other RAG libraries?

  • @ Ivan ITK 🚫 · # 1985

    This is generally a chunking approach for embeddings where the connection between the content of one document and relevant pieces is built not only in vector space by tokens but also through connections created from document structure, categorization, tagging, and technology separation.

    For RAG to have high quality, it's important to perform good clustering in vector space so that unrelated topics are further apart than related ones, and the score differs significantly in the selection.

  • @ Ivan ITK 🚫 · # 1986

    This is what it looks like if you split purely by tokens in vectors. There will always be a lot of overlap, as in fact the same words/terms are used and everything revolves around IT. Clustering purely on tokens doesn't make much sense. That's why there are hybrid searches, where we filter which vectors to search in based on additional features. However, a document exceeding an average of 1536 tokens will be split into several parts, as the embedding model cannot process more tokens. And here arises the problem of connecting these parts, as they all become a flat structure and different pieces from different documents will be returned in the results with the same score, simply due to token (in fact, word) coincidence, excluding the semantic connection of these pieces and their sequence.

    The example above is precisely about relationships, family, sex, and psychology, which largely overlap for the reason described above.

  • @ [ $davids.sh ] · # 1987

    So, conditionally, we don't just store vectors, we also create, store, and use "vector connections" (a graph), which also participates in forming the answer?

  • @ Ivan ITK 🚫 · # 1988

    It's more accurate to say that this is an alternative graph where we can link pieces of our documents based on additional logic, since the vector space itself already represents a graph, but there the connection is based on the proximity of vectors, which are in turn based on tokens (words).

  • @ Ivan ITK 🚫 · # 1989

    I'll be a bit of a lazy ass, but he explained it well)