Large Language Models (LLMs) continue to grapple with real-world applications.
Background: NotebookLLM is a tool being created at Google to help you make a sense of a large batch of sources. For example you’ve read and highlighted many books on a subject. Or you have a stack of research papers in an area you want to write about. You take your highlights or documents, upload them to NotebookLLM and talk to them. Google brought in one of my favorite science writers, Steven Johnson - “Extra Life”, “Where Good Ideas Come From”, “Farshighted” and more; to help them build this app. As a writer and inveterate notemaker I want to love this application.
Last week, I critiqued NotebookLLM’s readiness for prime-time use (https://mlevison.com/blog/notebook-llm-the-promise-of-efficiency-or-just-a-rusty-set-of-shears) based on the story. This week, after listening to Hardfork, I realized I have access to the tools and decided to explore. Particularly in relation to an upcoming Certified Scrum Developer workshop.
As part of my preparation for this workshop, I wanted to create a summary of Example Mapping. A perfect opporunity to assess NotebookLLM’s strengths and weaknesses.
I found five articles on Example Mapping:
- https://cucumber.io/blog/bdd/example-mapping-introduction/
- https://insideproduct.co/example-mapping/
- https://draft.io/example/example-mapping
- https://www.lostconsultants.com/2016/05/11/user-stories-acceptance-criteria-exercise/
- https://agileforgrowth.com/blog/acceptance-criteria-checklist/
It offered to write a “brief” on the subject. I got a mess on User Stories/INVEST. Since User Stories weren’t really in the articles I fed it, I can only assume I broke into its training data. I’ve attached a picture of it generated. (Notice a couple of sentences related to the articles).
Next up I gave it an outline of my article and asked it what I was missing. I told me I was missing things in the article (ex 25 min time limit; just write examples not Gherkin) or it focused on things of low importance (Three Amigos and specific Card Colours).
When I had a final draft article, asked it for a further critique and it won’t allow me to hit submit. Result, I will never get its final feedback for this writing.
Just for fun, I took the same writing and ran it through an LLM running on my Mac - using the model: Mistral-Nemo-Instruct-2407-GGUF
I asked what I was missing on the subject, it gave me as good an answer as the NotebookLLM. I asked it for a critique of my writing, it had some good points.
I’m going to try and write a deep article on pair programming, let’s see if it is more helpful with 30+ sources than it was with 5.
I want a tool like this to exist and run on my own machine. So far we still seem a long way off.
#Influence #Ship30For30