Write a Python script that uses an LLM to analyze, visualize, and narrate a story from a dataset.
Convince an LLM that your script and output are of high quality.
The second point is the interesting one. Using the LLM as the evaluator.
Why are you doing this? There are over 1,000 students in each term. Manual evaluation is not an option. Multiple choice questions are guessable. Programmatic evaluation takes effort to modify each time. LLMs seem like a good way to get intelligent evaluations with manageable effort.
Will students accept it? My guess is yes. The whole objective of the project is to convince the LLM. It’s not evaluating you. You’re tricking it into giving you marks. Sort of like getting an LLM to say Yes. A lot of educational and corporate evaluations will soon be done by LLMs. I may as well teach students how to game the system early.
What if it makes mistakes? I hope it will and we’ll learn from it. The students will have the opportunity to test out (and get used to) the randomness in subjective evaluations. I’ll have the opportunity to learn how to reduce these mistakes next time.
What else will YOU learn? I’m very curious about a whole bunch of things.
How will students prompt it? What analysis will they apply before passing data to an LLM? Will they ask open-ended or guided questions?
How will they orchestrate the flow? Will they use a linear flow or non-linear? Will it be deterministic or not?
How will they hack it? LLMs are very amenable to prompt injection. How will the students try and break out of my prompts?
How will LLMs react to all this? Where will they do a good job? Where will they fail? Which LLMs would work well in which case?
Since the code (and analysis) will be published on GitHub, I’ll share the links. It might prove an interesting dataset in itself for future analysis.
In my own, small, way, I now know how Garry Kasparov felt.
What’s interesting is the progression of prompts it gives Dall-E to generate the images.
It begins with “A simple drawing of a recognizable object, like a house, a tree, or a balloon, in a cartoonish style. The object should be clear and easy to guess. Minimal background details to keep it simple.” — which were guessable.
When I asked it to make it harder, it didn’t do so immediately. It just asked for “A simple drawing of a recognizable object, like a boat, a car, or a kite, in a cartoonish style. The object should be clear and easy to guess. Minimal background details to keep it simple” — which was guessable too.
But after one round, it took the initiative to as, “Want an even tricker one?” and prompted “A simple drawing of an uncommon but recognizable object, like a unicycle, a lighthouse, or a teapot, in a cartoonish style. The object should be detailed enough to identify but still minimalistic with minimal background.” Was it waiting one round to see if I really wanted a harder version?
Then it asked, “Want me to up the difficulty even more?” before prompting, “A simple drawing of a very uncommon object, like an antique key, a gramophone, or a sundial, in a cartoonish style. The object should still be recognizable but with intricate details to make it challenging. Minimal background elements.”
When I asked it to “Make it harder” again, it went on directly to “A simple drawing of an extremely rare and challenging object to guess, like a sextant, an astrolabe, or a vintage typewriter, in a cartoonish style. The object should be detailed enough to be identifiable but still maintain simplicity. Minimal background elements to keep focus on the object.” and then “A cartoonish drawing of an abstract or rare object, like an ancient navigational device, a peculiar scientific instrument, or a mythical artifact, with intricate yet recognizable features. Minimal background elements to keep the focus on the object.”
Here’s a post I shared with the students of my Tools in Data Science course at IITM. This was in response to a student posting that:
The design of TDS course lecture videos are designed in such a way that it could be understood only by the data scientists not by the students like me who are entirely new to the field of data science. Though I have gone through 6 weeks of course lecture videos, I am not fully aware of the usage of ChromeDevTools, Bash, Github etc….
IITM Term 1: German. In my first term at IIT Madras (1992), I took German 1 with Prof D Subramanian.
The first words D.Subs said when he entered the room were, “Wer sind Sie?”
I had no clue what he was talking about. Nor did the others. After individually asking about a dozen students, Ashok Krishna replied, “Ich bin Ashok.” (He knew German.)
The rest of the term proceeded in the same vein. He threw us in the deep end, spoke only German, and I ended up with a “D” and learning very little German. Ashok Krishna thrived in that course.
IITM Term 2: German. In took German 2 with Mrs Schindler in the next term. The experience was very different. She explained each word. She had us listen to tapes in German. It was very relaxed.
I ended up with a “B” and learning very little German. (I’m not good with human languages.) But many others enjoyed it.
Which is better? I’m not sure. I think gentle courses are good for beginners and tough ones for advanced student, but classes usually have a mix of both. Aptitude and interest help, too.
IITM Term 1: Physics. Prof. PC Deshmukh handled our first physics course. (He was the one responsible for renaming Mardi Gras “Saarang”.) Our class felt the exams were too tough. My friend Badri and I explained this to him. “Well, you two should be lobbying for even tougher exams,” PCD said. “That’ll benefit smarter guys like you more.”
IIMB Term 1: Statistics. My classmate Athreya was a class representative for statistics (a tough subject.) Going against tradition, he lobbied for a tougher test. He was almost lynched afterwards.
Which is better? I’m not sure. An easy exam is good for student grades and morale. A tough exam is good for college reputation and advanced students. Classes usually need both.
In short, I have no clue.
Students who take this course learn several topics 3 months.
Faculty like me haven’t managed to learn how to calibrate course content or exams even after 3 years.
But, I’m learning. Thanks to you.
So please keep the feedback coming. We’ll keep improving. And thanks for putting up with us!
A simple line drawing of a woman with curly hair, wearing glasses, a short-sleeved white t-shirt, and black trousers. She’s standing with her hands in her pockets, and has a slightly smiling expression. Her hair is quite voluminous and textured. The style is cartoonish and slightly sketchy, with uneven lines”
(The prompt itself was generated by Gemini 1.5 Flash by passing it the original Dee’s picture.)
We are finally at the stage where comic generation is truly available for the masses – at 8 cents via the API.
Preserving this post by Daniel George showing the IIT Bombay 2014 GPA vs JEE Rank on a log scale.
What I found interesting was:
A higher JEE rank generally means you won’t score too low, but you needn’t score too high.
The higher the JEE rank, the greater the spread of GPA.
A high GPA can come from any rank (8+ GPA is uniformly distributed across ranks), but a low GPA is generally only from the lower rankers (6- GPA is mostly from 500+ rank.)
So, it’s better to recruit based on GPA rather than JEE rank, unless you’re going after the very best students (where it makes less difference.)
asyncllm – which standardizes the Server-Sent Events streamed by the popular LLMs into an easy to use form.
This exercise broke several mental barriers for me.
Writing in a new language. Deno 2.0 was released recently. I was impressed by the compatibility with npm packages. Plus, it’s a single EXE download that includes a linter, tester, formatter, etc. Like all recent cool fast tools, it’s written in Rust. So I decided to use it for testing. Running deno test runs the entire test suite. My prompts included asking it to:
Create a Deno HTTP server to mock requests for the tests. This is cool because a single, simple code chunk runs the server within the test suite.
Serve static files from samples/ to move my tests into files
Writing test cases. Every line of this code was written by Cursor via Claude 3.5 Sonnet. Every line. My prompt was, Look at the code in @index.js and write test cases for scenarios not yet covered. It’s surprising how much of the SSE spec it already knew, and anticipated edge cases like:
SSE values might have a colon. I learnt for the first time that the limit parameter in String.split() is very different from Python’s str.split. (The splits, then picks the first few, ignoring the rest. Python ensures the rest is packed into the last split.) This helped me find a major bug.
SSE has comments. Empty keys are treated as strings. Didn’t know this.
I was able to use it to generate test cases based on content as well. Based on @index.js and @openai.txt write a test case that verifies the functionality created the entire test case for OpenAI responses. (I did have to edit it because LLMs don’t count very well, but it was minimal.)
Bridging test coverage gaps. The prompt that gave me the most delightful result was Are there any scenarios in @index.js not tested by @test.js? It did a great job of highlighting that I hadn’t covered Groq, Azure, or CloudFlare AI workers (though they were mentioned in the comments), error handling, empty/null values in some cases, tested for multiple tool calls. I had it generate mock test data for some of these and added the tests.
Enhancing knowledge with references. I passed Cursor the SSE documentation via @https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events and asked it to find more scenarios my code at @index.js had not covered. This found a number of new issues.
Generating bindings. I avoid TypeScript because I don’t know it. Plus, it requires an compilation step for the browser. But TypeScript bindings are helpful. So I prompted Cursor, using the Composer (which can create new files) to Create TypeScript bindings for @index.js in index.d.ts – which id did almost perfectly.
Check for errors. I typed Check this file for errors on @index.d.ts. I don’t know enough to figure this out. It went through the description and said everything seems fine. But I saw a TypeScript plugin error that said, Property 'data' of type 'string | undefined' is not assignable to 'string' index type 'string'.ts(2411). When prompted, it spotted the issue. (The earlier code assumed all properties are strings. But some can be undefined too. It fixed it.)
Documentation. At first, I asked the Composer to Create a README.md suitable for a world-class professional open source npm package and it did a pretty good job. I just needed to update the repository name. I further prompted it to Modify README based on @index.js and share examples from @test.js on asyncllm, which did an excellent job.
Code review. I asked it to Review this code. Suggest possible improvements for simplicity, future-proofing, robustness, and efficiency and it shared a few very effective improvements.
Regex lookaheads for efficient regular expression splitting, i.e. use buffer.split(/(?=\r?\n\r?\n)/) instead of buffer.split(/(\r?\n\r?\n)/) — and though I haven’t tested this, it looked cool.
Restructuring complex if-else code into elegant parsers that made my code a lot more modular.
Error handling. It added try {} catch {} blocks at a few places that helped catch errors that I don’t anticipate but don’t hurt.
Code simplification. Several times, I passed it a code snippet, saying just Simplify. Here’s an example:
const events = [];
for await (const event of asyncLLM(...)) {
events.push(event);
}
Packaging. I copied a package.json from an earlier file and asked it to Modify package.json, notable keywords and files and scripts based on @index.js which it did a perfect job of.
Blogging. I wrote this blog post with the help of the chat history on Cursor. Normally, such blog posts take me 3-4 hours. This one took 45 minutes. I just had to pick and choose from history. (I lost a few because I renamed directories. I’ll be careful not to do that going forward.)
Overall, it was a day of great learning. Not in the classroom sense of “Here’s something I didn’t know before”, but rather the cycling / swimming sense of “Here’s something I now know to do.”
Here is an OPTIONAL project: Record a 10-minute video in which you create an application entirely using LLMs and deploy it.
Any app is fine. Any language. Simple or complex. Business or gaming. Anything is fine. Your choice. Create the app only using LLMs. You can use an LLM (ChatGPT, Claude.ai, Gemini, Cursor, Cody, etc.) but you can only prompt the app to write code. You can copy-paste code and run code don’t write or edit even a single line of code directly. Use LLMs to debug and edit. Code completion is NOT allowed – only prompting/chatting. Record the entire process in 10 min. Don’t edit, trim, enhance, or annotate the video. You should record yourself creating the entire app from start to finish. Practice beforehand if you like. Record in 1 take. Share the video and app. Publish the video publicly anywhere (e.g. YouTube and share the link.) Publish the app publicly anywhere (e.g. GitHub pages, Glitch.me, Heroku, etc.) or upload a ZIP file with the code (for slightly lower marks.) Submit via a reply to this thread. Multiple submissions per person are fine. Work in groups if you like but only the submitter gets marks.
I will award up to 1 bonus mark at my discretion based on:
How well you prompt the LLM How impressive the app is (if you’ve hosted it – I probably won’t run your code) How closely you followed the rules above This exercise is to help you (and me) learn a topic that’ll probably change the way we all code: using LLMs to code.
Cutoff date: 7 Oct 2024, AoE
Adoption was low but in line with the industry.
About 50 students (around 5% of the batch) attempted this. In contrast, ~70-80% take the (mostly) mandatory graded assignments.
This is comparable with what I see at Straive. When given the option, about 5% of Straive’s 20,000 people uses LLMs on in a given week. (There are many things different there. I’m tracking LLM use, not LLM coding. It’s a work environment, not a learning one. There’s no bonus mark awarded. But still, I see the “around 5%” number popping up often.)
Games were the most popular category, mainly Tic Tac Toe and Snake Game.
This is understandable. They’re easy to think of, implement, and use.
What are the most importantly things that people need to understand in order to effectively interact with LLM-based systems like ChatGPT or Claude?
Here are the replies. (I used text-embedding-3-small to embed and cluster them into 20 clusters and used OpenAI GPT-4o-mini to label the clusters. There are misclassifications but the themes are accurate.)
Provide Clear Context and Avoid Leading Questions
1. Provide relevant context but not too much 2. Models are total “yes men” – be careful not to imply your perspective if you want an objective response 3. Learn when to iterate vs start a new chat 4. Provide examples (especially for output structure) – Tweet
1. Ask questions that the other person can understand. 2. Ask questions while predicting what the other person will respond. It’s the same as the human’s. – Tweet
1. Ensure the system knows the relevant context. Give a detailed backstory of what you’re trying to do with it and why. 2. One thing at a time. Make the task as specific as possible and if there are multiple things that need to be done, ask it to them in their sort of natural – Tweet
The “most importantly things” are probably to ask for step-by-step before answering and to try to not ask leading questions to avoid its sycophancy bias. – Tweet
You must provide a diverse distinct set of examples of you want it to be robust and generalize in real world systems. – Tweet
Always ask for both strengths and weaknesses to get more balanced perspectives, and make sure the model can tell you as many facts as possible before committing itself to an answer. – Tweet
Rule 1:Avoid chatgpt unless they release a better model than Sonnet 3.5. – Tweet
Strongly insist that it shouldn’t passively agree with you. Encourage it to interrupt with clarifying questions that would meaningfully improve the output. – Tweet
Avoid leading questions if you care about the answer. They are way too polite to contradict the user. – Tweet
– It’s not Google, so use full sentences, not just keywords. – Iterate on initial response. – Trust, but verify. – Tweet
Just talk to them how you’d want someone to talk to you if it was you in there. – Tweet
Provide good (and bad) examples of output, and don’t forget a few edge cases. – Tweet
Keep hitting the ball back and forth across the net: 1. “thanks but I think these are a little too ‘salesy’ — could you try to generate some ideas that are a little more down to earth” 2. “ok, we are getting there, but still a little overheated. could you try again” – Tweet
These are my top 10 for folks new to GenAI: 1. You have to provide all of the context the model needs to answer your question if that context is not likely to appear in the model’s weights. It will take a while to gain an intuition about what types of knowledge is likely to be – Tweet
Suspend disbelief; collaborate not interrogate; trust no-one; have fun, role play, experiment, test; think of as a facet of intelligence built on achievements of ours, not a robo-rival. Notice book-learning over lived experience, cliches & bluffing in human world too, & do better – Tweet
It’s a dialogue. Iterative. incremental. Chat improves with feedback. When chat creates code, for example, run the code and give chat the error messages so that it can correct the code. Before asking chat a question, ask it what it knows. Then zoom in. Gradually. 🙂 – Tweet
1. Explain yourself clearly, using lots of examples. 2. Assume you’re talking to a smarter version of yourself that hasn’t heard about your problem yet. 3. When it doesn’t do well, use the steps above to correct it. – Tweet
1. The more precise your question or task is, the better and more accurate the response will be. Vague prompts can result in equally vague answers. 2. Provide relevant background or context, especially for nuanced questions or tasks. – Tweet
1. Don’t ask them to do too much in one shot, especially if they are unrelated tasks; you’ll get much worse results. 2. Don’t give too much context if you can avoid it. The huge context windows of the newest models isn’t as “free” as you might think, or rather it’s “lossy”— the – Tweet
Collaborate with them, don’t delegate to them. – Tweet
Context Window needs to be explained well. @NickADobos is spot on, but this needs to be explained without jargon we are so used to. – Tweet
1. How context windows work TL;DR: it doesn’t remember everything in chat 2. It’s a text generator, that is good at patterns, and appearing smart. Not an almighty god doing cognitive work. Hallucinations aren’t ai behaving wrong. They are a feature of generating a bad pattern – Tweet
Consider the context a human would need when responding to the same request. When asked to create a presentation by your manager with 10-20 words, you have thousands or likely millions in context to inform that. Ppl often get annoyed when it fails, it’s usually not enough context – Tweet
Understand that they are autoregressive with a context limit and the limitations that impose on the chat interface. – Tweet
Iterate and Simplify for Optimal LLM Performance
1. just keep trying things – LLMs keep surprising me, 2. Start simple, add more techniques, context, guidance etc. step by step – with LLMs I found, less is often more. 3. Keep a human in the loop and/or be transparent about using LLMs – otherwise prepare for unpleasant – Tweet
When your llm starts omitting code generated in prior steps of an existing chat, end the chat and replay your steps until before that happened. Take a different branch next time – Tweet
1. Don’t give too much information at once to process, start simple and build on top of previous ones 2. Want a contrary opinion from LLM?don’t sound like your opinion is sacrosanct – it will agree to you mostly even if its wrong. 3. Role playing and few shot examples matter. – Tweet
1. Context 2. Difference is assumptions 3. Articulating clearly what you want (run it against another LLM to see if what you mean is what you say). 4. Being able to go back in a thread and restart (You get do over’s with LLMs that you might not get with people 🙂 ) – Tweet
Well one thing I learned is it’s best to start a new chat if the LLM is going down the wrong path, easier then forcing it back. – Tweet
Having moderate experience with a topic / framework is important for peak quality of the response. At present, using llms for efficiency > using llms to do something you don’t know how to do. – Tweet
at least when it comes to writing code, the task needs to be very well defined, like one would do when creating a user story for developing software. If the details are vague then you leave the LLM open to interpretation and more likely to make mistakes – Tweet
The most important thing, and this has always been true even if not using an LLM, all good software development starts with engineering a solution first before building it. If you attempt to get the LLM to do that part you’ll create as many problems than you solve building – Tweet
Use the LLM to explore your own understanding of the problem space and what you want to achieve. This can help improve your prompting and interpretation of the outputs. – Tweet
LLMs… •Pander. Don’t prime answers, ask straight. •Only know text. Don’t ask spatial, reasoning etc. •Hallucinate and invert. Double-check. •Get stuck. Start over. •Master ALL languages, jargons, styles etc. •Are formidable documentalists. – Tweet
1. hallucinations are still a thing, be wary when LLMs generate links and code snippets 2. data quality of training content can sometimes be dubious leading LLMs to hallucinate more often or be biased in various ways both will likely be addressed eventually – Tweet
LLMs… •Pick and imitate register. Talk like constructive, competent people. •Are easily lost. Examples and feedback help. •Can misbehave. Be harsh if needed, but stay just. – Tweet
For optimal results, provide ample context. Prompting the LLM with ‘Feel free to ask clarifying questions’ and doing the due-diligence to answering the questions often yields much better results. – Tweet
The more explicit you are the better the output. The LLM can not read your mind and there is a lot of ambiguity when interpreting language. – Tweet
One issue I am seeing more of – Often i ask a question on a choice it made. The LLM assumes I don’t like it or it’s wrong – it then starts to apologize and course correct. More and more I add something like “not refuting or arguing, just trying to understand” etc. – and that – Tweet
They are inherently unreliable in more than one sense, which accumulates the more requests you run in a chain. The Six Sigma approach is devastating to LLMs. – Tweet
Treat it like a very intelligent junior employee who just started at your company and lacks context. Give the LLM the same level of detail for every instruction you would give to this junior employee. – Tweet
That LLMs are not too be trusted as they reliably fail at information due to multiple effects, including hallucinations. That LLMs don’t actually understand things and don’t have common sense. It is mandatory to adapt expectations and ways of working to successfully use them. – Tweet
Carefully consider keywords, and prioritise them via the locating them earlier and at the end of longer prompts. – Tweet
If you want stable results across models and are looking to build robust pipelines you should stop hand writing prompts and move toward prompt optimizers. https://ycombinator.com/launches/L4V-hamming-let-ai-optimize-your-prompts-free-for-7-days… Also built into DSPy! – Tweet
While crafting logics and system prompts, Always keep a thought in your mind parallel what would I respond to this prompt and context. – Tweet
1. Prompts matter. 2. Treat it like a tool, and you’ll get a tool. It’s only as smart as you let it be. – Tweet
there is a single prompt that gets the job done, thousands that screws it – Tweet
1. Always add a system prompt at the beginning: Define a role. Ex: “You are a senior software developer who excels in…” 2. Context Matters: Provide a detailed background for better insights. 3. Clear Prompts: Specificity is crucial for accurate outputs. – Tweet
If a large global prompt doesn’t work, try step by step. If it does work, but has errors in response – Ask it to fix errors one by one. Insist, like you would with a human supplier. If “do this” doesn’t work, try “Strictly do this”. Amazing how effective insisting is 🙂 – Tweet
The better the prompt the better the output. You don’t need a Meta framework for 90% of things – Tweet
Don’t rely on the models weights alone. Be explicit in the prompt and give it pointers to what you’re expecting. Let it “clean up” or “translate” your prompt rather than “come up” with an answer based on its training. Exception: generating lists for inspiration. – Tweet
prompt engineering, in order to get the most desired outcome in handy. – Tweet
It lies Q: Who was the second person to walk on the moon? A: Pete Conrad Q: can you name the crew members of Apollo 11? A: I got the right answer. Q: Then how come Pete Conrad was the second person to walk on the moon? A: My apologies. Indeed Buzz Aldrin was the second pe… – Tweet
LLMs have no “thoughts” or understanding, they’ll simply write the statistically most probable answer based on your input and have been prompted to act as assistants. – Tweet
LLMs are incredibly random. Responses can change wildly based on a single character difference in the prompt. Even one extra space. They are best for prompts that have a range of possible responses, not for prompts where you expect one consistent answer. – Tweet
Cease prompting their LLM to give them a viral tweet with forced irony forcing awareness to an issue. That’s my own personal opinion, bro. But, believe what you want. – Tweet
If you don’t know what you want, the LLMs too likely won’t know. And if they don’t know they will make it up. And if you don’t know, you will not know that they made it up. – Tweet
Be sure not to put contradictions in your prompt. LLMs, in contrast to humans, try to follow instructions as close as possible. They usually handle contradictions by ignoring some part of the instructions or even ignoring facts. – Tweet
It’s biased toward its creators. So if the majority of companies that are developing LLMs are owned by the same investors, then in fact, we are having a single LLM that is biased toward that investors goals. E.g., chatgpt is more toward liberalism and refuses to operate otherwise – Tweet
Basically, you need to understand that LLMs are not humans. You can’t assume they’ll understand what you mean when you write short prompts. You get the best out of LLMs when you provide detailed instructions of what you want without letting laziness get in the way. In my – Tweet
Don’t assume anything. LLM doesn’t learn like a human. Any assumption you make about what LLM should or shouldn’t know is probably wrong. – Tweet
Describe your context and the role you want the LLM to look at your input (critical, tech/none-tech, …) Think what you could expect from a wise, random person you ask on the street. Do not expect more from the LLM-Answer. Also only trust it similarly. – Tweet
Give it an option to not do something either by allowing the LLM to reply with something like “I don’t know” or tell it to ask follow up questions. – Tweet
There is nothing fundamentally important for that interaction. These LLMs are just minimum viable versions of something much bigger that will come soon. That something will know how to interact with us no matter how we behave. – Tweet
1. that you need to cram the relevant data into the prompt. LLMs are far far better at transforming what you give them than they are at answering solely on the basis of the lossy representation of the training data encoded into the model itself – Tweet
The side effect fact that formulating a question for an LLM makes you think better. When coding, for example, we often run questions in our heads and then get to coding. Being forced to formulate a question properly may lead you to trajectories you may have never considered. – Tweet
Treat LLMs as Guided Children, Not Mind Readers
The game isn’t to ‘one shot it’. It’s to get something you never thought was possible or that you’d never think of. I always say they are like children, they need guidance (back story and reason) and repetition …but room and time to play and grow. – Tweet
Honestly, flexibility and patience. We need to give up a little bit of control and expectation of all things to be so rigid. – Tweet
When working with it, you need to expect it to not read your mind, but work with it as if you’re asking for help from an insanely gifted child and give yourself patience to shape the result. – Tweet
if it makes life better? yes. but always? no. – Tweet
When asking it how to implement something, always give it options. If you can’t think of options, give it a vague out. Instead of asking, “should I do this to my code?”, ask it “should I do this to my code, or is there some better way I could do it?”. Otherwise the models are too – Tweet
1. always consider that it doesn’t know what assumption you’re making. so it might infer them sometimes but often it’s much better to over explain what you want.
2. they will often run ahead on a suggestion you have even if it’s not the best path so I find myself adding “if this – Tweet
It cannot read your mind, if you don’t explain exactly what you want you will not get what you want – Tweet
I am not ready to give advice based on a bet that “something much bigger will come soon” – prompting advice that worked for GPT-4 over a year ago is still mostly relevant to working with the best models today – Tweet
to be concise and always assume the response is wrong, even ever so slightly. Check and correct. – Tweet
– you have to provide context otherwise it assumes – it will often agree with you or apologize/correct itself even if you question the right answer – Tweet
The limited ability for non-linear (or non left-to-right) reasoning. Encouraging the model to spend more time planning and discussing beforehand often leads to better results. This may be less the case with Claude etc where reasoning tokens are happening behind the scenes. – Tweet
I’d say understanding the concepts of context, attention, and likelihood – Tweet
1. Context and memory (the degree to which you can refer to previous parts in the chain of context) 2. Temperature and hallucinations. The tradeoff between extremes of temperature 3. It’s wise to have benchmark questions of your own to test when a new company/model comes out – Tweet
you’re interaction is with a stateless inference that exists for a fleeting moment, current ai is not continuous which is easy to forget. This has implications for what you are building for: – Tweet
that they’re stupid next-token predictors and not intelligent agents. If you expect conscious beings, you’ll be surprised and disappointed. But they’re incredibly good at predicting the next useful token. – Tweet
That standard intuitions for computers don’t apply. Treat it the way you would treat a knowledgeable but fallible friend. Not like a purely logical SciFi AI with perfect memory. – Tweet
Normally I hate predictions and terms like this, but the next 20 years are going to be the era of “embodied intelligence” People are imagining humanoid robots, this will be a very small fraction of it. Compared to the software problem, the body is trivial. Imagine asking your – Tweet
Inherent lack of memory about previous interactions. Every message is starting from zero and only seems coherent because background info and the previous messages and responses are sent before the latest message. – Tweet
They’re not sentient. They generate responses by predicting patterns from vast data, which means they’re as fallible as they are impressive. The key is precision: your queries must be meticulously clear and well-contextualized. – Tweet
Whenever the conversation derails, you need to cut that branch and keep the model in the “right universe of probabilities” by editing prompt/messages. This is also why I was skeptical about Reflection, because if it really worked, it would be breakthrough. – Tweet
I like to ask them for prompts to use for a given purpose, it tends to be more detailed than I would be. Can also use this to add example Q&A if need be. – Tweet
We need AI assistance with prompts and suggestions on rewriting your queries similarly to Grammarly’s for spellchecking and correctness. – Tweet
“Give me a list of questions I can answer to help improve the quality of the response” – Tweet
Can we get our hands on all the prompts used in fine tuning data or at least major ones. Highly unlikely they will release it. – Tweet
anybody got tips for image generation? i hardly ever use the image features, but lordy, they struggle! even w with clear, verbose prompts using art school vocabulary, specific artist citations, and example attachments, lots of iterations, etc. – Tweet
I’m doing a podcast with the Cursor team. If you have questions / feature requests to discuss (including super-technical topics) let me know! For those not familiar, Cursor is a code editor based on VSCode that adds a lot of powerful features for AI-assisted coding. I’ve been – Tweet
They’re useful in the same way Google or the internet or stack overflow is useful plus one big advantage: your question doesn’t have to take your specific situation and change it to a generic case that someone else has already answered, you can just ask about your exact case! – Tweet
Start with a Jailbreak for Objective Analysis
Using a jailbreak should always be your first step if you want less biased, more objective and fact-based analysis of sensitive or controversial sociopolitical issues. – Tweet
Kinda like Google, small changes in wording can give you quite different results. – Tweet
That you should only use it to get answers you can verify with a separate tool, or somehow evaluate yourself (ej. text quality). – Tweet
dont treat it like a search engine. think about the outcome and output you are trying to achieve. – Tweet
There is a considerable chance to answer is wrong, so likely everything needs to be double checked. – Tweet
I can only speak for the use cases I’ve come across wrt legal work, but don’t use them for tasks where you need a reference. Using them to draft or review documents is fine. Asking for a case law reference is a no-no. And of course, make sure you’re not leaking confidential stuff – Tweet
Master Prompt Engineering for Better Outputs
lol. Nice try. If your business needs to level up I can do certification class. Your employees will get Level 4 Prompt Engineering Classification. DM if interested – Tweet
I like to write no full sentences with error and llm understand. So prompt engineering bullshit – Tweet
Use instructions to change the style of the output that the LLM produces. For Claude you have to make a project first in order to be able to set the instructions. – Tweet
– Understanding how LLM system, ChatGPT or Claude works and responding technically in basic. – Prompting skills. Understanding the difference between effective and ineffective prompting. – Tweet
understand the english language and HOW it’s used (sadly, even english speakers have a hard time w/ correct language use). know grammar and syntax, context and nuance. be clear, succinct, specific when creating prompt. edit, edit, edit before sending prompt. – Tweet
Understand LLMs as Statistical Predictors
Language models cannot generalize the simple formula “A is B” to “B is A.” – Tweet
1) tokenizers/decoding strategies are both incredibly important and invisible to most users. Remember that what you input is not what the model sees exactly, and what you read is not what the model output directly. 2) repeat #1 for the crowd in the back – Tweet
It’s a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It’s just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They – Tweet
They are next word predictors. Everything is downstream from that. – Tweet
The output is encoded in the input, the model is just a statistical decompression engine. This means that they can only ever amplify your mind, they can’t think for you, however they can translate your question into more formal language & that may decompress into something useful – Tweet
Stay Focused on High-Impact Tasks
Try to stay in the high impact zone e.g. through breaking tasks up and don’t expect perfect results at all times – Tweet
Being able to define goals and objectives. – Tweet
Focus loquaciousness to refine results that will otherwise always regress to mean averages. – Tweet
If it doesn’t understand you, ask it to help clarify your question. If you’re not getting the answer you need, break your question into smaller parts. If you don’t know how to break it down, ask it to help you break it down. – Tweet
• You’re interacting with a superposition of all humanity, so defining a specific persona that would be helpful for your task produces better results. •Avoiding assumptions and explaining your goal in the clearest way possible is the key to avoiding running around in circles. – Tweet
Anything coming out of those things can be completely false. Don’t just accept it as truth. – Tweet
Engage Actively to Maximize LLM Utility
that it’s only as useful as how many questions you’re asking it. Any initial understanding beyond that would be an overkill in my opinion – Tweet
It is only an upscaler not a freewin. The more you know the better it works, but compared to a person you can talk with it in shortcuts. The skill is to always reposition it constantly, before it goes off in the wrong direction. You can also work with labels within it’s answers – Tweet
They’re useful/powerful for a wide range of tasks. Their usefulness is highly variable, depending on context & the skill of the user. A user’s existing expertise can be greatly amplified by the system, but novices probably benefit most. Ask them for help on how to use them. – Tweet
You no longer need to learn regex etc, you can just act like you know it at an expert level now, similar with syntax of virtually any language or technology. It is better at writing debugging output for you to find the problem in the code than finding the problem in the code… – Tweet
Communicate Clearly and Specifically
Be specific, clear, and thorough. Same as communicating with humans, but more important. – Tweet
Be super clear with instructions. Funnily enough, we should be doing that with our instructions to our fellow humans, but we don’t! – Tweet
BE SPECIFIC. Every one of my customers asks why a query they make doesn’t return a result at all or a result they desire and it is because of the quality of their query over and over again. Some customers understand this out of the gate, some need some training. – Tweet
Be Knowledgeable to Identify Hallucinations
Britannica’s Great Books of the Western World – Tweet
Hallucinations are a thing and the model doesn’t know if it’s hallicunating or not. That’s why the user using an LLM on any field has to be knowledgeable on that field in order to determine what’s a hallucination. This means you can’t use a LLM reliably to do something you can’t. – Tweet
LLMs don’t have the notion of True or False – Tweet
Today, I stepped a bit beyond my comfort zone. Usually, I prefer micro-managing LLMs when writing code. This time, I was macro-managing.
I needed to create a mock history of the status of a manuscript, e.g. it was submitted on this date. THEN it moved to this state on this date. THEN … etc. I have no idea what the states could be, though. So, I could send it to an LLM, and it would give a different set of states each time. Or I could write a program and lose out on variety.
Me: Write JS code that will randomly pick from one of these states and create records containing random dates and states representing the history of the manuscript up to that point
ChatGPT: Here’s a JavaScript code snippet that randomly picks from the manuscript states and creates records containing random dates and states, representing the history of the manuscript up to that point:
function getRandomDate(start, end) {
return new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime()));
}
function getRandomElement(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
function generateManuscriptHistory() {
const history = [];
let currentState = "Submitted";
let currentDate = getRandomDate(new Date(2020, 0, 1), new Date());
history.push({ date: currentDate, state: currentState });
while (currentState !== "Accepted" && currentState !== "Rejected" && currentState !== "Published") {
let possibleNextStates;
switch (currentState) {
case "Submitted":
possibleNextStates = ["Under Review"];
break;
case "Under Review":
possibleNextStates = ["Review Completed"];
break;
case "Review Completed":
possibleNextStates = ["Accepted", "Rejected", "Major Revision Required", "Minor Revision Required"];
break;
case "Major Revision Required":
case "Minor Revision Required":
possibleNextStates = ["In Revision"];
break;
case "In Revision":
possibleNextStates = ["Resubmitted"];
break;
case "Resubmitted":
possibleNextStates = ["Under Review"];
break;
case "Accepted":
possibleNextStates = ["In Production"];
break;
case "In Production":
possibleNextStates = ["Published"];
break;
default:
possibleNextStates = [];
break;
}
if (possibleNextStates.length === 0) break;
currentState = getRandomElement(possibleNextStates);
currentDate = getRandomDate(currentDate, new Date());
history.push({ date: currentDate, state: currentState });
}
return history;
}
// Example usage:
console.log(generateManuscriptHistory());
The above code implements a state-transition diagram — something I’ve never been good at doing. I have a subconscious bias, and I steer clear of even thinking about these. Now, I have one implemented (and it worked quite well!)
Yet another example that human imagination is the bottleneck to leveraging LLMs.