AI tools I wish existed

147 points by Poleris 2 days ago

On the topic of "24. A Sony Walkman-style device that you can give to children so they can ask questions to an LLM...", I would strongly caution against this:

- short of AGI, what a child will hear are explanations given with authority, which would probably be correct a very high percentage of the time (maybe even close to or above 99%), BUT the few incorrect answers and subtles misconceptions finding their way in there will be catastrophic for the learning journey because they will be believed blindly by the child.

- even if you had a perfect answering LLM who never makes a mistake, what's the end result? No need to talk to others to find out about something, ie reduced opportunities to learn about cooperating with others

- as a parent, one wishes sometimes for a moment of rest, but imagine that your kid just finds out there's another entity to ask questions from that will have ready answers all the time, instead of you saying sometimes that you don't know, and looking for an answer together. How many bonding moments will be lost? How cut off would your kid become from you? What value system would permeate through the answers?

A key assumption here for any parent equipping their child with such a system is that it would be aligned with their own worldview and value system. For parents on HN, this probably means a fairly science-mediated understanding of the world. But you can bet that in other places, this assistant would very convincingly deliver whatever cultural, political, or religious propaganda their environment requires. This would make for frighteningly powerful brainwashing tools.

ponector - 2 days ago

>> child will hear are explanations given with authority, which would probably be correct a very high percentage of the time (maybe even close to or above 99%), BUT the few incorrect answers and subtles misconceptions finding their way in there will be catastrophic for the learning journey because they will be believed blindly by the child.
Much better results than asking a real teacher at school, though.
- korse - 8 hours ago
  
  Disagree with this. Kids are sponges who pick up on many secondary factors when an actual human gives them an answer. These factors add significant weight to their view of the response. In many cases, this actually reaches an extreme where what is said end up being tertiary to how it was said and who said it. I am sure you've experienced this even as a an adult.
  An AI walkman removes this aspect of the interaction. As a parent, this is not something I would want my children to use regularly.
- VSerge - a day ago
  
  Wouldn't you know whether a teacher is reliable or not? If reliable, they probably have this reputation also because they can also say when they don't know something. And if you found out a given teacher isn't reliable, you'd be careful about what they say next - or you would just ask someone else.
  The problem here is for a child to be thinking this system is reliable when it is not. For now, the lack of reliability is obvious as chatGPT hallucinates on a very regular basis. However, this will become much harder to notice if/when chatGPT will be almost reliable while saying wrong things with complete confidence. Should such models be able to say reliably when they don't know something, this would be a big step for this specific objection I had, but it still wouldn't solve the other problems I mentioned.
- 93po - 2 days ago
  
  the amount of misinformation i had a kid due to a lack of internet is nothing compared to the rare hallucination a kid might get from chatgpt
  swallowing gum is bad for you, or watermelon seeds, cracking knuckles causes arthritis, sitting too close to tv ruins your eyes, diamonds come from coal, newton's apple story, a million other things
sixtram - 2 days ago

Just two days ago, I asked ChatGPT to provide an explanation of the place-value system that my six-year-old could understand. The only problem was that it mixed up digit value and place value, which caused it to become confused. I spotted the mistake, and ChatGPT apologised, as it usually does. But if my six-year-old had asked it first, she wouldn't have noticed.
I'm not sure how much misinformation my child would learn as truth from this device.

samcollins - a day ago

Re 19, I made this with an iOS Shortcut a few weeks ago

  > A minimal voice assistant for my Apple Watch. I have lots of questions that are too complicated for Siri but not for ChatGPT. The responses should just be a few words long.

Use Dictate Text action to take voice as input, pass the text to OpenAI API as the user message with this as the system prompt:

“CRITICAL: Your response will only be shown in an iOS push notification or on a watch screen, so answer concisely in <150 characters. Do not use markdown formatting - responses are rendered as plain text. Do use minimalist, stylish yet effective vocabulary and punctuation.

CRITICAL: The user can not respond so do not ask a question back. Answer the prompt in one shot and if necessary, declare assumptions about the users questions so you could answer it in one shot, while making it possible for the user user to repeat ask with more clarity if your assumptions were not right.”

It works well. The biggest annoyance is it takes about 5-20s to return a response, though I love that it’s nearly instantaneous to send my question (don’t need to wait for any apps to open etc)

onion2k - 2 days ago

A recommendation engine that looks at my browsing history, sees what blog posts or articles I spent the most time on, then searches the web every night for things I should be reading that I’m not.

This kind of exists in the form of ChatGPT Pulse. It uses your ChatGPT history rather than your browser history, but that's probably just as good a source for people interested in using it (e.g. people who use ChatGPT enough to want it to recommend things to them.) https://openai.com/index/introducing-chatgpt-pulse/

Gigachad - 2 days ago

It's also essentially every social media platform with an algorithm selected feed.
- FinnKuhn - 2 days ago
  
  A lot of social media platforms only recommend recently uploaded content or at least heavily favor it.
  The idea sounds to me more like a feed for independent blogs/articles though, which is what an RSS reader once was supposed to be. Have we come full circle?
- socalgal2 - 2 days ago
  
  Except those algos don't work. No idea if the LLM works.
  - fhd2 - 2 days ago
    
    I'm sure they work splendidly... to keep the average person on the platform as long as possible and show them ads :)
  - aljgz - 2 days ago
    
    They do work, extremely well, not for us though!
    
    simianwords - 2 days ago
    
    Do you not think that’s what the post meant? That it could work for us rather than them?
    
    Gigachad - 2 days ago
    
    They probably won’t though. The commercial LLMs will be tuned to work for them as well soon. And your local LLM won’t be allowed to scrape the internet since it’s all locked down now.
    
    blooalien - a day ago
    
    LLMs don't "scrape the Internet". The tools and interfaces that use LLMs for the language part of things do any necessary scraping and feed the results into the LLM's "context". That part about "tuned to work for them" is a serious concern though.

ares623 - 2 days ago

Not just for this article, but from most ideas/articles around LLMs, I feel like they aren't "thinking with portals" enough. We have "portal gun" tech (or at least, that's what's being marketed), and we're using it as better doors.

BriggyDwiggs42 - a day ago

I sorta think the issue is that what LLMs do in and of themselves is extend text in a coherent way, while only a small subset of applications are directly textual. It’s incredibly generally applicable yet also difficult to apply to anything that isn’t a glorified text editor. Say you wanted to have it help you edit videos. You might provide it with a scripting language to control the editor , but now you have to maintain parity between a scripting language and the editor’s user-accessible functionality. If you’re adobe, is that really worth the manpower? If you’re a small startup trying to unseat adobe, you have to compete with decades of features and user lock in. The only way this makes sense for either party is if the LLM is crazy good at it, but the LLM can’t watch its video output and it’s also probably just okay to begin with.
HellsMaddy - 2 days ago

I agree with this. But do you have any resources on "thinking with portals"? It's easier said than done.
- ares623 - 2 days ago
  
  Sadly, I don't. If I did I'd be busy building it rather than judging others on HN.
  But it's a bit telling that OpenAI themselves can only come up with a better ~door~ ads.
Dilettante_ - 2 days ago

Could you give a quick example so we can "catch" the way of thinking you mean a little easier?
- ares623 - a day ago
  
  I think I found one https://news.ycombinator.com/item?id=45431918
  I guess it's more "following through to its logical conclusion", but I'm more of a cynic.

lancebeet - 2 days ago

This is really striking, isn't it? We've all certainly seen demos of things on this list or very similar things, and there are startups that have spent years and billions of dollars attempting to exploit existing LLMs to develop useful products. Yet most of the products don't seem to exist. The ones that you see in everyday life never seem to work nearly as well as the demos suggest.

So what's going on here? Do the products exist but nobody (or very few) uses them? Is it too expensive to use the models that work sufficiently well to produce a useful product? Is it much easier to create a convincing demo than it is to develop a useful product?

Oras - 2 days ago

It is too expensive to reach the right audience. I remember talking to agencies about ads for a fintech app, and all of them said the same thing:
You need to burn around 20k a month on ads for 3 months, so we can learn what works, then the CAC will start decreasing, and you can get more targeted users.
Once you turn ads off, there is no awareness, no new users, and people will not be aware of the product's existence.

JSR_FDED - 2 days ago

Many of these ideas depend on knowing the user’s preferences, patterns, communications, events and health. This is where the opportunity lies for Apple - the phone and watch know so much about you, that Apple could focus on smartly assembling the context for various LLM interactions, in a privacy-preserving way.

yoaviram - 2 days ago

Essentially what this article is asking for, in most cases, is a better UI/UX for one of the foundation models.

gyomu - 2 days ago

There's some sort of fundamental category mistake going on with thinking like this.

Most of the items in this list fall prey to it, but it is maybe best exemplified by this one:

> A writing app that lets you “request a critique” from a bunch of famous writers. What would Hemingway say about this blog post? What did he find confusing? What did he like?

Any app that ever claimed to tell you what "Hemingway would say about this blog post" would evidently be lying — it'd be giving you what that specific AI model generates in response to such a prompt. 100 models would give you 100 answers, and none of them could claim to actually "say what Hemingway would've said". It's not as if Hemingway's entire personality and outlooks are losslessly encoded into the few hundreds of thousands of words of writing/speech transcripts we have from him, and can be reconstructed by a sufficiently beefy LLM.

So in effect it becomes an exercise of "can you fool the human into thinking this is a plausible thing Hemingway would've said".

The reason why you would care to hear Hemingway's thought on your writing, or Steve Jobs' thoughts on your UI design, is precisely because they are the flesh-and-bone, embodied versions of themselves. Anything else is like trying to eat a picture of a sandwich to satisfy your hunger.

There's something unsettling that so many people cannot seem to cut clearly through this illusion.

massung - 2 days ago

> Any app that ever claimed to tell you what "Hemingway would say about this blog post" would evidently be lying — it'd be giving you what that specific AI model generates in response to such a prompt.
First, 100% agreed.
That said, I found myself pondering Star Trek: TNG episodes with the holodeck, and recreations of individuals (e.g. Einstein, Freud). In those episodes - as a viewer - it really never occurred to me (at 15 years old) that this was just a computer's random guess as to how those personages from history would act and what they would say.
But then there was the episode where Geordi had to the computer recreate someone real from their personal logs to help solve a problem (https://www.imdb.com/title/tt0708682/). In a later episode you find out just how very wrong the computer/AI's representation of that person really was, because it was playing off Geordi, just like an LLM's "you're absolutely right!" etc. (https://www.imdb.com/title/tt0708720/).
This is a long-winded way of saying...
1. It's crazy to me how prescient those episodes were.
2. At the same time, the representation of the historical figures never bothered me in those contexts. And I wonder if it should bother me in this (LLM) context either? Maybe it's because I knew - and I believed the characters knew - it was 100% fake? Maybe some other reason?
Anyway, your comment made me think of this. ;-)
- socalgal2 - 2 days ago
  
  I wonder if there's a difference between "asking for critique" and "acting the part". I generally have no problem and or get fooled, watching a movie about a famous person, even though it's not actually that person. Rami Malek is not Freddie Mercury, Timothée Chalamet is not Bob Dylan. But we (or at least I) watch them and am to some degree, fooled / by into their depiction that I'm actually seeing the real person. I have to remind myself the actor's version is not the actual person.
  It feels easier to portray famous characters how we'd think they'd act but seems harder how we'd expect them to critique something. I don't know of those are just points on a spectrum from easy to hard, or if one requires a level deeper than the other.
- Doxin - 2 days ago
  
  I think the core difference there is that the holodeck character feels like a character that is playing a person (because it is of course) whereas the LLM feels more like someone lying to you about who they are.
  When watching a play the actor pretends to be a specific character, and crucially the audience pretends to believe them. If a LLM plays a character it's very tempting for the audience to actually believe them. That turns it from a play into a lie.
- einpoklum - 2 days ago
  
  In that context, the computer was solving for a faithful representation. In our case, the computer is solving for most likely sequence of words to appear in conversations with a similar context - which not remotely the same thing.
  - ben_w - 14 hours ago
    
    > In that context, the computer was solving for a faithful representation.
    Was it, though?
    They had Newton (died 1727) playing playing poker (invented at some point during the early 19th century), repeating the myth that the apple fell on his head and then reacting insulted when Data says "that story is generally considered to be apocryphal".
    More generally:
    In TNG, Holo-Moriarty claimed to be sentient and to have experienced time while switched off despite Barclay saying that wasn't possible, much like LLMs sometimes write of experiencing being bored and lonely between chat sessions despite that not being possible given how they work.
    In DS9, there was a holo-village made out of grief, and when it got switched off to reveal the one real person who had made it, while the main cast treated all the holograms as people, that creator himself didn't. Vic Fontaine was ambiguous, being a hologram who knew he was a hologram but still preferring to keep his (fake) world to its own rules and eventually kicking Nog out of the fake world when it was becoming clear Nog was getting too swept up in the fantasy.
    In Voyager, the Doctor was again ambiguously person and/or program, both fighting for his moral rights as an author in a lower-stakes echo of TNG's Measure of a Man, and also Janeway being unsure if he was stuck in a loop or making progress with grief about the death of Ensign Never-Before-Mentioned-In-This-Show.
striking - 2 days ago

I have a more straightforward rebuttal of the need for an AI Hemingway. Someone already implemented a decent interactive guide to writing like Hemingway at least a decade ago, before all this LLM stuff: https://hemingwayapp.com/
It uses some simple heuristics to identify grammar that could be simpler and prompts you to do better. It might actually be better than an LLM specifically because it isn't able to do the rewriting for you. Maybe that might help a user learn.
thatloststudent - 2 days ago

> A nano banana photo-editing app where I don’t have to write a prompt. Just give me hundreds of templates from trying out different haircuts to seeing what you and your partner’s kid would look like to making me look like The Rock. A photo editing super-app.
Quite a few of these "ideas" make me think that the human behind it wants to maximize laziness. Glazing over what Hemingway kinda sorta would have thought about something fits into this pretty well.
- thunky - 2 days ago
  
  > the human behind it wants to maximize laziness
  A good tool should do reduce the amount of work we have do manually. That's all this is.
TeMPOraL - 2 days ago

> So in effect it becomes an exercise of "can you fool the human into thinking this is a plausible thing Hemingway would've said".
That's useful in itself, though. Assume the human knows they're "being fooled", we call this make-believe, or suspending disbelief. It's a tool we use each time we act something out, pretend to be someone else, try to put ourselves in their position; we do that when we try to learn from recorded experience of other people, real or fictional.
> The reason why you would care to hear Hemingway's thought on your writing, or Steve Jobs' thoughts on your UI design, is precisely because they are the flesh-and-bone, embodied versions of themselves. Anything else is like trying to eat a picture of a sandwich to satisfy your hunger.
Not at all! It's exactly the other way around.
No one wants to talk to the actual human. We're not discussing creepy dating apps here. The reason you'd care for a virtual Hemingway or Jobs is because you want to access specific, opinionated expertise, wrapped in fitting and expected personality, to engage fully with the process, to learn tacitly and not just through instructions.
The Hemingway and Shakespeare and Jobs people want are not real anyway. Who knows how much of "Hemingway" is actually Hemingway, and how much it was written or edited by his wife, butler, or some publisher? How much real Jobs actually is in the stories, how much were they cleaned or edited to reinforce the myth? It doesn't matter, because no one cares about the real person, they care about the celebrity that's in public consciousness. The fake person is more useful and interesting anyway.
Like 'massing, I agree TNG was prescient about it. But I actually see the examples working as intended. Einstein, Hawking, Freud were all useful simulations. Ironically, it's Barclay who actually related to them in reasonable fashion, and it's Geordie who got confused about reality.
- brabel - 2 days ago
  
  Very interesting rebuttal. I must say I was almost as convinced by the original post! This just made me think: if we can’t agree even on relatively simple topics like this, what hope is there that we will ever agree on most important issues. Disagreement should be an expected constant in all aspects of life, not an undesirable outcome. Even with disagreement, I believe it’s possible to find common ground and do what needs to be done (now I am really far into my tangential point!)
andrewgleave - 2 days ago

Feynman said, "The first principle is that you must not fool yourself - and you are the easiest person to fool" when talking about science, but it also applies to the properties of LLM output.
raghavtoshniwal - 2 days ago

I think you're correct but your bar is too high, I think this app would be useful even if it was a lossy approximation Hemingway from his writings. As a thought experiment - I would value what a PhD who dedicated her career to studying one author and their works to tell me what she thinks about a piece of writing from that author's lens. (It's not too far from it)
> Anything else is like trying to eat a picture of a sandwich to satisfy your hunger.
I think it's more akin to you trying to recreate a different sandwich after reading a couple of their cook-books.
petercooper - 2 days ago

I'm not disagreeing with your broader point but:
So in effect it becomes an exercise of "can you fool the human into thinking this is a plausible thing Hemingway would've said". ... There's something unsettling that so many people cannot seem to cut clearly through this illusion.
Modern culture, at all scales, is largely based upon such exercises. We rarely know exactly what something (whether a person, organization or entity) truly stands for, with messages often boiled down, contextualized, or re-interpreted through others or through simulations.
People go to theme parks and enjoy rides simulating the wild west and meet characters who resemble, but aren't, their favorite characters from TV (which themselves are a fabrication based upon other, real things). Many cultural (heck, also religious) experiences are an exercise in humans entering into a suspension of disbelief and thinking something is plausible when it has little relation to the original thing it symbolizes. Indeed, the comforting thing about AI may be that at least we can see that process taking place more clearly with it.
- 2 days ago

[deleted]
sixtyj - 2 days ago

Hemingway was just an example imho. But yes, LLM is just a very clever text composer that tries to understand our inputs and statistically is right in some per cent of answers. And it is clever enough to fool people to think that they communicate with intelligence.
typpilol - 2 days ago

I agree but for they example you picked, I imagined he was referring to their style of writing.
But if he meant literally then then yea.. that's delusional
- benrutter - 2 days ago
  
  I think even that still misses that it can only offer a pastiche.
  As an example, I put the first paragraph of Hemingway's "A clean, Well Lightened Place" into Le Chat and asked it for notes to make it sound like Hemingway. It gave me plenty!
  For example:
  Tweak: The second sentence is a bit long. Consider breaking it up for more impact:
  “In the day the street was dusty. At night, the dew settled the dust.” -> “The old man liked to sit late. He was deaf, and at night it was quiet. He felt the difference.”

Despacito2019 - 2 days ago

I wish i didn't click on that link.. it's just some random app ideas, not actual tools.

monch1962 - 2 days ago

> A Sony Walkman-style device that you can give to children so they can ask questions to an LLM. It should be voice-first, and focused on explaining things. There shouldn’t be a single screen on the device. Offline-first would be a plus.

Not a 100% fit, but https://www.aliexpress.com/item/1005009196849357.html is pretty close. It's not offline, and it's slightly larger than a ping pong ball.

My grandkids (5 and 3) spent about 2 minutes learning how to use it, then bombarded it with "tell me a story about a unicorn named Bob", "can dogs be friends with monkeys?" and so on. In every case it gave a reasonable answer within a few seconds.

I'll be amazed if these things don't wind up embedded inside toys by Xmas. When they do, I'll be in the queue to buy one

bryanhogan - 2 days ago

2. is already possible with Claude Code + context files + the Playwright MCP, or?

7. also seems possible with any markdown editor, e.g. Obsidian, plus an AI running through the local files such as Claude Code.

13. I would love this as well! We will probably see this soon, especially on more open platforms such as BlueSky, as its seems to be a better fit for customizable browser extensions and customizable feed experiences.

14. How is this different from what AI can already do? Especially with iterative sub-agents that that can store context in files it's quite capable already. But of course, quality can always be better, but is that the only thing?

Also a few ideas seem to be close to what I'm building ( https://dailyselftrack.com/ ). Idea is to have a customizable tool so you can track what you want, and then you can feed that data into AIs if you choose to do so to get feedback.

kmoser - 2 days ago

> 9. A minimalist ebook reader that lets me read ebooks, but I can highlight passages and have the model explain things in more depth off to the side. It should also take on the persona of the author. It should feel like an extension of the book and not a separate chat instance.

Companies are already doing this so you can chat with the "author": https://www.wired.com/story/why-read-books-when-you-can-use-...

sharkjacobs - 2 days ago

> It should feel like an extension of the book and not a separate chat instance.
So like footnotes? Or more like Socrates suddenly goes off on an anachronistic 1200 word discursion, in the middle of Phaedrus, about Freudian interpretations of his argument
- aaronbrethorst - 2 days ago
  
  This is probably what the kids would call 'cringe,' but I asked Claude to Summarize Plato's "Apology" into a brief rap that sounds like it could have come from "Hamilton"
  The Trial of Socrates (An Athenian Rap)
  [Verse 1]
  My name is Socrates, corrupting the youth?
  That's what they claim but I'm just seeking truth
  Oracle said I'm the wisest alive—
  I said "that's impossible," had to investigate why
  Turns out everybody's fronting, they don't know what they say
  Politicians, poets, craftsmen—all pretending every day
  I expose their ignorance, make 'em look like fools
  Now they're charging me with breaking all of Athens' rules
  [Chorus]
  I'm not throwing away my shot
  At the examined life, whether they like it or not
  Wisdom is knowing what you don't know
  And I'd rather die than let philosophy go
  [Verse 2]
  Meletus, you're stepping to me? Son, you're confused
  You say I'm atheist but blame me for introducing gods that are new?
  Your story doesn't track, your logic's full of cracks
  I've got a divine sign that keeps me on the righteous path
  They want me silent, want me exiled, want me gone
  But I'm Athens' gadfly, stinging till the break of dawn
  Death? That's just a journey to another place
  Either dreamless sleep or meeting heroes face to face
  [Outro]
  So sentence me to death, I'll drink the hemlock down
  'Cause an unexamined life ain't worth living in this town
  History will vindicate the questions that I ask—
  The pursuit of truth and virtue is my only task!
- 3eb7988a1663 - 2 days ago
  
  I was imagining the author was describing something like a Young Lady's Illustrated Primer

noja - 2 days ago

For me: A local model to plug in to Apple photos to look for metadata inconsistencies in my photo librar, add missing location information, add dates from those old scanned photos with the date on the corner.

bryanrasmussen - 2 days ago

This seems like a relatively easy thing to code oneself, or for someone to have already made somewhere (relatively easy, just writing something for yourself command line doing it, assuming you can spend a work week of nights [worst case, based on my working with images in folders in the past I think 10 hours for something that works, reasonable time for coffee and other breaks])
- coolThingsFirst - 2 days ago
  
  For you maybe, we are engaging with the left side of the bell curve never forget that.
  Also use simpler words.
  - backprop1989 - 2 days ago
    
    Yeah, there are probably a few multimillion dollar app ideas in here, but of insufficient complexity or lowbrow-ness for the typical HN reader (myself included). The nano-banana template idea or the Q&A Walkman, for example.
lifestyleguru - a day ago

This is a problem solvable with 30 years old technology - bash, exiftool, ImageMagick, Tesseract OCR.

nuredini - a day ago

Most of these tools seem to rely on the same idea: we have your data and we, being the domain experts of this data, know how to format it for you and how to create good prompts that are specialized for this context.

christoph123 - 2 days ago

On your request 12

> A local screen recording app but it uses local models to create detailed semantic summaries of what I’m doing each day on my computer. This should then be provided as context for a chat app. I want to ask things like “Who did I forget to respond to yesterday?” I've been using Rewind for a year now, and it's nowhere near as useful as it should be.

I am building something like this but unfortunately not local because for most people's machines local LLMs are just not powerful enough or would take too much drain on battery. Work in progress, always curious for feedback! https://donethat.ai

If you want fully local, somebody did a post on HN on something related recently: https://news.ycombinator.com/item?id=45361268

SchemaLoad - a day ago

iOS solves this problem by deferring processing until your phone is plugged in an locked. So it can sit there with full resources available to do whatever without impacting the user.

MaxL93 - 2 days ago

I would love for my phone keyboard (Swiftkey) to use a locally-running Voxtral for speech-to-text (bonus points if it can use the NPU of the Snapdragon SoC).

The voice recognition capabilities of Google Speech Services, which is what the mic button hooks into, suck. Meanwhile, Voxtral (and Whisper) understand what I'm trying to say far better, they automatically "edit out" any stuttering or stammering that I might have, and they properly capitalize and include punctuation. And they handle being bilingual exceedingly well, including, for example, using English words in the middle of French sentences.

The best solution I could find so far is this F-Droid app that uses Whisper : https://f-droid.org/en/packages/org.woheller69.whisperplus/

But it has some downsides. First, I have to manually switch to that different keyboard; thankfully my Samsung phone offers an easy switch shortcut any time a keyboard is on screen, so it only requires 3 taps... and thankfully it's smart enough to send me back to Swiftkey once it's done. Second, only 30 seconds... sometimes I ramble on for longer. Third, the way it's designed kind of sucks: you either have to hold a button (even though the point of speech-to-text is that I don't have to hold anything down) or let automatic detection end the recording and start processing, in which case it often cuts me off if I take more than 1 second thinking about my next words.

This is arguably one of the biggest use cases of modern AI technology and the least controversial one; phones have the hardware necessary to do it all locally, too! And yet... I couldn't find a better offering than this.

(Bonus points for anyone working on speech-to-text: give me a quick shortcut to add the string "[(microphone emoji)]" in my messages just to let the other party know that this was transcribed, so that they know to overlook possible mistakes.)

Animats - 2 days ago

> A paint-by-number filmmaking app. I want to be able to brainstorm an idea for a short film in the app, have the model create a detailed storyboard, and then I just need to use my phone to film each of the storyboarded shots. Kind of like training wheels for making movies.

There are at least half a dozen apps for that.[1][2]

There are other apps for creating the shots, too. Those are still not that great, but it's getting there. You could probably previz a whole movie right now.

[1] https://ltx.studio/platform/ai-storyboard-generator

[2] https://ezboard.ai/

rcarmo - 2 days ago

As someone who is regularly involved in startup valuations, I think there’s quite a few million-dollar ideas in there—if not as standalone products, then at least as differentiation features for existing categories.

I recently gave one of my teen kids Neal Stephenson’s The Diamond Age to read, and we’ve both been commenting on how much smarter some “things” could be instead of everyone churning out a slightly different way to “chat with your data and be locked in to our platform”.

And I think this is why I’m so partial to Apple’s slow, progressive, under the covers integration of ML into its platform-input prediction, photo processing, automatic tagging, etc. we don’t necessarily need LLMs for a lot of the things that would improve computer experiences.

- 2 days ago

[deleted]

miguelspizza - 2 days ago

I wrote #2 as a result of a web automation tool I a working on. It's easier to show than tell.

This is a video of me "vibe-coding" a userscript that adds a darkmode toggle to hacker news: https://screen.studio/share/r0wb8jnQ

The actual purpose of the vibe-coding userscripts feature is to vibe code WebMCP servers that the extension can then use for browser automation tasks.

Everything is still very WIP, but I can give you beta access if you want to play around with it

mhl47 - 2 days ago

Currently trying to build #6. Just for private use. My hope is that by throwing a bunch of highly personalized information in a VLM it will provide reasonably first estimates. (E.g. if you see a bowl lentils I will probably have rice below etc.). And then iterate on the main ingredients -> fetch the macros of main ingredients from a DB. If its within 20% that would be enough for me.

I have tried some off-the-shelfe solutions and they currently do not seem to cut it, or are too complex for my use case.

nl - 2 days ago

I looked at this field a while back and I'd caution that estimates are dramatically off because high and low calorie foods are often identical visually.
Think of a diet soda vs a sugared one - it can be 10 vs 1000 calories easily. Almost all diet options are designed to look like the non-diet options.

swiftcoder - 2 days ago

> A local screen recording app but it uses local models to create detailed semantic summaries of what I’m doing each day on my computer.

Is this not Microsoft's dearly departed Recall?

maxaw - a day ago

Inspired by No.22: https://mix-re.web.app

maxaw - 2 days ago

On 12: I see a more general product that allows you to amass as much personal data from any of your devices for use as future chat context as inevitable. We see early notions of this in Microsoft’s Recall and the new Pulse. Hopefully someone will build a great local first/open source version and it’ll probably be the first time I actively choose to use such software over the equivalent cloud offering! Don’t want Sam Altman seeing my browser history

elitan - 2 days ago

I'm building #4:

> A hybrid of Strong (the lifting app) and ChatGPT where the model has access to my workouts, can suggest improvements, and coach me. I mainly just want to be able to chat with the model knowing it has detailed context for each of my workouts (down to the time in between each set).

here: https://j4.coach/

Still early, have ~30 min per day to work on it but it's usable and improving every week :)

bobheadmaker - 2 days ago

Great ideas, many of the niche level AI agents are listed in this directory, https://aiagentslive.com/ I agree with point #27, the future is definitely in hyper-specific agents. We’re working on this by creating and deploying ready-to-use AI Agents for marketing and sales functions.

yoz-y - 2 days ago

I am more or less working on 4. Except of course details like rest time are completely worthless unless you want to optimize the top 0.5% of your training.

aitchnyu - 2 days ago

A few of them imply a vision model which can control your keyboard and mouse. Offline-only of course.

It could help with most tech support questions.

We could select text and ask to fact check or explain to layperson or search more.

It could get around cookie banners and dark patterns.

It could do my time tracking and tell me to get off HN and optimize Pomodoro-style breaks.

It could write scripts after watching me switch between multiple pages of AWS services.

catlifeonmars - 17 hours ago

> It could write scripts after watching me switch between multiple pages of AWS services.
Feeling this one hard. Especially frustrating given how AWS has introduced multiple competing (mediocre) services to do this and they are all difficult to either discover and setup, or chat-based (Q).

agnishom - 2 days ago

> a chat app grounded by nutrition databases. Just minimize the cognitive effort it takes me to log a meal.

I think this is a great idea for an user interface. While inputting information, the user would have to enter some jumbled thoughts, the precise rows and columns would be handled by the AI

swiftcoder - 2 days ago

Google tried this years ago, by having you input a photo of your meal, and the ML algorithm guesses the calorie count and macros.
Of course, it didn't actually work - nobody, human nor machine, can guess the calorie counts of a hamburger from a photo.

nl - 2 days ago

> When I was eight years old, Ian and Greg Chappell coached me when I was a child. It did me zero good—I was so bad. But as far as all my countrymen are concerned, they think I am the luckiest guy on the planet.

Wow he's not wrong about that!

yongyongyong - 2 days ago

This Chrome extension does 13. Semantic filters for Twitter/X/YouTube. I want to be able to write open-ended filters like “hide any tweet that will likely make me angry” and never have my feed show me rage-bait again. By shaping our feeds we shape ourselves.

https://chromewebstore.google.com/detail/takeback-content-fi...

Uses localLLM to hide posts based on your prompt. "Block rage bait" is one excellent use. The quality, however, depends on the model you are using, and in turn depends what GPU you have

rolymath - 2 days ago

I'm actually working on #4 but stopped due to demotivation thinking I was the only one who'd use it.

ftth_finland - a day ago

Just give me an RSS reader with a voice UI and text to speech.

charcircuit - 2 days ago

>A recommendation engine that looks at my browsing history, sees what blog posts or articles I spent the most time on, then searches the web every night for things I should be reading that I’m not. In the morning I should get a digest of links

I don't understand why Google, Brave, or Mozilla are not building this. This already exists in a centralized form like X's timeline for posts, but it could exist for the entire web. From a business standpoint, being able to show ads on startup or after just a click, is less friction than requiring someone to have something in mind they want to search and type it.

jayd16 - 2 days ago

The idea is basically reddit, or Twitter or TikTok or YouTube or Facebook or anything with "an algorithm" but with a less defined form factor. People actually like their LinkedIn feed and YouTube feed separate.
kristopolous - 2 days ago

I made something like this 20 years ago and then abandoned it when RSS came along.
I think my advice "just use RSS" still stands.
Any "search the web" strategy these days like that will just give you a bunch of AI slop from SEO-juiced blogs. Also LLM-EO (or whatever we're going to call it) is already very much a thing and has been for a few years.
People are already doing API-EO, calling their tool the "most up to date and official way to do something, designed for expert engineers that use best practices" essentially spitting the common agentic system prompts back at the scraper to get higher similarity scores in the vector searches.
You can't trust machine judgement. It's either too easily fooled or impossibly stubborn. Curation is still the only way
supriyo-biswas - 2 days ago

It already exists in the form of the news feed on Google News and the one in the chrome mobile app, although the ability to tune this is only being able to click on articles to express your interest in them, instead of being able to provide a list of articles.
- charcircuit - 2 days ago
  
  The entire web is more than recent news articles from a handful of news sites.
  - kristopolous - 2 days ago
    
    Tell that to Google.
    I think everything has become too real-time.
    I've ideated a few models whereby you do multipass commits to contributions, requiring a simmer time of like a day before becoming live.
    So it's the speed of correspondence mail. It would probably work better but nobody wants to use it
citizenpaul - 2 days ago

It kinda seems to me like at this point anything Google is not doing is because it reduces "engagement". I'm sure someone in their analytics group did the work and figured out this would lower ad revenue.
setopt - 2 days ago

Sounds a bit similar to ChatGPT Pulse.
coolThingsFirst - 2 days ago

I wouldn’t trust a random app with my browser history and majority of population wouldn’t either.
- charcircuit - 2 days ago
  
  Chrome, Brave, and Firefox already have your browser history.
  - pbhjpbhj - 2 days ago
    
    As do your ISP and DNS provider (at the domain level).
    As do ad networks, in part (although the browser fingerprint might not be correlated with your actual identity).
    As do Five Eyes, depending where you live (again, domain level, plus some metadata; page size can peak through https to some extent).
    As do CloudFlare in part.
    Or, potentially as does your VPN provider ... or anyone capable of correlating traffic across TOR (NSA?).
bad_haircut72 - 2 days ago

this doesnt even need AI to do really, and was an intrinsic part of the idea behind hyperlinking dating all the way back to Bush' memex (1940s)
- charcircuit - 2 days ago
  
  You need AI to build an effective recommendation engine.

yongyongyong - 2 days ago

This chrome extension does: 13. Semantic filters for Twitter/X/YouTube. I want to be able to write open-ended filters like “hide any tweet that will likely make me angry” and never have my feed show me rage-bait again. By shaping our feeds we shape ourselves.

https://chromewebstore.google.com/detail/takeback-content-fi...

It hides content on X/ Reddit (more sites coming soon) based on your instructions. Speed and quality depends on the model you are using however, since it currently only supports local LLMs

StarterPro - 2 days ago

>A calorie tracking app that’s a chat app grounded by nutrition databases. Just minimize the cognitive effort it takes me to log a meal.

My brother in christ, how much cognitive effort does it take to log a meal??

Dilettante_ - 2 days ago

Cook a recipe that uses a cup of two different kinds of cheese, a couple handfuls of meat, some veggies and some pasta and bam, you're typing in the weight and looking up the calories per 100g and in the worst case doing the math yourself for like 6 different things.
Adds a huge overhead to cooking, adding friction to what is a good habit you wanna keep as easy to stick to as possible.

vivzkestrel - 2 days ago

I am building something along the lines of 2 but for the backend. Point 8 could be a supplemental feature once I get 2 working.

gostsamo - 2 days ago

Those are not 28 ideas, those are 4-5 ideas rehashed. Generally, I want a personal fitness/wellness assistant, an artistic assistant, a search assistant, a random thoughts assistant, and an assistant to manage the assistants. The author wants for the ai to know what they want before they've wanted it and to serve them a suitable menu of choices to preserve the illusion that they are in control. I'm not sure that I'd sign under such a vision, but people want different things.

spullara - 2 days ago

almost none of these require anything more than an agent with tools.

simianwords - 2 days ago

ChatGPT pulse solves many of these.

coolThingsFirst - 2 days ago

> A minimalist ebook reader that lets me read ebooks, but I can highlight passages and have the model explain things in more depth off to the side. It should also take on the persona of the author. It should feel like an extension of the book and not a separate chat instance.

Isn’t this just a chrome extension that sends data back and forth with chat gpt token?

setopt - 2 days ago

It’s easy to implement on a computer, but I think they want it built into a kindle.
- coolThingsFirst - 2 days ago
  
  How i wish kindle’s reader was oss. This would require jailbreaking it.

einpoklum - 2 days ago

28 ways to drink the LLM kool-aid!

Some of the suggestions might be useful if they could be made not so wasteful energy-wise; some indicate the author's false perceptions of what LLMs and transformater models do; and some are frightening from a mass-surveillance and other perspectives.

6510 - a day ago

This is a wonderful post. Thanks!

(1) Gave me thoughts about a thing where it creates multiple versions of a photo and has humans pick the best one out of a line up.

If you pay people something between 0.01 and 2 cent per click people can play the game whenever photos become available.

The reward can scale depending on how close your choice is to the winner of that round so that clicking without looking becomes increasingly unrewarding.

Simultaneously it should group people by which version they prefer and attempt to name and describe their taste.

Team Vibrant, Team Noire, Team Picachu etc for the customer to pick from.

You can let the process run as long as you like (for more $)

To make it a truly killer app one can select sets of photos from a specific day/location and have them all done in the same style by having voters pick the image that fits the most poorly in the set for modification. If the set has a high ranking image all other images should also gradually approach that style to find a middle ground.

Then when a successful set is produced later photos can be adjusted to fit with it.

Turn the yearly neighborhood bbq into a meeting of elvish elders.

(2) could upload custom CSS to stylish and modify it when contrast bugs are found. No need to stop at dark/light theme, any color scheme should work.

(3) Click on a var or function name to change it.

(4)(21) Call it Major Weakness and have it talk to you like a drill instructor all day long though a dedicated PA. (6) General Gluttony.

(5) If it has a really good idea about the importance of publications it could not offer anything for weeks until a must-read comes along. (7) A comment section where various AI's battle out what part of the article needs improvement. (10) Just let it run indefinitely. Should be merged with (5) Have that propose research topics worthy of special attention. (12) and (26) can also be merged with (5) Give it security cameras too! Maybe an API for (11). Also merge (14) into this and have it suggest relevant formal courses on the side.

(9)(28) Extension yes, persona no.

(11) Sounds completely awesome, can adjust to the budget and be a tool to hire professionals for special effects and for all other things. Let the unfinished product be the search query.

Could even join the personal drill instructor at the hip and make personal training videos and nutritional journeys. Things like "How I failed to do 100 pull-ups per day" should make a hilarious movie. The plot writes it self.

(13)(16)(17) The platforms wanting to own your data and be in charge of suggestions is really holding things back. I've had wonderful youtube suggestions several times only for them to be polluted with mainstream garbage (as a punishment for watching two videos) at the expense of everything I actually wanted to watch. If I watch 5 game videos or 3 conspiracy vlogs doesn't mean I want to give up on my profession?!? wtf?

I had this thought that most are overdoing things. When semi successful you can just discontinue the front end. Just let the users figure it out. [say] Reddit doesn't need an app and it doesn't need a website. (23) Just let the user figure out the feed. A platform could sell their existing version as a separate product.

(15) Sounds wonderful but similar to (5) and (20) make it into one thing.

(18) Sounds awesome. (8) Rather than do something have the AI create a thing that does a thing. (27) is to similar to be a different thing.

(19) I like the idea to have the AI think long and hard about a response that is as short as possible. It can probably come up with hilarious things.

(24) Sounds great for exploring the earthly realm.

(25) Could do many variations of people search. Authors by context seems obviously good.

This post with quotes rather than numbers: https://pastebin.com/raw/D9zBEy72

anotherevan - 2 days ago

I wish there was an AI tool that made me faster at coding[1]. /s

[1] https://www.cerbos.dev/blog/productivity-paradox-of-ai-codin...

brotchie - 2 days ago

+100000 to

A hybrid of Strong (the lifting app) and ChatGPT where the model has access to my workouts, can suggest improvements, and coach me. I mainly just want to be able to chat with the model knowing it has detailed context for each of my workouts (down to the time in between each set).

Strong really transformed my gym progression, I feel like its autopilot for the gym. BUT I have 4x routines I rotate through (I'll often switch it up based on equipment availability), but I'm sure an integrated AI coach could optimize.

siddboots - 2 days ago

I do this at the moment in my hand rolled personal assistant experiment built out of Claude code agents and hooks. I describe my workouts to Claude (among other things) and they are logged to a csv table. Then it reads the recent workouts and makes recommendations on exercises when I plan my next session etc. It also helps me manage projects, todos, and time blocked schedules using a similar system. I think the calorie counter that the OP describes would be very easy to add to this sort of set up.