AI tools I wish existed

sharif.io

147 points by Poleris 2 days ago


VSerge - 2 days ago

On the topic of "24. A Sony Walkman-style device that you can give to children so they can ask questions to an LLM...", I would strongly caution against this:

- short of AGI, what a child will hear are explanations given with authority, which would probably be correct a very high percentage of the time (maybe even close to or above 99%), BUT the few incorrect answers and subtles misconceptions finding their way in there will be catastrophic for the learning journey because they will be believed blindly by the child.

- even if you had a perfect answering LLM who never makes a mistake, what's the end result? No need to talk to others to find out about something, ie reduced opportunities to learn about cooperating with others

- as a parent, one wishes sometimes for a moment of rest, but imagine that your kid just finds out there's another entity to ask questions from that will have ready answers all the time, instead of you saying sometimes that you don't know, and looking for an answer together. How many bonding moments will be lost? How cut off would your kid become from you? What value system would permeate through the answers?

A key assumption here for any parent equipping their child with such a system is that it would be aligned with their own worldview and value system. For parents on HN, this probably means a fairly science-mediated understanding of the world. But you can bet that in other places, this assistant would very convincingly deliver whatever cultural, political, or religious propaganda their environment requires. This would make for frighteningly powerful brainwashing tools.

samcollins - a day ago

Re 19, I made this with an iOS Shortcut a few weeks ago

  > A minimal voice assistant for my Apple Watch. I have lots of questions that are too complicated for Siri but not for ChatGPT. The responses should just be a few words long.
Use Dictate Text action to take voice as input, pass the text to OpenAI API as the user message with this as the system prompt:

“CRITICAL: Your response will only be shown in an iOS push notification or on a watch screen, so answer concisely in <150 characters. Do not use markdown formatting - responses are rendered as plain text. Do use minimalist, stylish yet effective vocabulary and punctuation.

CRITICAL: The user can not respond so do not ask a question back. Answer the prompt in one shot and if necessary, declare assumptions about the users questions so you could answer it in one shot, while making it possible for the user user to repeat ask with more clarity if your assumptions were not right.”

It works well. The biggest annoyance is it takes about 5-20s to return a response, though I love that it’s nearly instantaneous to send my question (don’t need to wait for any apps to open etc)

onion2k - 2 days ago

A recommendation engine that looks at my browsing history, sees what blog posts or articles I spent the most time on, then searches the web every night for things I should be reading that I’m not.

This kind of exists in the form of ChatGPT Pulse. It uses your ChatGPT history rather than your browser history, but that's probably just as good a source for people interested in using it (e.g. people who use ChatGPT enough to want it to recommend things to them.) https://openai.com/index/introducing-chatgpt-pulse/

ares623 - 2 days ago

Not just for this article, but from most ideas/articles around LLMs, I feel like they aren't "thinking with portals" enough. We have "portal gun" tech (or at least, that's what's being marketed), and we're using it as better doors.

lancebeet - 2 days ago

This is really striking, isn't it? We've all certainly seen demos of things on this list or very similar things, and there are startups that have spent years and billions of dollars attempting to exploit existing LLMs to develop useful products. Yet most of the products don't seem to exist. The ones that you see in everyday life never seem to work nearly as well as the demos suggest.

So what's going on here? Do the products exist but nobody (or very few) uses them? Is it too expensive to use the models that work sufficiently well to produce a useful product? Is it much easier to create a convincing demo than it is to develop a useful product?

JSR_FDED - 2 days ago

Many of these ideas depend on knowing the user’s preferences, patterns, communications, events and health. This is where the opportunity lies for Apple - the phone and watch know so much about you, that Apple could focus on smartly assembling the context for various LLM interactions, in a privacy-preserving way.

yoaviram - 2 days ago

Essentially what this article is asking for, in most cases, is a better UI/UX for one of the foundation models.

gyomu - 2 days ago

There's some sort of fundamental category mistake going on with thinking like this.

Most of the items in this list fall prey to it, but it is maybe best exemplified by this one:

> A writing app that lets you “request a critique” from a bunch of famous writers. What would Hemingway say about this blog post? What did he find confusing? What did he like?

Any app that ever claimed to tell you what "Hemingway would say about this blog post" would evidently be lying — it'd be giving you what that specific AI model generates in response to such a prompt. 100 models would give you 100 answers, and none of them could claim to actually "say what Hemingway would've said". It's not as if Hemingway's entire personality and outlooks are losslessly encoded into the few hundreds of thousands of words of writing/speech transcripts we have from him, and can be reconstructed by a sufficiently beefy LLM.

So in effect it becomes an exercise of "can you fool the human into thinking this is a plausible thing Hemingway would've said".

The reason why you would care to hear Hemingway's thought on your writing, or Steve Jobs' thoughts on your UI design, is precisely because they are the flesh-and-bone, embodied versions of themselves. Anything else is like trying to eat a picture of a sandwich to satisfy your hunger.

There's something unsettling that so many people cannot seem to cut clearly through this illusion.

Despacito2019 - 2 days ago

I wish i didn't click on that link.. it's just some random app ideas, not actual tools.

monch1962 - 2 days ago

> A Sony Walkman-style device that you can give to children so they can ask questions to an LLM. It should be voice-first, and focused on explaining things. There shouldn’t be a single screen on the device. Offline-first would be a plus.

Not a 100% fit, but https://www.aliexpress.com/item/1005009196849357.html is pretty close. It's not offline, and it's slightly larger than a ping pong ball.

My grandkids (5 and 3) spent about 2 minutes learning how to use it, then bombarded it with "tell me a story about a unicorn named Bob", "can dogs be friends with monkeys?" and so on. In every case it gave a reasonable answer within a few seconds.

I'll be amazed if these things don't wind up embedded inside toys by Xmas. When they do, I'll be in the queue to buy one

bryanhogan - 2 days ago

2. is already possible with Claude Code + context files + the Playwright MCP, or?

7. also seems possible with any markdown editor, e.g. Obsidian, plus an AI running through the local files such as Claude Code.

13. I would love this as well! We will probably see this soon, especially on more open platforms such as BlueSky, as its seems to be a better fit for customizable browser extensions and customizable feed experiences.

14. How is this different from what AI can already do? Especially with iterative sub-agents that that can store context in files it's quite capable already. But of course, quality can always be better, but is that the only thing?

Also a few ideas seem to be close to what I'm building ( https://dailyselftrack.com/ ). Idea is to have a customizable tool so you can track what you want, and then you can feed that data into AIs if you choose to do so to get feedback.

kmoser - 2 days ago

> 9. A minimalist ebook reader that lets me read ebooks, but I can highlight passages and have the model explain things in more depth off to the side. It should also take on the persona of the author. It should feel like an extension of the book and not a separate chat instance.

Companies are already doing this so you can chat with the "author": https://www.wired.com/story/why-read-books-when-you-can-use-...

noja - 2 days ago

For me: A local model to plug in to Apple photos to look for metadata inconsistencies in my photo librar, add missing location information, add dates from those old scanned photos with the date on the corner.

nuredini - a day ago

Most of these tools seem to rely on the same idea: we have your data and we, being the domain experts of this data, know how to format it for you and how to create good prompts that are specialized for this context.

christoph123 - 2 days ago

On your request 12

> A local screen recording app but it uses local models to create detailed semantic summaries of what I’m doing each day on my computer. This should then be provided as context for a chat app. I want to ask things like “Who did I forget to respond to yesterday?” I've been using Rewind for a year now, and it's nowhere near as useful as it should be.

I am building something like this but unfortunately not local because for most people's machines local LLMs are just not powerful enough or would take too much drain on battery. Work in progress, always curious for feedback! https://donethat.ai

If you want fully local, somebody did a post on HN on something related recently: https://news.ycombinator.com/item?id=45361268

MaxL93 - 2 days ago

I would love for my phone keyboard (Swiftkey) to use a locally-running Voxtral for speech-to-text (bonus points if it can use the NPU of the Snapdragon SoC).

The voice recognition capabilities of Google Speech Services, which is what the mic button hooks into, suck. Meanwhile, Voxtral (and Whisper) understand what I'm trying to say far better, they automatically "edit out" any stuttering or stammering that I might have, and they properly capitalize and include punctuation. And they handle being bilingual exceedingly well, including, for example, using English words in the middle of French sentences.

The best solution I could find so far is this F-Droid app that uses Whisper : https://f-droid.org/en/packages/org.woheller69.whisperplus/

But it has some downsides. First, I have to manually switch to that different keyboard; thankfully my Samsung phone offers an easy switch shortcut any time a keyboard is on screen, so it only requires 3 taps... and thankfully it's smart enough to send me back to Swiftkey once it's done. Second, only 30 seconds... sometimes I ramble on for longer. Third, the way it's designed kind of sucks: you either have to hold a button (even though the point of speech-to-text is that I don't have to hold anything down) or let automatic detection end the recording and start processing, in which case it often cuts me off if I take more than 1 second thinking about my next words.

This is arguably one of the biggest use cases of modern AI technology and the least controversial one; phones have the hardware necessary to do it all locally, too! And yet... I couldn't find a better offering than this.

(Bonus points for anyone working on speech-to-text: give me a quick shortcut to add the string "[(microphone emoji)]" in my messages just to let the other party know that this was transcribed, so that they know to overlook possible mistakes.)

Animats - 2 days ago

> A paint-by-number filmmaking app. I want to be able to brainstorm an idea for a short film in the app, have the model create a detailed storyboard, and then I just need to use my phone to film each of the storyboarded shots. Kind of like training wheels for making movies.

There are at least half a dozen apps for that.[1][2]

There are other apps for creating the shots, too. Those are still not that great, but it's getting there. You could probably previz a whole movie right now.

[1] https://ltx.studio/platform/ai-storyboard-generator

[2] https://ezboard.ai/

rcarmo - 2 days ago

As someone who is regularly involved in startup valuations, I think there’s quite a few million-dollar ideas in there—if not as standalone products, then at least as differentiation features for existing categories.

I recently gave one of my teen kids Neal Stephenson’s The Diamond Age to read, and we’ve both been commenting on how much smarter some “things” could be instead of everyone churning out a slightly different way to “chat with your data and be locked in to our platform”.

And I think this is why I’m so partial to Apple’s slow, progressive, under the covers integration of ML into its platform-input prediction, photo processing, automatic tagging, etc. we don’t necessarily need LLMs for a lot of the things that would improve computer experiences.

- 2 days ago
[deleted]
miguelspizza - 2 days ago

I wrote #2 as a result of a web automation tool I a working on. It's easier to show than tell.

This is a video of me "vibe-coding" a userscript that adds a darkmode toggle to hacker news: https://screen.studio/share/r0wb8jnQ

The actual purpose of the vibe-coding userscripts feature is to vibe code WebMCP servers that the extension can then use for browser automation tasks.

Everything is still very WIP, but I can give you beta access if you want to play around with it

mhl47 - 2 days ago

Currently trying to build #6. Just for private use. My hope is that by throwing a bunch of highly personalized information in a VLM it will provide reasonably first estimates. (E.g. if you see a bowl lentils I will probably have rice below etc.). And then iterate on the main ingredients -> fetch the macros of main ingredients from a DB. If its within 20% that would be enough for me.

I have tried some off-the-shelfe solutions and they currently do not seem to cut it, or are too complex for my use case.

swiftcoder - 2 days ago

> A local screen recording app but it uses local models to create detailed semantic summaries of what I’m doing each day on my computer.

Is this not Microsoft's dearly departed Recall?

maxaw - a day ago

Inspired by No.22: https://mix-re.web.app

maxaw - 2 days ago

On 12: I see a more general product that allows you to amass as much personal data from any of your devices for use as future chat context as inevitable. We see early notions of this in Microsoft’s Recall and the new Pulse. Hopefully someone will build a great local first/open source version and it’ll probably be the first time I actively choose to use such software over the equivalent cloud offering! Don’t want Sam Altman seeing my browser history

elitan - 2 days ago

I'm building #4:

> A hybrid of Strong (the lifting app) and ChatGPT where the model has access to my workouts, can suggest improvements, and coach me. I mainly just want to be able to chat with the model knowing it has detailed context for each of my workouts (down to the time in between each set).

here: https://j4.coach/

Still early, have ~30 min per day to work on it but it's usable and improving every week :)

bobheadmaker - 2 days ago

Great ideas, many of the niche level AI agents are listed in this directory, https://aiagentslive.com/ I agree with point #27, the future is definitely in hyper-specific agents. We’re working on this by creating and deploying ready-to-use AI Agents for marketing and sales functions.

yoz-y - 2 days ago

I am more or less working on 4. Except of course details like rest time are completely worthless unless you want to optimize the top 0.5% of your training.

aitchnyu - 2 days ago

A few of them imply a vision model which can control your keyboard and mouse. Offline-only of course.

It could help with most tech support questions.

We could select text and ask to fact check or explain to layperson or search more.

It could get around cookie banners and dark patterns.

It could do my time tracking and tell me to get off HN and optimize Pomodoro-style breaks.

It could write scripts after watching me switch between multiple pages of AWS services.

agnishom - 2 days ago

> a chat app grounded by nutrition databases. Just minimize the cognitive effort it takes me to log a meal.

I think this is a great idea for an user interface. While inputting information, the user would have to enter some jumbled thoughts, the precise rows and columns would be handled by the AI

nl - 2 days ago

> When I was eight years old, Ian and Greg Chappell coached me when I was a child. It did me zero good—I was so bad. But as far as all my countrymen are concerned, they think I am the luckiest guy on the planet.

Wow he's not wrong about that!

yongyongyong - 2 days ago

This Chrome extension does 13. Semantic filters for Twitter/X/YouTube. I want to be able to write open-ended filters like “hide any tweet that will likely make me angry” and never have my feed show me rage-bait again. By shaping our feeds we shape ourselves.

https://chromewebstore.google.com/detail/takeback-content-fi...

Uses localLLM to hide posts based on your prompt. "Block rage bait" is one excellent use. The quality, however, depends on the model you are using, and in turn depends what GPU you have

rolymath - 2 days ago

I'm actually working on #4 but stopped due to demotivation thinking I was the only one who'd use it.

ftth_finland - a day ago

Just give me an RSS reader with a voice UI and text to speech.

charcircuit - 2 days ago

>A recommendation engine that looks at my browsing history, sees what blog posts or articles I spent the most time on, then searches the web every night for things I should be reading that I’m not. In the morning I should get a digest of links

I don't understand why Google, Brave, or Mozilla are not building this. This already exists in a centralized form like X's timeline for posts, but it could exist for the entire web. From a business standpoint, being able to show ads on startup or after just a click, is less friction than requiring someone to have something in mind they want to search and type it.

yongyongyong - 2 days ago

This chrome extension does: 13. Semantic filters for Twitter/X/YouTube. I want to be able to write open-ended filters like “hide any tweet that will likely make me angry” and never have my feed show me rage-bait again. By shaping our feeds we shape ourselves.

https://chromewebstore.google.com/detail/takeback-content-fi...

It hides content on X/ Reddit (more sites coming soon) based on your instructions. Speed and quality depends on the model you are using however, since it currently only supports local LLMs

StarterPro - 2 days ago

>A calorie tracking app that’s a chat app grounded by nutrition databases. Just minimize the cognitive effort it takes me to log a meal.

My brother in christ, how much cognitive effort does it take to log a meal??

vivzkestrel - 2 days ago

I am building something along the lines of 2 but for the backend. Point 8 could be a supplemental feature once I get 2 working.

gostsamo - 2 days ago

Those are not 28 ideas, those are 4-5 ideas rehashed. Generally, I want a personal fitness/wellness assistant, an artistic assistant, a search assistant, a random thoughts assistant, and an assistant to manage the assistants. The author wants for the ai to know what they want before they've wanted it and to serve them a suitable menu of choices to preserve the illusion that they are in control. I'm not sure that I'd sign under such a vision, but people want different things.

spullara - 2 days ago

almost none of these require anything more than an agent with tools.

simianwords - 2 days ago

ChatGPT pulse solves many of these.

coolThingsFirst - 2 days ago

> A minimalist ebook reader that lets me read ebooks, but I can highlight passages and have the model explain things in more depth off to the side. It should also take on the persona of the author. It should feel like an extension of the book and not a separate chat instance.

Isn’t this just a chrome extension that sends data back and forth with chat gpt token?

einpoklum - 2 days ago

28 ways to drink the LLM kool-aid!

Some of the suggestions might be useful if they could be made not so wasteful energy-wise; some indicate the author's false perceptions of what LLMs and transformater models do; and some are frightening from a mass-surveillance and other perspectives.

6510 - a day ago

This is a wonderful post. Thanks!

(1) Gave me thoughts about a thing where it creates multiple versions of a photo and has humans pick the best one out of a line up.

If you pay people something between 0.01 and 2 cent per click people can play the game whenever photos become available.

The reward can scale depending on how close your choice is to the winner of that round so that clicking without looking becomes increasingly unrewarding.

Simultaneously it should group people by which version they prefer and attempt to name and describe their taste.

Team Vibrant, Team Noire, Team Picachu etc for the customer to pick from.

You can let the process run as long as you like (for more $)

To make it a truly killer app one can select sets of photos from a specific day/location and have them all done in the same style by having voters pick the image that fits the most poorly in the set for modification. If the set has a high ranking image all other images should also gradually approach that style to find a middle ground.

Then when a successful set is produced later photos can be adjusted to fit with it.

Turn the yearly neighborhood bbq into a meeting of elvish elders.

(2) could upload custom CSS to stylish and modify it when contrast bugs are found. No need to stop at dark/light theme, any color scheme should work.

(3) Click on a var or function name to change it.

(4)(21) Call it Major Weakness and have it talk to you like a drill instructor all day long though a dedicated PA. (6) General Gluttony.

(5) If it has a really good idea about the importance of publications it could not offer anything for weeks until a must-read comes along. (7) A comment section where various AI's battle out what part of the article needs improvement. (10) Just let it run indefinitely. Should be merged with (5) Have that propose research topics worthy of special attention. (12) and (26) can also be merged with (5) Give it security cameras too! Maybe an API for (11). Also merge (14) into this and have it suggest relevant formal courses on the side.

(9)(28) Extension yes, persona no.

(11) Sounds completely awesome, can adjust to the budget and be a tool to hire professionals for special effects and for all other things. Let the unfinished product be the search query.

Could even join the personal drill instructor at the hip and make personal training videos and nutritional journeys. Things like "How I failed to do 100 pull-ups per day" should make a hilarious movie. The plot writes it self.

(13)(16)(17) The platforms wanting to own your data and be in charge of suggestions is really holding things back. I've had wonderful youtube suggestions several times only for them to be polluted with mainstream garbage (as a punishment for watching two videos) at the expense of everything I actually wanted to watch. If I watch 5 game videos or 3 conspiracy vlogs doesn't mean I want to give up on my profession?!? wtf?

I had this thought that most are overdoing things. When semi successful you can just discontinue the front end. Just let the users figure it out. [say] Reddit doesn't need an app and it doesn't need a website. (23) Just let the user figure out the feed. A platform could sell their existing version as a separate product.

(15) Sounds wonderful but similar to (5) and (20) make it into one thing.

(18) Sounds awesome. (8) Rather than do something have the AI create a thing that does a thing. (27) is to similar to be a different thing.

(19) I like the idea to have the AI think long and hard about a response that is as short as possible. It can probably come up with hilarious things.

(24) Sounds great for exploring the earthly realm.

(25) Could do many variations of people search. Authors by context seems obviously good.

This post with quotes rather than numbers: https://pastebin.com/raw/D9zBEy72

anotherevan - 2 days ago

I wish there was an AI tool that made me faster at coding[1]. /s

[1] https://www.cerbos.dev/blog/productivity-paradox-of-ai-codin...

brotchie - 2 days ago

+100000 to

A hybrid of Strong (the lifting app) and ChatGPT where the model has access to my workouts, can suggest improvements, and coach me. I mainly just want to be able to chat with the model knowing it has detailed context for each of my workouts (down to the time in between each set).

Strong really transformed my gym progression, I feel like its autopilot for the gym. BUT I have 4x routines I rotate through (I'll often switch it up based on equipment availability), but I'm sure an integrated AI coach could optimize.