Last week we shared a number of updates with our community of users, and now we want to share them here: At Mozilla, we work hard to make Firefox the best
AI generated alt-text running locally is actually a fantastic accessibility feature. It’s reliable, it provides a service, it can absolutely be deployed securely.
It’s fine to be critical of technology, it’s not fine to become as irrational about it as the tech bros trying to make a buck.
Not irrational to be concerned for a number of reasons. Even if local and secure AI image processing and LLMs add fairly significant processing costs to a simple task like this. It means higher requirements for the browser, higher energy use and therefore emissions (noting here that AI has blown Microsoft’s climate mitigation plan our of the water even with some accounting tricks).
Additionally, you have to think about the long term changes to behaviours this will generate. A handy tool for when people forget to produce proper accessible documents suddenly becomes the default way of making accessible documents. Consider two situations: a culture that promotes and enforces content providers to consider different types of consumer and how they will experience the content; they know that unless they spend the 1% extra time making it accessibile for all it will exclude certain people. Now compare that to a situation where AI is pitched as an easy way not to think about the peoples experiences: the AI will sort it. Those two situations imply very different outcomes: in one there is care and thought about difference and diversity and in another there isn’t. Disabled people are an after thought. Within those two different scenarios there’s also massively different energy and emissions requirements because its making every user perform AI to get some alt text rather than generate it at source.
Finally, it worth explaining about Alt texts a bit and how people use them because its not just text descriptions of an image (which AI could indeed likely produce). Alt texts should be used to summarise the salient aspects of the image the author wants a reader to take away for it in a conscise way and sometimes that message might be slightly different for Alt Text users. AI can’t do this because it should be about the message the content creator wants to send and ensuring it’s accessible. As ever with these tech fixes for accessibility the lived experience of people with those needs isn’t actually present. Its an assumed need rather than what they are asking for.
Local and secure image recognition is fairly trivial in terms of power consumption, but hey, there’s likely going to be some option to turn it off, just like hardware acceleration for video and image rendering, which uses the same GPU in similar ways. The power consumption argument is not invalid, but the way people deploy it is baffling to me, and is often based on worst-case estimates that are not realistic by design.
To be clear, Apple is building CPUs that can parse these queries in seconds into iPads now, running at a few tens of watts. Each time I boot up Tekken on my 1000W gaming PC for five minutes I’m burning up more power than my share of AI queries for weeks, if not months.
On the second point I absolutely disagree. There is no practical advantage to making accessibility annoying to implement. Accessibility should be structural, mandatory and automatic, not a nice thing people do for you. Eff that.
As for the third part, every alt text I’ve seen deployed is not adding much of value beyond a description of the content. What is measurable and factual is that the coverage of alt-text, even in places where it’s disproportionately popular like Mastodon, is spotty at best and residual at worst. There is no question that automated alt-text is better than no alt-text, and most content has no alt-text.
That is only the tip of the iceberg for ML applied to accessibility, too. You could do active queries, you could have users be able to ask for additional context or clarification, you could have much smoother, automated voice reading of text, including visual description on demand… This tech is powerful in many areas, and this is clearly one. In fact, this is a much better application than search, by a lot. It’s frustrating that search and factual queries, where this stuff is pretty bad at being reliable, are the thing everybody is thinking about.
AI generated alt-text running locally is actually a fantastic accessibility feature. It’s reliable, it provides a service, it can absolutely be deployed securely.
It’s fine to be critical of technology, it’s not fine to become as irrational about it as the tech bros trying to make a buck.
Not irrational to be concerned for a number of reasons. Even if local and secure AI image processing and LLMs add fairly significant processing costs to a simple task like this. It means higher requirements for the browser, higher energy use and therefore emissions (noting here that AI has blown Microsoft’s climate mitigation plan our of the water even with some accounting tricks).
Additionally, you have to think about the long term changes to behaviours this will generate. A handy tool for when people forget to produce proper accessible documents suddenly becomes the default way of making accessible documents. Consider two situations: a culture that promotes and enforces content providers to consider different types of consumer and how they will experience the content; they know that unless they spend the 1% extra time making it accessibile for all it will exclude certain people. Now compare that to a situation where AI is pitched as an easy way not to think about the peoples experiences: the AI will sort it. Those two situations imply very different outcomes: in one there is care and thought about difference and diversity and in another there isn’t. Disabled people are an after thought. Within those two different scenarios there’s also massively different energy and emissions requirements because its making every user perform AI to get some alt text rather than generate it at source.
Finally, it worth explaining about Alt texts a bit and how people use them because its not just text descriptions of an image (which AI could indeed likely produce). Alt texts should be used to summarise the salient aspects of the image the author wants a reader to take away for it in a conscise way and sometimes that message might be slightly different for Alt Text users. AI can’t do this because it should be about the message the content creator wants to send and ensuring it’s accessible. As ever with these tech fixes for accessibility the lived experience of people with those needs isn’t actually present. Its an assumed need rather than what they are asking for.
Local and secure image recognition is fairly trivial in terms of power consumption, but hey, there’s likely going to be some option to turn it off, just like hardware acceleration for video and image rendering, which uses the same GPU in similar ways. The power consumption argument is not invalid, but the way people deploy it is baffling to me, and is often based on worst-case estimates that are not realistic by design.
To be clear, Apple is building CPUs that can parse these queries in seconds into iPads now, running at a few tens of watts. Each time I boot up Tekken on my 1000W gaming PC for five minutes I’m burning up more power than my share of AI queries for weeks, if not months.
On the second point I absolutely disagree. There is no practical advantage to making accessibility annoying to implement. Accessibility should be structural, mandatory and automatic, not a nice thing people do for you. Eff that.
As for the third part, every alt text I’ve seen deployed is not adding much of value beyond a description of the content. What is measurable and factual is that the coverage of alt-text, even in places where it’s disproportionately popular like Mastodon, is spotty at best and residual at worst. There is no question that automated alt-text is better than no alt-text, and most content has no alt-text.
That is only the tip of the iceberg for ML applied to accessibility, too. You could do active queries, you could have users be able to ask for additional context or clarification, you could have much smoother, automated voice reading of text, including visual description on demand… This tech is powerful in many areas, and this is clearly one. In fact, this is a much better application than search, by a lot. It’s frustrating that search and factual queries, where this stuff is pretty bad at being reliable, are the thing everybody is thinking about.