The website accuses AI startup Anthropic of circumventing anti-scraping rules and protocols.

Freelancer has accused Anthropic, the AI startup that developed the Claude large-scale language model, of scraping data from its website, ignoring the company’s “no-crawl” robots.txt protocol. Meanwhile, iFixit CEO Kyle Wiens said Anthropic ignored the website’s policies that prohibit it from using content to train AI models. Freelancer CEO Matt Barry told The Information that Anthropic’s ClaudeBot is the “most aggressive scraper we’ve seen.” His website reportedly received 3.5 million hits from the company’s crawler within four hours, “probably about five times more than the next-largest AI crawler.” Similarly, Wiens posted on X/Twitter that Anthropic’s bot hit iFixit’s servers 1 million times in 24 hours. “Not only are you taking our content without paying for it, you are tying up our development resources,” he wrote.

In June, Wired accused another AI company, Perplexity, of crawling its website despite the presence of the Robots Exclusion Protocol (robots.txt). A robots.txt file typically contains instructions on which pages a web crawler can and cannot access. Compliance is voluntary, but bad bots often ignore it. After the Wired article was published, TollBit, a startup that connects AI companies with content publishers, reported that Perplexity is not the only one circumventing robots.txt signals. While not naming any companies, Business Insider reported that OpenAI and Anthropic were also found to be ignoring the protocol.

Barry said Freelancer initially tried to deny the bot’s access requests, but eventually had to block Anthropik’s crawlers altogether. “This is nasty scraping,” he said. [which] “It slows down the site for everyone who interacts with it, which ultimately impacts revenue,” he added. As for iFixit, the site said it sets alarms for high traffic and that Anthropic’s activity woke staff up at 3 a.m. The company’s crawlers stopped scraping iFixit after it added a line to its robots.txt file that specifically banned Anthropic bots.

The AI startup told The Information that it respects robots.txt and that its crawlers “honored iFixit’s signals when they implemented it.” It also said, “How quickly [it crawls] The agency believes the “same domain” is being used and is currently investigating the incident.

AI companies use crawlers to collect content from websites and use it to train their generative AI techniques. As a result, they have been accused of copyright infringement by publishers and have been the target of multiple lawsuits. Companies such as OpenAI are signing deals with publishers and websites to prevent further lawsuits. So far, OpenAI’s content partners include News Corp, Vox Media, Financial Times, Reddit, and others. iFixit’s Wiens also seems open to striking deals for articles on the how-to-repair website, saying in a tweet to Anthropic that he is open to discussing licensing the content for commercial use.

If any of these requests lead you to our Terms of Use, you will be informed that the use of our content is expressly prohibited. But don’t ask me, ask Claude.

If you would like to discuss licensing any of our content for commercial use, please contact us here. pic.twitter.com/CAkOQDnLjD

— Kyle Wiens (@kwiens) July 24, 2024

Source link

What's Hot

California DMV uses Avalanche (AVAX)

University of Limerick Researchers Unveil Robotic Solution for Floating Wind Turbine Maintenance

New York startup sells used Pelotons, a pandemic hit

The website accuses AI startup Anthropic of circumventing anti-scraping rules and protocols.

Outsourcing emotions: The horror of Google’s “Dear Sydney” AI ads

Meta reports second quarter results with ad sales and AI spending as top priorities

AI spending in focus as big tech companies enter ‘make it or break it’ week

While AI avatars may soon be attending meetings for us, it certainly feels like a slippery slope to an AI future that nobody wants.

Friends: Your new digital companion in the age of AI

Microsoft calls for new legislation on deepfake scams and AI-generated sexual abuse images

Outsourcing emotions: The horror of Google’s “Dear Sydney” AI ads

Meta reports second quarter results with ad sales and AI spending as top priorities

AI spending in focus as big tech companies enter ‘make it or break it’ week

While AI avatars may soon be attending meetings for us, it certainly feels like a slippery slope to an AI future that nobody wants.

Our Picks

Innovation in Action: Six BLUE KNIGHT™ Resident Quickfire Challenge Winners Shape the Future of Health

Immunon surges with Phase 2 data for ovarian cancer immunotherapy | Biotechnology | The Pharmaletter

A holistic approach to biotech manufacturing

What's Hot

The website accuses AI startup Anthropic of circumventing anti-scraping rules and protocols.

Related Posts

Subscribe to Updates