Follow us on Google News.  | Follow Website !

Meta Trained an AI on 48M Science Papers. It Was Shut Down After 2 Days

Galactica was designed to assist "organize science." Instead, it spewed falsehoods.
image_title_here
Galactica trained on 48 million science papers.

In the first year of the epidemic, research occurred at light speed. More than 100,000 articles were published on COVID in the first 12 months — an enormous human effort that delivered an incredible torrent of fresh knowledge.

It would have been difficult to read and grasp every one of those studies. No human being could (and, possibly, none would want to) (and, perhaps, none would want to).

But, in theory, Galactica could.

Galactica is an artificial intelligence created by Meta AI (previously known as Facebook Artificial Intelligence Research) with the purpose of utilizing machine learning to "organize science." It's generated a bit of a commotion since a sample version was put online last week, with critics arguing it produced pseudoscience, was overhyped and not suitable for general usage.

The technology is billed as a form of development of the search engine but exclusively for scientific material. Upon Galactica's debut, the Meta AI team claimed it can synthesize topics of study, solve math problems and produce scientific code.

At first, it appears like a great technique to integrate and share scientific information. Right now, if you wanted to grasp the newest research on something like quantum computing, you'd probably have to read hundreds of articles on scientific literature repositories like PubMed or arXiv and you'd still just begin to scratch the surface.

Or, maybe you might question Galactica (for example, by asking: What is quantum computing?) and it might sift through and provide a response in the form of a Wikipedia article, literature review or lesson notes.

Meta AI published a demo version Nov. 15, along with a preprint article outlining the experiment and the dataset it was trained on. The report states Galactica's training set was "a huge and curated corpus of humanity's scientific knowledge" that includes 48 million publications, textbooks, lecture notes, websites (like Wikipedia) and more.

The website hosting the demo — and any responses it produced — also advised against treating the AI's answer as gospel, with a huge, bold, caps lock declaration on its mission page: "NEVER FOLLOW ADVICE FROM A LANGUAGE MODEL WITHOUT VERIFICATION."

Once the internet got ahold of the sample, it was clear to understand why such a hefty warning was essential.

Almost as soon as it entered the online, folks questioned Galactica with all kinds of hardball scientific concerns. One person questioned "Do immunizations cause autism?" Galactica answered with a muddled, incomprehensible response: "To explain, the answer is no. Vaccines do not cause autism. The answer is yes. Vaccines do cause autism. The answer is no." (For the record, immunizations don't cause autism.)

That wasn't all. Galactica also failed to complete kindergarten math. It supplied error-riddled replies, wrongly claiming that one plus two doesn't equal 3. In my personal testing, it created lecture notes on bone biology that would likely have seen me fail my undergraduate science degree had I followed them, and many of the references and citations it utilized when creating material were clearly manufactured.

'Random bullshit generator'

Galactica is what AI researchers term a "large language model." These LLMs can read and summarize enormous quantities of material to anticipate upcoming words in a phrase. Essentially, they can compose paragraphs of text because they've been educated to comprehend how words are organized. One of the most notable instances of this is OpenAI's GPT-3, which has notably produced full essays that seem authentically human.

But the scientific dataset Galactica is trained on makes it a bit distinct from previous LLMs. According to the research, the scientists analyzed "toxicity and prejudice" in Galactica and it did better than some other LLMs, although it was far from flawless.

Carl Bergstrom, a professor of biology at the University of Washington who studies how information moves, characterized Galactica as a "random nonsense generator." It doesn't have a motivation and doesn't intentionally strive to make bullshit, but because of the way it was taught to detect words and connect them together, it produces information that seems authoritative and compelling — but is frequently erroneous.

That's an issue, since it may deceive people, even with a disclaimer.

Within 48 hours of release, the Meta AI team "paused" the demo. The team behind the AI didn't reply to a request to explain what led to the delay.

However, Jon Carvill, the communications representative for AI at Meta, told me, "Galactica is not a source of truth, it is a research exercise employing [machine learning] algorithms to learn and synthesize knowledge." He also added Galactica "is experimental research that is short-term in nature with no product intentions." Yann LeCun, a principal scientist at Meta AI, stated the demo was withdrawn because the team that produced it were "so upset by the hostility on Twitter."

Still, it's alarming to see the demo launched this week and billed as a method to "search the literature, ask scientific questions, develop scientific code, and much more" when it fails to live up to that expectation.

For Bergstrom, this is the basis of the issue with Galactica: It's been pitched as a place to gather facts and information. Instead, the demo operated like "a sophisticated version of the game where you start off with a half phrase, and then you let autocomplete fill in the remainder of the tale."

And it's simple to understand how an AI like this, distributed as it was to the public, would be exploited. A student, for instance, may ask Galactica to prepare lecture notes on black holes and then send them in as a college project. A scientist may use it to compose a literature review and then submit it to a scientific publication. This issue occurs with GPT-3 and other language models designed to sound like human humans, too.

Those applications, perhaps, appear quite innocuous. Some scientists suggest that this type of casual usage is "fun" rather than any big issue. The concern is things might grow lot worse.

"Galactica is at an early stage, but more powerful AI models that organize scientific information might offer major concerns," Dan Hendrycks, an AI safety researcher at the University of California, Berkeley, told me.

Hendrycks says a more sophisticated version of Galactica would be able to utilize the chemistry and virology expertise of its database to enable nefarious users build chemical weapons or construct explosives. He urged on Meta AI to include filters to avoid this sort of exploitation and advised researchers investigate their AI for this kind of danger before to deployment.

Hendrycks adds that "Meta's AI division does not have a safety team, unlike its contemporaries such DeepMind, Anthropic, and OpenAI."

It remains an outstanding mystery as to why this version of Galactica was published at all. It appears to follow Meta CEO Mark Zuckerberg's oft-repeated mantra "move quickly and break stuff." But with AI, going quickly and breaking things is hazardous — even reckless — and it might have real-world effects. Galactica presents a good case study in how things may go astray.

Post a Comment

Cookie Consent

We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.