Curated Resource ( ? )

Can LLMs Generate Novel Research Ideas?

Like llm , creativity , ideation , innovation , ai , ethan mollick

Curated: 10/09/2024 from arxiv.org/abs/2409.04109?utm_source=pocket_saves

my notes ( ? )

An arvix (ie not yet peer reviewed) paper investigating whether LLMs "can take the very first step of producing novel, expert-level ideas". They evaluated research idea generation in a "head-to-head comparison between expert NLP researchers and an LLM ideation agent... recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas... we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility".

Problems include "failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult". What's needed: a study where "researchers execute these ideas into full projects".

The conversation on LinkedIn is illuminating, with Mollick claiming the paper means "AI generates better academic research ideas that are more novel and exciting (to other researchers!) than experts in the field, with no significant difference in the idea's feasibility", without mentioning that LLMs "lack ideadiversity when we scale up idea generation, and they cannot currently serve as reliable evaluators".

"Research idea evaluation...: 1). the idea itself, generated in response to our instructions, 2). the writeup which communicates the idea, and 3). the evaluation of the writeup by experts."

"Our research ideation agent has three essential components: paper retrieval, idea generation, and idea ranking". After grabbing many papers from the Semantic Scholar API, the agent scores them.

Linkedin Comments

There's a real conflation between ideation and innovation in the comments, although the paper is clear. Ideation is a very important step, and one that many projects and companies fail at, instead pursuing the first idea they find rather than generating as many ideas as possible before selecting the best. So I see the potential in ideation, best articulated by one comment: "Research links associative thinking to creativity, problem-solving, and richer communication. When AI easily and rapidly generates a broad range of ideas, it provides us with richer, more diverse opportunities to engage in associative thinking".

Also, while ideas are often stillborn in meetings by execs, "No one is vetoing an AI’s brainstorm" - possibly we'll declare it mature tech when that starts happening.

But maybe it's just me, the sceptics in the comments had more interesting things to say:

"a gpt for that can generate original ideas " for this sitcom podcast (lessons learnt),
the observation that the process - "RAG-based ideation agent first looked through a wide array of research articles... for ideas for future research in the best articles... used those as seeds to riff on new ideas in narrowly defined sub-fields... [which] were articulated with a structured output that includes key criteria necessary to get research funded... a human expert reviewed ideas and reranked" - was a lot more rigorous work than "most ideas (particularly in business)" require
"They generated 4,000 ideas to get 200 unique ideas"
"someone made of blood and bone probably did at least piece together several dots of genius for the AI to produce those alleged « novel » ideas"
"The amount of researchers in the test (N=49) and the amount of reviewers (N=79) is excessively small "
"The metrics used ... Novelty Score, Feasibility Score, Effectiveness Score, and Excitement Scores... very subjective ... not a representation of human-capabilities during a research task. A very low reviewer agreement <60% is a very worrying metric"
""out of the 4000 generated seed ideas, there are only 200 non-duplicates." - Hum"

Read the Full Post

The above notes were curated from the full post arxiv.org/abs/2409.04109?utm_source=pocket_saves.

Can LLMs Generate Novel Research Ideas?

my notes ( ? )

Linkedin Comments

Read the Full Post

Related reading

Cookies disclaimer