When we say “Artificial Intelligence that understands science”, what do we really mean?

Three years ago we started Iris.ai with the goal of making researchers’ lives easier. We’ve launched a suite of tools based on artificial intelligence technology, which sometimes seems to have people a tad confused. One of my goals in life is to take complicated technology and make it understandable to everyone, and I figured this would be the perfect place to try to expand on what we mean when we talk about “an AI that understands science” as I totally agree, that’s quite a fluffy statement.


First, a brief description of our actual tools before we dig into the technology. Our tools are intended for researchers in the early stage of a new project, about to set out on a systematic research mapping or a simpler literature review. The tools are especially suitable when the researcher is mapping out a field that is new to them, or when the nature of the research is interdisciplinary meaning the researcher is not a deep domain expert in every topic incorporated in the work. In short, the moment you’re dealing with unknown unknowns, sitting down with google scholar to try and find those unknown resources are complicated and that’s where our tools come in. We have found that a team of researchers using our AI-powered science tools get a far broader and better overview over the field(s) and can draw superior conclusions compared to teams using traditional keyword query based tools such as Scholar and PubMed.


Then, over to the “how the heck are you doing this and what is this AI thing”. Let’s use the example of our first tool, freely available on our website on the.iris.ai. What we do here is that we train a machine to understand the context of your problem. You write it out in 300-500 words, like you would explain it to a colleague or define it in your PhD intro. Our machine then does the following:

  • Extract the most meaning bearing words from your text. This is using a pretty well known and straightforward method and is not that magical.
  • Enrich this with contextual synonyms. The machine has already read 18 million articles, and we fed it 11 words in a row at a time, showing it the first 5 and last 5, and asking it to guess the word in the middle. Now, initially the guesses were entirely random but after every 11-word combination in 18 million articles, the machine then builds up a pretty good understanding of what words are interchangeable in what contexts. Now it knows that “car” and “vehicle” are synonyms, and will expand your search including a broad set of interdisciplinary synonyms.
  • Enrich with topic words, or hypernyms. If your text talks about sheep, cats and guinea pigs, you may be discussing the topic of animals, mammals, pets or potentially food. The machine identifies these topics through a method called ‘neural topic modeling’ by taking a data set of around five million research papers and clustering them after content. In practice this means that one paper can be part of multiple clusters and then the machine identifies what combination of 10 words from the previous exercise can best be used to describe each group. When you input a problem statement, the machine puts it into all of the relevant topic clusters and grabs the words describing them.
  • We now combine all of the words from the three final steps – keywords, contextual synonyms and topics – and turn them into a “fingerprint” of the document – a list of weighted words. We then use a technique we nicknamed ‘fingerprint matching’ to find other documents with similar fingerprints.


All of the above happens in about 8-15 seconds, pending the length of your problem statement and your internet connection. We’ve found that 15-18 maps in 5-7 hours give you an excellent overview over the key topics and their relevant research papers.


Now, these methods described above can be put to use in a variety of different use cases beyond the Explore tool. We have also shuffled them into a Focus tool, which allows you to ‘fingerprint’ a group of up to 20,000 documents to be able to swiftly iterate it down to a precise reading list without having to read them. We’re also working on some exciting prototypes of the “show us 10 patents that are interesting to you and we’ll alert you every time something similar pops up” kind.


You can try the free version of the Iris.ai Exploration tool at the.iris.ai – and contact my colleague Karita at karita@iris.ai for more information about the Focus tool and other tools under development!


Written by Anita Schjøll Brede, co-founder of Iris.ai and keynote speaker at OEB 2018.

Leave a Reply

Your email address will not be published.