

Discover more from Living With Evidence
Book rating systems are so common. The 5-star types are everywhere. Simples, right?
Oh, if only! As a certified evidence fiend, once there’s the tantalizing possibility of systematization – with numbers, even! – I tumble down that rabbit hole for days, then tumble some more. Once I decided to review books etc for my Grief Collection blog, this got hellish complicated fast. Please send help!
Even 5-star ratings are super complex. Take for example the ones with half-stars, like LibraryThing. That’s actually a 10-point rating, which is how you get the half-stars (0.5, 1, 1.5, 2, etc). And when LibraryThing gives the book a combined rating with the “5-stars” symbols, there’s an algorithm calculating that – as happens at Amazon for everything, too.
This is a serious business, given how much online ratings must be impacting book visibility and purchase. Amazon’s website says their algorithm incorporates factors such as how recent the review was, if the person bought the book on Amazon, and what level of trustworthiness they are calculating for that review. (Confession: I’ve worked on those kinds of secret sauce algorithms in the past, though never commercially.)
LibraryThing’s blog tells you their algorithm is “intentionally obscure,” but also tells you to click at least 6 if you want to drive a book’s rating up (or 5 or less to sink it). Hmmm…. Doesn’t sound like the behavior you want to encourage, does it? All this is a reminder that online recommendation systems are riddled with biases that aren’t obvious, even when the number of ratings accumulating is very large.
Which brings us to the other Amazon book-rating behemoth: Goodreads. Here, it’s a straight-up 5 stars, and what they call averaging. Cue my cartoon about this word!
Yeah, well. I guess that joke is pretty average! (Here’s the source explainer post.)
Goodreads doesn’t say what they mean by averaging, so I did some calculations on an example. As you do. It’s a mean, taking into account the difference in the number of stars per rating (weighted).
But what if you want to rate something 0 stars – or go even lower? I’m definitely not going to give everything a star. I’m not rating the literary quality of resources, I’m thinking about potential benefits and harms for readers who are grieving. When I get into guides to grief, I’ll be rating their approach to evidence, too. At some point, I’m pretty sure I’ll include grief resources that I will want to plant a red flag on.
By the time you consider things like readability and access as well, this process is exceedingly complex. When Richard Redding and colleagues set about a similar task, they ended up with 19 measures to evaluate, combining into 5 sub-scales. I don’t dare count up all the issues I have plonked in my notes to consider so far, because it must be well north of 19!
John Norcross had an interesting system for getting psychiatrists to rate self-help books. He gave them 5 stars and a dagger. Brutal! He aggregated their stars and daggers into an overall 5-point scale, going from extremely good to extremely bad. Only the top 2 levels reached “worth reading” status.
I am using my overall personal rating as a tagging system for the blog. And I need words to do that – another work in progress. I’m beginning with 5-point rating as well, but it’s from 0 to 4 stars.
Today, I’ve posted the first review, and I’m hoping I get feedback that will help me develop this system. I started with a memoir, because I’m too early in my journey into the evidence to tackle grief advice books. It’s a short, iconic one: C.S. Lewis’ A Grief Observed, published in 1961. And here’s my fledgling 5-point recommendation scale:
0 star = “red flag”
★ = not recommended
★ ★ = recommended, with limits
★ ★ ★ = recommended, above average
★ ★ ★ ★ = highly recommended
I’d expected to be highly recommending C.S. Lewis’ book. It didn’t work out that way, though. Even though my process isn’t firm enough yet to put in writing, I hope it will be clear why it lost some rating points: I gave it 2 stars. The review is here.
I’ve included the Goodreads ratings for the book, too, in a way that I hope is more useful than their average number and stars. Years ago, some colleagues and I developed a system for reader ratings of online evidence-based patient information in Germany. I’ve presented the Goodreads data grouped in the way I got used to analyzing the results back then. (There’s an abstract about that process, in German.)
I definitely want to post something that scores “highly recommended” next!
Meanwhile, evidence that caught my eye this week follows below.
I hope you have a good week!
Hilda
I’ve been tracking the development of Covid vaccines since early in the pandemic. I’ll keep doing intensive blog posts on next generation vaccines – like intranasal ones or “variant-proof” vaxes. I did my first Mastodon thread on major highlights in results from the last couple of months. There’s good and bad news on intranasal vax development, and lots more besides – including the first retraction of Covid vaccine study results I’ve seen.
As discussed above, new writing from me this week was the first book review at the Grief Collection.
Also on the subject of grief this week: Lucy Selman and colleagues published a mixed methods study on inequities in access to bereavement support in the UK. They summed up their findings with this quote: “Sadly I think we are sort of still quite white, middle-class really” – although if you’re from a sexual minority group, you could still miss out. It seems to have worsened somewhat in the pandemic.
Before we leave the subject of society’s biases, Jessica Spence and colleague’s new meta-analysis tried to unpick whether or not the bias against hiring people who don’t have a standard accent could be because of high communication jobs. That’s not enough to account for it: “Moreover, the degree of accent bias was associated with perceptions of the candidates’ social status, and accent bias was particularly pronounced among female candidates and for candidates who spoke in foreign (as compared with regional) accents.”
Monica Logan and colleagues did a systematic review of priority-setting exercises internationally for dementia research. They found 10, across 4 continents – none from Africa or Latin America. Only one met all their best practice criteria. There were some priorities in common, but there were differences related to specific local context. For example, awareness and education was a higher priority in some parts of the world than others. I wonder how closely the research that’s actually done, or publicly funded, maps to those priorities? And whether the research that turns out to be the most helpful did, for that matter.
I’ll cap off this week with a meta-meta trio of excellent systematic reviews with results that feel simultaneously deflating and motivating to me. It’s a suite of systematic reviews of meta-research studies from the international Evidence-Based Research group, and a new one drew my attention to the previous couple. In the new one, Jane Andreasen included 21 meta-research studies on how often systematic reviews are used to justify new clinical trials. It’s still not often enough to make a serious dent in unnecessary trials in most clinical areas, but it’s hard to get more specific than that. The second had similarly discouraging conclusions on how often systematic reviews are used to inform the design of a new trial. The third one concluded that only about 30% of trials and other clinical studies contextualize their results using systematic reviews. Onwards and upwards, eh?!