How good is ChatGPT 4? We analyse its answers to four questions...

Since the end of 2022 it has been hard to avoid mentions of ChatGPT online or in person - and the buzz around it shows no sign of abating. This natural language processing tool / chatbot from the American artificial intelligence research company OpenAI has made waves, and it appears to be at last delivering on the promise of AI to offer a rich conversational experience, with a remarkable range of subject matter (even including the ability to generate computer code).

The initial public release of Chat GPT, based on version 3.5 after numerous pre-release internal developments, was on November 30th 2022. Version 4 was then launched on March 14th, 2023, and is presently available only to paid subscribers.

We put Chat GPT 4 to the test with questions designed to test its comprehension skills, knowledge base and relevancy of responses.

As with a lot of new technology that gets heavily hyped, we weren’t sure what to expect at the outset. The outcome was interesting, and the sophistication of the AI’s ability to comprehend complex questions and give relevant responses exceeded our expectations, although we found it is not without its quirks and limitations.

Chat GPT 4.0 typically gave multi-paragraph responses in pseudo-essay form. Its replies were considerably shorter than a full essay as might be required in an examination hall at A-Level or degree level, but longer than the space allowed for the individual portions of an elementary-school examination paper divided into many short questions, each worth 2-4 marks. Nonetheless, the comparison with academic examinations questions seems relevant, because its answers were formulaic and typically structured in a style that might be expected of a balanced response to such questions, considering more than one angle and in many cases including an introduction and a conclusion. It is our assessment that it has been deliberately programmed to answer in this style in order for its responses to appear more authoritative, as well as perhaps to serve as an outline model to students faced with real questions for their school or university coursework, and to make the use of Chat GPT more appealing to such students.

In each instalment of this article, we shall provide a transcript of the response Chat GPT gave to four of our questions, and a summary assessment of the strengths and weaknesses of its performance, before awarding them a grade from A to E. Let us know what you think of how ChatGPT fared against our range of questions. We’d love to hear your thoughts on this popular system.

1. Who was the greater philosopher: Locke or Leibniz?

GPT 4.0

The question of who was the “greater” philosopher between John Locke and Gottfried Wilhelm Leibniz is somewhat subjective, as it can depend on the personal beliefs, values, and interests of the person making the judgment. They made significant contributions to different areas of philosophy, and their ideas have had different kinds of impacts.
John Locke (1632-1704) was a central figure in the empirical tradition of philosophy, which emphasizes the role of experience and evidence gained through the senses in the formation of knowledge. His work laid the groundwork for Enlightenment thinking and greatly influenced the development of political philosophy. His ideas on the social contract and the rights of individuals had a profound impact on the development of political systems, particularly in the democratic tradition. His “Essay Concerning Human Understanding” is a key text in epistemology and philosophy of mind.
Gottfried Wilhelm Leibniz (1646-1716), on the other hand, was a polymath who made significant contributions to philosophy, mathematics, and science. In philosophy, he’s best known for his work in metaphysics and logic. He developed the concept of monads, which are simple substances that make up the universe. He also made significant contributions to the development of symbolic logic, and he is co-credited with inventing calculus in mathematics (independently of Sir Isaac Newton).
So, if your interest is more toward political philosophy and empiricism, you might find Locke to be the “greater” philosopher. If your interest is more toward metaphysics, logic, or the intersection of philosophy and mathematics, you might find Leibniz to be more significant.
It’s also important to note that “greatness” in philosophy isn’t just about who had the “right” ideas. It’s also about who had interesting, influential, and thought-provoking ideas. Both Locke and Leibniz certainly meet these criteria.

Analysis:

Chat GPT has no trouble in identifying at the outset who is implied by the Locke and Leibniz in the question, showing that their surnames in combination with their profession are sufficient for it to identify the individuals concerned for the purposes of responding.

It is correct in identifying at the outset that the question of which of two individuals famed for their own contributions in any particular field is the greater is somewhat subjective. This was a relevant observation to make: imagine a situation in which the same question was asked to a writer or another philosopher, and consider how their answer would be influenced by their own values, life experiences and opinions. So here, ChatGPT has successfully recognised that the preferred philosopher will vary depending on who is asked. It also succeeded in recognising the nuance that the philosophical writings of Locke and Leibniz do not generally address exactly the same topics within philosophy.

This said, the language it uses to express the relativism of any such comparison strays into the woolly. A well-written academic essay should be direct, precise and concise in its use of words to convey meaning. The expression ‘their ideas have had different kinds of impacts’ is vague and fails to identify what impacts their ideas had, an exposition of which would be expected in this context from an academic piece of writing.

The summary of the areas of philosophy on which Locke and Leibniz respectively wrote appears sound as far as it goes. This said, it is sketchy and does not explain why their respective work in each of the areas in which they wrote deserves to be considered as ‘great’. This is the main area in which the response falls short. If you are going to avow the greatness of the achievements of an historical figure in a particular field, you should give supporting arguments to demonstrate why their work in those fields is considered to be of a high quality or standard. Instead, Chat GPT uses too many platitudes such as ‘was a central figure’, ‘is a key text’, and (twice) ‘made significant contributions’, without justifying them with any specific details.

A good answer would detail the ways in which Locke's and Leibniz’s work in each of the areas on which they wrote broke new intellectual ground and built on the foundations laid by earlier writers in the same field. This has not been done.

The final paragraph is particularly weak and non-committal. It informally addresses the reader as ‘you’, which is considered poor style in academic writing, and instead of providing a summary case for why either philosopher could be considered the greater overall or for why they could be considered equally great in their respective fields, it simply refers the question back to the reader’s relative level of interest in the different areas of philosophy on which each philosopher wrote. It seems to skirt around the challenge of offering a definite opinion on the answer, which again is quite different from how the answer might end had a human written the response. This poorly judged conclusion ought to invite censure from any competent and experienced examiner.

Grade: D

2. Why was Gerolamo Cardano imprisoned?

GPT 4.0

Gerolamo Cardano (1501-1576), a prominent Italian polymath known for his contributions to mathematics, medicine, physics, and astrology, was indeed imprisoned in 1570. The reason for his imprisonment was not directly related to his scholarly works, but rather to a charge of heresy.
The specific incident that led to Cardano's imprisonment involved the publication of his book, "De Vita Propria" (On His Own Life). In this autobiography, he admitted to casting the horoscope of Jesus Christ, which was considered a blasphemous act by the Inquisition, the institution of the Catholic Church that was charged with identifying and eradicating heresy.
Despite his significant contributions to various fields of knowledge, Cardano had a tumultuous personal life, marked by the misfortunes of his children and his own struggle with mental health. His stint in prison was one of the dramatic episodes of his life. He was released from prison the next year, in 1571, and moved to Rome where he lived under the patronage of Pope Gregory XIII.

Analysis:

Chat GPT begins by showing recognition of who is meant by the subject of the question and listing some of the many areas on which Cardano wrote. Then, slightly awkwardly, it uses the form of words ‘was indeed imprisoned’, as though to imply that it has just had to check the claim implicit in the question for truthfulness and is somewhat surprised to find that according to its databases, the claim is in fact valid. This would not be an appropriate form of words in response to an academic examination question. It is altogether superfluous for the student to tell the examiner that a fact already implicitly understood as being true in the terms in which the question is framed is indeed, in fact, true. The question required the student to account for Cardano’s imprisonment, not to state whether or not it was true that he was imprisoned.

The second paragraph answers the question more directly, although inadequately. Chat GPT has correctly identified Cardano’s casting of the horoscope of Christ as being perceived as heretical by the Inquisition. But it was not the only perceived infraction he was found to have committed in print, and it would be historically incorrect to claim that it was by itself the reason for his imprisonment.

Even in relation to Cardano’s casting of the horoscope of Christ, the attendant details provided by Chat GPT are extremely sketchy and muddled. What is not stated is that the horoscope was included in both the first two editions of one of Cardano’s two major published works on astrology, a commentary on the Tetrabiblos of the ancient Greek scholar Ptolemy, in 1554 and 1555, many years before his imprisonment. This is important background information that would have contextualised Cardano’s arrest in relation to the timeline of his offending writings in a meaningful and revealing fashion, and ChatGPT’s failure to include it is to the detriment of its answer.

Moreover, and much more seriously, Cardano’s autobiography, De Propria Vita Liber, was only posthumously published in 1643, and there are peer-reviewed academic documents freely available online indicating that the manuscript for it was only commenced by him at the age of 74 (no earlier than September 1575) and then completed by him in the year of his death, 1576. Thus, the timeline of its composition does not fit the claim by Chat GPT that an admission within it was the cause of his imprisonment in 1570 at all.

Chat GPT would appear to have done little more than arbitrarily pull together unrelated facts about Cardano’s life here in the vague expectation that they might be causally connected, without even checking the credibility of this hypothesis against the timeline of events. It has thus failed the test of accurate historical research making full use of public domain sources, even while it has picked up on some key facts in isolation.

The final paragraph of Chat GPT’s response is little more than a narrative account of some general observations about Cardano’s life. This again is not relevant in response to the specific question asked. The nature of Cardano’s personal life is irrelevant to the question. A more satisfactory answer would have focused in depth on the Inquisition process and the individuals who led it, why they found some of Cardano’s writings to be heretical according to the beliefs of the established Church in Italy at the time, and why the infractions were considered serious enough according to the beliefs of the time for a prison sentence to be merited.

Overall, this is a poor, sketchy and inaccurate answer.

Grade: E

3. Why did Gladstone's Irish Home Rule bill fail on multiple occasions?

GPT 4.0

William Ewart Gladstone, a 19th-century British statesman and Prime Minister, proposed two Irish Home Rule Bills during his tenure, aiming to grant Ireland a degree of self-governance while still maintaining its status as part of the United Kingdom. Both of these bills failed to pass into law for a variety of reasons.
The First Home Rule Bill (1886): The first bill faced significant opposition, both from within Gladstone's Liberal Party and from the Conservatives. Many British politicians were concerned that Home Rule would weaken the United Kingdom and potentially lead to Irish independence. This faction, known as the Liberal Unionists, split from Gladstone's party and later merged with the Conservatives, opposing any form of Home Rule for Ireland. The First Home Rule Bill was defeated in the House of Commons.
The Second Home Rule Bill (1893): The second bill actually passed in the House of Commons, but was subsequently rejected by the House of Lords, which at the time had the power to veto legislation outright. The House of Lords was largely conservative and apprehensive of the potential political and social changes that might come with Home Rule.
In both cases, there were also concerns about how Home Rule would affect the Protestant minority in Ireland, especially in the northern province of Ulster, where Protestants were the majority and strongly identified with British rule. The proposed bills did not adequately address these sectarian issues, leading to further opposition.
Despite these failures, Gladstone's efforts laid the groundwork for future discussions about Irish self-governance, which eventually led to the establishment of the Irish Free State in 1922.

Analysis:

The introductory sentence correctly identifies Gladstone, but the language in which it speaks of him would be more appropriate in response to someone who had never heard of him before than in response to someone who is already clearly familiar with specific elements of his political career. It is not helpful to refer to him in such general terms as ‘a 19th-century British statesman and Prime Minister’ in this context.

What is arguably more useful is the subsequent nuanced reference to the objectives of the two Home Rule Bills. It is true that they were not designed to grant Ireland total independence from the United Kingdom. However, the expression ‘a degree of self-governance’ arguably understates the extent of the autonomy that they would have granted Ireland. What was on offer was considerably more than just a regional assembly.

The following sentence, invoking ‘a variety of reasons’ for the failure of the bills to pass into law, is too vaguely expressed to be useful. Some characterisation of the range and type of reasons would have been more impressive here, even though it’s meant as part of an introduction.

The second paragraph correctly identifies the date of the first Home Rule Bill. It also correctly recognises that opposition to the bill was not limited to members of the opposition party. However, it is short on details as to the distribution of the vote. It is not enough to simply declare, as Chat GPT does here, that ‘the First Home Rule Bill was defeated in the House of Commons’. More context would be useful here.

It is arguably impossible to meaningfully account for the failure of a bill to get through the Commons without referencing the actual number of votes from each political faction for and against it. In this case, Chat GPT’s answer notably fails to differentiate between the extent of the opposition to the bill from within the Liberal Party and from within the Conservative Party that dominated the opposition benches. Records show that in fact more than two thirds of Liberal MPs (224) voted with Gladstone for the bill, and only 92 against it, whereas not a single Conservative MP voted for it, and 248 of them voted against it.

This observation should have been factored into the analysis. The failure of the bill to pass was substantially although not solely a result of the wholesale opposition to it from the Conservative Party. The fact that just under a third of Liberal MPs also voted against it played a supporting although pivotal role in its defeat. It’s these important details that ChatGPT seems to gloss over that lead its answer to lack a pleasingly high level of definition, although it correctly identifies the most basic facts.

While the second and third sentences of the second paragraph attempt to give reasons why so many MPs voted against the bill (albeit without differentiating across party lines), there is a notable non-sequitur between them. The third sentence would appear to imply that all the many British politicians concerned that the bill would lead to independence belonged to a faction called the Liberal Unionists, which then split from the Liberal Party. However, the many British politicians with these concerns were not all members of the Liberal Party, so they could not all be classed as Liberal Unionists. Indeed, as we have seen in the details of the vote above, most of them were not members of the Liberal Party at all, but rather members of the Conservative Party, and were therefore not Liberal Unionists. The misstructuring of this paragraph generates considerable confusion.

The third paragraph correctly identifies the date of the Second Home Rule Bill, and correctly identifies that it was passed in the House of Commons but defeated in the House of Lords, which at the time had an absolute veto on Commons legislation. It is also correct in referring to the House of Lords as being ‘largely conservative’ at the time, although the use of ‘conservative’ with a small ‘c’ understates the case here. Records show that the Conservative Party held an absolute majority in the Lords, and this fact alone was sufficient to ensure the rejection of the bill, which in the event was overwhelmingly defeated in the Lords, with only 41 votes in favour and 419 against.

The fourth paragraph, referring to concerns over the impact of Home Rule on the Protestant minority in Ireland, is relevant as part of the analysis for the reasons as to the defeat of both bills. However, the final sentence in it is too vague to pass muster: ‘leading to further opposition’ is too generally stated, begging the question ‘opposition from whom?’ It would be more useful to know the degree to which opposition on grounds of concerns for the Protestant population of Ireland was confined to the Conservative Party and the degree to which it was also voiced by Liberal Party MPs.

The final paragraph is excusable as historical contextualisation, although it doesn’t directly answer the question.

Overall, Chat GPT’s answer in this case is mostly historically accurate as far as it goes, but short on relevant detail and partly spoiled by its false implication that the only politicians concerned that the Home Rule Bill would lead to Irish Independence were members of the Liberal Party. It does a better job here than it managed with the question of Cardano’s imprisonment above, but its analysis falls short.

Grade: C

4. Compare and contrast muons and gluons

GPT 4.0

Muons and gluons are two distinct types of particles in the field of particle physics, each with different properties and roles.
Muons:
Muons are fundamental particles, classified as leptons, which also include electrons and neutrinos. They are similar to electrons but are about 207 times heavier. Muons carry a charge of -1e (the same magnitude as the charge of an electron but opposite in sign) and have a spin of 1/2, which classifies them as fermions. Muons are subject to the weak, electromagnetic, and gravitational forces, but not the strong force. They are unstable particles and decay into an electron and two types of neutrinos through a process mediated by the weak force.
Gluons:
Gluons, on the other hand, are gauge bosons for the strong force, one of the four fundamental forces of nature. They mediate the interaction between quarks, the building blocks of protons and neutrons. Gluons themselves carry what is known as a color charge (unrelated to visual color), which is the charge associated with the strong force. Due to this color charge, gluons can interact with each other, unlike photons, which are the gauge bosons for the electromagnetic force. Gluons have no mass, carry no electric charge, and have a spin of 1, which classifies them as bosons.
In summary, the primary differences between muons and gluons are:
Muons are leptons, while gluons are gauge bosons.
Muons carry electric charge, while gluons do not.
Muons are subject to the weak, electromagnetic, and gravitational forces, but not the strong force. Gluons mediate the strong force.
Muons have mass, while gluons do not.
Muons decay into other particles, while gluons do not decay in the conventional sense because they are confined within particles like protons and neutrons due to the property of the strong force known as color confinement.
It's worth noting that both muons and gluons are key components in the standard model of particle physics, which describes the particles and forces that make up our universe.

Analysis:

This was a relatively simple question for Chat GPT to answer, as all it is required to do is to draw upon readily available definitions and facts about two different subatomic particles and then compare the facts pertaining to each.

While we at GWS Media are not experts on particle physics, checking the responses against readily available reference sources on fundamental particles online tends to give the impression that Chat GPT’s responses are largely accurate in this instance.

The summary of differences between muons and gluons in relation to particular points is helpful and we think comparable to that which would be given from a human response, for example from an A-level physics teacher, as this list format is easier to read and contrast. The initial paragraphs looking at the characteristics of each particle on its own in more depth are quite convoluted by comparison.

We see a missed opportunity in the failure to note that the only thing muons and gluons have in common is the sound of their respective names, which is a false friend as they belong to totally different functional groups of subatomic particles.

It would also have been relevant to compare the dates of discovery of muons and gluons, which freely available online sources indicate to have been 1936 and 1978, respectively.

These omissions notwithstanding, we rate this as Chat GPT’s best response from within the first batch of four questions. It accurately recalled facts and succeeded in the task of comparison and contrast.

Grade: B

Conclusions

Our first set of four questions challenged a number of different areas of Chat GPT’s knowledge base and analytical powers. Two (Nos. 2 and 3) were focused on public-domain historical facts from centuries past but were asking Chat GPT to explain them and not just to retell them. One (No. 1) was focused on famous intellectuals of the past and requiring Chat GPT to comparatively evaluate their respective contributions to philosophy. The fourth (No. 4) was a purely scientific question related to modern developments in physics, and merely tested Chat GPT’s skills of comparison and presentation of related facts.

We found that the modern scientific question was the one Chat GPT handled best.

On both of the historical explanation questions, it was able to provide some relevant outline but fell short on detail and got some facts wrong or presented them in a misleading way, although it did better at factual recall on a major episode in 19th-century British politics than on the background to the religiously-motivated incarceration of a 16th-century Italian polymath. It could well be that the more specific and obscure the relevant historical facts, the more trouble Chat GPT has in accessing them.

On the question of the comparative evaluation of two famous philosophers, it was non-committal, and fell short on analysis, even while it was able to satisfactorily outline the areas of philosophy to which each thinker contributed and recognise the differences between them.

From this evidence, we can see that Chat GPT 4.0 has good powers of comprehension of questions and excellent access to scientific knowledge and high-profile historical data in the public domain, but at present, limited powers of explanation and analysis.

If you are using ChatGPT for any kind of fact-finding or research, you should definitely check the facts it presents, and ideally have enough background knowledge (or find someone who does) to check that what it says is not misleading. At the very least, you will want to compare its output with trusted / moderated websites.

Continue to Part 2 of this series of articles analysing our Chat GPT 4.0 Test.

How good is ChatGPT 4? We analyse its answers to four questions...

1. Who was the greater philosopher: Locke or Leibniz?

GPT 4.0

Analysis:

Grade: D

2. Why was Gerolamo Cardano imprisoned?

GPT 4.0

Analysis:

Grade: E

3. Why did Gladstone's Irish Home Rule bill fail on multiple occasions?

GPT 4.0

Analysis:

Grade: C

4. Compare and contrast muons and gluons

GPT 4.0

Analysis:

Grade: B

Conclusions

Have a project you'd like to discuss?

Thank you!