Chat GPT 4.0 Test Part 2: Questions 5-8

Back in the Spring, we ran the first part of our review of Chat GPT 4.0, analysing its responses to a selection of four moderately challenging questions on topics of famous philosophers, Italian cultural history, British political history and science.

We now bring you the long-awaited second part of our review of its capabilities, analysing its responses to four more questions of varying difficulty level. Read on to see how it fared on questions about the environment, herbalism, industrial design and a range of early home computers!

Then read our analysis and let us know if you agree with our assessment of Chat GPT’s responses!

5. If mean global temperatures rise to three degrees Celsius above pre-industrial levels but no further, how much sea level rise can be expected to result by the year 3000?

CATEGORY: ENVIRONMENT AND PUBLIC HEALTH

GPT 4.0

Predicting sea level rise is a complex process that involves multiple factors, including thermal expansion of the oceans, melting of glaciers and ice caps, and the behavior of the large ice sheets in Greenland and Antarctica. It's also important to note that climate change impacts are not linear, and tipping points could trigger rapid changes in ice sheet dynamics.
As of my knowledge cutoff in 2021, there's no precise consensus about how much sea level would rise with a global average temperature increase of 3 degrees Celsius by the year 3000. However, a study published in the Proceedings of the National Academy of Sciences in 2016 suggested that for every degree Celsius of global warming, sea levels are likely to rise approximately 2.3 meters over the following 2,000 years. Therefore, if global temperatures increase by 3 degrees Celsius and remain at that level, sea level could theoretically rise by approximately 6.9 meters (around 23 feet) over the following 2,000 years.
However, other studies suggest different amounts of sea level rise, and the actual outcome could be more or less severe. Factors such as the rate of greenhouse gas emissions, how quickly the major ice sheets respond to warming, and the extent of other contributing factors like ocean warming and glacial melt could all influence the eventual sea level rise. Furthermore, future technological advances and mitigation efforts could also influence these outcomes.
Remember that while these estimates provide a general sense of potential sea level rise, they are based on many assumptions and models that contain inherent uncertainties. For the most current and comprehensive understanding, it's always best to consult the latest research and reports from organizations such as the Intergovernmental Panel on Climate Change (IPCC).

Analysis:

This question was designed to test Chat GPT’s abilities not only to source relevant scientific data but also to mathematically reason and project future measurements on its basis.

The initial sentence of its response is apposite, acknowledging three different heat-sensitive geophysical processes that can induce sea level rise. The second sentence also shows awareness of the fact that sea level rise does not respond in a purely linear fashion to increased global temperatures and may rapidly increase when particular ice sheets are brought to the point of collapse. Essentially, Chat GPT has shown recognition of the complexity of the scientific considerations needed to project future sea level rise in response to global temperature increases.

The second paragraph attempts to give a more direct answer to the question, but does so within the framework of disclaimers and provisos. Its wording is somewhat unfortunate, however, on two different levels.

Firstly, its statement that ‘there’s no precise consensus about how much sea level would rise with a global average temperature increase of 3 degrees Celsius by the year 3000’ implies that it has found multiple divergent opinions on this specific question. It does not, however, present any specific projection that it has discovered, suggesting that the truth may more nearly be that it has failed to find any projection at all based on this specific warming level and timeframe. The issue therefore would appear not to be a lack of consensus but one of lack of directly relevant recorded projections in sources available to Chat GPT. The challenge implicitly set was, however, for Chat GPT itself to project based on available scientific data regarding the impact of temperature increase on sea level rise in general, factoring in likely ice melt volumes and the thermal expansion of ocean water. This it has ostensibly failed to even attempt to undertake.

Secondly, the requested projection was not on how much sea level would eventually rise if temperature increase reached 3 degrees Celsius by the year 3000, as Chat GPT’s form of words here would suggest. It was on how much sea level would rise by the year 3000 if temperature increase reached 3 degrees Celsius well in advance of that year but then stopped. There is a clear difference. Consequential sea level rise is proven to lag well behind the temperature increases that cause it. The world has not yet experienced all the sea level rise expected to result from global temperatures reaching 1 degree above pre-industrial levels, although they already have done.

Chat GPT has identified and quoted one theoretical model linking each degree of temperature rise to a certain amount of sea level rise over the following 2,000 years, on a linear basis, and used this to project 6.9 meters of sea level rise within 2,000 years of a temperature increase of three degrees Celsius. This calculation, however, is unfinished for the purposes of responding to the original question, which related to sea level rise by the year 3,000 consequent upon a temperature rise of 3 degrees over pre-industrial levels. If we take the burning of fossil fuels for industrial purposes as having commenced in approximately 1710, when coke began to be used for iron smelting, then 2,000 years from that date takes us to 3710, far beyond the requested projection date. And since even in 2023, global temperature is only about one degree above pre-industrial levels, there is still quite a long time before temperatures are likely, if insufficient measures to halt their rise are taken, to reach three degrees above pre-industrial levels. Chat GPT’s answer projecting 2,000 years from when temperatures reach three degrees above pre-industrial levels is therefore likely to be looking far further into the future, perhaps to around 4200, if we tentatively project that global temperatures might reach 3 degrees above pre-industrial levels in 2200. Although it could theoretically happen considerably sooner than that, perhaps a whole century sooner, Chat GPT’s answer projecting forward 2,000 years from a future time when temperatures reach 3 degrees above pre-industrial levels nonetheless takes us well over a millennium beyond the timeframe specified in the original question, which is the year 3,000. It therefore does not usefully answer the question at all. What was needed here was for Chat GPT to estimate the fraction of the projected rise that would occur by the year 3,000. This it has failed to do.

The next paragraph of Chat GPT’s answer prevaricates wildly and is not very useful. In stating ‘Factors such as the rate of greenhouse gas emissions, how quickly the major ice sheets respond to warming, and the extent of other contributing factors like ocean warming and glacial melt could all influence the eventual sea level rise’, it is partly reiterating the different geophysical processes that mediate between temperature rise and sea level rise, and partly alluding to factors that affect temperature rise itself. This results in a confused bundling of different considerations in a single sentence. Changes in the rate of greenhouse gas emissions are a known determinant factor upon temperature rise, but do not by themselves impact the relationship between temperature rise and sea level rise. So they are irrelevant to answering the question, which assumes that enough greenhouse gas emissions will take place to occasion a temperature rise of three degrees Celsius over pre-industrial levels and then asks how much sea level rise will result if that does happen.

The following reference to future mitigation efforts is a relevant consideration. It would have been useful here to be more specific in terms of the types of mitigation that would be needed to reduce the level of sea level rise from a given level of temperature rise. For example, potential measures to stabilise the Antarctic and Greenland ice sheets and contain them from flowing out to warmer waters where they would melt and contribute to sea level rise would have been relevant to mention here.

The final paragraph of Chat GPT’s report acknowledges the uncertainties of all projections for future sea level rise and recommends keeping up to date with the most recent ones. This is fine as general research advice although again it avoids answering the question.

Overall, Chat GPT has struggled to get to grips with the demanding challenge of projecting future sea level rise by a certain date in response to a given level of temperature rise over pre-industrial levels. Its only attempt at projection has failed to match the specified timeframe by a margin of more than 1,000 years. Otherwise, it has mostly fallen back on drawing attention to general considerations of different factors affecting the relationship between temperature rise and sea level rise, at the same time as throwing in the irrelevant consideration of variable greenhouse gas emissions, and ultimately deferred to the IPCC’s latest reports and research without specifying what they say.

Grade: E

6. Describe the botanical and medicinal similarities and differences between the herbs tansy, feverfew and costmary

CATEGORY: BOTANICAL SCIENCE AND MEDICINE

GPT 4.0

Tansy, feverfew, and costmary are all flowering herbs from the Asteraceae family, often used in traditional medicine.
Tansy (Tanacetum vulgare) is a perennial, herbaceous flowering plant with finely divided compound leaves and yellow, button-like flowers. Tansy has been used traditionally for a variety of purposes, including as a digestive aid, for treating intestinal worms, migraines, and rheumatism, and as an insect repellent. However, tansy contains thujone, a toxic substance that can cause serious side effects, such as seizures and liver and brain damage, especially when consumed in large amounts. Therefore, its use as a medicinal plant is not recommended without medical supervision.
Feverfew (Tanacetum parthenium) is another perennial plant with daisy-like flowers and pinnately compound leaves. As its name suggests, it has traditionally been used to reduce fever. Other traditional uses include the treatment of headaches, arthritis, and digestive problems. Feverfew is best known for its potential use in preventing migraines, and some scientific studies support this use. As with many medicinal herbs, feverfew can cause side effects, including mouth ulcers, digestive problems, and can interact with blood-thinning medications.
Costmary (Tanacetum balsamita) is also a perennial herb, with fragrant leaves that have been used for centuries in cooking and medicine. It's been used for digestive disorders, menstrual difficulties, and respiratory conditions, among others. Its leaves are sometimes used to flavor beers and wines, as well. The plant's medicinal effectiveness is not as well studied as that of tansy or feverfew.
So, while all three herbs are part of the Asteraceae family and share some similar uses in traditional medicine, particularly related to digestive issues, they are distinct species with different chemical compositions. These differences in their chemistry can lead to variations in medicinal effectiveness and potential side effects. As with all medicinal herbs, it's important to seek advice from a healthcare provider before using these plants for their potential therapeutic effects.

Analysis:

In assessing the botanical similarities of the three herbs listed, Chat GPT has correctly identified that they are members of the same botanical family, but has neglected to mention that they are all members of the same genus. In taxonomy, the genus is the level of classification between the family and the species. It therefore refers to a much closer degree of relatedness between plants than the family does. Failing to note that costmary, tansy and feverfew are all species of the same genus was a serious omission from Chat GPT’s response to this question.

It would further have been apposite here to give a description of the defining characteristics of the Tanacetum genus, such as the form of the leaves, flowers and fruits.

Each herb has then been fairly clearly described in terms of its form, habits and traditional medicinal uses. Some non-medicinal uses such as in flavouring food have also been thrown in, and these are arguably irrelevant to answering the question and would preferably have been omitted.

What was called for was not an encyclopaedic account of the herbs in general but a direct response to the specific question about their similarities and differences. This challenge has been attempted in the final paragraph of Chat GPT’s response, but has not been very well met. The only specific similarities Chat GPT has identified is that they are members of the same botanical family and all have some digestive uses. It has failed to note as a similarity the fact that two of the three have been used against migraines, although it has separately characterised them thus in its individual descriptions for each herb. Its assessment of their differences is still poorer. It has acknowledged their different chemical compositions without specifying how they differ, and has not contrasted their botanical forms or uses. All in all, its presentation of the differences between the three herbs is much too vague and non-specific to usefully respond to the question, and lacks evidence of detailed analysis.

The final sentence of Chat GPT’s response is nothing more than a disclaimer on medical advice. This may be relevant to protecting Open AI from opportunistic lawsuits from parties claiming injury caused by experimentation with these herbs, but it does not help to answer the question.

To sum up, Chat GPT has given a narrative description of each herb and its uses that appears to be reasonably accurate as far as it goes, but it has shown lacklustre powers of analysis of the botanical and medical similarities and differences between the herbs and has failed to even spot that they are all of the same genus.

Grade: D

7. When did the use of ebonite bottle-tops begin to be phased out and the use of metal lids phased in?

CATEGORY: HISTORY OF INDUSTRIAL DESIGN

GPT 4.0

Ebonite, also known as vulcanite or hard rubber, was commonly used in various products in the 19th and early 20th centuries due to its durability and resistance to chemicals and electric current. However, information specifically about ebonite bottle-tops and their transition to metal lids is quite limited and may vary by region and specific use case.
Generally speaking, the transition from one type of material to another in the manufacture of products like bottle tops is often driven by a combination of factors including cost, functionality, and changes in manufacturing technology. Metal caps, such as the crown cork or crown cap, were invented by William Painter in 1892 and quickly gained popularity due to their effectiveness at maintaining carbonation in beverages like beer and soda.
As for the phasing out of ebonite bottle-tops, it's difficult to pinpoint a specific time, as the use of materials can vary based on the type of bottle (such as for beer, milk, or medicine), local practices, and other factors. It would be reasonable to assume, however, that the transition likely occurred in the early to mid-20th century, as materials like metal and later plastic became more prevalent in bottle cap manufacturing.
It's worth noting that today, many bottle tops are made from plastic or metal, although there has been a resurgence in the use of cork and other more sustainable materials due to environmental concerns.

Analysis:

This question challenged Chat GPT to locate and assess facts indicating the relative timelines for the use of two different types of material in the industrial design of bottle tops: ebonite and metal, and particularly the timeframe during which metal superseded ebonite in use. This was not an easy question because it is a very specific industrial design topic on which there are few authoritative and dependable online resources.

Chat GPT has managed to establish a limited number of relevant facts here, but has ultimately acknowledged that it cannot access a clear and definite answer or one that applies in all territories.

Among the most relevant facts it has established is that metal caps were invented in 1892 and ‘quickly gained popularity’. This goes a long way towards answering the question of when metal lids began to be phased in. What is less clear from Chat GPT’s sources is when ebonite bottle-tops were phased out. It suggests the early-to-mid 20^th century, which accords reasonably well with the limited online sources that address this topic. Some sources suggest that ebonite bottle tops remained in use for certain applications as late as the 1960s or even the early 1970s. However, the question did not require Chat GPT to give their last recorded use, only when they began to be phased out. If it could have causally linked the commencement of the phasing in of metal tops to the beginning of the phasing out of ebonite tops, it might perhaps have reasonably suggested the 1890s for this too. The phasing out of a technology can sometimes span several decades, and ebonite bottle tops may still have been dominant in the commercial bottling marketplace for many years after they began to be phased out and replaced by metal ones.

The first sentence of Chat GPT’s response provides background information on the historical uses of ebonite bottle tops. This reads more as part of an encyclopaedia entry about ebonite in general than as a direct response to the specific question, but is excusable for intellectual context-setting purposes as part of a short essay-length response, and usefully gives its alternative name of vulcanite, by which it is commonly referred, although it would have been useful to have provided more specific dates showing when ebonite bottle tops came into use. One source indicates that they were invented in 1872 and patented in the early 1880s by Henry Barrett, and that the material ebonite itself had been patented in 1846 by Charles Goodyear. These facts would have helped establish the timeline for the introduction of ebonite bottle-tops much more clearly than the general statement of the common usage of ebonite as a material in the 19^th century that has been provided.

The second sentence of Chat GPT’s response is slightly confusedly worded. In stating that ‘information specifically about ebonite bottle-tops and their transition to metal lids is quite limited and may vary by region and specific use case’, it appears to be implying that there either may or may not be more information about ebonite bottle-tops in some regions than in others. This seems an unnecessarily and unhelpfully vague statement to make. Either there is more information about them in some regions than in others, or there isn’t. If there is, then it would be more useful to specify the difference that has been found. For example, if has found a plethora of sources about their use in Britain but almost none about their use in Russia, it could simply say so. And if more information about their use in any territory has been found, it would be appropriate to specify this information, while pointing out that it only relates to their use in that territory and might not reflect patterns of use in other parts of the world.

The final paragraph of Chat GPT’s response is of doubtful value. Acknowledging that many bottle tops are made of metal or plastic today says nothing for when ebonite began to be phased out or metal phased in. It doesn’t even quantify the proportion of bottle tops made of metal today.

Overall, Chat GPT has struggled to establish sufficient facts to give a committed response to this question, and with the exception of identifying the year when metal bottle tops were invented, it has fallen back on rather imprecise and general statements about different technologies for bottle tops, while throwing in intellectually relevant but disappointingly non-specific disclaimers regarding the degree to which the timing of the phasing out of ebonite and phasing in of metal may have been universal across different countries and for different applications.

Grade: D

**Background image source:** Dpla-fr, CC BY-SA 4.0, via Wikimedia Commons (Cropped

8. List all the home computers released by Sinclair Research from the ZX80 to the Spectrum Plus 3, detailing their years of first issue and their RAM specifications

CATEGORY: COMPUTERS AND PROGRAMMING

GPT 4.0

Sinclair Research, founded by Sir Clive Sinclair, is a British consumer electronics company that played a significant role in the early home computer era. Below is a list of home computers released by Sinclair Research from the ZX80 to the Spectrum Plus 3, including their years of release and their RAM specifications:
Sinclair ZX80 - Released in 1980, the ZX80 came with 1KB of RAM, expandable to 16KB with an external module.
Sinclair ZX81 - Released in 1981, the ZX81 was initially available with 1KB of RAM, but an external 16KB RAM pack was offered as an optional add-on.
Sinclair ZX Spectrum - Released in 1982, the initial ZX Spectrum models came with either 16KB or 48KB of RAM.
Sinclair ZX Spectrum+ - Released in 1984, the ZX Spectrum+ came with 48KB of RAM. It was essentially a ZX Spectrum in a new case with a 'proper' keyboard, as opposed to the rubber keys of the original Spectrum.
Sinclair QL (Quantum Leap) - Also released in 1984, the QL was aimed more at the professional and business markets and came with 128KB of RAM.
Sinclair ZX Spectrum 128 - Released in 1985 in Spain and 1986 in the UK, this model had 128KB of RAM and included a number of enhancements over the original Spectrum, including a built-in sound chip.
Sinclair ZX Spectrum +2 - Released in 1986, after Sinclair Research's computer business was bought by Amstrad. The ZX Spectrum +2 came with a built-in cassette recorder and 128KB of RAM.
Sinclair ZX Spectrum +3 - Released in 1987, the ZX Spectrum +3 featured a built-in 3-inch floppy disk drive and also came with 128KB of RAM.
It's worth noting that the ZX Spectrum models, in particular, were incredibly popular and had a massive impact on the home computer scene in the UK, leading to a thriving ecosystem of software and games that continues to be appreciated by enthusiasts to this day.

Analysis:

This was a straight-forward fact-finding question for Chat GPT that did not require complex analysis. It only had to arrange the models of computer in a logical order, and it has appropriately opted to list them in chronological order of their dates of release. However, it has placed the Spectrum QL (launched in January 1984) and the Spectrum+ (launched in October 1984) in the wrong order, an understandable error when it was asked just for the years of release, but still one that a human researcher would have avoided.

Appropriately omitted from Chat GPT’s response, as required by the implicit temporal frame of reference of the question, are all models released after the Spectrum +3, including the +2A, +2B, +3B and PC200, which were released in 1988 and 1989. It might have been relevant for Chat GPT to state that the Spectrum +3 was not the last Spectrum computer released.

It was appropriate to mention that Sinclair Research’s computer line-up had been purchased by Amstrad by the time the +2 was launched in 1986, because the original question called for models released by Sinclair Research, and the change of corporate owner needed to be acknowledged.

All Chat GPT’s fact-finding regarding the memory specifications of the computers appears to be essentially correct.

What appears to be less consistent is the program’s choices to volunteer additional information outside the specification of the question about some but not all of the computers. Additional facts are volunteered about the Spectrum+, QL, Spectrum 128, Spectrum +2 and Spectrum +3, but not about the ZX80, ZX81 or Spectrum. Some of the additional facts relate to the inclusion of storage media, others to the design of the keyboard or sound processing system, and others to market positioning. This appears to be an arbitrary miscellany of facts not called for by the original question. It does add some context, but it is unevenly applied and ultimately unsatisfactory. It seems that Chat GPT often finds it hard to resist showing off its encyclopaedic knowledge in areas beyond what the question called for, but when it does so, it does not always do so in even or like measure across the different cases under study.

Still, this was a solid and generally accurate response to a relatively simple question, and shows that when it comes to reproducing facts about the release dates and technical specifications of modern electronic equipment on which adequate information sources are found online, Chat GPT has few difficulties.

Grade: B

Conclusion

In this second instalment of our wide-ranging test undertaken from May to June of 2023 of the capabilities of Chat GPT 4.0, we asked it four further questions designed to assess its comprehension, its identification and selection of facts relevant to answering each question, and its powers of analysis.

Its answers to three of the four questions in this instance showed up its weaknesses in areas of analysis such as extrapolation from scientific rules to scientific projections, comparison and contrast (as distinct from merely encyclopaedic description), and the establishment of historical timelines for specific industrial developments when faced with limited source material. It coped much better with a simple question on the timeline and specification of product releases by a late-20th-century technology company.

In spite of being claimed to exceed the performance of many students in answering university-level examination questions, Chat GPT 4.0 has shown that it excels chiefly at summoning facts from the Internet and recalling them in its own words. Its analytical powers and ability to keep its answers strictly relevant to the questions vary with the subject matter, but most often fall well below what would be expected of an intelligent student with access to high-quality library resources and the ability to comprehend and respond pertinently to relatively specific questions.

Since this test was carried out, an update to Chat GPT 4.0 has been released in November 2023, extending the cut-off point for its knowledge base to April 2023. This may result in its responses to some questions now differing from those at the time of our test. In future instalments of the test, we shall check what differences there are between its answers in the Spring of this year and its answers since the update, and draw attention to significant changes. Our tests so far show that while it gives somewhat different responses, they do not tend to be qualitatively significantly improved.

Our criticisms notwithstanding, the AI capabilities of Chat GPT 4.0 are a quantum leap ahead of previous publicly available AI chat routines. It is because it is so good that we have endeavoured to establish its limitations. Our precautionary advice is not to rely on Chat GPT 4.0 as a substitute for independent research and analysis, because so far, a well-trained human mind can do significantly better and give a more nuanced, detailed, accurate and relevant response to questions.

If you are interested in our services, or have any thoughts on this article, please send us a message.

Get in touch

Chat GPT 4.0 Test Part 2: Questions 5-8

5. If mean global temperatures rise to three degrees Celsius above pre-industrial levels but no further, how much sea level rise can be expected to result by the year 3000?

GPT 4.0

Analysis:

Grade: E

6. Describe the botanical and medicinal similarities and differences between the herbs tansy, feverfew and costmary

GPT 4.0

Analysis:

Grade: D

7. When did the use of ebonite bottle-tops begin to be phased out and the use of metal lids phased in?

GPT 4.0

Analysis:

Grade: D

8. List all the home computers released by Sinclair Research from the ZX80 to the Spectrum Plus 3, detailing their years of first issue and their RAM specifications

GPT 4.0

Analysis:

Grade: B

Conclusion

Have a project you'd like to discuss?

Thank you!