wiki article

Getting a wiki article and analyze to give a rating based on reader’s background.

Before reading an article, I want to know about a snapshot of the article. In this snapshot, I would like to know how interesting it is or how difficult it is for me to read. We can get a recommendation from AI.

WikiArticle

Let’s grab a wikipedia article to test.


source

WikiArticle

 WikiArticle (url)

Grab a wikipedia article to analyze.


source

WikiArticle.intro

 WikiArticle.intro ()

Select an introduction from the WikiArticle.

With an wikipedia url, we can grab the article. Let’s take a look at Evolution of snake venom article.

article = WikiArticle("https://en.wikipedia.org/wiki/Evolution_of_snake_venom")
intro = article.intro
Markdown(intro)

Evolution of snake venom

Venom in snakes and some lizards is a form of saliva that has been modified into venom over its evolutionary history.[1] In snakes, venom has evolved to kill or subdue prey, as well as to perform other diet-related functions.[2] While snakes occasionally use their venom in self defense, this is not believed to have had a strong effect on venom evolution.[3] The evolution of venom is thought to be responsible for the enormous expansion of snakes across the globe.[4][5][6]

The evolutionary history of snake venom is a matter of debate. Historically, snake venom was believed to have evolved once, at the base of the Caenophidia, or derived snakes. Molecular studies published beginning in 2006 suggested that venom originated just once among a putative clade of reptiles, called Toxicofera, approximately 170 million years ago.[7] Under this hypothesis, the original toxicoferan venom was a very simple set of proteins that were assembled in a pair of glands. Subsequently, this set of proteins diversified in the various lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania: several snake lineages also lost the ability to produce venom.[8][9] The Toxicoferan hypothesis was challenged by studies in the mid-2010s, including a 2015 study which found that venom proteins had homologs in many other tissues in the Burmese python.[10][11] The study therefore suggested that venom had evolved independently in different reptile lineages, including once in the Caenophid snakes.[10] Venom containing most extant toxin families is believed to have been present in the last common ancestor of the Caenophidia: these toxins subsequently underwent tremendous diversification, accompanied by changes in the morphology of venom glands and delivery systems.[12]

Snake venom evolution is thought to be driven by an evolutionary arms race between venom proteins and prey physiology.[13] The common mechanism of evolution is thought to be gene duplication followed by natural selection for adaptive traits.[14] The adaptations produced by this process include venom more toxic to specific prey in several lineages,[15][16][17] proteins that pre-digest prey,[18] and a method to track down prey after a bite.[19] These various adaptations of venom have also led to considerable debate about the definition of venom and venomous snakes.[20] Changes in the diet of a lineage have been linked to atrophication of the venom.[8][9]

article.intro is a title and some paragraphs of content before table of contents starts. Since this is a markdown, we can use _repr_markdown_ to display article nicely.

@patch
def _repr_markdown_(self:(WikiArticle)):
    return self.intro
article

Evolution of snake venom

Venom in snakes and some lizards is a form of saliva that has been modified into venom over its evolutionary history.[1] In snakes, venom has evolved to kill or subdue prey, as well as to perform other diet-related functions.[2] While snakes occasionally use their venom in self defense, this is not believed to have had a strong effect on venom evolution.[3] The evolution of venom is thought to be responsible for the enormous expansion of snakes across the globe.[4][5][6]

The evolutionary history of snake venom is a matter of debate. Historically, snake venom was believed to have evolved once, at the base of the Caenophidia, or derived snakes. Molecular studies published beginning in 2006 suggested that venom originated just once among a putative clade of reptiles, called Toxicofera, approximately 170 million years ago.[7] Under this hypothesis, the original toxicoferan venom was a very simple set of proteins that were assembled in a pair of glands. Subsequently, this set of proteins diversified in the various lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania: several snake lineages also lost the ability to produce venom.[8][9] The Toxicoferan hypothesis was challenged by studies in the mid-2010s, including a 2015 study which found that venom proteins had homologs in many other tissues in the Burmese python.[10][11] The study therefore suggested that venom had evolved independently in different reptile lineages, including once in the Caenophid snakes.[10] Venom containing most extant toxin families is believed to have been present in the last common ancestor of the Caenophidia: these toxins subsequently underwent tremendous diversification, accompanied by changes in the morphology of venom glands and delivery systems.[12]

Snake venom evolution is thought to be driven by an evolutionary arms race between venom proteins and prey physiology.[13] The common mechanism of evolution is thought to be gene duplication followed by natural selection for adaptive traits.[14] The adaptations produced by this process include venom more toxic to specific prey in several lineages,[15][16][17] proteins that pre-digest prey,[18] and a method to track down prey after a bite.[19] These various adaptations of venom have also led to considerable debate about the definition of venom and venomous snakes.[20] Changes in the diet of a lineage have been linked to atrophication of the venom.[8][9]

Using Claudette

Using claudette, we can analyze the article. When analyzing, it decides on interest_rating, difficulty_rating, prerequisites, and explanations for analysis.

models
['claude-3-opus-20240229',
 'claude-3-5-sonnet-20241022',
 'claude-3-haiku-20240307',
 'claude-3-5-haiku-20241022']

Using haiku is not recommended as it is not reliable.

client = Client(models[1])

We will use Tool use from claudette. There are more information on claude doc and claudette doc. Basically, we create a tool for Claude to use and to return result in a formulated way.

For us, we want ratings and reasons from the analysis.


source

ArticleAnalysis

 ArticleAnalysis (interest_rating:int, interest_reason:str,
                  difficulty_rating:int, difficulty_reason:str,
                  prerequisites:list[str], prereq_reason:str)

Analysis of a Wikipedia article for a reader based on the background.

Type Details
interest_rating int Rating 1-10 of how interesting the article is for this reader based on the background
interest_reason str Markdown explanation for interest rating (max 50 words) for this reader based on the background
difficulty_rating int Rating 1-10 of how difficult the article is for this reader based on the background
difficulty_reason str Markdown explanation for difficulty rating (max 50 words) for this reader based on the background
prerequisites list List of topics reader should know before reading for this reader based on the background
prereq_reason str Markdown explanation for prerequisites (max 50 words) for this reader based on the background

source

analyze_article_for_reader

 analyze_article_for_reader (article_text:str, background:str)

Analyze a Wikipedia article for a specific reader background

backgrounds = {
    'high_school': """Background of the reader:
- High school graduate
- Interested in science but no formal training beyond high school
- Enjoys nature documentaries
- Has basic understanding of how evolution works from school and documentaries
""",
    'college_bio': """Background of the reader:
- A college student
- Familiar with biology, organic chemistry, statistics, immunology, genetics, molecular genetics, molecular biology, and linear algebra.
- Interested in science related to machine learning, statistics, immunology, organic chemistry, genetics, genomics, and bioinformatics.
""",
    'humanities': """Background of the reader:
- English Literature professor
- Interested in narrative and historical developments
- Reads Scientific American occasionally
- No formal science education beyond high school
- Hates science.
""",
    'tech_professional': """Background of the reader:
- Software engineer with computer science degree
- Familiar with complex systems and algorithms
- Reads tech blogs and popular science articles
- Basic understanding of scientific method
""",
    'medical_practitioner': """Background of the reader:
- Primary care physician
- Strong understanding of human anatomy and physiology
- Familiar with pharmacology and toxicology
- Loves national geographic
"""
}
for reader_type, background in backgrounds.items():
    analysis = analyze_article_for_reader(intro, background)
    print(f"\nAnalysis for {reader_type}:")
    print(analysis)

Analysis for high_school:
ArticleAnalysis(interest_rating=7, interest_reason="Connects well with reader's interest in nature documentaries and evolution. The concept of evolutionary arms race between snakes and prey is engaging for someone who enjoys natural science.", difficulty_rating=8, difficulty_reason='Contains complex terminology (Caenophidia, Toxicofera) and molecular concepts. Technical discussion of gene duplication and protein evolution may be challenging without college-level biology.', prerequisites=['Basic evolution concepts', 'Basic cell biology', 'Protein structure basics', 'Scientific method understanding', 'Basic genetics'], prereq_reason='Understanding proteins, genes, and evolutionary mechanisms is crucial to grasp how venom evolved. Basic cell biology helps comprehend how venom glands and proteins function.')

Analysis for college_bio:
ArticleAnalysis(interest_rating=8, interest_reason="Aligns with reader's interests in molecular biology, genetics, and evolution. The molecular aspects of venom evolution, gene duplication, and protein adaptation would appeal to someone interested in genomics and bioinformatics.", difficulty_rating=3, difficulty_reason="Content is accessible given reader's strong background in biology, genetics, and molecular biology. Terms like gene duplication, protein evolution, and molecular studies are familiar concepts.", prerequisites=['Basic evolution concepts', 'Protein structure and function', 'Gene expression', 'Molecular phylogenetics', 'Natural selection'], prereq_reason='Understanding protein evolution, gene duplication, and phylogenetic analysis requires these fundamentals. Reader already has most prerequisites through biology and genetics background.')

Analysis for humanities:
ArticleAnalysis(interest_rating=4, interest_reason='Despite the narrative of evolutionary arms race and historical debate, the heavy focus on molecular biology and technical details may put off a literature professor who dislikes science.', difficulty_rating=8, difficulty_reason='Article contains complex scientific terminology (Caenophidia, Toxicofera, homologs) and molecular concepts that would be challenging without formal science background.', prerequisites=['Basic evolutionary theory', 'Basic molecular biology concepts', 'Understanding of scientific terminology', 'Knowledge of reptile classification'], prereq_reason='These fundamentals are essential to grasp the core concepts of gene duplication, protein evolution, and taxonomic classifications discussed throughout the article.')

Analysis for tech_professional:
ArticleAnalysis(interest_rating=7, interest_reason='The evolutionary algorithm-like process and system complexity would appeal to a software engineer. The concept of gene duplication and adaptation parallels software development patterns.', difficulty_rating=6, difficulty_reason='While the reader can grasp system evolution concepts, the biological terminology and phylogenetic concepts may be challenging without prior knowledge of molecular biology and taxonomy.', prerequisites=['Basic molecular biology concepts', 'Understanding of evolutionary theory', 'Familiarity with phylogenetic classification', 'Knowledge of protein structure basics'], prereq_reason='These topics provide essential context for understanding molecular evolution, protein modification, and taxonomic relationships discussed in the article.')

Analysis for medical_practitioner:
ArticleAnalysis(interest_rating=8, interest_reason='As a physician with toxicology knowledge and National Geographic interest, the evolutionary aspects of venom development and its medical implications would be highly engaging.', difficulty_rating=4, difficulty_reason='Medical background provides strong foundation for understanding biological concepts. Some evolutionary biology terms may be unfamiliar but core concepts are accessible.', prerequisites=['Basic evolutionary biology concepts', 'Understanding of protein synthesis', 'Knowledge of phylogenetic classification', 'Familiarity with molecular biology terms'], prereq_reason='These topics help comprehend the evolutionary mechanisms of venom development, gene duplication, and species classification discussed in the article.')

Interesting to see that everyone likes to read evolution of snake venom.

Running in parallel

It is quite slow analyzing one by one. It is possible to analyze multiple articles in parallel, but this is prone to rate_limit_error.


source

is_interactive

 is_interactive ()

Check if we’re running in an interactive environment (IPython/Jupyter)

is_interactive()
True

We use ThreadPoolExecuter if we are in interactive mode, but we switch to ProcessPoolExecutor when we are running it in script.


source

analyze_multiple_articles

 analyze_multiple_articles (articles:List[str], backgrounds:Dict[str,str],
                            max_workers:int=None)

Analyze multiple articles for different reader backgrounds in parallel

boring_articles = {
    'bureaucracy': WikiArticle("https://en.wikipedia.org/wiki/ISO_216").introduction,  # Paper size standards
    'statistics': WikiArticle("https://en.wikipedia.org/wiki/Analysis_of_variance").introduction,  # Dense statistical methods
    'obscure': WikiArticle("https://en.wikipedia.org/wiki/List_of_writing_systems").introduction,  # Dry list of writing systems
    'methodology': WikiArticle("https://en.wikipedia.org/wiki/ISO_8601").introduction,  # Date/time formatting standards
}
boring_articles
{'bureaucracy': 'ISO 216 is an international standard for paper sizes, used around the world except in North America and parts of Latin America. The standard defines the "A", "B" and "C" series of paper sizes, which includes the A4, the most commonly available paper size worldwide. Two supplementary standards, ISO 217 and ISO 269, define related paper sizes; the ISO 269 "C" series is commonly listed alongside the A and B sizes.\n\nAll ISO 216, ISO 217 and ISO 269 paper sizes (except some envelopes) have the same aspect ratio, √2:1, within rounding to millimetres. This ratio has the unique property that when cut or folded in half widthways, the halves also have the same aspect ratio. Each ISO paper size is one half of the area of the next larger size in the same series.[1]',
 'statistics': 'Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences between groups. It uses F-test by comparing variance between groups and taking noise, or assumed normal distribution of group, into consideration by dividing by variance between elements in a group. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.',
 'obscure': 'Writing systems are used to record human language, and may be classified according to certain common features.\n\nThe usual name of the script is given first; the name of the languages in which the script is written follows (in brackets), particularly in the case where the language name differs from the script name. Other informative or qualifying annotations for the script may also be provided.',
 'methodology': '2025-01-14T03:36:20+00:00 UTC+00:00 [refresh]\n\nISO 8601 is an international standard covering the worldwide exchange and communication of date and time-related data. It is maintained by the International Organization for Standardization (ISO) and was first published in 1988, with updates in 1991, 2000, 2004, and 2019, and an amendment in 2022.[1] The standard provides a well-defined, unambiguous method of representing calendar dates and times in worldwide communications, especially to avoid misinterpreting numeric dates and times when such data is transferred between countries with different conventions for writing numeric dates and times.\n\nISO\xa08601 applies to these representations and formats: dates, in the Gregorian calendar (including the proleptic Gregorian calendar); times, based on the 24-hour timekeeping system, with optional UTC offset; time intervals; and combinations thereof.[2] The standard does not assign specific meaning to any element of the dates/times represented: the meaning of any element depends on the context of its use. Dates and times represented cannot use words that do not have a specified numerical meaning within the standard (thus excluding names of years in the Chinese calendar), or that do not use computer characters (excludes images or sounds).[2]\n\nIn representations that adhere to the ISO\xa08601 interchange standard, dates and times are arranged such that the greatest temporal term (typically a year) is placed at the left and each successively lesser term is placed to the right of the previous term. Representations must be written in a combination of Arabic numerals and the specific computer characters (such as "‐", ":", "T", "W", "Z") that are assigned specific meanings within the standard; that is, such commonplace descriptors of dates (or parts of dates) as "January", "Thursday", or "New Year\'s Day" are not allowed in interchange representations within the standard.'}

None means we got an error. Most likely from rate limit.

It’s good to see that people have different interest ratings and difficulty ratings based on their background.