# wiki article


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

Before reading an article, I want to know about a snapshot of the
article. In this snapshot, I would like to know how interesting it is or
how difficult it is for me to read. We can get a recommendation from AI.

## WikiArticle

Let’s grab a wikipedia article to test.

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L16"
target="_blank" style="float:right; font-size:smaller">source</a>

### WikiArticle

>  WikiArticle (url)

*Grab a wikipedia article to analyze.*

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L34"
target="_blank" style="float:right; font-size:smaller">source</a>

### WikiArticle.intro

>  WikiArticle.intro ()

*Select an introduction from the
[`WikiArticle`](https://galopyz.github.io/snackademics/wiki.html#wikiarticle).*

With an wikipedia url, we can grab the article. Let’s take a look at
Evolution of snake venom article.

``` python
article = WikiArticle("https://en.wikipedia.org/wiki/Evolution_of_snake_venom")
intro = article.intro
Markdown(intro)
```

# Evolution of snake venom

Venom in snakes and some lizards is a form of saliva that has been
modified into venom over its evolutionary history.\[1\] In snakes, venom
has evolved to kill or subdue prey, as well as to perform other
diet-related functions.\[2\] While snakes occasionally use their venom
in self defense, this is not believed to have had a strong effect on
venom evolution.\[3\] The evolution of venom is thought to be
responsible for the enormous expansion of snakes across the
globe.\[4\]\[5\]\[6\]

The evolutionary history of snake venom is a matter of debate.
Historically, snake venom was believed to have evolved once, at the base
of the Caenophidia, or derived snakes. Molecular studies published
beginning in 2006 suggested that venom originated just once among a
putative clade of reptiles, called Toxicofera, approximately 170 million
years ago.\[7\] Under this hypothesis, the original toxicoferan venom
was a very simple set of proteins that were assembled in a pair of
glands. Subsequently, this set of proteins diversified in the various
lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania:
several snake lineages also lost the ability to produce venom.\[8\]\[9\]
The Toxicoferan hypothesis was challenged by studies in the mid-2010s,
including a 2015 study which found that venom proteins had homologs in
many other tissues in the Burmese python.\[10\]\[11\] The study
therefore suggested that venom had evolved independently in different
reptile lineages, including once in the Caenophid snakes.\[10\] Venom
containing most extant toxin families is believed to have been present
in the last common ancestor of the Caenophidia: these toxins
subsequently underwent tremendous diversification, accompanied by
changes in the morphology of venom glands and delivery systems.\[12\]

Snake venom evolution is thought to be driven by an evolutionary arms
race between venom proteins and prey physiology.\[13\] The common
mechanism of evolution is thought to be gene duplication followed by
natural selection for adaptive traits.\[14\] The adaptations produced by
this process include venom more toxic to specific prey in several
lineages,\[15\]\[16\]\[17\] proteins that pre-digest prey,\[18\] and a
method to track down prey after a bite.\[19\] These various adaptations
of venom have also led to considerable debate about the definition of
venom and venomous snakes.\[20\] Changes in the diet of a lineage have
been linked to atrophication of the venom.\[8\]\[9\]

`article.intro` is a title and some paragraphs of content before table
of contents starts. Since this is a markdown, we can use
`_repr_markdown_` to display article nicely.

``` python
@patch
def _repr_markdown_(self:(WikiArticle)):
    return self.intro
```

``` python
article
```

# Evolution of snake venom

Venom in snakes and some lizards is a form of saliva that has been
modified into venom over its evolutionary history.\[1\] In snakes, venom
has evolved to kill or subdue prey, as well as to perform other
diet-related functions.\[2\] While snakes occasionally use their venom
in self defense, this is not believed to have had a strong effect on
venom evolution.\[3\] The evolution of venom is thought to be
responsible for the enormous expansion of snakes across the
globe.\[4\]\[5\]\[6\]

The evolutionary history of snake venom is a matter of debate.
Historically, snake venom was believed to have evolved once, at the base
of the Caenophidia, or derived snakes. Molecular studies published
beginning in 2006 suggested that venom originated just once among a
putative clade of reptiles, called Toxicofera, approximately 170 million
years ago.\[7\] Under this hypothesis, the original toxicoferan venom
was a very simple set of proteins that were assembled in a pair of
glands. Subsequently, this set of proteins diversified in the various
lineages of toxicoferans, including Serpentes, Anguimorpha, and Iguania:
several snake lineages also lost the ability to produce venom.\[8\]\[9\]
The Toxicoferan hypothesis was challenged by studies in the mid-2010s,
including a 2015 study which found that venom proteins had homologs in
many other tissues in the Burmese python.\[10\]\[11\] The study
therefore suggested that venom had evolved independently in different
reptile lineages, including once in the Caenophid snakes.\[10\] Venom
containing most extant toxin families is believed to have been present
in the last common ancestor of the Caenophidia: these toxins
subsequently underwent tremendous diversification, accompanied by
changes in the morphology of venom glands and delivery systems.\[12\]

Snake venom evolution is thought to be driven by an evolutionary arms
race between venom proteins and prey physiology.\[13\] The common
mechanism of evolution is thought to be gene duplication followed by
natural selection for adaptive traits.\[14\] The adaptations produced by
this process include venom more toxic to specific prey in several
lineages,\[15\]\[16\]\[17\] proteins that pre-digest prey,\[18\] and a
method to track down prey after a bite.\[19\] These various adaptations
of venom have also led to considerable debate about the definition of
venom and venomous snakes.\[20\] Changes in the diet of a lineage have
been linked to atrophication of the venom.\[8\]\[9\]

## Using Claudette

Using `claudette`, we can analyze the article. When analyzing, it
decides on interest_rating, difficulty_rating, prerequisites, and
explanations for analysis.

``` python
models
```

    ['claude-3-opus-20240229',
     'claude-3-5-sonnet-20241022',
     'claude-3-haiku-20240307',
     'claude-3-5-haiku-20241022']

Using haiku is not recommended as it is not reliable.

``` python
client = Client(models[1])
```

We will use `Tool use` from `claudette`. There are more information on
[claude
doc](https://docs.anthropic.com/en/docs/build-with-claude/tool-use) and
[claudette doc](https://claudette.answer.ai/core.html#tool-use).
Basically, we create a tool for Claude to use and to return result in a
formulated way.

For us, we want ratings and reasons from the analysis.

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L48"
target="_blank" style="float:right; font-size:smaller">source</a>

### ArticleAnalysis

>  ArticleAnalysis (interest_rating:int, interest_reason:str,
>                       difficulty_rating:int, difficulty_reason:str,
>                       prerequisites:list[str], prereq_reason:str)

*Analysis of a Wikipedia article for a reader based on the background.*

<table>
<colgroup>
<col style="width: 9%" />
<col style="width: 38%" />
<col style="width: 52%" />
</colgroup>
<thead>
<tr>
<th></th>
<th><strong>Type</strong></th>
<th><strong>Details</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>interest_rating</td>
<td>int</td>
<td>Rating 1-10 of how interesting the article is for this reader based
on the background</td>
</tr>
<tr>
<td>interest_reason</td>
<td>str</td>
<td>Markdown explanation for interest rating (max 50 words) for this
reader based on the background</td>
</tr>
<tr>
<td>difficulty_rating</td>
<td>int</td>
<td>Rating 1-10 of how difficult the article is for this reader based on
the background</td>
</tr>
<tr>
<td>difficulty_reason</td>
<td>str</td>
<td>Markdown explanation for difficulty rating (max 50 words) for this
reader based on the background</td>
</tr>
<tr>
<td>prerequisites</td>
<td>list</td>
<td>List of topics reader should know before reading for this reader
based on the background</td>
</tr>
<tr>
<td>prereq_reason</td>
<td>str</td>
<td>Markdown explanation for prerequisites (max 50 words) for this
reader based on the background</td>
</tr>
</tbody>
</table>

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L65"
target="_blank" style="float:right; font-size:smaller">source</a>

### analyze_article_for_reader

>  analyze_article_for_reader (article_text:str, background:str)

*Analyze a Wikipedia article for a specific reader background*

``` python
backgrounds = {
    'high_school': """Background of the reader:
- High school graduate
- Interested in science but no formal training beyond high school
- Enjoys nature documentaries
- Has basic understanding of how evolution works from school and documentaries
""",
    'college_bio': """Background of the reader:
- A college student
- Familiar with biology, organic chemistry, statistics, immunology, genetics, molecular genetics, molecular biology, and linear algebra.
- Interested in science related to machine learning, statistics, immunology, organic chemistry, genetics, genomics, and bioinformatics.
""",
    'humanities': """Background of the reader:
- English Literature professor
- Interested in narrative and historical developments
- Reads Scientific American occasionally
- No formal science education beyond high school
- Hates science.
""",
    'tech_professional': """Background of the reader:
- Software engineer with computer science degree
- Familiar with complex systems and algorithms
- Reads tech blogs and popular science articles
- Basic understanding of scientific method
""",
    'medical_practitioner': """Background of the reader:
- Primary care physician
- Strong understanding of human anatomy and physiology
- Familiar with pharmacology and toxicology
- Loves national geographic
"""
}
```

``` python
for reader_type, background in backgrounds.items():
    analysis = analyze_article_for_reader(intro, background)
    print(f"\nAnalysis for {reader_type}:")
    print(analysis)
```


    Analysis for high_school:
    ArticleAnalysis(interest_rating=7, interest_reason="Connects well with reader's interest in nature documentaries and evolution. The concept of evolutionary arms race between snakes and prey is engaging for someone who enjoys natural science.", difficulty_rating=8, difficulty_reason='Contains complex terminology (Caenophidia, Toxicofera) and molecular concepts. Technical discussion of gene duplication and protein evolution may be challenging without college-level biology.', prerequisites=['Basic evolution concepts', 'Basic cell biology', 'Protein structure basics', 'Scientific method understanding', 'Basic genetics'], prereq_reason='Understanding proteins, genes, and evolutionary mechanisms is crucial to grasp how venom evolved. Basic cell biology helps comprehend how venom glands and proteins function.')

    Analysis for college_bio:
    ArticleAnalysis(interest_rating=8, interest_reason="Aligns with reader's interests in molecular biology, genetics, and evolution. The molecular aspects of venom evolution, gene duplication, and protein adaptation would appeal to someone interested in genomics and bioinformatics.", difficulty_rating=3, difficulty_reason="Content is accessible given reader's strong background in biology, genetics, and molecular biology. Terms like gene duplication, protein evolution, and molecular studies are familiar concepts.", prerequisites=['Basic evolution concepts', 'Protein structure and function', 'Gene expression', 'Molecular phylogenetics', 'Natural selection'], prereq_reason='Understanding protein evolution, gene duplication, and phylogenetic analysis requires these fundamentals. Reader already has most prerequisites through biology and genetics background.')

    Analysis for humanities:
    ArticleAnalysis(interest_rating=4, interest_reason='Despite the narrative of evolutionary arms race and historical debate, the heavy focus on molecular biology and technical details may put off a literature professor who dislikes science.', difficulty_rating=8, difficulty_reason='Article contains complex scientific terminology (Caenophidia, Toxicofera, homologs) and molecular concepts that would be challenging without formal science background.', prerequisites=['Basic evolutionary theory', 'Basic molecular biology concepts', 'Understanding of scientific terminology', 'Knowledge of reptile classification'], prereq_reason='These fundamentals are essential to grasp the core concepts of gene duplication, protein evolution, and taxonomic classifications discussed throughout the article.')

    Analysis for tech_professional:
    ArticleAnalysis(interest_rating=7, interest_reason='The evolutionary algorithm-like process and system complexity would appeal to a software engineer. The concept of gene duplication and adaptation parallels software development patterns.', difficulty_rating=6, difficulty_reason='While the reader can grasp system evolution concepts, the biological terminology and phylogenetic concepts may be challenging without prior knowledge of molecular biology and taxonomy.', prerequisites=['Basic molecular biology concepts', 'Understanding of evolutionary theory', 'Familiarity with phylogenetic classification', 'Knowledge of protein structure basics'], prereq_reason='These topics provide essential context for understanding molecular evolution, protein modification, and taxonomic relationships discussed in the article.')

    Analysis for medical_practitioner:
    ArticleAnalysis(interest_rating=8, interest_reason='As a physician with toxicology knowledge and National Geographic interest, the evolutionary aspects of venom development and its medical implications would be highly engaging.', difficulty_rating=4, difficulty_reason='Medical background provides strong foundation for understanding biological concepts. Some evolutionary biology terms may be unfamiliar but core concepts are accessible.', prerequisites=['Basic evolutionary biology concepts', 'Understanding of protein synthesis', 'Knowledge of phylogenetic classification', 'Familiarity with molecular biology terms'], prereq_reason='These topics help comprehend the evolutionary mechanisms of venom development, gene duplication, and species classification discussed in the article.')

Interesting to see that everyone likes to read evolution of snake venom.

## Running in parallel

It is quite slow analyzing one by one. It is possible to analyze
multiple articles in parallel, but this is prone to `rate_limit_error`.

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L100"
target="_blank" style="float:right; font-size:smaller">source</a>

### is_interactive

>  is_interactive ()

*Check if we’re running in an interactive environment (IPython/Jupyter)*

``` python
is_interactive()
```

    True

We use `ThreadPoolExecuter` if we are in interactive mode, but we switch
to `ProcessPoolExecutor` when we are running it in script.

------------------------------------------------------------------------

<a
href="https://github.com/galopyz/snackademics/blob/main/snackademics/wiki.py#L105"
target="_blank" style="float:right; font-size:smaller">source</a>

### analyze_multiple_articles

>  analyze_multiple_articles (articles:List[str], backgrounds:Dict[str,str],
>                                 max_workers:int=None)

*Analyze multiple articles for different reader backgrounds in parallel*

``` python
boring_articles = {
    'bureaucracy': WikiArticle("https://en.wikipedia.org/wiki/ISO_216").introduction,  # Paper size standards
    'statistics': WikiArticle("https://en.wikipedia.org/wiki/Analysis_of_variance").introduction,  # Dense statistical methods
    'obscure': WikiArticle("https://en.wikipedia.org/wiki/List_of_writing_systems").introduction,  # Dry list of writing systems
    'methodology': WikiArticle("https://en.wikipedia.org/wiki/ISO_8601").introduction,  # Date/time formatting standards
}
```

``` python
boring_articles
```

    {'bureaucracy': 'ISO 216 is an international standard for paper sizes, used around the world except in North America and parts of Latin America. The standard defines the "A", "B" and "C" series of paper sizes, which includes the A4, the most commonly available paper size worldwide. Two supplementary standards, ISO 217 and ISO 269, define related paper sizes; the ISO 269 "C" series is commonly listed alongside the A and B sizes.\n\nAll ISO 216, ISO 217 and ISO 269 paper sizes (except some envelopes) have the same aspect ratio, √2:1, within rounding to millimetres. This ratio has the unique property that when cut or folded in half widthways, the halves also have the same aspect ratio. Each ISO paper size is one half of the area of the next larger size in the same series.[1]',
     'statistics': 'Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences between groups. It uses F-test by comparing variance between groups and taking noise, or assumed normal distribution of group, into consideration by dividing by variance between elements in a group. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.',
     'obscure': 'Writing systems are used to record human language, and may be classified according to certain common features.\n\nThe usual name of the script is given first; the name of the languages in which the script is written follows (in brackets), particularly in the case where the language name differs from the script name. Other informative or qualifying annotations for the script may also be provided.',
     'methodology': '2025-01-14T03:36:20+00:00 UTC+00:00 [refresh]\n\nISO 8601 is an international standard covering the worldwide exchange and communication of date and time-related data. It is maintained by the International Organization for Standardization (ISO) and was first published in 1988, with updates in 1991, 2000, 2004, and 2019, and an amendment in 2022.[1] The standard provides a well-defined, unambiguous method of representing calendar dates and times in worldwide communications, especially to avoid misinterpreting numeric dates and times when such data is transferred between countries with different conventions for writing numeric dates and times.\n\nISO\xa08601 applies to these representations and formats: dates, in the Gregorian calendar (including the proleptic Gregorian calendar); times, based on the 24-hour timekeeping system, with optional UTC offset; time intervals; and combinations thereof.[2] The standard does not assign specific meaning to any element of the dates/times represented: the meaning of any element depends on the context of its use. Dates and times represented cannot use words that do not have a specified numerical meaning within the standard (thus excluding names of years in the Chinese calendar), or that do not use computer characters (excludes images or sounds).[2]\n\nIn representations that adhere to the ISO\xa08601 interchange standard, dates and times are arranged such that the greatest temporal term (typically a year) is placed at the left and each successively lesser term is placed to the right of the previous term. Representations must be written in a combination of Arabic numerals and the specific computer characters (such as "‐", ":", "T", "W", "Z") that are assigned specific meanings within the standard; that is, such commonplace descriptors of dates (or parts of dates) as "January", "Thursday", or "New Year\'s Day" are not allowed in interchange representations within the standard.'}

`None` means we got an error. Most likely from rate limit.

It’s good to see that people have different interest ratings and
difficulty ratings based on their background.
