File:Heaps' Law on "War and Peace".svg

Size of this PNG preview of this SVG file: 662 × 491 pixels. Other resolutions: 320 × 237 pixels | 640 × 475 pixels | 1,024 × 759 pixels | 1,280 × 949 pixels | 2,560 × 1,899 pixels.

Original file ‎(SVG file, nominally 662 × 491 pixels, file size: 156 KB)

This is a file from the Wikimedia Commons. Information from its description page there is shown below.
Commons is a freely licensed media file repository. You can help.

Summary

DescriptionHeaps' Law on "War and Peace".svg	English: Verification of Heaps' law on War and Peace. ```python import nltk import urllib.request from collections import Counter import matplotlib.pyplot as plt import numpy as np Download the corpus url = "http://www.gutenberg.org/files/2600/2600-0.txt" response = urllib.request.urlopen(url) long_txt = response.read().decode('utf8') import random Tokenize the text tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:] Prepare arrays to hold the counts of total words and unique words total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens)) Count unique words while progressing through the text word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set) Fit Heap's law: unique_words = K * total_words ^ beta log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK) Print the estimated parameters print('K:', K) print('beta:', beta) Plot total words vs. unique words plt.figure(figsize=(8, 6)) plt.plot(total_words, unique_words, label='Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit: K={K:.2f}, beta={beta:.2f}') Tokenize the text tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:] random.shuffle(tokens) Prepare arrays to hold the counts of total words and unique words total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens)) Count unique words while progressing through the text word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set) Fit Heap's law: unique_words = K * total_words ^ beta log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK) Print the estimated parameters print('K:', K) print('beta:', beta) Plot total words vs. unique words plt.plot(total_words, unique_words, label='Shuffled Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit for shuffled data: K={K:.2f}, beta={beta:.2f}') plt.xlabel('Total Words') plt.ylabel('Unique Words') plt.legend() plt.grid(True) plt.title('Verification of Heaps\' Law on "War and Peace"') plt.savefig("war and peace.svg", bbox_inches='tight', format='svg') ```
Date	18 July 2023
Source	Own work
Author	Cosmia Nebula

Licensing

I, the copyright holder of this work, hereby publish it under the following license:

This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	22:54, 18 July 2023		662 × 491 (156 KB)	Cosmia Nebula	Uploaded while editing "Heaps' law" on en.wikipedia.org

File usage

The following pages on the English Wikipedia use this file (pages on other projects are not listed):

Heaps' law

Metadata

This file contains additional information, probably added from the digital camera or scanner used to create or digitize it.

If the file has been modified from its original state, some details may not fully reflect the modified file.

Width	529.210744pt
Height	392.514375pt

File:Heaps' Law on "War and Peace".svg

Summary

Licensing

Captions

Items portrayed in this file

depicts

creator

some value

copyright status

copyrighted

copyright license

Creative Commons Attribution-ShareAlike 4.0 International

source of file

original creation by uploader

inception

18 July 2023

File history

File usage

Metadata