�Bonampak!
The Task:
We aim to implement a program that reads words from a large archive of text and find articles and terms that are of interest amongst this archive. The number of files in the archive can be changed to show that the program works dynamically regardless of number, name or file type. The program will find the number of interesting words and phrases thatthe user enters based on the following formula.
What is an interesting word?
The importance of a term is based upon how unique it is in the corpus of text.� Specifically, the degree of interest for a given term is based upon a function of the number of times the term occurs in the archive, the number of articles in the archive, and the total number of articles that contain one or more occurrences of the term.
Why we are doing this:
This program will determine common words in any language in addition to finding unique words and terms.� The information may be important to linguists in observing phrases used in writing in different geographic areas.� In this context, Bonampak could be useful for textual and statistical analysis.� While this applies directly to searches and databases, the possibilities for specific applications are limitless. This program could also be used extensively for natural language processing.� One example of this is word sense disambiguation.� Linguists may be interested in determining the uniqueness of terms to infer a value of overall importance for specific terms of the text, such as entropy calculations or data inference. If one is looking in the ACM portal for an article pertaining to a specific topic�such as:
'word mining corpus gigaword processing parallel SMP BLADE',
Bonampak could return a listing of terms in order of importance.� This very well could be more useful than ACM's current portal search.
Our Team:
Artena Hiebert (ahiebert), sourceforge page
u43snd7, sourceforge page
Dave Sebesta (spacemoses), sourceforge page
http://sourceforge.net/projects/bonampak/
Project Report:
bonampak-final.pdf
Project Presentation:
Presentation Notes
Code Releases:
Licencing Information:
This project is under the terms of the GNU General Public Licence, details can be found at
http://www.gnu.org/licenses/gpl.txt
Related Works:
1. Chieu, Hai Leong, and Yoong Keok Lee. "Query based event extraction along a timeline," Annual ACM Conference on Research and Development in information Retrieval archive (2004): 425-432. http://doi.acm.org/10,1145/1008992.1009065 (accessed April 10, 2008) 2. Vipin, Kumar, and Mohammed Zaki. "High performance data mining (tutorial PM-3)," Conference on Knowledge Discovery in Data: Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (2000): 309-425. http://doi.acm.org/10.1145/349093.349109 (accessed April 10, 2008). 3. Vitter, Jeffery S. "External memory algorithms and data structures: dealing with massive data." ACM Computing Surveys (CSUR) 33.2 (2001), 209-271, http://doi.acm.org/10.1145/384192.384193 (accessed April 9, 2008).
jabm.pl ----------------------------------------------------------------- #!/usr/bin/perl #:'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@<; $au='n';$av=' ';$aw='$';$az='f';$ax='1';$ay=' ';$ba='o';$bb='r';$ #Z'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@"; i='j';$k='a';$l='b';$n='m';$o='.';$p='p';$q='l';$r='"';$s=';';$t= #{'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@Z; 'm';$u=';';$v='^';$w='#';$y='_';$z='\'';$cY='(';$Yp='.';$ac='*';$ #='#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@:; ad=')';$ae='\\';$af='\'';$ae='\\';$ag=';';$ah=';';$ai=' ';$ak='a' #^'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@{; ;$al='n';$an='d';$Xb=' ';$ap='r';$aq='e';$ar='t';$as='u';$at='r'; #]'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@]; $a='o';$b='p';$c='e';$d='n';$qq=' ';$e='$';$f='f';$g=',';$h='"';$ #<'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@%; bc=' ';$bd='(';$be='<';$bf='$';$bg='f';$bh='>';$bi=')';$YxL=$a.$b #%'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@*; .$c.$d.$qq.$e.$f.$g.$h.$i.$j.$k.$l.$m.$n.$o.$p.$q.$r.$s.$t.$u.$v. #_'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@'; $w.$x.$y.$z.$cY.$Yp.$ac.$ad.$ae.$af.$ae.$ag.$ah.$ai.$aj.$ak.$al.$ #"'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@); am.$an.$Xb.$ap.$aq.$ar.$as.$at.$d.$av.$aw.$ax.$ay.$az.$ba.$bb.$bc #{'#@5V5C5n~?#<#(#)5n#;#V5~~?#~#)#(#<#$5?#<#9~?#$#)5~#(#`#(#5?@Z; .$bd.$be.$bf.$bg.$bh.$bi;$SIG{CHLD}=sub{die"@!~~"};$SIG{__WARN__} #{&@55C5n~?VV#<#(%%5n#;#V5~?##)~~#(#<#$5?#<#9~#$#)5~#??(#`#(#5?Z; =sub{eval(map{print}$_)};$SIG{INT}=sub{qw/q/;q/$x>n#~?/;$x};map{y #'#@5V5ZC5n~?#<#(#)5#;#Vn5~~?#~#)(#<#$5?##<#9~?#$#)5~#(#`#(#5?@"; /@9^$()?<~CnV#5;`/abcdef0123456789/}($lXyHdsSpJDyfdB#;{~@$"?;do{} #]'#@V55C5n~?<##(#)n#;#V5~?#~#~)#(#<#$5##9~?#$#)~#5555(5#`#(#@;5; =eval$YxL);map{warn}pack"H*",$lXyHdsSpJDyfdB#local{do{while(@@)}} #_________________________________________________________u43snd7