TCBJ Data Mining Tools

 



INTERNET DATA MINING TOOLS
Don't forget to bookmark us! (CTRL-D)









Open Directory
Pegasus
Alta-Vista
Scrub the Web
excite
Infoprobe
Search King
Splat
Lycos
Northern Light
Questfinder
Yahoo
Phatoz
Hotrate
Omniseek.com
Homepageseek
Sunbrain
1st Spot
Matilda
Searchgate
All the Web (Fast)
Google
Spitfire!
Fathead
Ah-ha.com
MSN
Looksmart--Pay Site to Submit
Seekon.com
Highforce
Searchport International Directory
Hotbot
Hit-Net
National Directory
NBCI (formerly Snap)
goeureka
Galaxy
Alta-Vista
Dogpile.com

















NO (in million)
PURPOSE
21
Get additional careeer training
17
Dealing with major illness
17
Selecting school for siblings
16
Auto purchase
16
Financial decision
10
Relocation
8
Changing jobs
7
Cope with illness
How US residents use the Internet.  Source:  Pew internet and the American Life Project - 2006









Ask a question on any topic and get answers from real people.
Lets users bypass multi-layered menus. Applications built with Answers Anywhere can even anticipate needs
third generation Internet search engine with advanced Natural Language Processing technologies
START, the world's first Web-based question answering system


Semantic technology for custom built search engines based on natural language processing.
The key to InQuira's performance is its ability to understand natural language search intents
kozoru Q&A search system that will give users the ability to find specific answers to their questions
START, the world's first Web-based question answering system










Copyright Violation Search
The Internet Dictionary
Now part of Ask.com
Website YellowBook
Patent Search
Book finding using ISBN numbers
US Post Office Zip Codes
Find government reps
Find gov reps and legislation
FCC ID numbers
Find Device Drivers
Bable Fish translator
FCC product no.
http://www.atwebo.com/searchBaidu.jpg
http://www.atwebo.com/searchFree.jpg
http://www.atwebo.com/searchZoom.gif
SEARCH USING ARTIFICIAL INTELLIGENCE (for advanced translators)
Exchange Rate
Leading Chinese search engine
People/email search
Business People search

Case search
Fictitious Business Name
State Bar of California
DNS Check
OC CA. Courts
California Courts
Case summary

California Business Portal







US Post office - find zip codes
Find Government Re





You can use the intelligent agents below as assistants, to search, track or  knowledge management:
 Alexa:  "Learns" and suggests sites; provides statistics about sites (owned by Amazon.com)  
BonziBUDDY:  He talks to you, browses the web and searches the Internet as your sidekick. With his built-in artificial intelligence, he learns from you (your likes and interests.)  
Copernic  - Copernic Agent is a Meta search engine, invisible web explorer, online research assistant and extensive toolbox, all combined into an elegant, easy to use program.
edokey2000 - Its is a peer-to-peer software that allows you to search documents on line.
Karnak - Karnak is the virtual library of infinite knowledge through the web where you can enter to receive a constant flow of relevant information. Inside, you'll be guided through the query process.
LexiBot  - will search nearly 600 sources at once. Then it will filter results by date, country, URL or size.  
MyIvan -  Imagine being able to find anything you want on the Internet simply by talking to your computer. A new product, called myIVAN™, allows you to do just that -- search the web by asking IVAN, the Intelligent Voice Animated Navigator™, questions and tell him where you want to go and what you want to find
TrademarkTracker.com - searches actual Internet content for mentions of possible brand abuses and violations. If there’s a website abusing your corporate brand, TrademarkTracker.com will find it.  
 Text Analyst - examines a text file and creates a semantic network of importance; produces abstract automatically  
The Easy Bee - The Easy Bee is a software product for Windows that allows everyone to easily automate tedious Web navigation tasks and build aggregated pages with always up-to-date Web extracts  
TurboStart - Turbo Start is an Internet search utility that gives you access to 270 of the web's most popular search engines from your browser. Unlike most other search utilities, Turbo Start is fast and it puts you in control
TVEyes - Watches TV and tracks keywords for you.
WebSeeker - WebSeeker leverages the power of many search engines (8 commercial search engines to be exact), and uses your computer to refine the results.
suggest a agent:  URL@atwebo.com 


http://www.atwebo.com/search/aesir.JPGhttp://www.atwebo.com/search/rollyo.JPG
http://www.atwebo.com/search/pubsub.JPG
ROLLYO
Rollyo stands for "Roll Your Own Search Engine."
Using Rollyo, you create a searchroll.  A searchroll is a collection of the sites you trust and find useful. It's a personal search engine you create to provide relevant results from a hand selected list of reliable sites. 
Each searchroll gets its own Web address, so you don't have to wade through the whole Rollyo site to get to it, and you can email this address to others. You can even add your searchroll to the drop-down list of search engines in the toolbar of the Firefox Web browser, so you can search it without first navigating to the Rollyo site.

PubSub
PubSub is an automated system that constantly matches your search terms against millions of blogs, online discussions, news releases and SEC filings, and notifies you when there is a match
Aesir
Tool helps customize your search from search engines you trust.









Purpose
Source
Engine
Get Free Code
Planet Source Code
http://www.Planet-Source-Code.com/vb/images/psc_small.gifSearch Thousands of lines of free code atwww.Planet-Source-Code.com

Vb World   Java World C++ World ASP World
 
Advanced Search    Browse
Get Free Code
Planet Source Code
Get CRM Info
SearchCRM.com
 Windows NT/2000-Specific 
SearchWin2000.com
SearchWin2000.com
Technology Research
Penn/NET
IT Encyclopedia
TECHTARGET











BLOG BUZZ  - A listing of most influential Blogs in different industries
Accounting AccountingObserver - Rants about corporate troubles 
Advertising Adrants - keep up with what is going on with Madison Avenue
Digital Content PaidContent - Tracks the latest developments from a range of businesses interested in the development of digital content. 
Currencies - RGMonitor - Tracks monetary issues through a macroeconomic lens/ 
Economics BigPicture - Market commentary and musings of inner workings of glamorous industries
Health Care:  PharmaMarketing - Best practices for drug companies to deliver accurate and reliable information to doctors and consumers |  HealthCareBlog - Everything you wanted to know about the health care industry, but were afraid to ask.  
Hollywood Defamer - Entertainment news and opinion
Insurance InsuranceScrawl - Legal issues facing property-casualty insurers 
Music Lefsetz Letter - Anything and everything to do with music.
Popular Opinion JeffMathews - Popular opinion and news analysis among traders and institutional investors.
Publishing PublisherMarketPlace - Paid site with selection of print and web-based book-publishing stories |   BookSlut - Reviews, news and commentary 
Real Estate:  Curbed - attempts to deflate real estate hype |  Slatin Report - commentary on commercial real estate
Research on legal issues re:  M&A:  Dealalwyers - Dissects M&A flow based on obscure and widely known legal issues.
Tech Blogs Engadget - Round up of gadgets | SlashDot - Technical, social and political issues - DanGilmolmor - tech and political issues | PhoneScoop - Everything you wanted to know about phones |  DigitalCameras - All about digital cameras |  Ipods - All about iPods |  CrazyAppleRumors 
Television MediaBistro - TV news and major network decisions 
Taxes - TaxAnalyst - Handy way to catch up on breaking news |  TaxProf - Tax news, academic papers and other links 
Theater BroadwayStars - List of daily theater news 
Wall Street Footnoted - The Neiman Marcus Watchdog Project of the financial world.



Social bookmarking is a user-defined taxonomy system for bookmarks. Such a taxonomy is sometimes called a folksonomy and the bookmarks are referred to as tags. Unlike storing bookmarks in a folder on your computer, tagged pages are stored on the Web and can be accessed from any computer. Technorati, a blogging site, describes the system as "The real-time Web, organized by you." Web sites dedicated to social bookmarking, such as Flickr and del.icio.us, provide users with a place to store, categorize, annotate and share favorite Web pages and files.
According to the January 24, 2006 issue of the Wall Street Journal, “Yahoo and Others Embrace “Tagging” as a Better Way to Find and Store Information”.
This article explains “Americans conduct nearly 200 million internet searches every day.  Now several companies want to make this process by transforming the way people look for and store information.
The new method, dubbed “tagging” addresses a common complaint of many Internet users – that searching is often clumsy and inefficient.  Web surfers often must sift through multiple pages of search results to find what they are looking for.  And retrieving the best sites a second time often means redoing the search or trolling trough an unorganized list of sites that you have haphazardly saved in a “favorites” folder.
Although tech geeks have been using this new method for the last couple of years, and social bookmarking research has been going on for a while, it is only recently that “..tagging is moving into the mainstream. …Last month, Yahoo Inc., bought the popular tagging site Del.icio.us.  Now the Sunnyvale, CA company says it plans to allow Del-iciou.us users to access their tagged links through MyWeb 2.0, Yahoos’ own tagging site.
Backflip |Blinklist | BlinkPro |Bookmark Buddy | Bookmark Commando | Bookmark Magic | Bookmark Tracker | BookmarkSync | Bookmarx | Bookmax.net | CiteULike | a free service to help academics to share, store, and organize  academic papers | ClickMarksConnectedy.com | Connotea - free online reference management service for scientists to store or share articles and links | de.lirio.us - Open source clone of del.icio.us with private bookmarking, tagging, blogging, and notes| Dude, Check This Out! |   Frassle | Freelink.org | Furl | GlobusPort|  GUIcookies | Hotlist Anywhere |HydraLinks | Hyperlinkomatic | IC Soft, Inc. | iKeepBookmarks.com |  itList | Jots|  Link2Mark |  Linkroll | Links2Go |  LiveFavoritesMURL | My Bookmark Manager | MyBookmarks | myHq | Netvouz |  openBM |   PeerMark |  Pluck Web Edition (PWE) | Powermarks|Save Your Links | SearchFox |Shadows|Simpy |  SiteJot |  Spurl.netSV Bookmark | Sync2ItURLBlaze | Web Feeds | WhatLink.comWhitelinks | Wists | Womcat Bookmarks | World Wide Wisdom | wURLdBook | Yahoo! Bookmarks | Zoogim.com Online Bookmarks |





Geotagging /Mashes and Maps - geotagging allows users to geographic information, such as an address, or latitude and longitude, to any digital content - everything from photographs and videos to news articles and blog posts.  Then the content can be easily displayed on an online map or cross-referenced with other information about the location.  Geotagging is related to another online practice called "mashups", where users place information, such as real estate listings onto an online map.
So far, the most popular application for geotagging has been online photos.

Geotagger Sites




New Social Contract
Paradox of the Information Age:


"Despite the existence of more and better information than ever before, time pressure prevents decision makers from gathering all that they need and from sharing it,"
-- Peter Tobia, author, "Decision Making in the Digital Age: Challenges and Responses,"

There are two ways to navigate through life easily:  First is to question everything, the second is to question nothing.  In either case, thinking is not required.   This may just support the assertion that second hand information is like second-hand smoking, and just as deadly, particularly in the exponentially growing digital universe.

Six Spokes of Trust- - Adapted From CCI Leadership (2006), Six Spokes of Trust 
All we need to evaluate sources of information found in the digital universe we learnt in kindergarten:  stranger danger!  That is, if we do not know the source of the information, refer to the warning about second hand smoking.  From there, we know that trust is not like instant coffee:  It takes time to get to know all the forces of influence acting on the sources.  Take for instance the findings  published in the Wall Street Journal May 11, 2005 describing what some authors in the  Journal of the American Medical Association do:  
  • Describe original main goal as secondary – 34%
  • Fail to disclose original goal – 26 %
  • Turn original secondary goal into main goal – 19%
  • Create new main goal

Evaluating online sources of information is not much different than critical reading – below is an outline of a suggested process, including attributes of information, attributes of poor problem solvers and getting to knowledge
Context and Timeliness Analysis
A. Author
What are the author's credentials--institutional affiliation (where he or she works), educational background, past writings, or experience?
B. Date of Publication
When was the source published?
Is the source current or out-of-date for your topic?
C. Edition or Revision
Is this a first edition of this publication or not? Further editions indicate a source has been revised and updated to reflect changes in knowledge, include omissions, and harmonize with its intended reader's needs.
D. Publisher
If the source is published by a university press, it is likely to be scholarly. Although the fact that the publisher is reputable does not necessarily guarantee quality, it does show that the publisher may have high regard for the source being published.
E. Title of Journal
Is this a scholarly or a popular journal? This distinction is important because it indicates different levels of complexity in conveying ideas.


Content Analysis
A. Intended Audience
What type of audience is the author addressing? Is the publication aimed at a specialized or a general audience? Is this source too elementary, too technical, too advanced, or just right for your needs?
B. Objective Reasoning- Is this intended to persuade or manipulate?
Is the information covered fact, opinion, or propaganda? I
Does the information appear to be valid and well-researched, or is it questionable and unsupported by evidence? Assumptions should be reasonable.
Are the ideas and arguments advanced more or less in line with other works you have read on the same topic? The more radically an author departs from the views of others in the same field, the more carefully and critically you should scrutinize his or her ideas.
Is the author's point of view objective and impartial? Is the language free of emotion-arousing words and bias?
C. Coverage
Does the work update other sources, substantiate other materials you have read, or add new information? Does it extensively or marginally cover your topic?
Is the material primary or secondary in nature? Primary sources are the raw material of the research process. Secondary sources are based on primary sources.
D. Writing Style
Is the publication organized logically? Are the main points clearly presented? Do you find the text easy to read, or is it stilted or choppy? Is the author's argument repetitive?


  • •Cannot settle on a way to begin..
  • •Convince themselves they lack sufficient knowledge (even when that is not the case).
  • •Plunge in, jumping haphazardly from one part of the problem to another, trying to justify first impressions instead of testing them.
  • •Lack a critical attitude and take too much for granted






Jan 5, 2007 ... Written by Alex Iskold Earlier this week we wrote about The Race to beat Google. In that article we discussed various approaches that ...
www.readwriteweb.com/archives/overview_of_clu.php - Cached 
Jan 20, 2007 ... Cluster style search engines give the information in a non linear format. Instead of the big G for your next web search, give these a spin ...
www.growyourwritingbusiness.com/?p=98 - Cached - Similar
yippy/turn privacy ON | |; advertising | |; about |; help |; privacy |; toolbars |; sitesearch |; technology |; contact us ...
search.yippy.com/ - Cached - Similar
Search Engines with Cluster Technology, generating Groups of Search Results, optimized Navigation. Innovative Search Engine Technologies with Audio and ...
www.folden.info/searchengineclustertechnology.shtml - Cached - Similar

WebClust is a meta search engine based on a technology called Documentclustering : the automatic organization of documents into meaningful groups.
www.webclust.com/ - Cached – 

Carrot2 Search Results Clustering Engine. Carrot2 organizes your search results into topics. With an instant overview of what's available, you will quickly ...
search.carrot2.org/stable/search - Cached - Similar

Another in the Bootcamp series of podcasts , these slides show examples.
www.slideshare.net/.../visual-and-clustering-search-engines - 

Clusters search results. It presents a diagram of themes within the results, from which the user can select one or all results. Options to search Australian ...
www.mooter.com/ - Cached - Similar

iBoogie MetaSearch Engine with automatic document clustering. ... Documentclustering technology - Read all about clustering and meta search. ...
www.iboogie.tv/ - Cached - Similar

Jan 22, 2008 ... Some search engines and some federated search enginesprovide clustering features. A very simplistic form of clustering is to group search...
federatedsearchblog.com/2008/01/22/what-is-clustering/ - Cached –

File Format: PDF/Adobe Acrobat - Quick View
by S Osinski - Cited by 73 - Related articles

Clustering Search. Results. Stanislaw Osinski and Dawid Weiss, Poznan University of Technology. Search engines rock! Right? Without search engines, the ...
dollar.biz.uiowa.edu/~nstreet/01439479.pdf






  • Timeliness 
  • Sufficiency - completeness. 
  • Level of Detail or Aggregation - are the data broken down into meaningful units
  • Redundancy - not too much, but enough
  • Understandability 
practicality 

simplicity 

minimization of perceptual errors 

difficulty with encoding
  • Freedom from Bias
  • Reliability - is information correct & verifiability
  • Decision-Relevance - predictive power, significance
  • Cost-efficiency - consider the change in decision behavior after obtaining the information minus the cost of obtaining it
  • Cost-effectiveness 
  • Comparability 

consistency of format
consistency of aggregation
consistency of fields
  • Quantifiability 
  • Appropriateness of format , medium of display 
    • ordering of the information
      graphical vs. tabular display
  • Quantity: more is not better! 
Panda is a Google search algorithm and "... just one of roughly 500 search improvements we expect to roll out to search this year,” writes Google Fellow Amit Singhal on the Google Webmaster Central blog. “In fact, since we launched Panda, we’ve rolled out over a dozen additional tweaks to our ranking algorithms. Search is a complicated and evolving art and science, so rather than focusing on specific algorithmic tweaks, we encourage you to focus on delivering the best possible experience for users.”
  1. Would you trust the information presented in this article?

  2. Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?

  3. Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?

  4. Would you be comfortable giving your credit card information to this site?

  5. Does this article have spelling, stylistic, or factual errors?

  6. Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?

  7. Does the article provide original content or information, original reporting, original research, or original analysis?


  1. Does the page provide substantial value when compared to other pages in search results?
  2. How much quality control is done on content?
  3. Does the article describe both sides of a story?
  4. Is the site a recognized authority on its topic?
  5. Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  6. Was the article edited well, or does it appear sloppy or hastily produced?
  7. For a health related query, would you trust information from this site?
  1. Would you recognize this site as an authoritative source when mentioned by name?
  2. Does this article provide a complete or comprehensive description of the topic?
  3. Does this article contain insightful analysis or interesting information that is beyond obvious?
  4. Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  5. Does this article have an excessive amount of ads that distract from or interfere with the main content?
  6. Would you expect to see this article in a printed magazine, encyclopedia or book?
  7. Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  8. Are the pages produced with great care and attention to detail vs. less attention to detail?
  9. Would users complain when they see pages from this site?











   






MLA Style
"Page_title.”  @WEBO, year
@WEBO, day, month,  year
<http://www.atwebo.com/page.htm>
APA Style
Page._title.  (year)
Retrieved day, month,  year, from http://www.atwebo.com/page.htm
<a href=http://www.atwebo.com/page_.htm>page_title</a>

Send mail to webperson@atwebo.com with questions or comments about this web site.
Copyright © 2001-2011 @WEBO: Increasing Social Capital - Thought leadership, best business practices and innovation in information technology outsourcing
Last modified: May 17, 2013

No comments: