Wednesday, July 27, 2011

Clearwell Enron 'Topics'

Moving forward with my look at semantic network analysis of the Enron data, I've been exploring Clearwell's 'Topics' feature, which like a lot of Clearwell's features, does much of the work for you with a few clicks.

For your perusal, here is the entire Topics list of the Enron corpus, sorted as Clearwell supplies it, and here are the top 10, sorted by occurrence frequency:
TopicTerms     # of     Documents
Margin CallCall, Conference, Conference call, Margin, number, Monday, Margin Call, account, funds, requirements, market, time, amount, equity, Weekly Call, year, Hours, stock, wire, portfolio                1538
intended recipientrecipient, intended recipient, affiliate, contract, sender, basis, party, sender or reply, Thanks, Message, privileged material, basis of a contract, use, relevant affiliate, sole use, RESUME, reply, Report, opportunity, people                1238
staff meetingStaff, Meeting, staff meeting, MONDAY, ENW Staff, location, MORNING, CAO Staff, January, Tuesday, MONDAY MORNING STAFF, CHANGE, ETS STAFF, Reminder, ETS MONDAY MORNING, MORNING STAFF, ETS MONDAY MORNING STAFF, ETS MONDAY, week, Teams                 1204
Enron Stockemployees, company, Enron, Stock, Enron Stock, Fund, millions, retirement, Proceeds, Demand Ken Lay, Demand, Enron Stock Sales, Lay, Sales, Ken Lay, Stock Sales, New York Times, energy crisis last, last, underhanded dealings                1151
experiencesinterview, Schedule, questions, Enron, time, form, date, system, employees, communication, answers, experts, students, conversation, experiences, video, clip, process, position, Guide                 1027
Conference CallCall, Conf, times, Conference Call, Conference, comments, GIR Conf, decision, Wednesday, Practices, Business, Summary, Beeson Conf, One, BPs, BP Conf, Meeting Summary, Business Practices, Gas Conf, Links Meeting Summary                   993
Dow Jones IndexPerson, Unknown Person, Dates, HourAhead hour, hour, Start Date, schedule, Subject, 2001 Subject, file, Message, Index, Kate, Dow Jones Index, Index Prices, Jones Index, djenergy Subject, EPMI Index, EPMI Index Prices, Prices                   896
Repeat parentparent, Repeat, Repeat parent, Date, Description, ENTRY, CALENDAR ENTRY, CALENDAR, Standard Time, Central Standard Time, Time, INVITATION Description, INVITATION, Meeting Dates, Mtg, Russell, call, OFFICE, Buchanan, Stacey                  807
option premiumOPTION, models, premium, option premium, spread, Index, Insurance, stock, HOUR, library, volatility, Email, baskets, average, Yes, price, tax, Digital Options, Index Option, State                   801
Access RequestRequest, Access, Date, Access Request, act, Read, email, Switch, Drop, POLR Request, data, approver, form, ERCOT, data approver, Switch Request, switch date, Customer, period, end                  734

I am interested to more fully know how Clearwell derives these topics, and to compare this process to other tools, like Gensim.
Here is Clearwell's writeup on its 'Topics' analysis process.
I will continue posting updates to this project as it develops.