**Newsle is building the future of news alerts.**

A future where the information you need gets delivered to you.

We believe that web services should save you time, not waste your time.

We’re a consumer technology company, tackling cutting-edge technical problems and delivering actionable information to our users.

Over the last two years, we’ve developed the state-of-the-art person disambiguation system, built a news indexer that crawls millions of articles a day, and designed an application that’s loved by users around the world.

Our team is based in San Francisco and comes from around the world, including Azerbaijan, Israel, Germany, New Hampshire, and California. Our priorities are simple: reorganize news around people, and have fun doing it.

We’re funded by Conde Nast, Maveron, DFJ, Bloomberg Beta, Transmedia Capital, Lerer Ventures, and SV Angel.

At Newsle, you can work on all aspects of the product, from building and scaling the core NLP algorithms to generating newsfeeds and pushing them to the frontend.

We’re considering engineers for all levels of our stack, but are particularly interested in machine learning or full-stack engineers. Competitive salary/equity & unlimited vacation.

Read more about our openings at newsle.com/jobs.

]]>

*Optimizing the memory footprint of a classifier used here at Newsle set us down a rabbit hole of rewriting a basic Scipy function with Cython, something that only became a problem when our high-dimensional text spaces grew to a cartoonish size, thanks to the hashing trick. Here I motivate the use of the hashing trick, how we use sparse matrix-vector multiplication for text classification, and how we derived and wrote the new implementation.*

**The Hashing Trick: Why, How, What**

In many information retrieval, natural language processing, or machine learning contexts, it is standard to work with a large spaces of words/n-grams, sometimes on the order of a few million unique observations. This bag-of-words model treats a document as a point in a high-dimensional space; a sparse vector in which the non-negative entries line up with the terms that document has.

Creating this corpus model is typically done in a stateful, incremental way, where terms are assigned a coordinate in the order in which that word is seen. For instance, the sentence “The dog is a nice dog, but cats aren’t so nice.” might be tokenized, filtered for stopwords, counted, and embedded into its own space as the vector where the ordered dimensions correspond to {dog, nice, cat}.

Keeping this ordering around in a production environment (usually in the form of a hash table or dictionary with constant lookup time) can have a huge memory footprint. This also requires that we process the corpus in serial, or share state across processes, complicating parallelization. Enter the hash trick.

Rather than derive the feature coordinates from the order seen, we can generate a column index for a given string with a hash function, mapping to enough (prime-numbered) buckets to avoid collision for a corpus with even a million unique terms. For this exercise, we’ll use MurmurHash3 (MMH3.) Choosing a 32-bit hash, our domain of possible column incides is now up to 2^{31} – 1 (modding out the sign to ensure we have sensible column indices. This is also, incidentally, the eighth Mersenne prime!) On a corpus of nearly 1M Newsle industry bigrams (explained below), we encounter just 17 collisions, so we’ll call this “good enough.”

**Hashing Trick in Practice: Scipy Sparse and Text Classifiers **

Let’s try a simple classification task. We have a corpus of bigrams used across news articles flagged as representing one of nearly two hundred industries (e.g., banking, software, management consulting, fisheries), with each industry compactly represented by the centroid of its many representative articles. We could then assert that a new query document is described by whichever centroid is closest to it. We’ll define closeness by use of cosine similarity. The underlying space can then be thought of as a Voronoi tessellation shattered into a few hundred pieces (each corresponding to a pane centered on a centroid vector), and this method may often be called a “nearest-centroid” or “1NN” classifier.

Classification is then as simple as erecting a matrix representation of the corpus, called , (with rows as industry centroid vectors), l_{2}-normalizing its rows, and constructing a vector of the new document, called , with the same method of (TF-IDF) weighting and with unit l_{2}-norm. Computing cosine similarity against every category is then just computing the matrix-vector multiplication

.

Arg-maxing on will yield the closest industry, and our prediction. Let’s begin!

Let’s take our article to be a recent New York Times piece on BlackBerry’s announcement of its new phones and operating system. Without feature-hashing, we’d construct our vector by computing TF-IDF weights on tokenized terms by referencing a dictionary for both the index and inverse-document frequency (IDF) lookup. Here we use Python’s popular Scipy Sparse module, a library allows us to efficiently store matrices with mostly zero entries across a variety of sparse data representations, while supporting many important linear algebraic methods (e.g., back-substitutions, factorizations, and eigendecompositions by the Lanczos algorithm).

from scipy import sparse from math import sqrt tfs = nlp.tokenize_and_count(article) seen_tfs = filter(lambda t:t in feature_lookup, tfs) rows = [feature_lookup[term] for term in seen_tfs.iterkeys()] columns = [0 for _ in xrange(len(rows))] tfidfs = [tf * idf[term] for term, tf in seen_tfs.iteritems()] norm = sqrt(sum(w**2 for w in tfidfs)) normed_tfidfs = [w / norm for w in tfidfs] v = sparse.csc_matrix((normed_tfidfs, (rows, columns)), shape = (M.shape[0], 1)) r = M * v classification = industries[r.toarray().argmax()]

Here our classifier labels the Times article to be about “Wireless.” The next closest guesses were “Telecommunications” and “Consumer Electronics”. Pretty good, classifier! Let’s now assume we wanted this script to run across many child processes, the memory footprint as small as possible on each, and we’d like to **use the hashing trick instead of a dictionary for the feature lookup**.

Sparse CSR (compressed sparse row) data structures will play nice if the column indices can fit into a numpy.int32 data type, so a signed 32-bit hash function will work well here. We merely need to replace the matrix with , for which the column indices are now in instead of The data structure is the same size in memory as before, but now we need only have a hash function on hand instead of an additional hash table**.** Let’s make the new vector the same as before, this time drawing on a word hash, and pulling the IDFs from a C++-implemented MARISA trie. So our naive implementation of this procedure might be:

import mmh3 from math import sqrt, log MAX_INT = 2147483647 def word_hash(s): return (mmh3.hash(s) % MAX_INT) def calc_tfidf(tf, df, N = 200): return tf * log(N / (1.0 + df)) tfs = newsle_nlp.tokenize_and_count(article) hashed_tfs = {word_hash(term):tf for term, tf in tfs.iteritems()} rows = hashed_tfs.keys() columns = [0 for _ in xrange(len(rows))] tfidfs = [calc_tfidf(tf, df = df_trie.get(h_term,[(0]])[0][0]) for h_term, tf in hashed_tfs.iteritems()] norm = sqrt(sum(w**2 for w in tfidfs)) normed_tfidfs = [w / norm for w in tfidfs] v_hashed = sparse.csc_matrix((normed_tfidfs, (rows, columns)), shape = (MAX_INT, 1)) r = M_hashed * v_hashed classification = industries[r.toarray().argmax()]

**This code will fail!** The problem is in the penultimate line. It was natural for us to structure the matrix `M_hashed`

as a “CSR” data structure (there is sparse data on every row, but not for every column) and the vector v_hashed as a “CSC” (Compressed Sparse Column, it’s a single column vector, but nearly 2^{31} of its rows are zero.) The problem is in how Scipy Sparse does multiplication– it requires both matrices in the operation `A*B`

to be either CSR or CSC, and will convert the second to the format of the first if this is not the case. **Forcing our sparse vector v_hashed to become CSR, however, will blow up our memory**.

To see why, let’s try a toy example. Consider the following vector in which has nonzero entries in only four places. One way of constructing this sparse vector is just by just passing in the ordered coordinates:

I, J, V = [2, 5, 6, 12], [0, 0, 0, 0], [0.1, 0.2, 0.3, 0.9] v = sparse.csc_matrix((V, (I, J)), shape = (15, 1))

The CSC vector we constructed is now uniquely determined by three core numpy arrays– `v.data`

, the nonzero values; `v.indices`

, the rows corresponding to the values in the previous array; and `v.indptr`

, a pointer to the values in the previous two arrays, telling us where the values and row-indices for the contents of the i^{th} column lie.

`>>> print v.todense().T`

[[ 0. 0. 0.1 0. 0. 0.2 0.3 0. 0. 0. 0. 0. 0.9 0. 0. 0. 0. ]]

>>> print v.data, v.indices, v.indptr

[0.1 0.2 0.3 0.9] [2 5 6 12] [0 4]

CSR is a very similar data structure as CSC, but its `indices`

array points to the nonzero value’s columns, and its `indptr`

has an entry whenever our data skips to the next array. For a column vector, you can imagine this to be redundant:

`>>>v_csr = v.tocsr()`

>>> print v_csr.todense().T

[[ 0. 0. 0.1 0. 0. 0.2 0.3 0. 0. 0. 0. 0. 0.9 0. 0. ]]

>>> print v_csr.data, v_csr.indices, v_csr.indptr

[ 0.1 0.2 0.3 0.9] [0 0 0 0] [0 0 0 1 1 1 2 3 3 3 3 3 3 4 4 4]

And so it is! Though this vector is the same mathematical object as it would be in a CSC representation, its row pointer is forced to have an entry for every (nominally) new row in the vector. This requires an entry for the length of the array, defeating the point of a sparse structure. Now you can imagine why our classification code above failed– coercing `v_hashed`

to CSR will construct a structure with an `indptr`

array that has nearly 2^{31} numpy long integers, and that’s going to cause a segfault.

So, we clearly need a CSR times CSC multiplication method. Not finding one out there on the internet, we will write one of our own.

**Sparse CSR/CSC Multiplication in Python**

Computing this sparse matrix/vector product means we want to scan rows of our matrix and whenever it shares nonzero columns with the vector factor, we’ll multiply the result, and the sum of all such overlaps is the inner product on that dimension. Trying out this naive logic as Pythonically as possible:

from scipy import sparse import numpy as np def pythonic_mult(M, v): assert isinstance(M, sparse.csr.csr_matrix), "matrix M must be CSR format." assert isinstance(v, sparse.csc.csc_matrix), "vector v must be CSC format." assert M.shape[1] == v.shape[0], "Inner dimensions must Agree." assert v.shape[1] == 1, "v must be a column vector." v_indices = set(v.indices) v_values = dict(zip(v.indices, v.data)) checker = lambda j: j[0] in v_indices num_rows = M.shape[0] res = np.zeros(num_rows) for i in xrange(num_rows): a, b = M.indptr[i], M.indptr[i+1] M_indices_data = zip(M.indices[a:b], M.data[a:b]) matching = dict(filter(checker, M_indices_data)) ip = sum(val*v_values.get(k, 0.0) for k, val in M_indices_data) res[i] = ip return res

Testing this out…

`>>> res_1 = pythonic_mult(M_hash, v_hash)`

1 loops, best of 3: 5.48 s per loop

>>> print industries[res_1.argmax()]

Wireless

Well, **we** **got the right answer and didn’t segfault, but the running time it is just terrible**. Let’s start over. We know we’re only going to use the columns of `M_hash`

that are nonzero in the rows of `v_hash`

, so why don’t we just scan through the structure of `M_hash`

, throw away all that we don’t need, and multiply the result by the dense array left over in `v_hash`

? Let’s give that a try:

def pythonic_mult_2(M, v): assert isinstance(M, sparse.csr.csr_matrix), "matrix M must be CSR format." assert isinstance(v, sparse.csc.csc_matrix), "vector v must be CSC format." assert M.shape[1] == v.shape[0], "Inner dimensions must agree." assert v.shape[1] == 1, "v must be a column vector." kept_columns = {x:i for i,x in enumerate(v.indices)} x_values = dict(zip(v.indices, v.data)) checker = lambda j: j[0] in kept_columns num_rows = M.shape[0] indices, data, indptr = [], [], [0] for i in xrange(num_rows): a, b = M.indptr[i], M.indptr[i+1] for index, d in izip(M.indices[a:b], M.data[a:b]): if index in kept_columns: indices.append(kept_columns[index]) data.append(d) indptr.append(len(data)) red_mat = sparse.csr_matrix((np.array(data), np.array(indices), np.array(indptr)), shape = (M.shape[0], len(kept_columns))) return red_mat.dot(v.data)

>>> res_2 = pythonic_mult_2(M_hash, v_hash)

>>> %timeit res_2 = pythonic_mult_2(M_hash, v_hash)

1 loops, best of 3: 778 ms per loop

>>> print industries[res.argmax()]

Wireless

>>> print np.allclose(res_1, res_2)

True

Our answers agree, and this was substantially faster than the first try. But Scipy’s implementation in the unhashed space takes only 28 milliseconds on the same machine. Time to get closer to the metal.

**Sparse CSR/CSC Multiplication in Cython**

Cython’s a superset of Python that can compile to high-performance C from almost-standard Python with very little effort. This is my first attempt optimizing code with Cython, so I first wrote a matrix-vector multiplication as I might’ve in MATLAB or C– using pointers and flags. For a given row of a matrix, I’ll iterate over its `indices`

array, as well as the indices array of the vector. Assuming they’re sorted (something you’ll have to check when you pull these arrays from the Scipy Sparse structure), I’ll iterate one forward whenever the other’s ahead, and if their index ever agrees, I know the two arrays have an entry in common, and I’ll increment the inner product by that value. Simple logic flow, if a bit convoluted in its book-keeping:

import numpy as np from scipy import sparse def py_ptr_multiply(m_indptr, m_indices, m_data, v_indices, v_data): """ ASSUMPTION: CSR structure of input matrix has sorted indices. m_indptr, matrix's pointer to row start in indices/data m_indices, non-negative column indices for matrix m_data, non-negative data values for matrix v_indices, non-negative column indices for vector v_data, non-negative data values for vector """ M = m_indptr.shape[0] - 1 v_nnz = v_indices.shape[0] output_vector = np.empty(M) for count in range(M): inner_product = 0.0 v_pointer = 0 increase_v = 0 exhausted_v = 0 v_index = v_indices[v_pointer] row_start = m_indptr[count] row_end = m_indptr[count+1] for m_pointer in range(row_start, row_end): if exhausted_v == 1: exhausted_v = 0 break increase_m = 0 while increase_m == 0: if increase_v == 1: v_pointer = v_pointer + 1 if v_pointer >= v_nnz: exhausted_v = 1 break v_index = v_indices[v_pointer] increase_v = 0 col_index = m_indices[m_pointer] if col_index < v_index: increase_m = 1 continue elif col_index == v_index: inner_product = inner_product + m_data[m_pointer]*v_data[v_pointer] increase_v = 1 increase_m = 1 elif col_index > v_index: increase_v = 1 output_vector[count] = inner_product return output_vector

Testing this out, we see it’s a bit slower than our Pythonic attempts before:

>>> assert M_hash.has_sorted_indices, "M must have sorted indices along its rows."

>>> assert v_hash.has_sorted_indices, "v must have sorted indices along its column."

>>> m_indptr, m_indices, m_data = M_hash.indptr, M_hash.indices, M_hash.data

>>> v_indices, v_data = v_hash.indices, v_hash.data

>>> res_3 = py_ptr_multiply(m_indptr, m_indices, m_data, v_indices, v_data)

>>> print industries[res_3.argmax()]

Wireless

>>> %timeit res_3 = py_ptr_multiply(m_indptr, m_indices, m_data, v_indices, v_data)

1 loops, best of 3: 1.12 s per loop

Saving this as a separate file in the .pyx format, we can compile it without any changes, and then import it directly into a Python script or within the shell:

>>> from cy_ptr_multiply_1 import ptr_multiply_1

>>> res_4 = ptr_multiply_1(m_indptr, m_indices, m_data, v_indices, v_data)

>>> print industries[res_4.argmax()]

Wireless

>>> %timeit res_4 = ptr_multiply_1(m_indptr, m_indices, m_data, v_indices, v_data)

1 loops, best of 3: 512 ms per loop

**It’s twice as fast with no changes!** But let’s be smart. Cython has special support for NumPy arrays (by way of a ‘cimport numpy’), and we can declare types on our variables and arrays. We can additionally tell Cython to avoid checking for improper indices on arrays by way of a boundscheck decorator (which means you’ll need to check your input!):

import numpy as np from scipy import sparse cimport numpy as np cimport cython DTYPE_INT = np.int32 DTYPE_FLT = np.float64 ctypedef np.int32_t DTYPE_INT_t ctypedef np.float64_t DTYPE_FLT_t @cython.boundscheck(False) def sp_matrix_vector_rmult(np.ndarray[DTYPE_INT_t] m_indptr, np.ndarray[DTYPE_INT_t] m_indices, np.ndarray[DTYPE_FLT_t] m_data, np.ndarray[DTYPE_INT_t] v_indices, np.ndarray[DTYPE_FLT_t] v_data): """ ASSUMPTION: CSR structure of input matrix has sorted indices. m_indptr, matrix's pointer to row start in indices/data m_indices, non-negative column indices for matrix m_data, non-negative data values for matrix v_indices, non-negative column indices for vector v_data, non-negative data values for vector """ assert m_indptr.dtype == DTYPE_INT assert m_indices.dtype == DTYPE_INT assert m_data.dtype == DTYPE_FLT assert v_indices.dtype == DTYPE_INT assert v_data.dtype == DTYPE_FLT cdef int M = m_indptr.shape[0] - 1 cdef int v_nnz = v_indices.shape[0] cdef np.ndarray[DTYPE_FLT_t] output_vector = np.empty(M, dtype=DTYPE_FLT) cdef int count, v_pointer, increase_v, exhausted_v, v_index, row_start cdef int row_end, m_pointer, increase_m, col_index cdef DTYPE_FLT_t inner_product for count in range(M): inner_product = 0.0 v_pointer = 0 increase_v = 0 exhausted_v = 0 v_index = v_indices[v_pointer] row_start = m_indptr[count] row_end = m_indptr[count+1] for m_pointer in range(row_start, row_end): if exhausted_v == 1: exhausted_v = 0 break increase_m = 0 while increase_m == 0: if increase_v == 1: v_pointer = v_pointer + 1 if v_pointer >= v_nnz: exhausted_v = 1 break v_index = v_indices[v_pointer] increase_v = 0 col_index = m_indices[m_pointer] if col_index < v_index: increase_m = 1 continue elif col_index == v_index: inner_product = inner_product + m_data[m_pointer]*v_data[v_pointer] increase_v = 1 increase_m = 1 elif col_index > v_index: increase_v = 1 output_vector[count] = inner_product return output_vector

Compiling and running this code, we see **our Cython-aided implementation of sparse-matrix multiplication is actually twice as fast as the Scipy-computed multiplication** of the same matrices in the unhashed space! And there’s no memory-bloating inefficient CSRCSC conversions in the process.

# testing out type-declared Cython method on hash-tricked data structures

>>> from newsle.nlp.linalg.sparse import sp_matrix_vector_rmult

>>> print M_hash.shape, v_hash.shape

(144, 2147483647) (2147483647, 1)

>>> print M_hash.nnz, v_hash.nnz

3211379 464

>>> res_5 = sp_matrix_vector_rmult(m_indptr, m_indices, m_data, v_indices, v_data)

>>> print industries[res_5.argmax()]

Wireless

>>> %timeit res_5 = sp_matrix_vector_rmult(m_indptr, m_indices, m_data, v_indices, v_data)

100 loops, best of 3: 11.1 ms per loop

# the following is the method using multiplication within Scipy on unhashed space

>>> print M.shape, v.shape

(144, 995887) (995887, 1)

>>> print M.nnz, v.nnz

3211379 352

>>> res_0 = M * v

>>> np.allclose(res_0.T.toarray()[0], res_5)

True

In [37]: %timeit res_0 = M*v

10 loops, best of 3: 27.4 ms per loop

(Notice that are actually more nonzero entries in `v_hash`

than in `v`

. This is because we no longer have a dictionary for seeing if a word’s been seen before or not. If it’s not in the corpus, though, we needn’t worry about computing the inner product, as those terms will vanish.)

*Erich is Newsle’s sole Machine Learning Engineer. He works on fun problems of entity disambiguation, story clustering, and topic modeling. Follow him on Twitter: @erich_owens*

]]>

**David Cohen **- The name “David Cohen” is almost synonymous with Boulder’s tech scene. David is the CEO and Co-Founder of the well-known startup accelerator TechStars, with mentors including Foursquare’s Dennis Crowley, Tumblr’s David Karp, and other heavy hitters. TechStars alumni include Lore, Graphic.ly, SendGrid, and several other rising stars. To learn from one of the best, check out this video of David’s advice for startup community leaders.

**Jim Franklin **- Jim is the CEO of SendGrid, a cloud email infrastructure alumnus of TechStars. SendGrid is growing fast, hitting 60,000 users a few months ago; so fast that they have grown too big for the Boulder office, prompting the opening of a satellite in Denver. On a personal note, we use SendGrid to send email alerts to our users, letting them know when their friends make the news; SendGrid is highly recommended.

**Jud Valeski **- Jud’s company, Gnip, is “the largest provider of social media data to the enterprise.” They slice and dice data like one the best, recently adding the ability to sort Twitter streams based on country codes, locations of users, time zone, language, number of followers, and other data points. (And congrats to Jud for being named a finalist in Ernst & Young’s 2012 Entrepreneur of the Year awards.)

**Laura Marriott **- Laura, CEO of NeoMedia Technologies, is placing her bets on QR codes and mobile barcodes. And if the company’s Q2 stats are any indication of the growing market, NeoMedia is well-poised to remain an industry leader as mobile barcodes continue to increase in popularity worldwide. Laura is ready for the challenge, always expanding on her business model; Just last month, NeoMedia licensed their portfolio of over 74 patents to Microsoft.

**Niel Robertson** - Niel is the CEO of Trada, self-described as “the world’s first and only crowdsourced online advertising services marketplace.” As the company grows, it has been adding key hires to its marketing & sales teams in an effort to focus on the mid-market segment of paid search. Niel is also a Co-Founder of tenXer, a personal productivity solution, which announced $3 million in series B funding last month.

—

**Who are your favorite CEOs in Boulder?**

]]>

**Seth Priebatsch** - Seth, CEO of SCVNGR, has been working hard to position his LevelUp app at the top of the crowded mobile payment space. His most recent change allows users to contribute to a charitable cause via the app. Seth must be doing something right, as LevelUp is close to its one millionth transaction and recently increased its funding to $21 million.

**Bettina Hein** - In addition to being one of L’Oreal’s 2012 USA Women in Digital “NEXT Generation Award” winners, Bettina is the CEO of Pixability, a video marketing company. She recently gave her insight to BostInno about how to make a video go viral. Her take on the issue is that having the goal of going viral often sets you up for failure.

**Dave Kerpen** – Dave is the CEO of Likeable Media, a social media & word of mouth marketing agency based in Bostin & NYC. He was recently quoted in an article about the presidential candidates’ social media presence, (which is a hot topic these days, with Obama’s AMA on Reddit and Ryan’s post on Quora.) Dave also gave some advice that would come in handy for any recent graduate still looking for work in his Forbes article, “5 Essential Tips To Make Your Social Profiles Resume-Ready.”

**Brian Halligan** – Brian, who coined the term “inbound marketing,” is the CEO of HubSpot, a marketing software company. Just two weeks ago, Brian and Co-Founder Dharmesh Shah unveiled Hubspot 3. The pair claimed that the update would feature “Amazon.com-like personalization achievable for the rank-and-file businesses that power our economy.”

**Vanessa Green** – Vanessa, CEO of OnChip, appeared earlier this week at an MIT entrepreneurship event which featured a student accelerator competition, a direct rebuttal to Peter Thiel’s announcement that he’d pay students to drop out of college. Vanessa’s advice to the student competitors was to “show up and keep showing up. Take advantage of the ecosystem.” And congrats to Vanessa, as investors are showing up in support of OnChip: the company recent raised $2.4 million in funding.

**Stephanie Kaplan** – Stephanie, CEO of Her Campus spoke about the benefits of deferred admission for MBA students in a recent US News article. She was accepted into Harvard’s 2+2 program as a senior in college, but not before winning a case competition for her company, Her Campus, which is “a collegiette’s guide to life.” Stephanie’s wisdom extends beyond college, though, shown in her advice on how to encourage innovation without leading to burnout; she claims it’s about setting lofty goals and pulling people away from their usual to-do lists.

**Who are your favorite Boston-based CEOs? What city would you like to see featured in the next Movers & Shakers?**

]]>

Seattle is often hailed as a haven of tech startups. After the success of Amazon, many other founders have started their own ventures in The Emerald City. After covering mom bloggers, dad bloggers, and higher education professionals in our Movers & Shakers series, we decided to go a little more high-tech and introduce some of our favorite startup CEOs. And we couldn’t think of a better city to start with than Seattle.

**Paul Thelen** - It’s no secret that Zynga is treading difficult waters at the moment. After an assortment of troubles, including losing their COO, some analysts have declared this a prime opportunity for another online & mobile games company to take the spotlight. And if Paul has his way, his company Big Fish Games will be the one for the job. He gave up the role of CEO four years ago to Jeremy Lewis, but recently reclaimed the position. Now moving quickly, Paul is betting big on Big Fish Casino, an app that allows UK players to win real money, and Big Fish Unliminted, a cloud based gaming service.

**Adam Schoenfeld** – Adam made headlines a couple weeks ago when his measurement & analytics company, Simply Measured, added Big Fuel to its impressive list of agency clients which already includes Edelman, Ogilvy, and others. He was also recently quoted in Mashable and Yahoo, among other publications, citing data his company pulled showing that Instagram is an up-and-coming platform for brands, with 40 of the Top 100 brands using the photo sharing app.

**Darrell Cavens** – Darrell, a former SVP at Blue Nile, now has his own powerful ecommerce company. Zulily, a daily deal site featuring products for babies and kids. Congrats are in order for Darrell and the rest of the Zulily team, as they recently passed 5 million members. He’s also slated to speak at Startup Day next month in Bellevue.

**T.A. McCann **- T.A. is well known for his former position as CEO of Gist, but is now taking on the challenge of VP of BBM at Research in Motion. In fact, as he moves forward with his work at RIM, the company announced last week that Gist would be permanently shut down next month. His original idea won’t completely disappear, though, as BlackBerry 10 announced that next year they will have the functionality to aggregate info from a contact’s blog posts, tweets, and other profiles into a single page on their device.

**Andy Liu** – Andy, CEO of BuddyTV, made headlines this summer when he helped users navigate the broadcast of the Olympics. Users had the ability to use BuddyTV’s “Olympics Quicklist,” which sorted events by channel & time, and even alerted you when your favorite event was about to start. It’s of no use now that the games are over, but it’s still an interesting feature to read about, and perhaps it will carry into the next Olympics.

**Adrian Aoun** – Adrian, CEO of Wavii, a news feed startup built around topics, saw an opportunity out of the Olympics as well. More specifically, he knew he could solve the problem many dubbed as #NBCFail. “NBC started having their fail moment,” Adrian says, “Well, we have that data.” It shouldn’t be shocking that Wavii reported spikes of Olympics related traffic around lunch time and in the afternoon, hours before NBC reported event results.

**Lara Feltin **- Lara is CEO and Co-Founder of Biznik, self-described as a site for “business networking that doesn’t suck.” – It’s a community of support for independent business people. Earlier this year, Biznik took a stance against spam and phony accounts by switching from a freemium model to a pay-only model. With a lot of writers asking questions like “would you ever pay for Twitter or Facebook?,” it seems Lara saw this coming and acted ahead of the curve.

**Keith Krach** - Keith is CEO of DocuSign, which I have to admit is one of my favorite iPad apps; it allows you to fill out and sign documents all on your touch screen. (And the signatures are recognized legally by the government, making the paper-free process much easier than scanning, faxing, or mailing.) Keith has reason to celebrate these days, as Google Ventures joined DocuSign’s impressive list of investors, bringing the company’s total funding up to about $114 million. That’s not all he’s been up to: the company also recently added Mary Meeker, General Parter at Kleiner Perkins Caufield & Byers, to its Board of Directors.

**Who are your other favorite Seattle-based startup CEOs? Also, what city do you want to see us feature next?**

]]>

Another Olympics has come and gone. While it may have felt like it went by in a flash, there have been many headlines that will remain prominent for years. Whether that’s broken records, broken dreams, or performances that nearly brought down the stadium, the London 2012 Summer Olympic games were certainly newsworthy. Let’s take a look at some of the biggest stories:

**Gabby Douglas Wins Gold** – Perhaps one of the greatest stories to come out of this year’s Olympic Games is Gabrielle Douglas, the 16-year-old American who became the first African American to win the all around title at the Olympics. Not only that, she’s become NBC’s “most clicked” athlete, even beating Michael Phelps. (But one of the most viral internet stories to come out of London 2012 is Douglas’s teammate McKayla Maroney, who is “not impressed.”)

**Usain Bolt Breaks Records** – Gabby Douglas isn’t the only one rewriting history. Bolt lead his relay team to a gold medal and a new world record in the 4x100m event. (So maybe we should cut him some slack for partying until 6am, right?)

**Ryan Locthe Takes the Spotlight **- Much attention was given to Ryan Locthe, the American swimmer, after his impressive showing this year. A lot of speculation has been placed on Locthe’s future: Will we see him next on *Dancing With the Stars*? How about his own reality TV show?

**Opening & Closing Ceremonies Showcase London** – While Danny Boyle‘s opening ceremonies were highly regarded for displaying historic literary, cultural, and political triumphs, some argued they were outdone by the closing ceremonies. And that’s a tough act to beat, as the closing ceremonies was a rocking concert featuring the Spice Girls, The Who, One Direction, Jessie J, Eric Idle, and others.

**Sarah Attar is Not the Last **- Even though she finished last in her event, Sarah Attar made history by becoming the first female track and field athlete to represent Saudi Arabia. She was congratulated by a crowd of 80,000 people on their feet cheering for her

**What were your favorite stories of London 2012?**

]]>

**Bruce Sallan **- Fitting with our decision to profile mom bloggers before dad bloggers, Bruce wrote a piece about how moms often take the spotlight in the parental blogging sphere. Read his comparison of mom bloggers & dad bloggers and decide for yourself who reigns supreme.

**Michael Sheehan**** **- Michael, AKA “High Tech Dad,” writes a blog where “technology and fatherhood collide.” He recently wrote a piece about a lot of *phishy* sites that are taking advantage of London 2012 hype and attempting to scam people doing Olympics related searches.

**Mike Adamick ** – Mike, the writer of “Cry It Out,” has a very serious question posted on *SFGate* for all the adults out there – which way do you tie your rabbits?

**Matt Logelin **- Matt, author of “Matt, Liz, and Madeline” has been making news lately because his *New York Times* bestselling book, Two Kisses for Maddy, has been optioned by Lifetime TV. If the deal goes forward, his book will be adapted into a TV movie by the Co-Creator of “Friends” and the producers of “The Lucky One.”

**Dan Pearce**** **- I specifically saved Dan for last. He is better known as “Single Dad Laughing,” but to many, he is a controversial figure in the dad blogging world. Lisa Belkin of the *Huffington Post *wrote a great piece titled “The Latest Battle in the Dad Blog War” that shows what other parent blogs have to say about Dan; they claim he’s *too* staged, his stories are *too* easy (which I’ll admit, I’ve thought from time to time; like when he was trapped on a mountain but has several great pictures of the experience. Who’s first instinct when they feel stranded & physically tormented is “hey friend, take a bunch of pictures of this so I can turn it into a five post series!?” But to Dan’s credit, I read all five parts. I’m not here to judge, just to share who makes the news; and Dan certainly stirs up quite a story.

**Who are your favorite newsworthy dads?**

]]>

**Erik Qualman - **Qualman, a social media consultant known for his Socialnomics book, is also a professor at Hult International Business School. Qualman was mentioned in this story about viral videos. The story touts Qualman’s Social Media Revolution web series as a strongly educational video, on the same plane as TED talks.

**Mark Schaefer** – Schaefer’s accomplishments include writing two books, being named to Forbes “Power 50″ social media influencers, as well as being an adjunct professor at Rutgers University. Schaefer recently wrote a piece for Influencer Marketing Review about social influence. He claims that at a conference he was introduced with his Klout score and number of Twitter followers, but with no mention that he had two graduate degrees or taught at a university. Have we reached a turning point in what identifies us, whether that be education or follower count? Read Mark’s opinion to see what he thinks.

**Sree Sreenivasan** - Sree is a professor & Dean of Student Affairs at Columbia Journalism School. He’s also a social media blogger on CNET News where he wrote recently about how to keep kids safe online. He’s frequently on top lists of people to follow including AdAge’s 25 media people on Twitter. He also uses his influence to help non-profits, which we love.

**Administrators**

**Renu Khator** – The President of the University of Houston is noted as being one of the only university presidents on Twitter. She recently made headlines when she publicly announced that she wouldn’t be taking on the role of President of Purdue University, though she was believed to be the front runner for the position.

**Feniosky Peña-Mora** – Up until recently, Peña-Mora was serving as the Dean of Columbia University’s school of engineering. Faculty resistance and public criticism ultimately caused Peña-Mora to step down. In a similar story, Alejandro Zaera was recently named the Dean of Architecture at Princeton, despite public outcry from a majority of graduate students in the school’s programs. Each story demonstrates how different universities respond to criticism from within their respective communities.

**Jenna Johnson - **Working for the Washington Post, Johnson is a respected education writer. A recent piece (also written by Anita Kumar, Daniel de Vise, and Paul Schwartzman) provides extensive coverage over the President of University of Virginia being ousted and reinstated over the course of 18 days. Playing out like a hollywood movie about corporate loyalties and power struggles, this piece alone is reason enough to follow the headlines that Johnson writes.

**Christine Armario - **A reporter for the Associated Press, Armario covers the U.S. Department of Education. She also writes about trends in education; after the infamous video about the bullied bus monitor, Armario wrote a piece about the rising issue of students bullying teachers and administrators.

**What other higher education professionals shape the news?**

]]>

**Jenny Lawson** – Also known as The Bloggess, Jenny recently published her “mostly true memoir,” Let’s Pretend This Never Happened. A couple months after its release, the book is still a hot seller, and appears on Top 10 best-seller lists frequently.

**Leah Segedie** – The creator of Mamavation (and owner of Bookieboo) was put in a “face off” against fellow mom blogger Audrey McClelland about the topic of putting your child on a diet. Which mom do you agree with?

**Jessica Gottlieb** – Never afraid to speak up, Jessica’s opinion appeared alongside the “Cool Whip Controversy” that set the blogosphere ablaze. Mom bloggers can find themselves in the firing line of criticism, and Jessica’s point is a good wake up call for all aspiring big time bloggers.

**Kristen Howerton** – Kristen not only writes her own blog, Rage Against the Minivan, but she also contributes to Huffington Post. (Remember how Newsle can help you follow journalists too?) Kristen recently wrote a piece about celebrities adopting african american babies and the reasoning behind it. (For lighter reading, check out her review of Disney Pixar’s “Brave” from a parental point of view.)

**Catherine Connors **- Catherine is a dual force to be reckoned with in the blogging world. She not only runs the blog Her Bad Mother, but she works for Babble, a popular blogging network. Take a look at her opinion on the question every mom has: “Can a mom have it all?”

**Who are your favorite newsworthy moms?**

]]>

For example, let’s say you’re the PR Manager of a startup building a mobile payments system. Here’s what you can do with Newsle:

Having a grasp on how the market feels about your company is critical to managing a successful PR campaign. And sometimes, Google Alerts doesn’t offer enough insight. If you happen to work for LevelUp, a service originally created by SCVNGR that recently pivoted to be a loyalty/payment app, then the last sentence in this story in GigaOM might clue you in to where you stand. After announcing LevelUp’s newest funding and explaining what the company does, the author concludes by saying that LevelUp’s service might be “confusing for now, seeing so many options in the market.” From a strategic standpoint, the PR Manager can find a way to carve a more identifiable niche. As a follow up, he or she could subscribe to all stories written by the author, Ryan Kim, to see if he ever changes his mind. (On a side note, Kim writes many stories about mobile applications, so he’s a good person to follow for anyone interested in that field.)

In the mobile payments sector, Jack Dorsey currently reigns supreme. Any PR Manager would be wise to know what Dorsey is cooking up at his company, Square. Over the past few weeks, Square has been investing more in the Android operating system, adding as many as one Android engineer every week. Chances are if you’re competing against Jack, knowing what they are investing in will help you communicate with your stakeholders.

If you work for a mobile payments company, you probably already had a good idea about Square’s overall strategy. But you might occasionally miss a story about other external factors in your industry. For example, Congress has been debating over which federal agency should have authority over mobile payments and their policies regarding security of data.

In another trend, companies who previously weren’t too involved in technology are getting their hands dirty with mobile payments. I personally love the Starbucks app, and apparently so do many of their customers. As of April of this year, over 45 million payments had been made using Starbucks’ platform. The coffee giant was recently joined by Burger King, who just this month began testing a mobile payment application in 50 of their stores. So perhaps mobile payment developers should fear a hamburger joint and a coffee shop more than the tech company started by the former CEO of Twitter.

These tactics apply to more than just mobile payment companies, though. Any industry professional can benefit from Newsle alerts, whether you work for a video game developer (so you could follow Larry Frum, who writes about gaming for CNN Tech) or an online retailer (you might want to follow Tony Hsieh, CEO of Zappos.)

]]>