For Developers

Reverb Developer Blog

Enabling OAuth With Swagger

by Erin  |  Posted March 21, 2014

Ever wanted to expose your API sandbox with OAuth 2? Here’s a short guide on enabling the Implicit or Bearer flow with swagger.

Swagger is an interface to your API, and it’s only appropriate that required authentication is described in that interface. We therefore model the OAuth2 implicit flow in the Resource Listing:

authorizations

In the above, we first enumerate the scopes that this API may request permission to. The scopes are listed with descriptions, and in this sample, are all about pets.

Next, the grant types for this API are described. This example is only using the implicit type, which you can read more about here:

The OAuth 2.0 Authorization Framework (The long version)

A more practical explanation

This tells us some important info, specifically where the user will be logging in, and when they do, what the token name will be upon success.

We add some information on each API, describing what’s required to access them. For example:

apis
Note here, the method to add a pet is secured with oauth2 and requires the write:pets scope be authorized by the end user.

Finally, we need to enable the (tiny) OAuth swagger library:

include3
And we need to configure swagger-ui with our auth information:

initialization

That’s it! Now when a user wants to access any API which has declared OAuth2 authorization to be required, they will be presented with a login button. Note it’s you (the API provider’s job) to enforce the security—Swagger is just acting as a conduit of the tokens. You can try it out on our petstore sample and even test your application with the OAuth server on the petstore. That server will allow any callback URL so you can do local testing.

The swagger implementation for Java can automatically create the JSON elements above. In a java-based project, you can simply define the authorization information like such:

grants

And annotate your methods:

method

Swagger is Open Source, and the specification can be read here:

Swagger 1.2 Specification

The petstore sample source can be found here:

Petstore Sample App

In a future blog post, we’ll describe how to enable OAuth2 authorization flow (aka 3-legged OAuth) with Swagger.

{ 6 comments }

Making Wordnik Faster

Every team and company accumulates at least a little technical debt. The trick is figuring out how to address it.

When Chiao joined Reverb, we had the opportunity and the frontend muscle to clean up some of the technical debt our products had accumulated. Wordnik.com especially hadn’t been receiving the love it deserved, so we committed to a six-week project to make the frontend codebase faster and cleaner.

The main problems we wanted to tackle with this rewrite were frontend performance and codebase complexity.

Moving Wordnik.com from Rails to Node.js was fun, fast, and felicitous

Fast

This was our first production deployment of Node.js so first we did some quick load tests. We tested express by doing simple EJS template transforms — it easily returned the HTML at over 900 req/s on an EC2 small instance. This gave us the confidence to move our servers from EC2 medium instances to EC2 small, reducing the frontend hosting costs by 75%.

We use Pingdom to track the time it takes to deliver the initial HTML file at wordnik.com/words/cat from different locations around the globe. Switching from Rails to Node.js reduced this measurement by 90%. This measurement isn’t a fair technology comparison by any means (we also reduced some underused functionality of the site), but the dramatic difference was surprising to us.

Simple

We were also able to make the codebase and local environment setup much easier. A member of our NLP team interested in Node.js was immediately able to understand the code and get a local instance of the new site running, and was able to submit a pull request fixing a bug minutes after checking out the repository.

We liked the design philosophy the npm site, especially the unceremonious MVC part. The publicly available Wordnik API returns beautifully structured data so we don’t need models. We used request, async to communicate with the API so most of what the frontend is doing is routing and applying the proper templates.

Flexible

npm site‘s ability to combine modules into your own custom stack with the exact feature set desired, avoiding any unnecessary complexity, is beautiful. The cornucopia of different ways to solve problems along with the hilarity of the Node community make this a fun way to build stuff. This set of hugely popular npm modules provides an extremely convenient way to build a website on top of an API: request, async, express, EJS, EJS-locals.

There are a variety of npm template solutions which can be rendered on the server or the client, which is incredibly freeing. We could easily modify the mix of modules rendered in either location to immediately render the most important page modules for users and search engines while deferring less important content.

We were initially too aggressive in moving content to the client. In our Reverb for Publishers product we had seen the Googlebot making requests that would have only occurred if JavaScript was being executed, so we thought that search engines would index content that was appended to a page via client side JS.

Our experience was that this is (sadly) not the case. We took a hit on search engine referrals until we moved our related words module back to server-side rendering. It wasn’t immediately obvious that this content would benefit our ranking, but one of our NLP experts, Will Fitzgerald, explained how this was important for the main word indexing.

Get Everything out of the Way of Your Page Rendering

What happens on your website before the visitor sees the main content? We were previously making a separate request for a file containing all the CSS required for the entire site. We’re now sending the main page content and all the CSS needed to display it inline, which allows the browser to render the main page body before additional assets are retrieved. We push all JavaScript to the bottom of the page so the browser can display the main content before we begin to add additional flourishes.

God Bless PageSpeed

A ton of basic web performance practices are taken care of by the PageSpeed module available for both Apache and Nginx. We’re now running Nginx in front of Node for all of our sites to handle static asset delivery. We’re taking advantage of Automatic JS and CSS concatenation and minification which was previously managed by grunt tasks as part of the build process.

The mix of PageSpeed options is amazing: images can be optimized on the fly, not only shrunk but converted to a more efficient filetype, or changed to a data URI.

Nginx even has an option to automatically inline the CSS required for your page, which, as we’ve seen, is a nice performance win. However, when I first enabled this option, I saw Nginx adding a huge amount of JS to the page in order to figure out the proper list of classes whose CSS needed to be inlined. This was so much text that it was simpler for us to just have Node include the CSS for the appropriate modules in the document head on Wordnik’s word pages rather than asking Nginx to figure it out on the fly.

Post jQuery

This version of the site dropped jQuery because it was no longer necessary. The benefit that project provided for client side development is immeasurable and I’m so grateful for its existence. But now we’re dealing with a pretty standards-compliant mix of browsers, the most prevalent of which are Mobile Safari on iOS7, Chrome, and Firefox. We don’t see any significant traffic from Internet Explorer versions below 9 or the Android Browser running on an OS version below 4.

piechart

Purposely not relying on jQuery has helped keep us aware of the expanding capabilities of the browsers we are dealing with. A former Wordnik employee Zeke wrote a beautiful and helpful jQuery/Vanilla Javascript Rosetta stone.

We move elements around with CSS3 transforms, which is simple and keeps the required computation off the render thread which helps avoid jankiness.

Testing

Erin McKean led the functional testing effort, evaluating libraries and settling on CasperJS. We have a commit hook which kicks off a suite of tests covering important page interactions against our local site instance. Errors in client side code, our Node.js, or inappropriate API responses are revealed here.

For frontend work, being able to trigger a test suite which actually interacts with your site the way a user would provides a lot of peace of mind.

Result

According to Google Analytics (GA), the entire Wordnik site now has a less than three-second page load time. When loaded internally, the DOMLoad event fires in less than one second. However, we know that Real User Monitoring numbers like those collected by GA are always dramatically higher than your own internal measurements and tell the real story of your user’s experience.

We are very happy with our transition from Rails to Node.js for Wordnik.com. Although we reduced the total feature set of the site as part of the rewrite, the user response has been positive, mostly because of the faster page loads (in the Rails version, very long lists were loading slowly, or even timing out entirely).

With Node.js, we also feel confident in our ability to continue to develop the site in shorter, more modular bursts, without as much head-scratching as with the previous Rails code. Because most of the team has at least a passing familiarity with JavaScript, it’s also easier for people not directly tasked with Wordnik.com work to contribute.

 

{ 0 comments }

Partitioning MongoDB Data on the Fly

by Tony Tam  |  Posted December 19, 2013

Let’s say you start a project with MongoDB. It is (and probably should be) simple and small. It starts getting some traction and expands. Your single server is now behind a load balancer and your EC2 instance sizes are getting larger and larger.

Someone reminds you to start backing up your data, and you switch to run replica sets. People are using and loving your service, and downtime isn’t an option. Your mongodump snapshots are getting huge, and it’s clear that you’re going to have data scale issues.

Browsing any of the hundreds of articles about MongoDB can teach you a ton of best practices from people who have been to war with growth. But what do you do when your system starts getting too big to change without a day of downtime?

Let’s put some things in perspective:

  • Copying 100gb of data on AWS will take hours
  • Writing 100gb of data with mongodump will take hours
  • Restoring 100gb of data with mongorestore could take many, many hours, depending on the index rebuilding process
  • It’s possible to have inconsistent data when performing mongodump, as records could be added as you’re writing files
  • If you’re on AWS + EBS, you can use the awesome EBS snapshot feature, but to do so, you need to lock the database during the snapshot process, which could take hours

To partition your data with the standard MongoDB toolset, significant downtime is unavoidable. You’ll either need to write a bunch of application logic, or get creative with some third party tools. This is a problem that we’ve hit at Reverb more than once, and are the exact same tools + technique that we used to migrate across datacenters (see From the Cloud and Back).

Now let’s assume that your data growth is such that you need to split up your database into many, smaller databases. This is a very common thing to do when optimizing MongoDB—smaller, discrete MongoDB instances are much easier to manage and scale, especially on cheap cloud servers. So now your app, which looked like this –

single

will look like this –

multi

We again have a number of options. But let’s assume that downtime isn’t one of them, and you have a reasonably large MongoDB deployment, so the time to move data is going to be in the hours. Luckily there are mechanisms to work with the MongoDB oplog along with some (free!) open-source tools to facilitate this.

From the software point of view, we’re going to assume that the servers are going to be updated with new logic, one after the other, and that they’re behind a reasonable load balancer. We’ll also assume that during the updates, one server can handle the traffic and load, which should be a safe assumption, assuming you’ve been able to update the software without outages in the past.

So during the update period, there could be writes sent to the old service, which is writing to the old, single MongoDB cluster. After updating, there’s a period of time where both servers are writing before the second machine is updated.

For the database, we want to first create a new cluster to take the subset of data from our main cluster. There’s initially no data—this is easy, and well documented.

Now, getting the data ready. We’re going to do this in three steps—all apply just to the collections that we’re MOVING:

  1. Store all write operations to disk
  2. Perform a Mongodump on the old cluster to copy collections
  3. Restore the data with mongorestore
  4. Apply the write operations from disk
  5. Make a real-time replication of write operations from the old database to the new one

Let’s walk through each of these steps.

MongoDB has an oplog for all operations—this is how secondaries are kept in sync with the master server. The standard MongoDB java driver has the ability to read this oplog, and we at Wordnik have written some very simple utilities to work with the oplog.

Writing the oplog operations to disk simply means each write operation will be stored in a file. This is done as follows:


wget http://oss.sonatype.org/content/repositories/releases/com/wordnik/mongo-admin-utils-distribution/1.2.0/mongo-admin-utils-distribution-1.2.0-distribution.zip

unzip mongo-admin-utils-distribution-1.2.0-distribution.zip

./bin/run.sh com.wordnik.system.mongodb.IncrementalBackupUtil -c database_a.collection_1,database_a.collection_2 -o oplog-files -h 1.2.3.4:27017

This will connect to your MongoDB cluster at IP address 1.2.3.4 on port 27017 and start writing the operations on collection_1, collection_2 from the database named “database_a” into a folder called “oplog-files”. You’ll see a series of files written to that folder. Note that the entire oplog will be written to this directory, and you probably don’t need this.

To set the time range of files to write, you can simply create a file in the target directory called “last_timestamp.txt”. In that folder, put the unix-timestamp (in seconds) that you want to read from in the target directory:

cat oplog-files/last_timestamp.txt
1386725439|0

Note the “|0” at the end

Keep this process running!

Now dump the collections you want. This is easy—just use the standard mongodump command:

mongodump 1.2.3.4:27017/database_a -c collection_1
mongodump 1.2.3.4:27017/database_a -c collection_2

This will write the two collections that we’re interested in into a folder at ./dump/database_a/

Remember that because we’re writing all operations to disk, we’re safe for the delay in writing the oplog.

Now we restore those dump files into our new database. Again standard mongodb administration technique:

mongorestore -d database_a ./dump/database_a

We’re almost there. You can now stop the oplog copying and write the files into the new server. The graceful way to stop the oplog tail program is to create a stop file –

touch stop.txt

– and it’ll flush all data to disk then exit. Note that in the output directory (oplog-files) the last_timestamp.txt file has been updated with the last record it read from the server—don’t delete this file, we’ll need it later. We can now restore those files to the new server:

./bin/run.sh com.wordnik.system.mongodb.ReplayUtil -i oplog-files -h localhost:27017

This will apply—in order—the files that we wrote during the oplog tailing process. Restoring these files will apply all operations to the timestamp that you stopped the file copy process. It can take a little bit to run, depending on how much data you have.

Finally, we’re ready to complete the last step, which is syncing the remaining and future operations from the old master. First, you want to copy the last_timestamp.txt file into the working directory of the oss-tools:

cp oplog-files/last_timestamp.txt .

Then start replicating all operations from that timestamp onwards to the new server:

./bin/run.sh com.wordnik.system.mongodb.ReplicationUtil -h 1.2.3.4:27017 -H localhost:27017 -c database_a.collection_1,database_a.collection_2

Voila! You have now made a exact copy of data from one cluster to another. You should now run some queries on both of the databases to ensure replication is working correctly. You should also verify collection counts between instances.

Finally, we’re ready to update our application. You can do this by simply shutting down the services one at a time and updating them. You should verify that your application logic is writing to the new cluster. Once done, stop the replication utility! You’re now safe to drop the collections that you’ve moved to the new server.

Using replication utilities creates a number of opportunities for interesting deployments. If you look at the source code for the utilities on github:

https://github.com/wordnik/wordnik-oss/tree/master/modules/mongo-admin-utils

You’ll see it’s very easy to modify them. So you could, for instance, replicate production data to a development environment, and anonymize email address in the process.

I hope that’s helpful and that it lets you continue to grow your MongoDB deployment without downtime!

{ 2 comments }

Filed under: MongoDB | Permalink | 2 Comments

The Reverb App, Powered by Wordnik API and Swagger

by Tony Tam  |  Posted December 13, 2013

You may have heard the news: recently Reverb released its Reverb app for iPad.

The Reverb app leverages Wordnik’s fundamental understanding of words and content, and combines it with state-of-the-art NLP and computing to build a personalized, dynamic Word Wall.

Understanding this content requires a complex pipeline of processing and literally hundreds of servers. To say APIs are important here would be a huge understatement—the Reverb app is powered by over 35 different service types and roughly 500 different API operations.

tonyWordWallScreenCropped

The Word Wall above changes constantly based on thousands of factors, requiring orchestration across over a dozen service types. The same Wordnik APIs from our public API are used throughout our infrastructure.

So how does Reverb leverage Swagger in this complex deployment scenario? Let’s take a look.

definitionCropped-2

We’ve seen the lovely documentation that accompanies REST APIs that are Swagger-enabled. Explaining the behavior of your API is critical and now a prerequisite to getting adoption from developers. APIs are everywhere and the easiest ones to understand and use have the best chance at adoption! For example, look at Callfire or the US Government.

But how else can an API description help? You may not know about code generation via Swagger-codegen. It’s a powerful way to use a machine-readable API description with language-specific templates to produce clients in a variety of languages. That same templating layer can be used to produce static documentation—including HTML, PDFs, and other formats. A machine-readable web is necessary to remove tedious tasks from developers, and expandable, open-source tools can enable truly magic levels of productivity.

Early on, Reverb moved to a micro-service architecture, where chunks of logic are broken up into discrete servers. You can read tons about this style of development which has been evangelized by Netflix over the last couple years. In short, it’s a very efficient way to scale a development team and allow for different server languages, frameworks, and styles. As long as the micro service exposes a consistent interface, the caller doesn’t really care how it was built. It just expects a certain quality of service in making requests.

Consider a system like such:

01_restSystem

We split it up into a micro service like such:

02_restMicroService

We can then scale by adding instances of the heavily-loaded services, where our consumer (the upstream service) can load-balance across the target services automatically with a standard HTTP load balancer –

03_restScale

– or with a custom client that is aware of all the target servers:

04_restCustom

So that’s great. But remember the clients need to communicate with the services, and many services means many clients, let alone many services to understand. That’s where Swagger can fit in perfectly.

At Reverb, we have a Netflix-inspired build and deployment process where the “deployable unit” of software is a server. In our case, the server is an Amazon AMI which is versioned and available to launch at any instant. During that process, a compatible client library is built and made available for prospective service consumers.

Pausing for a minute and looking at this step from a higher vantage point, the process is effectively making the remote service a “library” which is invoked by the consumer. The consumer doesn’t need to know all the details about how it was built or deployed. It just needs to know the right way to communicate with it.

We have now effectively decoupled all implementation from the logic of the service—in fact, at Reverb, we run micro services on five different frameworks in three different languages. The consumers don’t care about those details! They call, get responses, and perform their logic.

You can expand this style of programming and deployment outside your infrastructure. The API world allows amazing things to be done, and just as simply as making an API call (which can translate to a simple function call), you can be invoking extremely complicated and valuable logic for your end-users.

And of course this process can be very simple or quite complicated. We’ve seen folks hosting the Swagger declarations on github (both public and private) which means the code generation process can be triggered with commit hooks. We’ve also seen people generate the Swagger declarations during the build process, fire up a server, and call it with the codegen. In the end, there are different work flows for both your language and your team. You have options.

For details of the client, it is very simple process to include in a target project as a dependency. For Scala, via SBT:

sbt2

 For Java, via Maven:

maven-1 copy

For Node.js:

node2

For Objective-C:

objc2

The point is, mature programming languages have simple facilities to bring dependencies into your project. The versioning between the service and client helps keep things compatible, and if the server were to (gasp!) have a signature change, your Swagger-generated client will reflect it (hopefully that turns into an error before your service is launched). Even (especially) languages which are not type-safe can benefit from having a server description. Does the structure match what you’re expecting?

Of course now that we have split up a logical server into n discrete services, we have a much more complicated deployment process. The Swagger metadata can help simplify the deployment process immensely, making a near zero-touch model for clients to communicate with the right version of service. We’ll dig into how that is handled at Reverb in a follow-up post.

While Swagger does “Hello World” well, it’s definitely not designed as a hello-world only system. Swagger excels at complex and real-world deployments.

{ 0 comments }

Filed under: api, swagger | Permalink | No Comments

Wordnik API in the Wild: Spoonflower

by Angela Tung  |  Posted July 23, 2013

Headerlogo_new

Reverb founder (and sewing enthusiast) Erin McKean has long been a fan of Spoonflower, a site that allows people to design, print and sell their own fabric, wallpaper, decals and gift wrap. When Erin found out Spoonflower was looking to use the Wordnik API to build their new tagging feature, it was like a match made in sartorial heaven.

Today we talked to Spoonflower web developer Stephanie Anton to find out more about their new tagging feature and how the site is using the Wordnik API.

How did you find out about the Wordnik API? Why did you choose it?

I went looking for a gem or API that could give me synonyms of a supplied word. I tried a couple other options, but Erin was so helpful with all my questions and really helped jump start the project. Once I found out that she had heard of Spoonflower and ordered fabric from us, I knew we had to use Wordnik.

Did you find the API first, and then decide to make the tagging feature, or did you already have the tagging feature in mind and find the Wordnik API?

The new tagging feature allows a user to add a tag and see a tag strength. Tag strength is a factor of how many designs sold on Spoonflower share this tag and how much the tag is searched for on Spoonflower.

When I was developing the feature, I saw some of the tags I was adding had a pretty low score and I wasn’t sure what other tag to use instead. That’s when I thought to add in a synonym generator to give people suggestions and started searching for something that would fill that need. We posted a screen shot and explanation of the changes on our blog.

What surprised you most about the API?

I was excited about the different kinds of synonyms that Wordnik had available such as related words and same context. Sometimes a direct synonym didn’t really work when describing a design, but the other options allowed us to bring back a better list to supply options to our customers.

What other APIs do you use? What frameworks/languages, etc.?

We use Github and Zendesk APIs for back-end administrative needs, but this is the first API we have used for the benefit of our customers. We have used several different kinds of Ruby gems (Spoonflower is developed using Ruby on Rails) and plugins.

What advice do you have for others using the API?

Make sure to completely explore the developer website! I missed the ability to curl at first, but once I found that it was smooth sailing.

Final question: Swagger, best thing since sliced bread, or best thing ever?

I’d never used Swagger before, but it was incredibly helpful when I was figuring out exactly what I wanted to bring back.

{ 2 comments }

Filed under: api | Permalink | 2 Comments

The Fastest Way to Connect to an API with JavaScript

by Tony Tam  |  Posted July 18, 2013

swagger-round

Let’s cut to the chase. JavaScript is everywhere. APIs are exploding. And swagger-js is the fastest way to connect from JavaScript to your API, period.

Show me the code!

Let’s take two examples, starting with the browser.

js-1a

Done!

Now consider a connection from nodejs to another API.

js-2

That was easy.

Let’s peel back the onion a bit. Initializing the Swagger client does a couple things. First, it calls the API and reads the Resource Listing (the list of all APIs on the server). Next, it reads each API Declaration and “builds itself”. Let’s pause.

To understand the relationship between the Resource Listing and the API Declaration in Swagger, you can read the wiki. Or to put it simply, the Resource Listing is like a site map for the APIs on the system. The API Declaration is the description of each API itself. Maybe you don’t want to know about all APIs on the system? You can point swagger-js directly at one of the API Declarations when initializing it:

js-3

The scope of the client will be much less (just this API) but that may be all you need. That’s up to the developer.

Next, what does “build itself” mean? Specifically, the JSON structure of the API (as described in the Swagger format) is read. The HTTP method, parameters, response types, content types, etc., are all described in the JSON which tells swagger-js exactly what it needs to in order to make calls. Don’t believe it? Try a different API! How about the US Government’s Federal Agency Directory?

js-4a

Note this time we specified a success callback (because async JavaScript is the right thing to do). Of course just logging into the console isn’t very interesting, but this shows how the library works.

Now let’s pretend that we don’t know what this API even does. What are the operations I can call? What are the parameters? When using the node repl, it’s easy to see (same goes for the developer console in chrome).

js-5

This tells us something important! There’s an identifier flag required to make this call. So we can call it, passing a (valid) identifier flag:

js-6

This is super powerful. Because not only do you know what APIs you can call, you can ask what the parameters are. Calling the API is also easy, and customizing our requests is a breeze.

Another example. Let’s say that your resource isn’t wide open–it requires some fancy authorization strategy. We’ll cover oAuth in a follow-up post but for now, let’s assume you have to pass “some” header in every request. Luckily, this is built into swagger-js.

js-7

How about sending data? Surly there is more to the programmable web than just GET operations, right? Let’s keep in mind that we never said to send a GET–the swagger-js client “knew” to call GET because of the Swagger JSON specification. But here’s how you’d pass data to a server with the same convenience.

js-8

That was easy! Here, we passed the body as a string (swagger-js doesn’t make assumptions about the content being objects). It does, however, default to expect the content-type is application/json How about XML? Remember XML?

js-9

So this is all neat and fancy. But one would say “this only let’s me connect to a Swagger-enabled API”. This is indeed correct. Swagger JSON is a technique to describe the API, and with it, you can have a client for “free”. But what if your API isn’t Swagger enabled?

Remember that you can document virtually any API with Swagger, even if it’s not your own. So as quickly as you can hand-code a JavaScript client, you can build a Swagger JSON specification, which of course doesn’t need to live on the remote server. Like the folks at ApiHub have shown, you can document an API regardless of where it’s hosted.

Give swagger-js a shot (it’s FOSS, as you would hope):

https://github.com/wordnik/swagger-js

And you can read more about the spec here:

https://github.com/wordnik/swagger-core/wiki

{ 3 comments }

Filed under: swagger | Permalink | 3 Comments

Integrating in a New Company: How I Made Myself at Home

by Chiao-Yen Dyi  |  Posted July 1, 2013

Hello Kitty

On my first day at Reverb, I was terrified.

I had been at my previous company for six years and was very comfortably established. I knew who to go to for help, I had my lunch crew, and everyone knew about my Hello Kitty obsession.

But it took me a while to get there. It was a month before I started talking to people, six months before I began eating lunch with others, and a year before I revealed my love for the bow-wearing cat. Now, at Reverb, I had to start all over. This was when my coworker said to me: “You aren’t who you were six years ago.”

He was right. I didn’t have to wait so long to feel at home at my new job. I could do something about it. But what?

I started brainstorming ideas. What made my previous job so memorable? What made me happy and want to go to work every day? It was the sense of home – the amazing people and my Hello Kitty decorated desk. But how could I reproduce that in my new place?

Getting to know you

My first step was getting to know my coworkers. But how? Reverb is full of coffee addicts so I started by inviting people along on coffee runs. That gives you just enough time to talk to someone but not so much time that it becomes awkward. But that wasn’t enough because not everyone drank coffee.

That was when it occurred to me that the best way to most people’s hearts is through their stomachs. We have a pantry of yummy snacks, so it had to be homemade. I decided to bake some [ridiculously delicious - Ed.] chocolate chip cookie encrusted Oreos and bring them in.

The responses were great. My co-workers started to know who I was and I was able to have interactions with even more people.

Lunching with the ladies

One of the hardest things for women in a tech company is that there are so few of us. Since college, I’ve always been one of the only girls in the room and thought maybe other people felt the same way. I suggested we have a “Ladies of Reverb” lunch. Everyone loved it and we all bonded over some delicious Mediterranean food.

Decorating my desk

My next goal was to make my new place more homey. It occurred to me that when people look at my Sanrio-swag, it encourages them to start a conversation and get to know me. I thought, What if everyone at work decorated their desks too? It would give us all a chance to get to know each other and I would have an excuse to Hello Kitty-fy my work area.

[Keep your eye out for an upcoming post revealing how everyone, thanks to Chiao, Reverberated their work spaces. - Ed.]

Solving problems

On one of the coffee runs, we were discussing how the Peet’s card (a pre-paid card that everyone can use) goes missing all the time. “I bet if we put it in a Hello Kitty holder,” I said, “nobody would lose it.”

People laughed and said, “Sure, go ahead.” That night, using some old conference card holders, my Hello Kitty scrapbook kit, and my old Hello Kitty phone case, I made two Hello Kitty coffee cardholders. The next day, everyone was shocked. I couldn’t figure out if this was a shock of approval or dismay. Regardless, we now have two beautiful Hello Kitty cardholders and the cards rarely go missing.

Keep disrupting

Five months ago,the CTO at my previous job said: “You’re disruptive. You see a problem, you figure out ways to solve it. Keep taking risks, keep trying different things. What’s the worst that can happen? Someone says no? Who cares?“

This new-found confidence is the difference between me six years ago and me now. Luckily for me, I keep working with people who are open to all my crazy suggestions and willing to try different things, which encourages me to keep being disruptive and try different things as well. Out of every hundred ideas I have, at least one will be useful. I’m sure of it.

[Photo: CC BY 2.0 by Arthur John Picton]

{ 3 comments }

Have Yourself a New-Fashioned Barn Raising

by Erin  |  Posted May 30, 2013

CC BY-NC photo by JoseJose

Last Friday at Reverb we had a barn-raising.

No, we didn’t raise an actual barn (although that would have been fun, too). Instead, we took a day that we had set aside for hackathon time, and decided to work on communally-voted-on features for our existing products.

Why did we do this instead of a free-play hack day? Well, although our past hack days led to some cool stuff, one or two days just wasn’t enough for each project to go from zero to fully-deployed, and Reverb devs really like to ship. Add that to everyone having one or more “pet” features that he or she would like to see implemented as quickly as possible in our current products, and the path seemed clear: why not take a day to work together on some big things that were 1) cool and 2) more shippable, but that weren’t on the immediate product plan?

Thanks to Ramesh and Ivan, our Directors of Engineering and all-around coordinators, we decided to give it a shot. Everyone suggested (on a wiki page) some barns to raise (both product features and infrastructure improvements). Once all the suggestions were in (more than 30!) we took some time in the daily standup to let each proposer explain why his or her barn should get raised. After a quick offline vote, we had eight candidate projects, and an exciting Friday ahead of us.

The immediate advantages of our barn-raising day were these:

  • lots of enthusiasm! the projects suggested were scratching real “itches” we all had for our products
  • by working from our current architecture, APIs, and framework, we were able to “stand on the shoulders of giants” — many of the projects were just taking some existing functionality a bit deeper
  • more experimentation than we get in a normal workflow: lots of UI variation and some internal evaluation tools
  • taking a day to do rapid prototyping of multiple new ideas will make our sprint planning more efficient down the road — we now have a better idea of how long some of these new features will take to finalize

There are some things you can do in a hackday that are less-suited to a barn-raising — for instance, trying out a completely new technology or framework is a bit harder when you have to integrate with existing products.

What will we do differently next time? Well, as in a real barn-raising, we’ll have the whole community (both devs and non-devs) suggest barns to be raised. We may also get a few more donuts.
Pink Donuts!!!

{ 0 comments }

Json4s Performance Round

by Katie  |  Posted May 7, 2013

 

json

At Reverb, we’re big users of Scala. To support the language we’ve been helping with libraries including Json4s. Admittedly Scala has some problems with Json4s so we’ve been working to update it.

With the last release of Json4s, we worked a lot on performance and at improving correctness of the reflection engine. This started out as checking how viable Scala 2.10′s reflection would be for our purposes. The short answer is it wasn’t all that suited for what we wanted from it. Also it doesn’t work with Scala 2.9. ;)

So our new reflection layer now does a lot better when detecting deeply nested Scala value types like Int in generic constructs. Overall this reflection is slower than the one that was used originally from lift-json but because we can cache descriptors a lot better, we don’t do any redundant work and it turns out to be faster than the pre 3.2.x series.

Our reflection layer now is built around the idea that classes don’t change at runtime and there are some cases where it can’t cache the descriptor for those (top-level generic types). It’s still built on the combination of Java reflection, Scala manifests and ScalaSig usage. We’ve also added experimental support for using case classes defined in other classes, but this probably needs more cooking.

The major focus of the 3.2 series was improving performance and binary compatibility. In the inherited implementation the `Formats` play a big role but are implemented as a series of vals without return type. So whenever something got added to those, it broke binary compatibility with the previous version. They are now defined as defs and in most places we’re now using explicit return types on public methods.

Last but not least, the performance improvements. Overall we’re seeing a performance increase between 4-10x when compared with the previous version of Json4s. We’ve reached this by using YourKit to identify the hotspots and drive timings down. In many cases, using a more appropriate datastructure gave us better results.

Something I’ve learned from this exercise is that scala.collection.immutable.List is a really really fast datastructure. It gives better results than java.util.LinkedList if you use it for its prepend functionality. If you really have to append or be able to navigate to the last item in your collection in constant time, you want scala.collection.mutable.MutableList. Then, make more use of instance sharing of stringbuilders and java.io.Writer implementations are also important and often improved speed noticeably.

So how are we doing wrt to Jackson in terms of performance? We’re now in the same digits range as them and are about 2-3x slower still but are a lot more correct.

Some numbers on the original state of affairs: https://github.com/json4s/json4s/blob/master/benchresults-lift.md#no-type-hints. And a relatively recently run benchmark over the new codebase: https://github.com/json4s/json4s/blob/master/benchresults-beans.md.

We at Reverb are committed to improving Json4s. Let us know if you have any suggestions. We’re always happy to see pull requests for the community.

{ 0 comments }

Filed under: scala | Permalink | No Comments

doris day

We admit it: mistakes were made.

The front end of Reverb for Publishers (RFP) is a JavaScript application that can display links relevant to the page you are viewing, concepts derived from the current page or recent trending articles on your site. This is powered by Reverb APIs which ingest the site content, analyze the text, provide recommendations, and monitor traffic. The bulk of our users are running WordPress powered sites but you can install this free plugin on any platform.

While RFP is now highly performing and increases page views by presenting related links on thousands of sites, we did do some dumb things during development. Here’s what not to do when creating an application that’s supposed to run in someone else’s webpage.

Don’t require interaction

Initial versions of our application had fancy interactivity we thought was inviting. We tested several radically different designs which required rollover to bring up more details or had stuff following you around the screen on the sidebar.

But when we gathered usage statistics the results were clear: the simplest implementations always won. It’s better to display all the information cleanly right away and not expect to entice viewers to mouse over or interact in any way for more information.

Don’t expect users to use configuration options

We built a lot of optional functionality into our application which could only be accessed via administration screens. Tracking the usage of this showed us that it was extremely rare for people who installed our widget to even open the configuration options much less experiment with all the extra functionality we made available.

The lesson learned here was that we could please finicky customers by providing advanced features but with new installs you live and die by your default configuration. Most people will just evaluate your app on those features and uninstall if it doesn’t fulfill their expectations.

Don’t think your in-house testing has any relation to real user experiences

When we started monitoring the aggregate speed of our application on real users, the results were shocking. Our little widget was reporting a load time of several seconds which we never noticed during development or in-house testing on live sites. Instrumenting the code execution path exposed some serious bottlenecks we had created.

One assumption we made was that jQuery was probably installed on a lot of the sites so the first thing we did was check for a relatively recent version and only load it if necessary, delaying subsequent code execution until it was available. We saw this process adding over a full second to the load time in aggregate real user data. Any synchronous operation like this caused a visible delay in our aggregate load time data.

We started by using Mixpanel for reporting which was a really nice way to be able to record and view the data in real time. We eventually moved to an in-house solution when we were able to reproduce the important Mixpanel features so we could report directly to our APIs and could remove the Mixpanel library from the client side code we were delivering to everybody.

Don’t depend on way too many libraries

The initial version of our application included several open source libraries which allowed us to develop features quickly. But when we got serious about performance, we found that removing all these libraries allowed us to reduce total app size by 70%. Admittedly we went overboard during early development including not only jQuery but Atmosphere, Backbone, Underscore, etc. We still use EJS style templates but we precompile them during our build process and then no longer need to include a templating library in the client side code.

In order to support our target browser set (Android 2.3+, iOS4+, IE8+, last 3 versions of FF and Chrome) after dropping jQuery we needed to change our selector syntax to use querySelector instead of $:

element = document.querySelector('.wordnik_discovery');

In order to support IE8 which still represents 5% of traffic we need handle lack of addEventListener:

addEventListener: function(el,eventType,handler) {
  if (el.addEventListener)
    el.addEventListener(eventType, handler,false);
  else if( el.attachEvent )
    el.attachEvent ('on'+eventType,handler);
  }
}

Don’t use defenseless CSS

Our design strategy with our widget was to let the host site’s styles shine through so we made sure not to override the colors of anything or type attributes. This worked pretty well and we were able to fit into a variety of designs smoothly. There were of course some styles that we need to be able to control and we were careful to tie important layout features like the float layout of template classes back to an ID reference for maximum specificity. But we weren’t careful enough. Site owners will do some things that even in retrospect seem difficult to have anticipated:

a { white-space: pre-wrap; }

This seems like a pretty strange rule but we actually encountered a site with this applied. Here is an example of the consequences. I overrode the pre-wrap with white-space:no-wrap on the middle item:

Don’t expect users to install updates

Although WordPress users are presented with update notifications when they view their plugins and updating is a one click process, a significant percentage of our userbase has better things to do with their time. In order to actually fix problems we encountered we designed our widget to load the main application code from our servers asynchronously on domready so we were able to push updates to our entire install base.

Don’t deliver unnecessary templates

Fully JavaScript-powered apps are able to take advantage of beautiful templating engines, but they usually deliver all the possible templates to the client whether or not they get used. In our case we were delivering all the optional templates when we knew a site could only choose a single display option per widget.

To avoid delivering unnecessary JavaScript template code, we’re currently moving towards the API response providing HTML fragments instead of pure data in a JSON object. Shifting the creation of the HTML fragment to the server allows us to slim down the delivered JavaScript app code by removing all previously client-side templates. This would have created some difficulties if we weren’t able to use similar js templating code server side, but we are able to execute Mustache in a variety of languages and can get even better templating options by using Node.js to transform EJS style templates which provide the flexibility I personally find most comfortable.

Don’t have insufficient JavaScript encapsulation

WordPress can be a bit of a bad neighborhood front end wise so it is a wonderful place to see how your code will fare in unusual situations. It is not uncommon to see JavaScript errors caused by WordPress plugins on live sites whether from the plugin’s front end code, or a mishandled configuration. We were pretty careful to do no harm to the sites where we were installed so we didn’t add to the confusion. I’m not saying we were blameless but we followed recommended practices to keep our JavaScript unobtrusive. Everything is wrapped in an immediately executable function and our app is a single global object with all the required functions attached to it.

(function(){
var WRC = window._WRC = { version: "0.6.5"};
_WRC.extend(_WRC, {
  DOM: {...
})()

[Photo Credit: CC BY 2.0 by velvettangerine]

{ 0 comments }