Sunday, December 20, 2015

Google App Engine cron jobs run as no user

Google App Engine (GAE) is a great platform to use when developing just about any new application.  I use it for a couple personal projects, as well as for some fire department apps (dispatch and inventory management).

Today I'd like to talk about two things: users and cron jobs.

Users

One of the really convenient things about GAE is that it has built-in support for Google authentication (no surprise there).  This means that you can let GAE take care of your sign-in system (and trust me, handling the single-sign-on 3-way handshake isn't all that fun).  With GAE, you easily go one of two routes:
  1. Certain paths in your web application are automatically required to have a logged-in user.  If someone tries to access a path without being logged in, GAE will redirect him to a sign-in window and then bring him back when he's done.
  2. Server-side, you can ask GAE for log-in and log-out URLs, and you can direct a user to these at any time to have him log in or out.
I personally prefer the second route because there's no strange redirection and all of my endpoints behave as I expect them to.  For example, if I have a JSON REST API, a user (who has not signed in yet) can access the endpoint and be given a normal 400-level error (in JSON) instead of being redirected to a Google sign-in page.  My REST clients much prefer this.

To see if a user is signed in, a Java Servlet can call the "getCurrentUser()" method of the "UserService" instance.  If the result is null, then no one is logged in.  Otherwise, you get a couple of helpful methods to tell you about the user:
  1. "getEmail()"; this returns the user's e-mail address.
  2. "getUserId()"; this returns a unique numeric ID for the user.
  3. "isUserAdmin()"; this returns whether or not the user is an administrator of the GAE application.
For my apps, user authentication is whitelist style.  I check the "@" portion of the e-mail address to see if the user is in one of the domains that I care about (I typically build apps internal to an organization that uses Google's mail system), and I check the administrator status to grant administrator powers to my admins.

If someone tries to access a sensitive API endpoint without being logged in appropriately, I'll send back a 400-level error stating that the user is not signed in with an appropriate account.

Pretty easy stuff.

UserService userService = UserServiceFactory.getUserService();
User user = userService.getCurrentUser();
if( user == null ) {
   // There is no user logged in.
} else {
   // The user is logged in.
   System.out.println( "User is logged in: " + userService.isUserLoggedIn() );
   System.out.println( "User is administrator: " + userService.isUserAdmin() );
   System.out.println( "User:" );
   System.out.println( "   Auth Domain: " + user.getAuthDomain() );
   System.out.println( "   E-mail: " + user.getEmail() );
   System.out.println( "   Federated ID: " + user.getFederatedIdentity() );
   System.out.println( "   Nickname: " + user.getNickname() );
   System.out.println( "   User ID: " + user.getUserId() );
}

Warning: you cannot call "userService.isUserAdmin()" if the user is not already logged in.  If you try to, then it will throw an exception.

Cron Jobs

Another thing that GAE can do is schedule cron jobs.  Basically, these are page requests that are scheduled like normal Linux "cron" jobs.  So if you need to have some task performed regularly, create an endpoint for it and schedule a job to access that endpoint.

Cron jobs act as if an administrator is making the request, so they can access all paths with "admin" requirements.  Howerver, you cannot check this using "userService.isUserAdmin()" because cron jobs do not run as any particular user.

To determine if a request is coming from the cron scheduler, you have to check for the "X-Appengine-Cron" header.  This header cannot be faked (except by admins); if you try to set this header, GAE will quietly remove it by the time that it gets to your Servlet.

To detect a cron job, you have to check for the header and make sure that its value is "true".

String cronHeader = request.getHeader("X-Appengine-Cron");
if( cronHeader != null && cronHeader.compareTo("true") == 0 ) {
   log.info( "Cron service is making this request." );
}

Ultimately, if I'm checking to see whether a user is allowed to access a particular section, I go through these steps:
  1. (Assume no access at all.)
  2. No user is logged in.
    1. Is the "X-Appengine-Cron" header set to "true"?  If so, then allow administrative access.
  3. A user is logged in.
    1. Is the user a GAE admin of the application?  If so, then allow administrative access.
    2. Is the user's e-mail domain in the whitelist of basic user access?  If so, then allow basic access.

Monday, September 14, 2015

Fixing the default Ubuntu snmpd configuration

SNMP is super helpful for performance and health monitoring of any production equipment.  It's lightweight, easy to understand, and very resilient when Bad Things happen to the network.  If you're not monitoring your production equipment with SNMP, then probably should look into that right away (we use SevOne NMS at work).

Getting "snmpd", the Linux SNMP daemon, up and running on Ubuntu is simply a matter of installing "snmpd":
sudo apt-get install snmpd;

Or is it?

Default configuration woes

Logging

By default, Ubuntu wants to log literally everything that "snmpd" does to syslog.  While I love the enthusiasm, this quickly leads to overflowing logs and the headache around them (plus it makes it impossible to find any event that's actually important).

How many times do you want to see messages like this in your logs?
Sep 11 16:48:23 your-server snmpd[19552]: Connection from UDP: [192.168.59.101]:49867->[10.129.11.219]
Sep 11 16:48:23 snmpd[19552]: last message repeated 199 times


The logging options are specified on the "snmpd" command line, and are thus configured in "/etc/default/snmpd".

The default logging settings are:
-Lsd

"-L" is for the logging options.  "s" is for syslog.  "d" is for the daemon facility.

What we want are these settings:
-LS 4 d

"-L" again is for logging options.  Capital "S" is for a priority-filtered syslog, with "4" being "warning-level or higher".  Again, "d" is for the daemon facility.

Port access

By default, Ubuntu locks down SNMP access to "localhost", so it's 100% useless from a monitoring perspective.  While I respect the security-mindedness displayed here, I need my boxes to actually respond to requests.

The access options are specified in the "snmpd.conf" file, which is located here: "/etc/snmp/snmpd.conf".

At the top of the file, there is a configuration item called "agentAddress".  By default, this limits requests to those originating locally.
agentAddress udp:127.0.0.1:161

There is usually a line following it that's commented out, and that's the one that we want.  Get rid of the line above and make sure that this one is enabled:
agentAddress udp:161,udp6:[::1]:161

This makes sure that any requests to port 161 (the standard SNMP port) will be allowed.

Permissions

Yes, yes, we should all be using SNMPv3's great user-based access-control mechanism, but for an internal-to-the-company server that can't be reached from the Internet, we can often afford to be lax.  And hey, I'm not stopping you from setting up SNMPv3 access control.  Go nuts.

Here, we're going to allow the community string of "public" to access everything about the box (but not make any changes at all).

The default configuration allows "public" to see some basic system information, but that's not good enough:
rocommunity public default -V systemonly

Get rid of that line and replace it with one that doesn't have the "systemonly" restriction:
rocommunity public

Restart "snmpd" and you'll be ready to respond to SNMP requests from your local management station.
sudo service snmpd restart;

Sunday, September 13, 2015

Google App Engine and Google Authentication

One of the things that I love about the year 2015 is "cloud computing"; in particular, I love that I can hand Google App Engine a Java project and it will host it, handle redundancy, auto-scale it, provide me a database, and do just about everything else that I could ever want.

However, security is still a major concern, and there are some applications that I work on where "leaking" some private data onto the public Internet is bad news.

User authentication

There are lots of ways to authenticate users at this point.  Years ago, we did everything ourselves (remember those days?).  Each application had its own user database with varying degrees of security, and passwords were being stolen and sold all the time.  Now we have things like OAuth, where we can pass off user authentication to other systems (which we have to trust), so we don't have to store anything more than an e-mail address.  If the OAuth server says that that e-mail address is legit and logged in, then it's legit and logged in.  This saves us time and money, since who wants to build and maintain a user authentication layer, anyway?

Google App Engine provides a pretty easy and awesome way to lock down your application to "signed in" users.  Here, "signed in" users can be one of two things:
  1. Any user on Earth with a Google account; or
  2. Any user in your Google Apps domain.
If you have a Google Apps domain, then with a couple of tweaks to both the domain and the app, then Google will make sure that only those people in the domain can log in to the app.  Otherwise, if you'll have to check for a particular domain from your logged-in users in your REST API calls (your app is RESTful, right?).  No biggie, either way.

(In this article, I'll be talking about the Java version of the Google App Engine SDK, so any files or calls will be related to that.)

To set up your app to force everything to require a logged-in user, you just need to update "web.xml" and add a "security-constraint" (obviously, you can play a lot with this):
<security-constraint>
<web-resource-collection>
<web-resource-name>site</web-resource-name>
<url-pattern>/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>*</role-name>
</auth-constraint>
</security-constraint>

The "url-pattern" of "/*" means "MATCH ALL THE URLS!", and the "role-name" of "*" means "all users signed in with a Google account".

I can't log out!

I recently put together an application for a group where I don't own the Google Apps domain, so I couldn't set up the sweet domain-level restriction at the app level.  Like I said, it's no big deal, so at the top of my API class was a check for "@myspecialdomain.org" in the user's e-mail address (remember, Google already verified that it was a Google account, so I just needed to make sure that it was in the right domain).

I asked the group to try out the application, and half loved it; the other half couldn't see any data.  It turned out those people logged in with their personal accounts (not their organization-specific accounts), and now were stuck in an application where they had no access to any data.

What I needed was a "switch accounts" button (like all of the other Google applications), or at least a way to invalidate the user's session so that they could log in again (and use their organization account, this time).  Google App Engine's built-in authentication system does some magic, so the usual Google authentication guides don't apply.  I tried all of the stuff that people online were talking about (invalidating sessions, deleting cookies, etc.), but none of it worked, and it seemed like a hack anyway.

It turns out that Google App Engine provides a "UserService" class help deal with this kind of problem.  In particular, you may request a login URL and a logout URL.  Since my whole app requires the user to be logged in, all I needed to do was log the user out, and he would be immediately redirected to the Google login/pick-account screen again.

I added a simple API call to return some information about the current user (so I could show the "Logged in as ..." message), and that call returned an additional property for the logout URL (remember, this URL is generated by Google App Engine and isn't trivial).  When my UI loads, it makes an AJAX call for the user information, grabs the logout URL, and provides a "Logout" link for any users that need to log out.

Problem solved.

Here's how to get the logout URL:
UserService userService = UserServiceFactory.getUserService();
String logoutUrl = userService.createLogoutURL("/");

The "/" argument to "createLogoutUrl" is where the user should be redirected once he logs out.  In this case, I'm sending him right back to the main screen of the app, where he will be asked to log in again.

Additional resources

  1. For more information on "web.xml" within Google App Engine, see this document.
  2. To learn how to set up authentication with Google App Engine, see this article.

Beware the AppScale firewall

If you haven't already looked into AppScale for your company's internal application needs, then you may want to spend some time looking into it.  In short, it's an open-source implementation of Google App Engine that can run on a "private" cloud.  Why is this cool?  Well, it lets me use all of the power and convenience of Google App Engine apps without having to put my app in the public cloud (high latency and billing), instead letting me use all the resources that I want in my company data center.

tl;dr: AppScale runs "iptables" on its own, so if you want to run an additional service (such as SNMP) on a node, then you'll have to configure AppScale to allow it.

My goal: SNMP monitoring

At work, I'm setting up an AppScale cluster to serve some internal applications.  The first thing that any server needs to do, once online, is provide performance statistics (via SNMP) to our performance management tool (in our case, we use SevOne NMS).

Typically, this is the world's easiest task:
  1. Install "snmpd" (apt-get install snmpd).
  2. Allow "snmpd" to respond to remote requests (duh).
  3. Fix Ubuntu's terribly verbose SNMP logging defaults.
Unfortunately, I fought with this for an hour because, no matter what I did, "snmpd" would not respond to any requests from my management tool, and nothing in the logs said why.  For reasons not perfectly clear to me, AppScale (for Ubuntu) runs on Ubuntu 12.04, so I thought that maybe there was some ancient security measure in place that I had forgotten about over the years.

I eventually stumbled on "iptables" as a culprit (it's never first on my list, but probably should be).  I ran "iptables -L -n" to list the current "iptables" rules, and sure enough, the system had some:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination        
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0          
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:22
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:80
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:443
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1080
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1443
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:2812
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:5222
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:5555
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:6106
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpts:8080:8100
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpts:4380:4400
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:17443
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:4343
ACCEPT     all  --  10.129.11.219        0.0.0.0/0          
ACCEPT     all  --  10.129.11.221        0.0.0.0/0          
DROP       all  --  0.0.0.0/0            0.0.0.0/0          

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination        

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

However, no amount of "iptables" magic would allow me to get the system to respond to SNMP requests.  I'd add a rule, and it might respond for a few seconds, but after that, my SNMP requests would time out again.  My rule?  Gone.

The AppScale firewall

It turns out that AppScale maintains the "iptables" setup for the box, and any change that you make will quickly be reverted by it.  This doesn't, in principle, bother me except that it's not really documented anywhere.  The only real mention of it is the Performance Tuning document, and even then, it's just a quick mention in order to get HAProxy stats from the box.

The AppScale firewall configuration lives in "appscale/firewall.conf" (the default installation guide had me put the "appscale" directory in "/root", so the file was located in "/root/appscale/firewall.conf" for me).  Once I saw what was going on, it was simply a matter of making a quick change to the file and waiting a few seconds (AppScale periodically re-reads the file and makes any changes live).

To tell AppScale to allow SNMP requests, I simply had to add the following line after the other "iptables -A" lines:
iptables -A INPUT -p udp -m udp --dport 161 -j ACCEPT # SNMP

Problem solved.

Additional resources

  1. The current default version of AppScale's "firewall.conf" can be found here.

Wednesday, November 27, 2013

Chrome/Chromium, Roboto, and the horrible text nightmare

I began doing Android development a year or so ago, and the first thing that I noticed was that the official Google Android documentation looked absolutely horrible in Chromium (and Chrome too).  I tried updating fonts, disabling fonts, and installing new fonts, and none of that helped.  On other people's computers, the text looked fine.  In Firefox on my own computer, the text looked fine  On my computer, in Chromium: total crap.

One of the directions that I researched was around the Roboto font (which apparently was released with Android Ice Cream Sandwich).  However, it turns out that the font itself has nothing to do with the problem.

Today, I finally figured it out (and fixed it!), and my Internet (and Android development) experience has been much, much better.

First, let me tell you that I've been running Kubuntu (the KDE Ubuntu variety) this whole time.  So, this covers Kubuntu 11.10, Kubuntu 12.04, Kubuntu 12.10, Kubuntu 13.04, Kubuntu and 13.10.  I had the problem with all of them.

Here's what it looks like:


To make a very long story short, I had to enable anti-aliasing for fonts for the entire system.  I typically set all of my graphics settings to the lowest possible levels in order to not have to see stupid animations or other things that slow down my experience just to be on par with Windows, and it looks like font anti-aliasing is one of the settings that got turned off.

So, a quick tweak to the drop down box here:

And everything now looks nice and pretty.  Here's the exact same page from before, this time with system-wide anti-aliasing enabled:


Problem solved.

Monday, October 28, 2013

Beware the void* trap

I'd like to take a little bit of time today to share a problem that took me numerous days to track down and solve because it was so obscure and stealthy.  At work, we use ZeroMQ to handle any inter-process communication, and our primary language is C++.  I had begun a fairly intense project to remove ZeroMQ from the intra-process communications of a particular daemon, leaving it only for when we need to cross process boundaries.  Basically, once the messages that we want get into our daemon, there is no reason to pass them off to threads and such by serializing them to byte arrays (since that what a ZeroMQ message is) when they could be added (as objects) the lists, passed around to thread pools, etc.

As I was nearing the completion of the project, I found that all outbound communication from one of the sections of code seemed to be lost.  This was strange because the inbound communication was handled properly.  I had refactored both directions, so it was quite possible that I had messed something up.  However, no matter how many times I checked the logic, no matter how many log statements I made at each line, nothing looked amiss.  I struggled for days trying to understand why the "send()" calls seemed to be going nowhere, and the answer shocked me.

Here is a simplistic version of the code that I had.  Basically, this code receives (from somewhere else in the program) a list of messages to send to a ZeroMQ socket as a single atomic message.  This is accomplished by adding the "ZMQ_SNDMORE" ("send more") flag to the call.
void sendMessageToEndpoint1( std::list<zmq::message_t*>& messages ) {
   while( messages.size() > 0 ) {
      //! This is the ZeroMQ message that we're going to send.
      //! It has been prepared for us elsewhere.
      zmq::message_t* message = messages.front();
      // Remove the message from the list.
      // The size of this list is now the number of remaining messages to send.
      message.pop_front();
      
      //! This contains the flags for the "send" operation.  The only flag that
      //! we're actually going to set is whether or not we have more messages
      //! coming, and that's for every message except the last one.
      int flags = messages.size() > 0 ? ZMQ_SNDMORE : 0;
      // Send the message to the endpoint.
      // Note that "endpoint1" is of type "zmq::socket_t*".
      endpoint1->send( message, flags );
      
      // We can now delete the message.
      delete message;
   }
}

I promise you that I had logged something before and after every statement, and everything was exactly as I expected it to be.  No exceptions were thrown.  There were no compiler warnings.  But no client on the other side of "endpoint1" ever got any of the messages that were being sent.  This drove me crazy.

The answer is that I was passing the wrong thing to "zmq::socket_t::send()".  Unlike the "recv()" ("receive") call, which takes a pointer to a "zmq::message_t", the "send()" call merely takes a reference to a "zmq::message_t".  Clearly this is a type error, so the compiler should have caught it.  But it didn't.

Here's the signature of the "send()" function.  Basically, it sends a message with some optional flags.
bool send( zmq::message_t& message, int flags = 0 );

I was sending it a "zmq::message_t*" and "int", so the compiler should have reported an error, since the type of the first argument was incorrect.  However, no error (or warning) was printed, and it compiled fine.  Even stranger, nothing bad happened when I called "send()".  Nothing good happened, either, but the code ran with the only strange symptom being that my "send()" calls seemed to do nothing.  The client on the other side of the ZeroMQ socket simply never received the message.

So, what's up with that?

It turns out that there is another "send()" function, one that takes three parameters.  It sends an arbitrary number of bytes with some optional flags.
size_t send( void* buffer, size_t length, int flags = 0 );

And there's the rub.

We've already established that the first "send()" function shouldn't work.  But here's a second "send()" function that does meet our signature.  As for the first parameter, a "zmq::message_t*" will be implicitly cast to "void*" in C++.  As for the second parameter, "int" will be implicitly cast to "size_t", which is just an unsigned integral type.  As for the third parameter, it is not specified, so it'll be set to zero.

This second "send()" is clearly not what I wanted to use, but the compiler doesn't know that I thought that the function required a pointer to a message, not a reference to it.  Since "ZMQ_SNDMORE" is defined to be the number 2, this call to "send()" only attempts to transmit two bytes.  And because a "zmq::message_t" is certainly larger than two bytes (it is actually at least 32 bytes), the data to copy, from the second "send()" function's perspective, is always present.  This means that in addition to not getting any warnings or errors, I am also guaranteed to have this code never crash, since it will always send the first two bytes of the "zmq::message_t" structure.

Naturally, the fix was to send the dereferenced version of the message, and everything worked fine after that.  The moral of the story here is to watch out for implicit "void*" conversion.  And if you are making a library that accepts a byte buffer for reading/writing purposes, please set the type of that buffer to some byte-oriented type, such as "char*" or "uint8_t*".  These would require explicit casts, thus preventing accidental use as in my case.

Sunday, April 21, 2013

Don't let std::stringstream.str().c_str() happen to you

If you're coming to C++ from C, then you will quickly learn to love std::stringstream.  These things let you quickly build out a (possibly huge) string by just tacking on string literals or any other variables to the end.  It's useful for building on-the-fly SQL queries or constructing configuration or connection strings that involve numbers (such as port numbers), since you don't have to pre-define a buffer of known length and snprintf onto the end of it and check for length issues and such.

And you'll love std::string, since that'll save you countless "strdup" calls and null checks.  std::string also has some extra powers that make him way more useful than character buffer manipulation, but still less amazing (and heavy) than std::stringstream.

Anyway, you'll also quickly find that most functions don't accept std::stringstream or std::string; rather, they accept "const char*", which is fine by me.  In fact, std::string has a "c_str" function that will return just such a pointer, and std::stringstream has a "str" function that will return a std::string, so that's great, right?

Yes, absolutely.

But watch out!

But watch out for this:
//! This is our string stream; we're just going to put something
//! in it for fun.  This example will use a made-up connection string.
std::stringstream myStringStream;
// Set up the "connection string"; note for example purposes that
// these could be variables of any type; much like the thing at the
// end is an integer.
myStringStream << "tcp://" << "localhost" << ":" << 9001;

// Create a character pointer so that another function can use it.
const char* myPointer = myStringStream.str().c_str();

// Use that in some function.
someCStyleFunction( myPointer );

Did you see the problem?

When "myPointer" was created, it called "c_str" on a string that was only alive for the duration of that line.  After that line is over, the string that generated the character pointer has been deleted; thus, the pointer to its data is invalid.

Valgrind will complain about this as accessing some memory that was deleted by the destructor of std::string, but you'll probably be too confused to realize what's going on.

In a single-threaded situation, you might be able to slide by without noticing this because nothing has used that memory just yet.  However, in a multi-threaded situation, that memory is essentially instantly whisked up by other threads for other uses.  And now your character pointer points to random other data.  Welcome to what might be hours of troubleshooting and debugging.

The proper solution

Since "c_str" returns a pointer to the internal buffer of a std::string, and since you don't have to free it, it means that the character pointer that it returns is only valid for the lifetime of the std::string that it came from.

Our earlier example could be addressed in one of two ways.

The sneaky way

Don't let the std::string go out of scope by ending the line.  The "str" function's result, a std::string, won't be cleaned up until after "someCStyleFunction" completes, so this gets around the problem.  However, later expansion or debugging of the code might inadvertantly re-introduce it.  Avoid this method.
//! This is our string stream; we're just going to put something
//! in it for fun.  This example will use a made-up connection string.
std::stringstream myStringStream;
// Set up the "connection string"; note for example purposes that
// these could be variables of any type; much like the thing at the
// end is an integer.
myStringStream << "tcp://" << "localhost" << ":" << 9001;

someCStyleFunction( myStringStream.str().c_str() );

The classy way

Actually store the std::string so that it goes out of scope when you want it to.  This makes it clear what the string is for and what its scope is.
//! This is our string stream; we're just going to put something
//! in it for fun.  This example will use a made-up connection string.
std::stringstream myStringStream;
// Set up the "connection string"; note for example purposes that
// these could be variables of any type; much like the thing at the
// end is an integer.
myStringStream << "tcp://" << "localhost" << ":" << 9001;

//! This is the string that we have created with our string stream.
std::string myString = myStringStream.str();

// Create a character pointer so that another function can use it.
const char* myPointer = myString.c_str();

// Use that in some function.
someCStyleFunction( myPointer );

Hopefully this might save you some time.  I spent hours researching the thread-safety of the STL for my current g++ version and was lead down all kinds of wrong paths for a simple, simple scoping issue.