Wednesday, January 4, 2017

Performance of N-Body Simulation Styles in Scala

I spend a lot of time doing N-body simulations. My working simulation code is written in C++. The reason for that is simple, speed. It is a serious number crunching code and even written in C++ runs can take months to complete. I would love to do these simulations in some other language that I enjoyed writing code in more, but I'm not willing to give up all that much speed in order to make that happen. A number of years ago a student and I explored the performance of a Java version of the code, and we found that we could get performance within 10-20% of the C++ version, but doing so required doing a number of things in a way that made the Java almost as challenging to work with as the C++. These days, I'd really love to be able to convert my code to Scala, as I find it much more enjoyable to code in, but I'd have to be certain that I can get performance close that what I'm getting in C++. My guess is that in the end I'll wind up updating my C++ code to use features of C++11, 14, and 17, but I still like to explore the possibilities of using a different language.

There are lots of ways that you can write N-body simulations in Scala. Some of them take advantage of the expressivity of the language to produce shorter code that is easier to read and generally less bug prone. However, there are reasons to believe that those approaches might also tend to be slower. For a while I've been meaning to explore different approaches to see how much of an impact it really has on the speed of execution. While there aren't all that many people who do N-body simulations, my expectation is that the results of these performance tests will apply to many other numerically intensive applications.

Array of Mutable Classes

In C++, one represents particles with a struct and makes sequences of these using arrays, vectors, or valarrays. Everything is generally mutable. This first version of the code in Scala is written to mirror the C++ code. Of course, there is a huge difference in the memory model. When you make a sequence of a simple struct in C++, the entire memory for all elements is laid out in a single contiguous block. The way this code is written in Scala, the array is an array of references to instances of MutableBody and each of those contains two references to instances of MVect3. This matters because caching is vitally important on modern computers and contiguous memory accesses have much better caching performance. In this simple example, odds are that because all of the objects are allocated in order at the beginning of the program, they wind up being contiguous in the heap, but there are still multiple levels of references that have to be followed, which are likely to impose some performance cost.

This code really does look similar to what the C++ code for this type of thing looks like, unless you happen to have a library set up for SIMD vector operations or expression templates for the 3-D vectors. Unfortunately, that isn't a good thing. Not only is this code verbose, I can speak from experience that it is bug prone. It is just too easy to have one of those x, y, or z fields typed wrong and the resulting error is very difficult to diagnose and track down.

Array of Immutable Classes

Given the capabilities of Scala, a much "better" way to write this is to use immutable classes and add symbolic methods to a 3-D vector class so that you can simplify the math expressions and make them less error prone. Of course, this change means that you are doing a lot of memory allocation making small objects for all of the 3-D vectors.

Clearly this code is shorter and more readable. It is still using the same basic approach with a mutable array for storage, but the ability to use mathematical operators on vectors improves the code significantly. It is worth noting that in the mutable version you can make operations like +=, but given the math that is being done, they aren't all that useful unless you are making temporary values to store scaled versions and such.

Arrays with Value Class Wrapper

Now we are going to get a bit more interesting and produce a version that tries to give us contiguous memory without hurting program readability and without making lots of temporaries. Basically, this is an attempt to make a version of the code that has the best chance of being speed competitive with the C++ code while still being reasonably readable. The real key areas of this code are lines 5-29 where we make a number of arrays of doubles and then define a value class. The value class is a newer feature of the Scala language that is stack allocated and has no more overhead than a primitive, while allowing you to have nice OO syntax. Instances of it are stored as just primitives on the stack and the methods wind up being static methods in the JVM, so there is no memory or speed overhead of object allocation. In order to use a value class, we have to put the whole thing in an object declaration and not a class declaration. Value classes can't go inside of other classes because then they would be non-static inner classes and that would mean they have the overhead of keeping a reference to their enclosing instance. They could go at the top level, outside of all other declarations, but then they wouldn't have access to the arrays. Putting both the arrays and the value class in an object declaration gives us what we need here.

Unfortunately, value classes are limited in Scala because they aren't really supported by the JVM at this time. The main limitation is that they can only have a single primitive value in them. There are plans to add value types to Java and the JVM. Once that happens, which probably will be Java 10, then you could make classes with three doubles in them that are stack allocated and arrays of them would be memory contiguous, giving you behavior much more similar to C++. Until that time, code like that above can give you a reasonable approximation.

Purely Functional Approaches

Of course, the real key to Scala is the functional aspects, something that none of the above examples took advantage of. The version that used immutable classes still mutated an array that stored references to those objects. I tried two different approaches to doing things functionally that are shown in the code below. The first approach is pretty much just like our immutable class version, but it uses a Vector and the updated method. Even without speed tests, we can feel confident that this is not an efficient approach, but it is pretty much required if we want to do the minimum number of distance calculations where each pair is only visited once. The second version, called forSim2, does twice as many distance calculations, but this allows it to be written in a much more succinct manner because we produce a full sequence with the accelerations to each particle without every mutating them. As it happens, this is the approach that we need to take if we parallelize the code to prevent race conditions. I will explore that more in a future blog post.

This code uses the ImmutableBody and the immutable Vect3 class shown earlier. That makes the code more compact than the non-mutable versions. Those unfamiliar with fold methods might find forSim2 difficult to read, but in general a fold can be used to replace a loop that would accumulate a value in a mutable way. Note that like the previous code, this is put in an object, but for completely different reasons. This code isn't stateful, so there is no reason to have a class. The state is created by the initBodies method and then passed into the forSim methods, which return modified values. This code never stores the state locally. This is a significant difference from the earlier code and should probably be expected for a functional approach.

Timing Results

So how do these different versions perform? We tested all of them using Scala 2.12.1 both with no optimization flags and with -opt:l:classpath -opt:closure-invocations -opt:simplify-jumps -opt:copy-propagation -opt:redundant-casts -opt:box-unbox -opt:nullness-tracking -opt:inline-global. The timing results are shown in the following table.

OptimizationStyleAverage Time [s]rms [s]
NoneValue Class0.7040.004
Mutable Class0.9040.006
Immutable Class1.2660.020
Functional 18.330.09
Functional 22.3430.048
OptimizedValue Class0.6790.001
Mutable Class0.6820.001
Immutable Class1.1910.026
Functional 18.550.11
Functional 22.3250.023

It is good to see that in many ways, my intuitions as to which versions would be fastest proved correct. Without optimization, the version that uses a value class and contiguous blocks of memory in arrays is clearly the fastest, followed by the mutable classes, then the immutable classes, then the purely functional approaches at the end. Note that the second functional approach takes nearly twice as long as the immutable class version. Since it is doing twice as many distance calculations, this indicates that the overhead of being functional is small and the speed difference is basically due to reorganizing the math. I will note again that this reorganization is also required for parallelization, so I would speculate that in parallelized versions, the second functional approach will be comparable to the immutable class version. The first functional approach is a clear loser. Calling updated so frequently on the Vector type is clearly inefficient. I will note though that I tried changing Vector to Array, and the resulting code was so slow for the first functional approach that I never saw it complete. On the other hand, using an array, even if it isn't mutated, seemed to slightly improve performance for the second functional approach.

It is also interesting to note that optimization benefited the mutable class version significantly, making it nearly comparable to the value class version.

Comparison to C++

We will close out this post with a comparison to C++. After all, we can't know if it is reasonable to use Scala for numerical workloads if we don't explore how it compares. I wrote a version of the code in C++ that was basically an adaptation of the mutable class version using a std::valarray for storage. I compiled it using g++ with the -Ofast compiler flag. I also made a separate application in Scala that only ran the value class version and had both the C++ and the Scala time 1000 steps of integration. The result was that the C++ generally completed in 6.22 seconds and the Scala completed in 6.88. Both were consistent across multiple runs. So the final result is a 10% advantage for C++. That isn't a huge advantage, and my guess is that most applications would be happy to live with that for the advantages that they get in coding in Scala over C++ in terms of developer productivity. Granted, that was with g++ I expect that using the Intel compiler or some other highly optimizing, and not free or open source, compiler will produce even faster executables for C++.

For me, the bottom line is that if you hear people say that Scala (or Java/JVM) are slow, there are good odds that they haven't really tested it compared to other languages/platforms. For straight number crunching applications, feel free to point them to this blog post to show them that the difference in speed really doesn't have to be all that large. My guess is that using macros, it would also be possible to create Scala code that has the readability of the immutable Vect3 class with symbolic methods, but without the speed cost for allocating lots of temporary values. This would be akin to expression templates for that purpose in C++. Maybe I'll take a stab at creating that code and write a blog about it as well. I also look forward to the Java 10 JVM adding value types, as I believe that they have the potential to significantly improve the performance of numerical code in all JVM languages.

Other Code

Here is the code for C++ and the timing in Scala in case anyone wants to be able to run everything.

C++ Code

Scala Timing

Sunday, January 1, 2017

Performance of Scala for Loops

One of the interesting features of the Scala programming language is that the for loop is basically syntactic sugar for calls to collection methods like foreach, map, filter, and flatMap. They are often called for-comprehensions and they have a lot more flexibility than what you get from a standard for loop in most languages or for-each loops in the languages that have them. The fact that they get compiled to calls to these other methods also means that Scala for-loops can be used on things that aren't collections like Options and Futures.

Unfortunately, this flexibility has often come at a price. As I noted in an earlier post, Loop Performance and Local Variables, using a for loop produced code that was significantly slower than using while loops. That was back in 2013, and I was using Scala 2.10. With the release of Scala 2.12, I really wanted to see if this was still the case. The primary change in 2.12 was to make Scala compile to Java 8 bytecode using all the new features of the Java 8 JVM. One of the main additions in Java 8 was lambda expressions. Since the foreach, map, filter, and flatMap are higher order methods that take functions, compiling to Java 8 lambdas seemed like it might improve performance. This post looks at testing that hypothesis.

Previous Code

We start by repeating the previous test that was written to look at where variables are declared. I took the same code as used before and simply ran it with three different versions of Scala, both with and without optimization. The following table shows the results.

VersionOptLoopVar LocAverage Time [ns]Time Deviation [ns]
2.10.6 forIn3.4100.141
Out3.7000.293
whileIn2.8230.270
Out2.8610.311
-optimizeforIn3.7120.588
Out3.8560.653
whileIn2.8950.252
Out2.8810.279
2.11.8 forIn3.2210.351
Out3.6660.397
whileIn3.0230.510
Out2.8480.243
-optimizeforIn3.3890.402
Out3.0140.120
whileIn2.8580.287
Out2.8480.254
2.12.1 forIn3.1540.354
Out3.4560.407
whileIn2.7650.167
Out2.7510.139
See BelowforIn3.2610.321
Out3.2790.549
whileIn3.1410.207
Out3.1640.264

All runs were done using Java 1.8.0_111 for the runtime. For 2.12, they added a lot of different optimization flags to the compiler. The values used for the timings in this post are -opt:l:classpath -opt:closure-invocations -opt:simplify-jumps -opt:copy-propagation -opt:redundant-casts -opt:box-unbox -opt:nullness-tracking -opt:inline-global. There is enough scatter here that it is hard to draw really strong conclusions. It appears that the while loop still has an advantage, but the percent difference in speed seems smaller across all the "current" compilers than what had been seen back in 2013. I put current in quotes because while 2.10 is older, 2.10.6 is a fairly recent release and the Scala team backports things when it makes sense, so there are good odds that 2.10.6 is incorporating optimizations of the for loop that weren't present in the earlier version of 2.10 I had been using in 2013.

N-Body Simulation

The problem of building multiplication tables was rather contrived as a simple example that worked well for testing the declaration locations of variables. If people are going to actually make their code uglier putting in while loops in place of for loops, it would be good to see if it matters on a somewhat more realistic example. For this I decided to do a simple first-order numerical integrator of bodies using gravity. This is a problem that involves a lot of number crunching in loops and which happens to be at least related to things that I write for my research, so it seemed like a good place to test performance.

The code used for this test is shown below. For the purposes of this post, what really matters is the forSim and whileSim methods. These have multiple loops including one area where they are triply nested. I store all the values in mutable arrays and then use a value class to access the elements in an object-oriented way. I chose this approach as there is minimal overhead from object allocation, potentially better cache performance, and I have a feeling that it is faster than other approaches, though testing that is a matter for later posts.

Here is a table giving the timing results for this code again the same three compilers.

VersionOptLoopAverage Time [s]Time Deviation [s]
2.10.6 for0.6660.002
while0.6670.029
-optimizefor0.6600.012
while0.6580.001
2.11.8 for0.7160.009
while0.6690.007
-optimizefor0.6750.006
while0.6560.001
2.12.1 for0.6990.003
while0.6830.001
See Abovefor0.6760.001
while0.6830.003

Note that for this code, there is very little difference between a for loop and a while loop. These tests were very stable in their timing results and while building up the tests I ran them multiple times and found little variation. It really doesn't appear that 2.12 did anything to help with the difference between for and while loops in either of these examples, but in this one, there really isn't a significant difference in any version. What does that mean? As with so many things dealing with performance, you should write clean code that runs first. Once you have that, and you are tweaking things for performance, you might consider changing your inner-most loops from for loops to while loops, but it is quite possible that it won't matter.

I also feel compelled to note that the for loop version is much easier to parallelize than the while loop version because of the ease of switching to a parallel collection. I haven't done it here as one must make some alterations to prevent race conditions, but that is something that I might also explore in a future post.

Variables in for Comprehensions

There is one caveat to the conclusion that for loops don't hurt performance in the larger example. In the forSim method shown above, the variables pi and pj are both declared inside of the inner most loop. The for comprehension in Scala allows variables to be declared in the "header" section of the loop. When I first wrote this code, I declared pi between the two generators and pj right after the generator for j. One wouldn't think that this would matter much, but it did. Having the declarations up in the header instead of the body cause this code to run roughly 2.5-3x slower than when they were put as the first lines in the body of the for loop. I don't have an explanation for this behavior and I haven't explored the generated bytecode to see what might be causing it. However, based on this result, it is probably worth not using the variable declaration capabilities of for comprehensions if performance is critical to your application.

Friday, July 1, 2016

The Value of pass-by-name Semantics

One of the nice features of the Scala language is the ability to use pass-by-name semantics. This isn't something that was invented for Scala. I remember first encountering it while studying for the CS subject GRE back in the mid-1990s. However, it is a feature that isn't very common in modern languages. For those who aren't familiar with this passing style, an argument that is passed-by-name is not evaluated when the function/method is called. Instead, a "thunk" is passed into the function/method. This "thunk" is basically a closure over the code with no arguments. It is evaluated each time that the value is used in the function/method. If it helps, you could think of it as being a lazy argument that doesn't cache its value, so it is re-evaluated each time. This might not sound like much, but it enables a lot of the powerful features of the Scala language, especially in regards to creating DSLs.

Today I was asked about an ideal way of doing something in Java. The problem involves reading through a file and adding values to a Map. After some testing, it was discovered that these files include duplicate keys. Since the Java libraries don't include a multimap, the solution is to use a Map<String, List<Integer>>. The person working on this code was trying to find a succinct way of writing the code that will add values to this Map. Of course, the first time a key is found, you have to create a new List and add the one value. Subsequent matches on that key need to add the value to the List that is already there. Here is the straightforward approach to doing this.

  if(map.containsKey(key) {
    map.get(key).add(value);
  } else {
    map.put(key, value);
  }

It's not really all that long or complex, but the question was, can it be improved upon, especially since Java 8 added some new, more expressive, functional features. If I were doing this in Scala, I would use get on the Map to get back an Option and work with that, or I could use getOrElse on the Map directly. It happens that Java 8 added Optional as well as a getOrDefault method on their Map type. As a result, we can write the following code in Java 8.

  map.put(key, map.getOrDefault(key, new ArrayList<String>()).add(value));

This is perfectly fine code in many ways. It compiles. It runs. It does what you are expecting. People familiar with this library will even find it to be easy to read. However, there is a detail in here that isn't immediately obvious that makes this code rather wasteful with memory. (That really just means that your garbage collector is going to be invoked most often than it really needs to be.) The problem is that all arguments in Java are evaluated then passed either by value or by reference depending on whether they are primitives or object types. The key is that the new ArrayList<String>() is going to be evaluated every time this line is executed, even if the value is in the Map. On the other hand, in Scala, the getOrElse method passes the second argument by name. As a result, it will never be evaluated unless needed. So when you do something like this, you are not doing extra work to calculate a default unless you actually need it. Granted, the library writers for Java 8 could have opted to make the second argument of getOrDefault a function or something like it that could be passed as a lambda expression. However, they didn't. Probably because doing so would have made the calling syntax uglier. Pass-by-name does not have that drawback.

Of course, pass-by-name has a lot of other handy uses in Scala. It is probably the most important feature when it comes to Scala's ability to implement Domain Specific Languages and create libraries that function like language extensions. This was the first time though that I could recall really feeling the lack of this feature in some other language. Other new languages, that I have looked at some, Rust and Kotlin come to mind, seem to have also skipped over this feature. Perhaps it is time other languages began to bring it back into standard usage.

Friday, February 12, 2016

Are College Students Customers?

I'm writing this blog post in response to Students are not customers; Are they?. I actually have a fairly simple view of this issue. The student sitting in front of me today is not my customer. The person that student will be in 5-50 years is my customer. From the student perspective, education is an investment in your future self. From the college perspective, alumni are the real test of whether or not the college is doing what it should. Alumni who are successful in life benefit the University in many ways, going well beyond their financial support.

Taking this viewpoint, my goal is not to make my current students happy. My goal is to equip them for later success in life. I want to make certain that when they graduate, they are have the skills to enter the workforce and do what they are hoping to do. That isn't going to happen if I coddle them and give everyone good grades. It means that I need to challenge them, but support them. I get to spent time with my students and come to know them. I can gauge how much assistance different students need to master the material. I need to impart to them the knowledge and skills that I think will put them ahead of others in the workforce when the first enter. I also need to try to develop their ability to teach themselves so that they will continue to learn through their careers so that they can keep themselves relevant.

I appreciate the small, liberal arts setting that I teach in, because I think that it really allows me to do what is best for my students. It also forces my students to pick up the other skills that they will need to know for their future, that I'm probably not the ideal person to teach. I stress to prospective students and occasionally to current students how important communication skills and critical thinking are going to be in their careers. I focus mostly on teaching them how to break down problems and communicate the solution method to a computer to make the computer solve the problems. The successful alumnus isn't going to just sit in front of a computer and code all day for 40 years after graduation. They should advance to being team leads, managers, even vice presidents and CTOs. Those jobs also require knowing how to communicate with humans in both written and spoken form. Many of my current students don't like that they have to take a bunch of courses outside of CS, but the people that I think of as our customers, the future selves of these students, inevitably come to appreciate the breadth of their education as much as they appreciate the rigors of the technical aspects of my personal instruction.

As a final benefit of this approach to "students as customers", I get to tell students who do complain about courses being hard (with a hint of sarcasm) that I am just looking out for their future selves. If I have to tear down the current version to build a better one, then so be it. ;)

Monday, December 7, 2015

A Recursively Enumerable Grammar for Binary Addition

As part of my 2nd semester CS course, I introduce students to the Chomsky hierarchy of grammars. We only go into details about regular grammars and the main learning objective is regular expressions, but I talk about context-free, context-sensitive, and recursively enumerable as well. Part of the reason is to show that there are limits to what you can do with grammars at each level, at least up to recursively enumerable, RE. I tell them that RE grammars are Turing complete, so they can compute anything that any computer could. I often use the example of a factorial because it is a fairly simple function that they are used to coding that doesn't seem at all like it would be doable with grammars.

This year I had a student ask about that on the questions they give me at the end of each class. He said that he couldn't see how a grammar could calculate factorial. I went to be that night trying to think of how to illustrate that a grammar could do math. I knew that I wasn't likely to write factorial, but I figured that if I started with addition might might be enough to show that you can build more complex math. The next question was how to represent that numbers. Addition in strings that use a unary format is simple, so I decided to go to binary. As an assignment in the 1st semester class I have students implement addition, multiplication, and exponentiation using only recursion, successor, and predecessor. I decided to take the approach here that I would do for addition there. We say that a+b = (a+1)+(b-1) and have a base case when b is 0. It turns out that you can implement successor and predecessor in binary strings fairly easily.

I put my code for doing this in a GitHub Gist that you can see at the link below. There are 14 productions in the grammar. In addition to the '0', '1', and '+' that I need to represent binary addition, I bracket the numbers with 'N' so that I can tell when I am at the beginning or end of a number. There are a few productions at the end that are used to "clean up" at the end so we get a single number in a string. There are only three productions that are used to create the successor functionality, which is done by putting in a 'S' character. The harder part is getting predecessor because the 'P' character needs to be inserted at the right end of the second number, and when it is done, we have to have that information sent back to the plus sign to say that the current operation is done. So while there are only two productions labeled as predecessor, the left and right message productions are also part of making the predecessor work.

Gist of the Code To see how this works, I ran it and entered the numbers 5 and 6. You can see the output of that here. The first line shows that we are adding 5 and 6. Each line below that is the application of a single production. The ones that don't have a ' after the plus sign are completed steps in the addition algorithm. The last line is printing the final result, which you can see is 11 in binary.
N101N+N110N
N101SN+'NL110N
N101SN+'N1L10N
N10S0N+'N1L10N
N10S0N+'N11L0N
N10S0N+'N110LN
N10S0N+'N110PN
N110N+'N110PN
N110N+'N11P1N
N110N+'N1R01N
N110N+'NR101N
N110N+N101N
N110SN+'NL101N
N110SN+'N1L01N
N111N+'N1L01N
N111N+'N10L1N
N111N+'N101LN
N111N+'N101PN
N111N+'N10R0N
N111N+'N1R00N
N111N+'NR100N
N111N+N100N
N111SN+'NL100N
N111SN+'N1L00N
N111SN+'N10L0N
N11S0N+'N10L0N
N11S0N+'N100LN
N1S00N+'N100LN
N1S00N+'N100PN
NS000N+'N100PN
NS000N+'N10P1N
N1000N+'N10P1N
N1000N+'N1P11N
N1000N+'NR011N
N1000N+N011N
N1000N+N11N
N1000SN+'NL11N
N1000SN+'N1L1N
N1001N+'N1L1N
N1001N+'N11LN
N1001N+'N11PN
N1001N+'N1R0N
N1001N+'NR10N
N1001N+N10N
N1001SN+'NL10N
N1001SN+'N1L0N
N1001SN+'N10LN
N1001SN+'N10PN
N100S0N+'N10PN
N100S0N+'N1P1N
N100S0N+'NR01N
N1010N+'NR01N
N1010N+N01N
N1010N+N1N
N1010SN+'NL1N
N1011N+'NL1N
N1011N+'N1LN
N1011N+'N1PN
N1011N+'NR0N
N1011N+N0N
N1011N+NN
N1011N
N1011N

Wednesday, February 25, 2015

Straw at Saturn's Keeler Gap

The term "straw", in relation to planetary rings, was coined during the orbital insertion of Cassini at Saturn when images like the following, available from the JPL site, were taken with high resolution because the probe was so close to Saturn and the rings. The "ripples" that run through this image diagonally are density waves. In the bottom left corner there is a large density wave that has course structures in it. Someone in the room when these images came down thought that they looked like straw, and the name stuck.

Just a bit before Cassini arrived at Saturn, I had been doing simulations of the Encke gap that showed extremely large self-gravity wakes forming near the gap edge. These gravity wakes form as the Pan wakes damp out downstream from the moon. They were described in an Icarus paper (Web supplement with movies). The figure below comes from that paper, and the top four panels in the right column show some frames where these oversized wakes appeared in four different simulations. Since that time, "straw" type features have been seen at the edge of the B ring as well, but I am not aware of any observations of them at the Encke gap. Perhaps the end of life mission will manage to fix that with higher resolution images of the rings as Cassini dips inside the D ring.

It is natural for gravity wakes to form in these simulations. The frames on the left column show the situation early in the simulation when the clumping is from normal sized gravity wakes. The surface density of the simulations varies by row, and generally increases from the bottom row to the top. The size of the gravity wakes is a direct function of the surface density. Notice that the size of the clumps is much larger at the inner edge of the right column cells than it is anywhere in the left column.

This last year, Nicole Albers at the Laboratory for Atmospheric and Space Physics (LASP) found some interesting features in Cassini UVIS occultations of the Keeler gap region. These features are basically holes in the ring material roughly a kilometer in radial extent that are located just a bit inside of the edge of the gap itself. So you have the gap, with effectively no material, followed by a few kilometers of ring material, followed by a kilometer or so of no material, then back to the normal rings. These holes are found in the region a few degrees downstream from Daphnis, the moon that maintains the Keeler gap. (I hope to do a systematic search for these things myself using public data from the rings PDS node. I'll put up another blog post on that so I can make figures without infringing on what Nicole has done.) Some of these holes also appeared around the Encke gap.

I've been simulating the Keeler gap for a while, so I was wondering if I might see something similar in simulations. The problem was that I needed to do a big simulation to get to the region where these were seen by Cassini. My hypothesis was that straw might form near the edge of the Keeler gap, and the regions between those oversized gravity wakes would be cleared out enough to appear as a hole. To test this hypothesis, I have been running a very large simulation of the Keeler gap involving 80 million particles with collisions and particle self-gravity. At this time, the simulation has been running for about three months, and it is a bit over half way done. The animation below shows a large-scale view of the simulation. It is using a gray scale to show the geometric optical depth. That is just a ratio of how much area the particles in a small region cover divided by the total area of that small region. This isn't exactly what one would see looking at this material in the rings, but it is a reasonable approximation.

There is a lot going on in this plot. The coordinate system is centered on the average position of Daphnis. The particles drift from right to left with a drift speed that is proportional to their radial distance from the moon. When they go past the moon, the gravity of the moon pulls them onto eccentric orbits. This leads to the wavy structures that are visible. Daphnis has a fairly eccentric orbit relative to the width of the gap, so the magnitude of how much it pulls on the particles varies with time. The fact that Daphnis has a non-negligible eccentricity and inclination makes this part of the rings difficult to simulate. I'll plan to write a different post describing how these simulations are done, compared to the Encke gap where those values can be set to zero without losing too much information about the system. The combination of the wavy motion and the shear that results from more distance particles drifting faster leads to compression regions that we call moon wakes. Since Daphnis is the moon, I will sometimes call them Daphnis wakes. The collisions in these wakes dissipate the forced eccentricity caused by the gravitational pull of the moon, so the waviness decreases as you go to the left after passing by the moon.

The first question is, does straw form? If we believe from the Encke gap simulations that straw is just long, stringy gravity wakes that have grown over-large due to systematic motions of the population of particles, then this is equivalent to asking if large gravity wakes form near the edge of the Keeler gap in this simulation. The answer to this question turns out to be yes. It took 12 orbits downstream from the moon for them to begin to appear, but the figure below, which shows a region ~16 orbits after passing the moon, clearly shows that there are large gravity wakes. Going back to the figure above, if you look at the last few wake peaks, those just past 2000 km downstream from the moon, on the inner edge, there is coarse graininess that was not present in the earlier wakes. This appears to be what those large wakes manifest as in the binned data.

This simulation still needs to go another 10 orbits or so further downstream before it gets to the point where the first "holes" have been seen in occultations. That should take about another two months gives the current speed the simulation is running at. At this point we can say that straw definitely forms near the edge of the Keeler gap, but it is very unclear if the straw can cause the observed holes.

Thursday, May 15, 2014

Strong Typing vs. Weak Typing vs. Static Typing vs. Dynamic Typing

This post is basically a little rant of mine related to misuse of terms related to typing. In particular, I get frustrated when people use the term "strongly typed" when what they really means is "statically typed". There are also many situations where people will say that something is "weakly typed" when what they really mean is "dynamically typed". The key to understanding these things is that there are two different orthogonal issues here. The first is strong vs. weak typing and the second is static vs. dynamic type checking. Programming languages can exist in the space defined by these two independent axes.

Strong vs. Weak Typing

A strongly typed language is one where every statement that is executed must follow the proper rules of the type system. A weakly typed language is one where this isn't the case. It turns out that most all of the languages people are working with these days are strongly typed. People often don't think of languages like Python as strongly typed, but just try doing something that isn't allowed with a value in Python. The program will run, but only up until that point. Then it will crash. This same thing is true for pretty much all scripting languages. It is also true for most compiled languages.

Now, different languages might be more liberal with what is "allowed" by the type system. Languages like Perl and JavaScript take great liberties in converting things so that they "work". They might implicitly convert strings to numbers and back as needed for different expressions. However, those are the operations that are defined by the language for those types. One can come up with many reasons why doing these things is bad in large software development projects, but that is why they are scripting languages. Being liberal with type conversions doesn't really mean the same thing as saying that the language has weak typing.

So then what languages do actually have weak typing? What languages allow you to do things that are completely outside of the type system? The simplest answer to this is C/C++. To illustrate this, consider the following code, which compiles without complaint under g++ version 4.7.3 using the -Wall and -pedantic options.
int main() {
    double a = 5.0;
    double *d = &a;
    cout << a << endl;
    int *b = (int*)d;
    cout << b[0] << endl;
    cout << b[1] << endl;
    b[0] = 99;
    b[1] = 98;
    cout << b[0] << endl;
    cout << b[1] << endl;
    cout << a << endl;

    return 0;
}
Granted, this is something that would be considered very poor C++ code because you should never do C-style casts in C++. However, that type cast on line 5 is perfectly valid in both C and C++, even though it breaks the type system. Indeed, any code that you see in the C standard libraries that takes a void* works by virtue of the fact that the type system is weak and can be broken. In case you are wondering, the output for this code on my machine happens to be the following.
5
0
1075052544
99
98
2.07956e-312
This is allowed because there are times in system level programming when you have real pointers into memory and you really just want to force the type that you are dealing with at that address. Such activities are dangerous and generally not cross-platform compatible, but when you are doing system level programming you are programming for a particular system and don't really expect it to work on others.

In a certain sense, languages are either strongly typed or they are weakly typed. Either the language allows you to break the type system with something like the code shown above, or it doesn't. However, even languages that let you break the type system can discourage it. I'm sure that many C++ programmers would be unhappy with the characterization that C++ is weakly typed. The above code is proof that it is true, but a lot of effort has gone into making it so that the proper style of code in the newer releases of C++ is strongly typed. Well written C++ code will almost certainly be done using a subset of the language that is strongly typed. They just happen to have this hole in the type system for reasons of backward compatibility and for those rare situations when system level programming really does require that you treat a block of memory as something that it isn't. I should also note that doing this with floating point numbers, as shown in my example, would be much less common than doing it with integer types of different sizes.

Static vs. Dynamic Type Checking

The more interesting aspects of programming languages have nothing to do with strong vs. weak typing. They instead center on the way in which a language does type checking. The static vs. dynamic type checking distinction focuses on whether types are checked at compile time or during the runtime. A purely statically typed language would make certain that every type was correct at compile time and no checks would ever be done at runtime. A purely dynamic language doesn't check anything related to types a compile time and all those checks are done at runtime. (I use the term "compile time" loosely here because dynamically typed languages can be interpreted and might never go through a formal compilation step.)

Most of the arguments that people get into regarding type systems focus on this static vs. dynamic distinction and the way it is implemented. Proponents of statically type checked languages argue that they provide a certain level of safety in that the compiler does a formal proof that the types in the program are correct. Proponents of dynamically typed languages argue that the type systems associated with static type checking are overly restrictive and add a burden to the programming process. While I sit strongly on the side of static type checking, the purpose of this post is to draw distinctions in terminology, not to get involved in that debate, which has been hashed out in many places.

Where Languages Stand

In some ways, where a language fits into this depends a fair bit on how you program it. That isn't true for every language, though. Some languages have a very specific location in the plane of these two axes. Even for the languages that don't, we can probably say some general things about the standard styles in them. If you follow normal style rules, few languages are going to be weakly typed in practice. C is probably the weakest, simply because the primary method of creating polymorphism in C involves casting to void* and losing type information. If you do this improperly, you won't find out about it until runtime and the result could be a runtime error where you program terminates or it could be a logic error where the program simply outputs garbage. In general, languages you use today other than C/C++ are probably strongly typed. Any violation of the type system causes an error either at compile time or runtime.

The static vs. dynamic axis is much more interesting. Scripting languages and functional languages derived from LISP often sit completely on the dynamic type checking extreme. If your code contains an expression like sqrt("hi"), things will run just fine until that line is actually executed. This is very typical for languages that use traditional, dynamic duck-typing.

There are also some languages that fall at the other extreme of the spectrum where absolutely everything has passed through type checks at compile time and nothing remains to be done at runtime. Many of the languages in the ML family, including SML and Haskell fall here. Those languages typically have type inference where the types of values are inferred by their usage and those inferred types are checked for correctness. Older procedural languages, such as Pascal, also fell at this extreme. The lack of OO and inheritance hierarchies made the type systems of these languages very straightforward.

Most of the languages that people think of as being statically typed actually include some dynamic typing as well. This is because they include things that simply can't be checked at compile time. In OO languages, you run into this when you have to downcast to a more specific type. As a general rule, this is considered a bad thing to do. In C++ the dynamic_cast is there because they need it, but it is intended to look ugly. In Java, unfortunately, the early versions of the language, before generics were introduced in 1.5, required a lot of downcasting. The collections were collections of type Object and every time you took something out of a collection you needed to cast it to a more specific type. This is ugly and error prone. Hence, the addition of generics. My own favorite language, Scala, includes an asInstanceOf method in the Any type, but use of it is strongly discouraged. (Just search for it on StackOverflow and you'll see how much it is discouraged.) The approach that is recommended in Scala is to use a match expression. This is still a runtime type check, so even well written Scala includes features that involve dynamic type checking. This is a feature of OO languages because of their type hierarchies and the occasional need to downcast. The ML family languages avoid that by simply not having type hierarchies.

Summary

So that wraps up my discussion of different terms for type systems. Next time you want to call something "strongly typed", think a bit and consider if what you really mean isn't "statically typed". A lot of the time, that is what people really mean.