jonmon89

  • About

DropQuest 2012!!!

Posted by jonmon89 on May 13, 2012
Posted in: Uncategorized. Leave a Comment

DropQuest is amazing!  I believe last year was the first year that Dropbox set up DropQuest.  I missed it last year but this year I made certain I participate in.  DropQuest is a series of puzzles(each harder than the last).  Each puzzle uses various sites from the internet and tests how well your critical thinking skills are.  For instance the last puzzle this year was a crossword puzzle.  Once the puzzle was finished, there were two clues within the puzzle.  Those two clues were the secret behind the final answer.

While I am sure there are some participants that enjoy DropQuest for the chance to solve difficult puzzles, most come for the prizes.  This year over 100 prizes were given out depending on how fast you finished all the puzzles.  All the prize levels included some extra free storage to your Dropbox account, even though during DropQuest extra space could be earned.  I didn’t earn any of the prizes but I did earn an extra 1GB for my Dropbox account.  I cannot wait till next years event.  Perhaps if I am still keeping up this blog the post will be longer as I will have something to compare against.  Right now I am just excited and in the puzzle solving mode!

CS373-W: Post 14

Posted by jonmon89 on December 5, 2011
Posted in: Software Engineering. Leave a Comment

I really enjoyed CS373.  I had used assembla and used it for other class projects but I never used it with another partner.  I aslo I have never used git(I am more of a subversion man but git is rubbing off on me).  The topics of the course were also pretty interesting.  While some of the topics such as the basic programming stuff at the beginning of the class seemed like we should have known it was a nice refresher.  I have taken a class taught by Downing before(I took CS315 the last semester he taught it I believe) so some of the topics seemed like review(for instance the L-values and R-values stuff).  He really makes sure that the basics of programming are known by the students who take his class.  I would recommend every one take one of his classes before graduating for UT.

I really enjoyed that we went over another language like Python.  I have used Python a lot before but going over how the language works shows me I still have a lot to learn.  While we didn’t do much with Haskell, it was still nice to have another language to compare to. Haskell is one of those weird languages that is not used much but it is nice to no a little bit of just to expand your thinking and see another approach to solve the same problem, even if it is harder to implement in.

The projects were also really fun, especially the last one.  While the last 3 projects were pretty stressful(at least the 5th one) I still had a lot of fun working in the big group and learning all the different tools.  I had never programmed such a large site(I am not a good web developer) but while working with our front-end guy I feel that knowledge has expanded greatly working with html and other web frameworks.  I may even work on expanding my personal webpage when given the time.

Last but not least, I really enjoy Downing’s teaching style(well the everyday quizzes were pretty random sometimes).  He really makes sure the class is active and interacting with what he teaching by asking individual people questions and walking through steps if someone has trouble.  I wish more classes at UT were like that instead of straight lectures and take notes.  I know that all classes cannot be like that but more should try to be more active.

MapReduce Blog

Posted by jonmon89 on December 1, 2011
Posted in: Computer Science, Software Engineering. Leave a Comment

Processing large amount of data efficiently and reliably has always been a topic of concern for computing.  Companies and researchers always want to run larger and larger data sets get the results back quickly and have some level of assurance that the results are accurate.  Parallel processing arose from this issue.  As data sets become larger and larger, bigger and bigger parallel computation machines are being developed and used.  The more data that can be processed the faster the results can be returned.  Many abstractions have been developed to facilitate programming on these machines.  Whether it be a multi-threaded model, a distributed memory model, or recently the rise of using accelerators(General Purpose Graphics Processing Units or Intel’s MIC processor[2]).  Another model that has been developed by Google is the MapReduce model[1].

MapReduce is based on a very simple concept.  The model takes in as input a set of key/value pairs and produces as output a set of key/value pairs.  The model is broken into two steps: map and reduce.  The map stage takes in the set of input key/value pairs and produces a set of intermediate key/value pairs.  These intermediate key/value pairs are then passed to the reduce stage which then produces the output set key/value pairs.  Sometimes the reduce stage will only produce a single key/value pair(e.g. a summation of ints in a list).

That is the model but MapReduce is meant for large clusters of computing and to process data in parallel so where does this happen?  This happen on how MapReduce is implemented.  The model is just a way of thinking on how to structure your data and how to expect the output.  The implementation of the model automatically parallelizes the computation.  All map and reduce tasks are scheduled in parallel to be executed.  If a certain task has dependencies that need to be resolved first, it must wait until the needed tasks have completed before executing.  A master service is started up which does the scheduling and data management.  the master service also sets up the worker processes that are to process the data on the compute nodes.  These worker processes are alive for the duration of the computation.  The worker then reads the data it is to be processed(which is assigned by the master service), processes the data, and does a local write of the intermediate key/value pair.  Another worker then does a remote read of this intermediate data and processes it and then output the final key/value pairs.

I have been lucky enough to work on a project that is developing a language that uses the MapReduce model for data processing.  The language is called Swift and it is presented in [3].  Swift is a scripting language that is based off the MapReduce model.  Swift has very general functionality.  The master service can be run on you local machine(e.g. your laptop or workstation), the data to be processed can reside on some remote data store as long as it is accessible to the outside world(like the UTCS servers), and the data can be processed on some parallel cluster(TACC’s Ranger compute cluster).  Swift will handle all the data management and can use several different protocols such as ftp, sftp, gridftp, scp, and http(for read only).  It is still in the early stages(but close to a major milestone release of 1.0) but it is moving along nicely.  The homepage for the project is [4].

  1.  http://usenix.org/events/osdi04/tech/full_papers/dean/dean.pdf
  2.  http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html
  3.  http://www.ci.uchicago.edu/swift/wwwdev/papers/SwiftLanguageForDistributedParallelScripting.pdf
  4. http://www.ci.uchicago.edu/swift/main/

CS373-W: Post 13

Posted by jonmon89 on November 29, 2011
Posted in: Software Engineering. Leave a Comment

Last week was thanksgiving so we only had one class.  The lectures are over and we are now presenting the website we have all been working on these past couple weeks.  The first group was Team Xtreme.  I thought there presentation went very well.  They gave a solid presentation in my opinion.  I wish they would have spent more time on the features they added to make there site good though.  Talking more on the auto-spell checking bit or on some other features they may have implemented would have really good.  I think they spent too much time on the the buzz words they coined and not enough time on their features.  I could be wrong though.  Perhaps that is how the presentation should be and I just want to know how everyone implemented their sites.  The next team was Supernova.  I like their website but the presentation could have been better.  They computer they used ran out of bandwidth on the UT network right before class.  I think that if they had the bandwidth their presentation would have been better.  I guess what we learned from watching this presentation is that a backup plan must always be formulated and ready to be implemented if the main approach fails, especially with presentations.

CS373-W: Post 12

Posted by jonmon89 on November 21, 2011
Posted in: Software Engineering. Leave a Comment

I was able to attend one class last week as I left for Seattle on Wednesday for an interview with Amazon so I am not sure really how class went.  Project #6 is over though.  These last couple of projects have been interesting and quite fun.  I have never worked on developing a website so I feel like I learned a lot.  I cannot wait  for the presentation.  I am not sure if we can modify the website anymore before the presentation but there are some things I would like to go fix and spruce up, nothing major though.  It will be interesting to see how other groups present there website.

Overall I think I did pretty well at the Amazon interview.  I do not think I messed many things up and again they were really interested in this software engineering class we are all taking.  They seem very impressed in the topics we going over and the big WC project.  Seattle was nice as well.  It was a nice weather change to the warm fall we are having.  I got to meet up with some people I worked with during my internships at Argonne National Labs as there was the big super computing conference(SC’11) during last week.

It seems thought that last week we went over some pretty interesting topics last week.  I had been waiting for the lecture to go over the Factory topic so I am pretty bummed that I missed it. I guess I will have to go over the posted code and read up on it myself.  At least I have a starting point to play with.  The code that Downing posts is pretty helpful to get a feel for how the execution flow works.  The end of the semester is in sight though.

CS373-W: Post 11

Posted by jonmon89 on November 15, 2011
Posted in: Uncategorized. Leave a Comment

A bit late this time.  Projects have been dominating all my time lately.  These past couple weeks have been pretty interesting in class.  We have been going over inheritance and dynamic binding in Java and Python.  The inner workings of programming languages is one of my interests(I have many but this is probably #3).  It is always interesting to see how languages chose to implement the same thing.  I posted a post on performance numbers during this past week testing dynamic binding.  I have yet to find a case where Java cannot inline a method call using dynamic binding.  Every case I try seems to finish in a couple of milliseconds(although these are very simple tests).  I will find one eventually but it seems Java does a much better job at compiler optimizations than I though.

The last project is going much smoother that project #5.  For some reason we had a lot of trouble with project #5.  We spent time working on it almost every day but still had some problems in the end.  Maybe we were not communicating enough because a group we are communicating with each other a lot more on this last project and things are being checked off our checklist at a very quick pace.  I cannot wait until we finish the site and get to take a step back and look at the work we have done.  I have given a couple presentations for work so not to freaked out about the giving one on what we have done.  Although the being graded by our peers seems unsettling to me.  I thought maybe the TA would be doing the grading since he is the “customer”.

Final Performance Tests

Posted by jonmon89 on November 8, 2011
Posted in: Java. Leave a Comment

Today in class we went over the final keyword again and some method inlining scenarios.  I was curious if the java compiler(javac 1.6.0_26) was able to do inlining.  Turns out it does and it does it better than expected.  We went over 4 scenarios for inlining in Java:

  1. The class is final
  2. The method is final
  3. The method is private
  4. Method is static

We also went over when Java cannot inline methods:

  1. Dynamic loading
  2. The method is too complicated(the method contains a loop).

Turns out if the method is “simple enough”, dynamic loading isn’t even a problem.  I say “simple enough” because the methods I am testing only return constant values.  I do not have them do any manipulation of data or pass in any arguments to the methods.

I tested (1), (2), and (4) from above.  I didn’t bother with (3) because it didn’t really fit how I tested each case(plus I figured the idea came across with the other 3).  Each test consists of Java objects and a test file.  I have a test file for each case to make sure that a run a fresh JVM on each run for fairness purposes(don’t want anything cached in the JVM or have the garbage collector run when I don’t want it too).  I also do a warm up loop at the start of the JVM(this is common when doing testing numbers). This flushes out whatever may be in the cache as well as get the system moving.  I ran each test from 0 to Integer.MAX_VALUE and the time returned is in milliseconds which I attained using System.currentTimeMillis(), but you can read the code to get more details on how it was implemented.  Here are the numbers:

Reference Test
-------------
Duration=5ms

Class final test
----------------
Duration=6ms

Static test
-----------
Duration=5ms

Method final test
-----------------
Duration=6ms

Dynamic loading test
--------------------
Duration=5ms

Loop inline test
----------------
Duration=176ms

The reference test is just calling a method that is neither static nor final and the class is not final either. This is how most methods in Java are written to be.  As you can see the reference test is about the same time as the other “optimized” tests.  This is because the JVM can see that this method can also be inlined because it is “simple enough”.  This “simple enough” also explains the dynamic loading.  More extensive tests need to be done to gain full insight of how sophisticated the java compiler is in terms of optimizations but this is the starting point.  However, an example of when the java compiler cannot do method inlining is when the method has a loop.

All tests were run on a 64bit UTCS server(hasselblad.cs.utexas.edu).  The source files can be downloaded with the command:

  • git clone git://github.com/jonmon89/Java_Performance.git

Running ‘make run’ after downloading the files will compile and run each test(The makefile isn’t my best work but it worked for these tests).

A quick article about Java’s method inlining is here:

  • http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html

CS373-W: Post 10

Posted by jonmon89 on November 7, 2011
Posted in: Software Engineering. Leave a Comment

While WC2 was interesting and a good project my group struggled at the end.  Every time we met as a group we got a lot of work done towards finishing the project.  It was not until Wednesday when we started putting the project together did we find problems.  We stayed in the lab till 3:00am on thursday trying to complete the project.  We ended up heading home for sleep and came back thursday afternoon to finish.  In the end we figured out that the parser was broken.  We didn’t have time to fix the parser but we tested everything else and turned it in.  We are meeting tomorrow to fix the parser and things should plug in nicely after that.

Besides the major problems we experienced in WC2 this week, I thought class went pretty good this week.  I have been able to keep up with the readings the second half of the semester and they are pretty interesting.  I am especially interested in the refactoring book.  I firmly believe that the readability of the code is just as important and the code working efficiently so reading the book is pretty interesting.

I also had an interview with a company here in Austin this past week.  One of the technical questions I had was the six-degrees of Kevin Bacon problem.  I screwed up the initial part of the problem.  The interviewers had me draw out the graph that explained the problem and I saw instantly they were asking me to implement the breadth first search algorithm.  I then stupidly implemented the depth first search algorithm but they got a kick out of it because apparently that happens more often than you think.  Overall the interview went great.  We spent  a good deal of time talking about this class specifically and this WC project we are all working on.  They seemed highly interested how we are approaching the problem and what software engineering techniques we are using.  Short story…everyone should take this class from Downing before graduating.  Next interview is in two weeks at Amazon.  Hopefully it goes as smooth as this past one did.

CS373-W: Post 9

Posted by jonmon89 on October 31, 2011
Posted in: Uncategorized. Leave a Comment

This week has been all about project #5.  But on a project #4 subject, after class in the 11am class there was a discussion with the TA on what some of the groups produced and what the TA wanted.  They could have not been fundamentally farther apart.  We talked back and forth for about 30 minutes and finally came to a conclusion, do not assume anything on this project.  Do not make assumptions on what the TA(who in this case is the customer) wants by how Professor Downing explained the project.  If you make assumptions more than likely they are wrong and that is not the correct way to write software.  If there is anything fuzzy or not explained well enough then ask for clarification no matter how small the problem.

Project #5 has been an experience.  It is a great project.  There are so many tools and 3rd party software out there to that we need to be careful which ones we use.  The last thing we want is to have our code completely unreadable and hard to change when project #6 comes around.  I am curious what project #6 will be.  It seems like project #5 we will most, if not all, the functionality that professor Downing said the site will have when he described these projects at the beginning of project #4.

We also got our tests results back for test #2.  I thought I did much better on that test than what I received as a grade.  I was able to answer all the questions and I was pretty sure I got the right solution.  I know I messed up on one question but it was more of a syntax error and not really a logic error.  A syntax error that would have been easily fixed by doing a quick google search.

CS373-W: Post 8

Posted by jonmon89 on October 24, 2011
Posted in: Software Engineering. Leave a Comment

This week was about finishing the first part of the world crisis project and the first exam.  The world crisis project is going to be very interesting.  There are a lot of interesting topics that arise from the project.  The google app engine is very cool.  It seems pretty powerful.  The next part is going to be interesting because instead of using static html pages we will be moving to dynamic html pages based on the XML data that is provided.  I have done a lot of XML parsing with different parsers in python in a research group on campus so I am hoping this part will not be to difficult.

The test was interesting as well.  I felt rushed in the programming portion.  I feel like I answered the coding questions correctly but if there was maybe one less programming question, like maybe get rid of the zip problem, I would have felt much more comfortable.  It would have given me more time to think about the other problems and to run some test cases in my head.

This week we also went over database design and mysql syntax.  There is a lot more to databases than I thought.  I have only worked with databases once and it was during the internship I had during the summer.  I did not have to do any design I just had to use the information that was already in there so actually going over design is helpful in seeing the decisions they group made.

Posts navigation

← Older Entries
  • Recent Posts

    • DropQuest 2012!!!
    • CS373-W: Post 14
    • MapReduce Blog
    • CS373-W: Post 13
    • CS373-W: Post 12
    • CS373-W: Post 11
    • Final Performance Tests
    • CS373-W: Post 10
    • CS373-W: Post 9
    • CS373-W: Post 8
  • Meta

    • Register
    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.com
Blog at WordPress.com. Theme: Parament by Automattic.
Follow

Get every new post delivered to your Inbox.

Powered by WordPress.com