We’re wrapping up week 4 at the Flatiron School and… I’m starting to work on a fun gem idea! Yes, I’ve been inspired by my classmates to take the plunge and build my own Ruby gem.
But first things first. I need data in order for my gem to be useful. So earlier today (well, yesterday, at this point) — I proceeded to scrape some data. And in the process, I discovered a neat pair of methods that work very nicely together: .zip and Hash[].
WHAT DOES .ZIP DO?
The .zip method basically joins two arrays together, pairing elements from each that have matching indexes and returning the whole thing as a big array that contains mini-arrays populated with the newly paired elements. You set one of the arrays as the receiver of the method, which then takes the second array as an argument. It looks something like this:
How crazy awesome is that?! It returned a hash and populated it with key-value pairs made out of the elements from each array that have matching index numbers.
So the .zip and Hash[] methods, together, proved very useful indeed as I scraped data from Billboard.com and was able to get artists and songs, separately, to look like a numbered list of the pair, artist - song. And that’s all I’m saying about my gem-in-progress, for now!
I <3 Ruby.
KEY TAKEAWAYS
You can join two arrays together and return matching-index pairs using array1.zip(array2)
You can create a hash populated with key-value pairs made out of two arrays using Hash[array1.zip(array2)]
This is something you’re going to hear a lot over the course of your Ruby journey: “NORMALIZE YOUR DATA”. But what does this mean?
Between humans, the concept of “normal” is typically (and overly-simplistically) understood to connote, “not weird”. As in, if you’re normal, then by default you are not a total weirdo (/ˈwi(ə)rdō/ 1. A person regarded as being very strange or eccentric 2. A deranged, potentially dangerous person. I suppose, however, that derangement and danger are not limited to just people).
In Ruby, on the other hand, “normal” — or rather, “normalized” — carries a very different meaning. The latter is mostly used to describe data (i.e strings, numbers, etc.) that has been transformed and accounted for in such a way that it is now easier to compare it against other data. It’s kind of like assuring that nothing will be “lost in translation”. In fact, it’d be a little like having a handy translator on hand (ha! no pun inten…) that can translate — in real time — whatever you’re saying out loud in English to another language. And this might save your @$$ when you’re trying to communicate to a room full of non-English speakers. IN FACT, when we humans translate stuff, we are, essentially, NORMALIZING it!
Actually, I just thought of the best analogy of all: it’s making things “apples to apples”. We couldn’t run around comparing apples to oranges now, could we?!
SOME COMMON METHODS USED TO ‘NORMALIZE’ DATA:
.upcase (capitalizes a string) .downcase (lower-cases a string) .reverse (reverses a string) .sort (sorts an array in ascending order, by default) .split() (splits a string per whatever you specify as an argument) .gsub() (substitutes all instances of one thing to another)
SOME COMMON RUBY PROBLEMS THAT REQUIRE ‘NORMALIZATION’:
“Does Some_Array Contain an Anagram of a Given Word?” (can be nicely solved using .sort)
“Is This_Sentence a Palindrome?” (might need a little.downcase, .gsub and .reverse action)
“Deaf Grandma!” (a rare one that calls upon .upcase!)
Dealing with user input in general via ‘gets.chomp’ or ‘gets.strip’ (always add a .downcase for good measure!)
Why that last piece of advice? Because every user on the planet can’t be trusted to input his or her data accurately, every time. So ‘normalizing’ this input by calling .downcase on it ensures that you’ve accounted for any capitalization inconsistencies. You can now better compare apples to apples.
KEY TAKEAWAYS
‘Normalizing’ your data means transforming its appearance slightly so that your program knows it’s comparing apples to apples
Some common methods you can use to normalize data include: .upcase .downcase .reverse .sort .split .gsub
I wanted to write up a short addendum to my previous post, “Ruby: Just Set It. Just Say It” and show you a simple example of refactoring.
What is refactoring, you ask?
In my mind, it’s a little like editing your writing. Your first draft might be an incoherent mess, but at least you got your thoughts out. From there, it’s your responsibility as a writer to clean up your work so that it best conveys what you mean to say FROM YOUR READERS’ PERSPECTIVE. Likewise, refactoring your code is a kind of clean-up process. It shouldn’t change your program’s external behavior, i.e the final output shouldn’t be changing, but HOW your program gets there should have effectively changed.
I guess this is a little different than editing your writing, in that the former is to better serve the consumer — the reader; refactoring code, on the other had, doesn’t yield any new outcome to the consumer — the user. However, refactoring does make your code more readable to OTHER CODERS. Which can ultimately mean that future de-bugging efforts will be less painful. That, and you should take pride in your code! Make it as readable, logical, and elegant as possible — you wouldn’t do any differently in your writing, would you??
With that said, here’s a side-by-side comparison of my code for the pigeon organizer exercise, pre and post refactoring:
The outcome is the same either way, but the refactored version saves me a few lines of code and ultimately captures the logic behind the solution better.
What the ||= operator does, in a nutshell, is evaluates whatever’s on the LEFT side of the operator first. If that condition returns ‘true’, then the program proceeds to the next line of code, and so on. If the condition is ‘false’, then whatever’s on the RIGHT side of the operator is assigned as the value of the left side.
Cool, right?!
KEY TAKEAWAYS
Write code that works first. Then refactor it.
Refactoring should’t change the final outcome of your program; it should just help your program get there more efficiently, and make your code more readable to OTHER CODERS
The ||= (“or or equals”) operator is a cool alternative to using ‘if else’ logic, if you intend for some part of your program to handle new assignment (i.e values to keys, etc.)
When we’re fluent in a language, we take for granted everything about it that feels intuitive. Ever feel that way when asked something specific about English? Say you had to conjure up some intelligent explanation for why certain verbs conjugate irregularly while others don’t. You’d probably say something like, “I don’t know! I just know that they do. I’ve always known. If you know English, it’s like… intuitive.”
You just know.
Well, until you decide to pick up a foreign language and start the language-learning process from scratch, you have no idea now valuable and hard-earned this intuition is. To have a SENSE of how a language works, how you can play by its rules but still bend it to your will and imagination… this is what fluency feels like. And it feels awesome, because you can really express yourself.
The alternative, of course, can feel really really frustrating — and this has been a prevalent feeling so far in my quest to learn to ‘speak’ Ruby. Everything feels unnatural. I find myself writing most of my code out in the form of pseudo-code, wishfully hoping all the steps I’ve typed out in plain English will magically transform themselves into Ruby. Come on, you mean the computer CAN’T interpret that? WHY THE HELL NOT.
Anyway. Rather than throw a tantrum (and believe me, sometimes nothing sounds better than a good tantrum), I suppose it’d be more productive for me to share a few Ruby intuition-isms (?) that I’ve finally picked up on these last few days.
HOW DO YOU DEAL WITH `.collect`?!
By this, I don’t mean how do you USE .collect, per se. Just to be clear though, .collect iterates over a data-structure (array or hash) and then, based on what else you’ve told it to do in your code block, it returns a newly-generated array of all the affected, .collected items. Cool. What used to really puzzle me, however, was how the hell you further accessed and/or passed around the fruits of this .collected array’s labors. For example, look at this (admittedly wonky-looking) method that uses `.collect’:
1234567
# ===== Apple-picker using '.collect' ====== #defapple_picker(array)array.collectdo|fruit|fruitiffruit=="apple"end.compactend
See how I tacked on a .compact LITERALLY on the end of the ‘end’? I know — IT LOOKS SO WEIRD. But the real point is, it looks like it’s over once I wrap up my ‘do’ and ‘end’. Kaput. No more access to whatever that entire block of code just returned. WHAT IF I WANTED TO DO MORE? Like, how can I make the .collect’s return value a THING? Oh, #NBD. Avi finally drilled home for me today this concept: thats_what_variables_are_for = “storing and pointing to things!” So look. If I wanted to store the fruits of .collect’s labor somewhere, I literally JUST SET THE WHOLE BLOCK AS = TO A VARIABLE. Like so:
1234567
# ===== Apple-picker using '.collect' ====== #defapple_picker(array)some_variable=array.collectdo|fruit|# <-- LOOK! I JUST SET THE 'EQUAL TO' BIT RIGHT HEREfruitiffruit=="apple"end.compactend
That’s it! It’s that easy. And now, some_variable effectively ‘houses’ the result of .collect.
Oh, and by the way — since end.compact looks silly, here the better way to write that using curly braces:
12345
# ===== Apple-picker using '.collect' ====== #defapple_picker(array)array.collect{|fruit|fruitiffruit=="apple"}.compactend
HOW DO YOU, LIKE, CHECK IF SOMETHING ALREADY EXISTS?
This one was a huge epiphany for me. Like HUGE. Of epic proportion. Because NOT knowing this stumped me on basically every problem I had to solve that required ‘leveling out’ my iteration. Look at this first attempt to organize a hash of pigeons, for example, without explicitly ‘leveling out’:
1234567891011121314151617181920
pigeon_list={}pigeon_names=[]pigeon_data.eachdo|descriptors,details_hash|pigeon_names=details_hash.each_value.collectdo|name|nameend.flatten.uniqendpigeon_data.eachdo|descriptors,details_hash|details_hash.eachdo|qualities,names_array|names_array.eachdo|name|ifpigeon_names.include?(name)# <-- THE ISSUE IS AROUND HEREpigeon_list[name]={descriptors=>qualities}endendendendputspigeon_list
In the above code, my iterator is ultimately only returning the last line of the block that’s evaluated, over-writing data with each iteration — i.e, it keeps re-populating my hash with over-writes, rather than ‘leveling out’ and catching all the data. The problem, I discovered, was that I NEEDED TO ACCOUNT FOR WHETHER OR NOT SOMETHING ALREADY EXISTS. And how do you do that? YOU JUST STATE IT. Here it is in action below, in the refactored solution:
12345678910111213141516171819
pigeon_list={}pigeon_data.eachdo|property,property_hash|property_hash.eachdo|values,birds_array|birds_array.eachdo|name|ifpigeon_list[name]# <-- TO CHECK IF IT EXISTS? JUST STATE ITifpigeon_list[name][property]# <-- RIGHT HERE I ASK AGAIN IF SOMETHING EXISTSpigeon_list[name][property]<<valueselsepigeon_list[name][property]=valuesendelsepigeon_list[name]={property=>[values]}endendendendppigeon_list
See how easy that is? Your instinct, based on English, might be to make up some random ‘questions’ to ask the entity like if pigeon_list[name].exists? or .already_exists? — but THIS IS RUBY. You don’t need to ask any questions. YOU LITERALLY JUST STATE THE ENTITY.
So simple. But mind still blown.
KEY TAKEAWAYS
Ruby is a pretty English-y language, but it’s not ACTUALLY English (so don’t expect such)
To ‘catch’ the return value(s) of a .collect, simply SET THE ENTIRE BLOCK AS EQUAL TO A VARIABLE (and now it lives in the variable!)
To ‘level’ out your iteration and avoid re-populating with over-writes, you need to check if certain entities (i.e an array) exist already… and how to check? YOU JUST STATE THE ENTITY ITSELF
The parallels between good (English) writing and good code continue to abound, and one that really stood out to me today was the idea that methods are like paragraphs.
Just think about it.
Any work of writing worth your dime (screenplays and plays don’t really count in this example) — be it an op-ed piece, a serious article, or a novel — can be broken down into smaller and smaller units until you’re staring at individual letters and punctuation marks. Now, a slightly more meaningful unit of work than individual letters and characters is a sentence, and I’d say the next unit up after a sentence is by and large the most meaningful of all: the paragraph.
Aside from needing to be well-written in general, conceptually, good paragraphs should adhere to a single topic. That is the point of a paragraph: to broach, discuss, debate, or refute ONE MAIN IDEA. And if you find yourself babbling about new or irrelevant ideas, it’s probably time for — you guessed it — a new paragraph. Paragraphs help us compartmentalize our thoughts so we can build upon them in a logical manner; they’re kind of like lego pieces that we can eventually stack together to build something cool. Conversely, how could you build anything without these pieces?? I HAVE NO IDEA. How miffed would you be if you had to build a house with one massive brick? Or how pissed have I been when previous writing students submitted “papers” that were comprised of ONE MASSIVE BLOCK OF TEXT?
Paragraphs make your work — as both the writer AND the reader — more manageable. And I think this is such a key concept in writing code, as well.
Methods, like paragraphs, are units of work. And these units of work need to be kept manageable, so as to make your life as a programmer (and potential reader of other programmers’ code) less painful. How are you going to debug effectively if you’re staring at one massive block of code? How can you tell where your errors truly begin and end if your entire program depends on ONE GIANT CHAIN OF METHODS?? I don’t know.
Even if you’re still having a hard time grasping how crucial this idea is, or if you don’t know the first thing about writing code, any layperson with a pair of eyes can see the stark differences between the following two examples of code.
classJukeboxattr_accessor:songsdefinitialize(songs)@songs=songs.sortenddefcallputs"Welcome to Pop Jukebox!"puts"Enter a command to continue. Type 'help' for a list of commands."exit=falsewhileexit==falsedocommand=gets.strip.downcasecasecommandwhen'help'puts"Enter one of the following commands: help, play, list, exit"puts"\n"when'list'putsself.songsputs"\n"puts"Type play to proceed"puts"\n"when'play'puts"Would you like to play by song title or artist?"puts"\n"song_title_or_artist=gets.strip.downcaseputs"\n"casesong_title_or_artistwhen'song title'putslist_song_titlesputs"Please select a song"puts"\n"song_choice=gets.strip.downcaseplay_song_choice(song_choice)when'artist'putslist_artistsputs"Please select an artist"artist_choice=gets.strip.downcaselist_artist_songs(artist_choice)song_choice=gets.strip.downcaseplay_song_choice(song_choice)elseputs"Enter one of the following commands: help, play, list, exit"endwhen'exit'puts"\n"puts"Goodbye! Nice to know ya."exit=trueelseputs"Say whaaaaaa?"endendenddeflist_artistsartists=self.songs.collectdo|song|song.split(" - ")[0]end.uniqenddeflist_song_titlessong_titles=self.songs.collectdo|song|song.split(" - ")[1]end.sortenddefplay_song_choice(song_choice)self.songs.selectdo|song|ifsong.split(" - ")[1].downcase==song_choiceputs"Now playing #{song}!"endendenddeflist_artist_songs(artist_choice)self.songs.selectdo|song|ifsong.split(" - ")[0].downcase==artist_choice# then just show that artist's songputssong.split(" - ")[1]endendputs"Select song"endend
I mean, just from a VISUAL standpoint, it doesn’t take rocket science to see that one of the two versions seems to be less… verbose. It seems to rely on small, focused methods that each serve a single purpose and are accessed/called upon by other methods, when needed. This is achieved through something called method delegation, where methods pass responsibility along from one to the next, initiating different parts of the program swiftly and efficiently.
Of course, the responsibility of the fledgling Rubyist is to focus on ‘making it work’ first, then worry about ‘making it beautiful’. But somehow, I think considering form and function both is a valuable approach as well. Because if you’re thinking in small pieces, and you KNOW that writing code in small pieces is a good thing, then you should have no trouble… writing code in small pieces.
KEY TAKEAWAYS
Methods are like paragraphs — give them one ‘topic sentence’, i.e function, at a time
A paper written as one big block of text is egregious… and so is code that relies on one unwieldy method
Make your methods small and focused — you can always call them in other methods!