Tuesday, December 21, 2021

Ruby and MS Excel Woes: How I Learned Too Much about UTF8 BOMs

Description 

What could go wrong?  I was trying to export an Excel spreadsheet to a CSV for an anonymous Fortune 500 quasi-government/quasi-private corporation.


| Asset ID | Software Asset Name | Other Column 1 | Other Column 2 |

| SAN12344 | Some App            |                |                |

| SAN8272  | Some Other App      |                |                |



Code

Can you spot the bug? 😜  No exceptions are thrown, but the resulting `inventory_list` is an empty array.  Unit tests with some manually generated CSVs pass.
csv = CSV.open(path, headers: true, skip_blanks: true)
@inventory_list = csv
.to_a
.map(&:to_h)
.select { |row| row.key?('Asset ID') }
.map { |row| row.transform_keys { |k| k && k.strip } }
.map { |row| row.transform_values { |v| v&.strip } }
.map { |row| row.delete_if { |k, v| k.nil? || k.empty? || v.nil? || v.empty? } }


Fix

csv = CSV.open(path, 'r:bom|utf-8', headers: true, skip_blanks: true)

Catch the change?

Root Cause

$ xxd software_inventory_list_bom_spec.csv | head -1                 

00000000: efbb bf41 7373 6574 2049 442c 2053 6f66  ...Asset ID, Sof


Turns out the file has a UTF-8 BOM!  I've only heard of these, and what little I knew was that it's supposed to be handled transparently by whatever language's file parsing library, if it supports UTF-8. 

And, turns out, MS Excel (2016, to be more specific), inserts a BOM if you select the "CSV UTF8" option in "Save As"!

And even vim and Notepad++ auto-detected the BOM, so I didn't even think that my Ruby code was somehow the culprit.
 
Well, after too many hours debugging to admit (the lag against ssh through Chrome, through Remote Desktop, is pretty high), it turns out that by default (at least on Amazon Linux 2, Ruby 2.8), the BOM is not auto-detected and just included as part of the "Asset ID" column.

The bytes 0xEF,0xBB,0xBF bytes were included in the string's!  And worse, irb didn't even render as a characters!  It didn't render them at all, so you can't tell by printing out the strings to the console that it's different.  The only way to tell that the string "Asset ID" had those three extra bytes in the beginning was to do a byte count. 🤦

It turns out, if you ask Excel to export a CSV by using the "CSV UTF8" option, it'll insert a BOM, BUT it won't insert a BOM if you use the "CSV" option, the one that doesn't mention UTF8.  What confusing UI...


If you want to insert a BOM in a file for testing (and, say, make a unit test out of it like I did 😼 ), there's a vim command for it:

vim -e -s +"set bomb|set encoding=utf-8|wq" filename


Take Aways

  • A CSV seems simple on the outside, like a pumpkin, nice and smooth on the outside, but inside can get pretty gnarly.
  • When in MS Excel, make sure you know if you're exporting a CSV with or without a BOM
    • the file util is helpful
  • Ruby doesn't automatically handle a BOM.  You have to ask it to check to see if it seems a BOM, which seems like a strange default.

Extra Reading

  • https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
  • https://estl.tech/of-ruby-and-hidden-csv-characters-ef482c679b35
  • https://stackoverflow.com/q/543225/423943
  • https://theonemanitdepartment.wordpress.com/2014/12/15/the-absolute-minimum-everyone-working-with-data-absolutely-positively-must-know-about-file-types-encoding-delimiters-and-data-types-no-excuses/

Tuesday, April 25, 2017

Why I Started Lying in Tech Interviews--And You Should Too!



"Do you have experience with the Microsoft Word?"

I didn't start lying at this point, although I did giggle a little to myself quietly at the poor grammar coming from an HR drone.  I politely pointed out that, with a Bachelor's in Computer Science and two years of corporate IT experience, I had written many English papers, many lab reports, and many corporate reports in Microsoft Word.


"Have you ever taught a class in Microsoft Excel?"

I still remember the exact moment that I first lied during an interview.  This was it.  At this point in my career, I had more than 30 credits of teaching experience in higher ed.  I had taught classes in databases, Java, C++, and even algebra.  I even had a Master's in Computer Science at this point.

No, technically, I had never taught a class in Excel.  What did I say?  What do you think I said?

"Yes--twice."


"Imagine an infinite chess board."

Oh, please!  You can probably tell how dismayed I was upon hearing this question.  This was supposedly from a company that specializes in interviewing (*cough*, Karat, *cough*).

It was an algorithms question, and you're tasked with figuring out the algebraic formula for the location of a knight, or some such non-sense.

Business value?  0.
Easy to grade?  Yes.

I would argue that actually playing chess with someone and asking them to explain their thinking would give you more insight into their understanding of algorithms.  But that kind of thing is time-consuming, and it's much easier to ask someone to imagine an infinite chess board.

I don't fill out the stupid tests anymore


For my last round of job searches, I took about 8 hours of online "entrance" exams, ranging from Karat to TekSystems.  They range from basic Java questions which are easily googl'able, to useless "infinite chess board" puzzles, to obscure details of the Java 8 DateTime API.

If you don't cheat and plug the code examples into your IDE or just google for answers, you'll get a low score and the non-technical recruiter will think you're an idiot.

And how many interviews did I get from those exams?  Zero.  Although I did get a lot of compliments on how highly I scored.

I would have been better off writing cover letters or making more phone calls.  Never again.

Misleading Job Descriptions


Ever look back on the job description after you've worked at a job?  Or an incompetent colleague's job description and wonder how they got the job in the first place?

One time, I got handed a BlackBerry for off-hours support a few weeks into a job for a certain cable company.  I ask, "What is this?"  They said, "It's your turn."  It wasn't on the job description, and they conventionally forgot to mention it during the interview.

You realize the job description isn't thought through very well, and/or poorly worded (sometimes intentionally), and if that's gonna keep you from even interviewing from the job you want, well, two can play that game.



Bad interviewers


Let's be honest: most of are us are bad interviewers.  We've never got any training, so we cargo-cult-it and just ask questions we were asked.  At one startup I worked for, we went through (i.e., hired and fired) two employees in the year since I started!

"What would you do if you were 6-inches tall and were thrown in a blender that's gonna turn on in 15 seconds?"

Those questions were pretty popular in the 90s and early 00's, but thankfully people have mostly stopped, although I still get the occasional question about buildings with infinite floors and such nonsense.

Conclusion

I feel like the problem has gotten worse, with many more toolkits, buzzwords, and technologies than even 10 years ago.  (Do you know Kafka?  You should know Kafka. And node.js)

You can get certainly get a job not knowing these things, but not the good ones.

I think the problem is deep enough, with bad interviewers asking bad questions, that the only solution is for everyone to lie.  Eventually, once the amount of lies reaches a critical turning point, interviewers will need to be trained to ask better questions.

Tuesday, November 15, 2016

What's the Difference Between a Computer Programmer and Software Developer? Trick Question

You could argue there's a difference between Web Developers and Software Developers.  One is primarily front-end, and the other is a combination, but primarily backend.  (Even this distinction is slowly going away as JavaScript enters the back end via Node.js.)

But Computer Programmers and Software Developers?  I can't think of a difference.

Enter the United States Department of Labor (technically the BLS, Bureau of Labor Standards):


 OCCUPATIONJOB SUMMARYENTRY-LEVEL EDUCATION Help2015 MEDIAN PAY Help
Computer and information research scientists

Computer and Information Research Scientists

Computer and information research scientists invent and design new approaches to computing technology and find innovative uses for existing technology. They study and solve complex problems in computing for business, medicine, science, and other fields.
Doctoral or professional degree$110,620
computer network architects image

Computer Network Architects

Computer network architects design and build data communication networks, including local area networks (LANs), wide area networks (WANs), and intranets. These networks range from small connections between two offices to next-generation networking capabilities such as a cloud infrastructure that serves multiple customers.
Bachelor's degree$100,240
Computer programmers

Computer Programmers

Computer programmers write and test code that allows computer applications and software programs to function properly. They turn the program designs created by software developers and engineers into instructions that a computer can follow.
Bachelor's degree$79,530
Computer support specialists

Computer Support Specialists

Computer support specialists provide help and advice to people and organizations using computer software or equipment. Some, called computer network support specialists, support information technology (IT) employees within their organization. Others, called computer user support specialists, assist non-IT users who are having computer problems.
See How to Become One$51,470
Computer systems analysts

Computer Systems Analysts

Computer systems analysts study an organization’s current computer systems and procedures and design information systems solutions to help the organization operate more efficiently and effectively. They bring business and information technology (IT) together by understanding the needs and limitations of both.
Bachelor's degree$85,800
Database administrators

Database Administrators

Database administrators (DBAs) use specialized software to store and organize data, such as financial information and customer shipping records. They make sure that data are available to users and are secure from unauthorized access.
Bachelor's degree$81,710
Information security analysts

Information Security Analysts

Information security analysts plan and carry out security measures to protect an organization’s computer networks and systems. Their responsibilities are continually expanding as the number of cyberattacks increases.
Bachelor's degree$90,120
Network and computer systems administrators

Network and Computer Systems Administrators

Computer networks are critical parts of almost every organization. Network and computer systems administrators are responsible for the day-to-day operation of these networks.
Bachelor's degree$77,810
Software developers

Software Developers

Software developers are the creative minds behind computer programs. Some develop the applications that allow people to do specific tasks on a computer or another device. Others develop the underlying systems that run the devices or that control networks.
Bachelor's degree$100,690
Web developers

Web Developers

Web developers design and create websites. They are responsible for the look of the site. They are also responsible for the site’s technical aspects, such as its performance and capacity, which are measures of a website’s speed and how much traffic the site can handle. In addition, web developers may create content for the site.
Associate's degree$64,970

http://www.bls.gov/ooh/computer-and-information-technology/home.htm

Boy, where do I start?

  • Not only is this distinction very dubious to begin with, but they managed to come up with different average salaries!  Guess who makes more money, software developers or computer programmers?  Software developers make $100,690 per year, versus computer programmers with $79,530 per year.  WTF?
  • They predict job growth of 17% for software developers, and a decline of 8% for computer programmers!
  • They don't even include a category for "software engineer", which is the most common job title nowadays.
I hope some poor college student isn't using this data to make a career choice, and no wonder people don't trust unemployment numbers.

Wednesday, July 13, 2016

BufferBloat Be Gone! (or How I Learned to Love OpenWrt)

Before OpenWrt / SQM:





After OpenWrt / SQM

NOTE: slower speeds are my fault

I haven't completely optimized the settings (i.e., download speed and upload speed under Basic Settings. Worth it, though!

Friday, July 31, 2015

Snippet :: HOWTO Make JAXB Generate Joda-Time Classes

The key is using a bindings XML file with a javaType element.

pom.xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<plugin>
                <groupId>org.apache.cxf</groupId>
                <artifactId>cxf-codegen-plugin</artifactId>
                <version>3.1.1</version>
                <executions>
                    <execution>
                        <id>generate-sources</id>
                        <phase>generate-sources</phase>
                        <configuration>
                            <sourceRoot>${project.build.directory}/generated-zuora</sourceRoot>
                            <wsdlOptions>
                                <wsdlOption>
                                   <wsdl>
                                      ${basedir}/src/main/resources/xml-schemas/zuora/zuora.a.69.0.wsdl
                                   </wsdl>
                                   <wsdlLocation>classpath:/xml-schemas/zuora/zuora.a.69.0.wsdl</wsdlLocation>
                                   <bindingFiles>
                                      <bindingFile>${basedir}/src/main/resources/xml-schemas/zuora/bindings.xml</bindingFile>
                                  </bindingFiles>
                                </wsdlOption>
                            </wsdlOptions>
                            <!-- generate toString()'s too -->
                            <extraarg>-xjc-Xts</extraarg>
                        </configuration>
                        <goals>
                            <goal>wsdl2java</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

bindings.xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
<?xml version="1.0" encoding="UTF-8"?>
<jaxb:bindings version="2.1" 
    xmlns:jaxb="http://java.sun.com/xml/ns/jaxb" 
    xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- 
        so jaxb doesn't create JAXBElement<..>'s everything
         http://stackoverflow.com/a/4583912/423943
    -->
    <jaxb:globalBindings generateElementProperty="false"> 
        <!-- use JODA-Time DateTime for parsing xs:date -->
        <jaxb:javaType name="org.joda.time.LocalDate" xmlType="xs:date" parseMethod="org.joda.time.LocalDate.parse"/>
        <!-- use JODA-Time DateTime for parsing xs:dateTime -->
        <jaxb:javaType name="org.joda.time.DateTime" xmlType="xs:dateTime" parseMethod="org.joda.time.DateTime.parse"/>
    </jaxb:globalBindings>
</jaxb:bindings> 

Monday, April 13, 2015