how tosplit one text file into multiple *.txt files

September 18, 2018 - Reading time: 4 minutes

You can use the Linux Bash core utility split:

split -b 1M -d  file.txt file

Note that M or MB both are OK but size is different. MB is 1000 * 1000, M is 1024^2

If you want to separate by lines you can use -l parameter.

Update:

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

Another solution as suggested by Kirill, you can do something like the following

split -n l/12 file.txt

Note that is l not one, split -n has a few options, like N, k/N, l/k/N, r/N, r/k/N.

Using Bash:

readarray -t lines < file.txt
count=${#lines[@]}

for i in "${!lines[@]}"; do
    index=$(( (i * 12 - 1) / count + 1 ))
    echo "${lines[i]}" >> "file${index}.txt"
done

Using AWK:

awk '{
    a[NR] = $0
}
END {
    for (i = 1; i in a; ++i) {
        x = (i * 12 - 1) / NR + 1
        sub(/\..*$/, "", x)
        print a[i] > "file" x ".txt"
    }
}' file.txt

Unlike split, this one makes sure that the number of lines are most even.


how to repack an epub file from command line

September 9, 2018 - Reading time: 5 minutes
To unzip the epub, move the ePub to a folder, cd to it then simply:
unzip MyEbook.epub
To zip up an epub:
1. zip -X MyNewEbook.epub mimetype
2. zip -rg MyNewEbook.epub META-INF -x \*.DS_Store
3. zip -rg MyNewEbook.epub OEBPS -x \*.DS_Store
Some explanations necessary here. We start each line with two flags:
-r (recursive)
This means move down through any directories/folders recursively, ensuring that everything in the folders specified gets included
-g (grow file)

even more concise :
// To unzip the epub, move the ePub to a folder, cd to it then simply:
zip -rX ../my.epub mimetype META-INF/ OEBPS/

Without -X you could get the following when validating it with EpubCheck:

ERROR: my.epub: Mimetype entry must not have an extra field in its ZIP header

If mimetype is not the first in the epub file EpubCheck prints the following:

ERROR: my.epub: Mimetype entry missing or not the first in archive

Linux uses the name of the current folder as the epub filename:
zip -rX "../$(basename "$(realpath .)").epub" mimetype $(ls|xargs echo|sed 's/mimetype//g')

I found this version also worked better for epubs I found that don't use the OEBPS folder. I don't know if not having that folder is valid per standards, but I found examples of it being missing in the wild.



concatenate multiple files but include filename as section headers

April 2, 2017 - Reading time: ~1 minute
tail -n +1 file1.txt file2.txt file3.txt

grep "" *.txt

This should do the trick as well:

find . -type f -print -exec cat {} \;

Means:

find    = linux `find` command finds filenames, see `man find` for more info
.       = in current directory
-type f = only files, not directories
-print  = show found file
-exec   = additionally execute another linux command
cat     = linux `cat` command, see `man cat`, displays file contents
{}      = placeholder for the currently found filename
\;      = tell `find` command that it ends now here

You further can combine searches trough boolean operators like -and or -or, also find -ls.


batch-convert document formats

February 8, 2017 - Reading time: 4 minutes

Install the latest LibreOffice and it's library
On most Ubuntu version LibreOffice is defaulty installed, just upgrade it to the latest version.

  • apt-get install libreoffice

Install unoconv
unoconv is a command line utility that can convert any document format (doc, docx. odt, ods, xls, xlsx) that LibreOffice can import, to any file format (xml, pdf, doc, docx. odt, ods, xls, xlsx) that LibreOffice is capable of exporting.

  • apt-get install unoconv
Installing LibreOffice is more than enough to answer your needs in converting document, but for some reason unoconv do better (despite using the same library) when the output is use by other application, we can say it more acceptable.

Batch Convert Document to csv, pdf, jpg, docx, xlsx, odt, or ods

To batch convert document you have to work under terminal, prepare a document that will be converted in a folder (can be consist of several sub-folders),  then do following :

Synopsis :

  • unoconv [options] to-file from-file

Convert single xls format to pdf

  • unoconv -f pdf some-document.xls

Convert single png format to jpg

  • unoconv -f jpg some-document.png

Batch Convert docx format to pdf

  • unoconv -f pdf *.docx

Batch Convert xlsx format to ods

  • unoconv -f ods *.xlsx

Batch Convert csv format to xlsx

  • unoconv -f xlsx *.csv
Batch Convert csv format to ods
  • unoconv -f ods *.csv

You can change the option, source, and destination format according to your needs. Some document format  which supported by the this application are pdf, odf, odt, ods, xls, xlsx, doc, docx, rtf, ppt, pptx, csv, png, jpg, bmp and svg.

download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

wget is not designed for this. You can however parse its output to get what you want:

$ wget http://aligajani.com -O - 2>/dev/null | grep -oP 'href="\Khttp:.+?"' | sed 's/"//' | grep -v facebook > file.txt

You could use lynx for this:

lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt

This command dumps the links of a single page. To do this recursively:

      wget -r -p -k http://website 

or

wget -r -p -k --wait=#SECONDS http://website

The second one is for websites that may flag you if downloading too quickly; may also cause a loss of service, so use second one for most circumstances to be courteous. Everything will be placed in a folder named the same as website in your root folder directory or whatever directory you have terminal in at time of executing command.