Commons:Bots/Work requests
This is a page for requesting work to be done by a bot. This is an appropriate place to simply put ideas for bots. However be aware of various tools available to all users which can be used to accomplish the work without the need for a bot:
|
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. | |
|
Legend |
---|
|
|
|
|
|
Manual settings |
When exceptions occur, please check the setting first. |
Add Template:User category (CL)
[edit]Please add {{User category |1=Chris Light}} to the user categories of @Chris Light: . This ensures they are in Category:User categories (flat list) and marked as hidden categories.
To find them: https://petscan.wmflabs.org/?psid=28135244
I fixed the few inclusions that weren't user categories (or shouldn't be): sample. A few are still to fix:
- Category:Smith_Type_IV_Truss
- Category:Indiana_SR_42_Over_Eel_River
- Category:Grd_Promenade_(Hot_Spings_NP)
- Category:Pizza_King_(Brookston)
- Category:Porter_(Rea)_Cemetery_Historic_marker
- Category:Hazards_(LTNP)
About 60 are currently not hidden categories: sample. They will be once the template is added. Enhancing999 (talk) 09:49, 29 April 2024 (UTC)
- This can be easily done using AutoWikiBrowser (see example diff). Getting the categories into AWB is quite easy - in Petscan, select "Wiki" as output format and save the result as txt file. Now, in AWB, you can use that file to generate a list by selecting "Text file (UTF-8)" as source. The only problem is that a bot-enabled account would be useful for this, otherwise you'll have to manually confirm 4540 edits... Fl.schmitt (talk) 09:57, 14 July 2024 (UTC)
- I have a bot account at DaxBot, I can take on this work. Let me know -- DaxServer (talk) 14:33, 14 July 2024 (UTC)
- @DaxServer: Would be great - thanks a lot! I'm not sure if a bot additionally needs AWB access. I've got AWB access, but my bot was approved for different tasks... Anyway, after AWB approval, creating the task in AWB is very easy - since the only modification required is prepending the User category, no regex, no replacements or deletions. Fl.schmitt (talk) 15:22, 14 July 2024 (UTC)
- @Fl.schmitt My bot was approved for different tasks too and will need to apply for approval for this one-time task. I'd use the https://doc.wikimedia.org/pywikibot/stable/scripts/main.html#replace-script to replace
__HIDDENCAT__
with{{User category|Chris Light}}
. If the template is already inserted, I'll skip. Here is the command: pwb replace -subcatsr:"Image by Chris Light" -summary:"Summary goes here" -ns:Category -excepttext:"{{User category" "__HIDDENCAT__" "{{User category|1=Chris Light}}"
- I'll check the petscan query after the run and handle the missed ones. How does this sound? -- DaxServer (talk) 17:34, 14 July 2024 (UTC)
- @DaxServer: Sounds great, thanks a lot for the explanation - i'm still learning! I didn't think about using one of the pwb standard scripts - that's really useful! Fl.schmitt (talk) 19:05, 14 July 2024 (UTC)
- Cool. Filed it Commons:Bots/Requests/DaxBot (4) -- DaxServer (talk) 19:35, 14 July 2024 (UTC)
- I added
-titleregexnot:"C(hris )?Light"
to the command so that categories like Category:Hazards_(LTNP) are skipped. -- DaxServer (talk) 15:22, 15 July 2024 (UTC)
- I added
- Cool. Filed it Commons:Bots/Requests/DaxBot (4) -- DaxServer (talk) 19:35, 14 July 2024 (UTC)
- @DaxServer: Sounds great, thanks a lot for the explanation - i'm still learning! I didn't think about using one of the pwb standard scripts - that's really useful! Fl.schmitt (talk) 19:05, 14 July 2024 (UTC)
- @Fl.schmitt My bot was approved for different tasks too and will need to apply for approval for this one-time task. I'd use the https://doc.wikimedia.org/pywikibot/stable/scripts/main.html#replace-script to replace
- @DaxServer: Would be great - thanks a lot! I'm not sure if a bot additionally needs AWB access. I've got AWB access, but my bot was approved for different tasks... Anyway, after AWB approval, creating the task in AWB is very easy - since the only modification required is prepending the User category, no regex, no replacements or deletions. Fl.schmitt (talk) 15:22, 14 July 2024 (UTC)
- I have a bot account at DaxBot, I can take on this work. Let me know -- DaxServer (talk) 14:33, 14 July 2024 (UTC)
- Sounds good to me. I've been making corrections as I return to existing categories. If it can be automated, great. You've got my approval. Chris Light (talk) 21:51, 14 July 2024 (UTC)
- Test run is done (edits) -- DaxServer (talk) 15:15, 15 July 2024 (UTC)
- I'm running the bot since yesterday at ~3 edits/minute. It'd take about a pair of couple more days to finish. I'll ping once it is done. -- DaxServer (talk) 12:47, 27 July 2024 (UTC)
- @Enhancing999 It seems to have done already. About 50 or so need some manual intervention. If you'd be able to verify some random cats and check if all is good, we can mark this resolved. -- DaxServer (talk) 13:26, 28 July 2024 (UTC)
- Thanks. I will check once the Petscan query updates. Enhancing999 (talk) 13:46, 28 July 2024 (UTC)
- @Enhancing999 It seems to have done already. About 50 or so need some manual intervention. If you'd be able to verify some random cats and check if all is good, we can mark this resolved. -- DaxServer (talk) 13:26, 28 July 2024 (UTC)
- I'm running the bot since yesterday at ~3 edits/minute. It'd take about a pair of couple more days to finish. I'll ping once it is done. -- DaxServer (talk) 12:47, 27 July 2024 (UTC)
Somehow petscan isn't updating.
- Special:Search/Category: intitle:"CLight" -hastemplate:"User category"
- Special:Search/Category: intitle:"Chris Light" -hastemplate:"User category"
show that it's mostly done. Eventually some more cleanup should be done, but at least now most categories are in the "user categories" tree.
Thanks for this! Enhancing999 (talk) 10:45, 29 July 2024 (UTC)
- PS: Petscan updated shortly after or before my post above. It's fixed for all but 5 categories. Enhancing999 (talk) 18:32, 29 July 2024 (UTC)
DGJ file descriptions from Flickr
[edit]Files like this have some duplicated text and content that is of interest mainly to Flickr users. In the past, I cleaned up some pages myself, but there are actually plenty of them, see Special:Search/"PLEASE, NO invitations or self promotions, THEY WILL BE DELETED." (about 10000). Also Category:Dennis G. Jarvis and Special:Search/"Dennis G. Jarvis" (27000).
Possibly Creator:Dennis G. Jarvis template could be added at the same time. Enhancing999 (talk) 07:38, 5 May 2024 (UTC)
- @Enhancing999 Do you have example diffs of the cleanup? -- DaxServer (talk) 19:40, 14 July 2024 (UTC)
- I omitted a sample, wanting to leave it to the bot operator. In the meantime, I made Creator:Dennis G. Jarvis.
- Maybe this: diff
- deletion of part of the "Description" in "en": "PLEASE, NO invitations or self promotions, THEY WILL BE DELETED. My photos are FREE to use, just give me credit and it would be nice if you let me know, thanks."
- deletion of the notice twice in "Permission": "PLEASE, no multi invitations or self promotion in your comments, THEY WILL BE DELETED. My photos are FREE for anyone to use, just give me credit and it would be nice if you let me know, thanks - NONE OF MY PICTURES ARE HDR."
- addition of Creator:Dennis G. Jarvis
- Enhancing999 (talk) 15:05, 16 July 2024 (UTC)
- Is there a template we could use for the "it would be nice if you let me know"-part? Enhancing999 (talk) 15:13, 16 July 2024 (UTC)
- I personally do not know, I'd suggest asking in the Commons:Village pump - someone might know. -- DaxServer (talk) 15:35, 16 July 2024 (UTC)
- I went through the templates, but only found a few user specific ones, Special:Search/Template: let me know finds some.
- We could solve this by adding something to Creator:Dennis_G._Jarvis. Makes it also easier to update if needed. Enhancing999 (talk) 09:22, 22 July 2024 (UTC)
- I added it at Special:Diff/905603933. Enhancing999 (talk) 11:04, 1 August 2024 (UTC)
- I personally do not know, I'd suggest asking in the Commons:Village pump - someone might know. -- DaxServer (talk) 15:35, 16 July 2024 (UTC)
- Thanks @Enhancing999. I'd also remove the link to Creator's Flickr as it is covered by Wikidata [1] -- DaxServer (talk) 15:20, 16 July 2024 (UTC)
- @Enhancing999 Here are the test runs. Here is the Pywikibot command I used:
pwb replace -recursive -nocase -summary:"DaxBot Task #6 test run" -search:"PLEASE, NO invitations or self promotions, THEY WILL BE DELETED." -ns:File -grep:"\[https://www\.flickr\.com/people/22490717@N02 Dennis Jarvis] from Halifax, Canada" -regex "\[https://www\.flickr\.com/people/22490717@N02 Dennis Jarvis] from Halifax, Canada" "{{Creator:Dennis G. Jarvis}}" "Quote from photographer on numerous files|((Quote )?{{Quote\|)?PLEASE, ?\*?(no )?(multi )?invitations((, glitters)? or self promotions?,?| \(none is better\))?( in your comments(,|\. Thanks\.))?( THEY WILL BE DELETED\. *My photos are FREE( for anyone)? to use, just give me credit and it would be nice if you let me know, thanks\.?( - NONE OF MY PICTURES ARE HDR\.)?)?( I AM POSTING MANY DO NOT FEEL YOU HAVE TO COMMENT ON ALL - JUST ENJOY.)?(}})?\n*" "" "=( *)\n+( *)\|" "=\1\n\2|" "1=\s+" "1="
- The regex matches these texts so far:
- Quote from photographer on numerous files
- {{Quote|PLEASE, no multi invitations or self promotion in your comments, THEY WILL BE DELETED. My photos are FREE for anyone to use, just give me credit and it would be nice if you let me know, thanks - NONE OF MY PICTURES ARE HDR.}}
- Quote {{Quote|PLEASE, no multi invitations in your comments. Thanks. I AM POSTING MANY DO NOT FEEL YOU HAVE TO COMMENT ON ALL - JUST ENJOY.}}
- PLEASE, no multi invitations, glitters or self promotion in your comments, THEY WILL BE DELETED. My photos are FREE for anyone to use, just give me credit and it would be nice if you let me know, thanks - NONE OF MY PICTURES ARE HDR.
- PLEASE, no multi invitations (none is better) in your comments. Thanks.
- PLEASE, NO invitations or self promotions, THEY WILL BE DELETED. My photos are FREE to use, just give me credit and it would be nice if you let me know, thanks.
- PLEASE, *NO invitations or self promotion in your comments, THEY WILL BE DELETED. My photos are FREE for anyone to use, just give me credit and it would be nice if you let me know, thanks - NONE OF MY PICTURES ARE HDR.
- PLEASE,invitations or self promotion in your comments, THEY WILL BE DELETED. My photos are FREE for anyone to use, just give me credit and it would be nice if you let me know, thanks - NONE OF MY PICTURES ARE HDR.
- If you find other texts, please let me know. -- DaxServer (talk) 11:38, 20 July 2024 (UTC)
- Thanks. Looks good.
- I went through the output of the test run. It seems it caught almost all of it. 3407985788 has "PLEASE, no multi invitations (none is better) in your comments. Thanks. " in the description left.
- Possibly it could also add {{en|1= }} around the description if {{En}} isn't present. Enhancing999 (talk) 09:07, 22 July 2024 (UTC)
- Nice catch! I'll add the en template, can you confirm if all the descriptions are in English? -- DaxServer (talk) 09:25, 22 July 2024 (UTC)
- In the sample, I think all were, plus, as far as I recall in the ones I categorized for Lucerne. I think it's a reasonable assumption. Enhancing999 (talk) 09:29, 22 July 2024 (UTC)
- Some already have {{En}}. Enhancing999 (talk) 09:30, 22 July 2024 (UTC)
- Below some searches with current number of results (just to compare later).
- Notes
- Special:Search/"I AM POSTING MANY DO NOT FEEL YOU HAVE TO COMMENT ON ALL" Jarvis: 740
- Special:Search/"no multi invitations" Jarvis: 13132
- Special:Search/"self promotion in your comments" Jarvis: 12780
- Special:Search/"THEY WILL BE DELETED" Jarvis: 19,148
- Special:Search/"NONE OF MY PICTURES ARE HDR" Jarvis: 12,684
- Special:Search/insource:"let me know" Jarvis: 19,558
- {{En}} missing:
- Name
- Notes
- Enhancing999 (talk) 11:32, 22 July 2024 (UTC)
- Below some searches with current number of results (just to compare later).
- BTW I'd use {{en|1=some description}} instead of {{en|some description}} to avoid it breaking when the text includes a "=". Enhancing999 (talk) 10:47, 29 July 2024 (UTC)
- Nice catch! I'll add the en template, can you confirm if all the descriptions are in English? -- DaxServer (talk) 09:25, 22 July 2024 (UTC)
- Is there a template we could use for the "it would be nice if you let me know"-part? Enhancing999 (talk) 15:13, 16 July 2024 (UTC)
- Bot requested filed: Commons:Bots/Requests/DaxBot (6) -- DaxServer (talk) 11:47, 20 July 2024 (UTC)
Hidden categories added as Category:Hidden categories
[edit]Hidden categories is a system category added by __HIDDENCAT__
However, some files and even categories add it as regular categories: [[Category:Hidden categories]]
To find some: [2] (currently 468 in category namespace). Enhancing999 (talk) 13:22, 2 June 2024 (UTC)
- I've reduced the numbers with Com:Cat-a-lot. The rest probably should be gone through manually. Jonteemil (talk) 23:04, 3 June 2024 (UTC)
- Shouldn't they be replaced with __HIDDENCAT__? This finds those lacking that. Enhancing999 (talk) 23:18, 3 June 2024 (UTC)
- I'm not sure all 128 categories really should be hidden. That's why I suggest they be gone through manually. Jonteemil (talk) 11:52, 4 June 2024 (UTC)
- Shouldn't they be replaced with __HIDDENCAT__? This finds those lacking that. Enhancing999 (talk) 23:18, 3 June 2024 (UTC)
- Currently 54 hits. Support fixing this.
[[Category:Hidden categories]]
should NOT appear. — Preceding unsigned comment added by Taylor 49 (talk • contribs) 14:12, 26 June 2024 (UTC)- I think is done now. I've edited most of the remaining 43 categories using AWB. I was unsure about Category:Vector files with non-modifiable text since there, Category:Hidden categories is used as piped link.
- {{Section resolved|Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)}} Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)
- Currently 54 hits. Support fixing this.
Thanks for the help. I had done a few as well. While doing the change manually helps adding more precise categories (like {{Source category}} or {{Usercat}} ) . I don't see an issue with systematically converting all uses going forward. Since July 14, a new use has been added: [3]. Maybe a bot that runs daily could include it too. Enhancing999 (talk) 11:54, 16 July 2024 (UTC)
sorting files
[edit]Please help me sort files in the subcategories of Category:Photographs in the Golestan Palace Library by number. Sortkeies should be in three digits as there might be more than a hundred files in each album. Hanooz 15:18, 7 June 2024 (UTC)
- @Hanooz: this seems to be done, too - is this correct? If not, please comment. Thank you!
- Section not resolved| (talk) 10:36, 14 July 2024 (UTC)}} Fl.schmitt (talk) 10:36, 14 July 2024 (UTC)
- It's not, unfortunately. Hanooz 16:10, 14 July 2024 (UTC)
- OK, i've removed the resolved template (sorry, i didn't understand first that you want to sort the files inside the subcategories, not into the categories...). --Fl.schmitt (talk) 17:07, 14 July 2024 (UTC)
- @Hanooz Is this the format - [4] [5] ? -- DaxServer (talk) 19:39, 14 July 2024 (UTC)
- 008.2 (or 008-2) for File:Golestan Palace Album No. 100-8.2.jpg and 008.1 (or 008-1) for File:Golestan Palace Album No. 100-8.1.jpg. What comes after the dot (1 or 2) is recto/verso. Hanooz 19:59, 14 July 2024 (UTC)
- @Hanooz Here is what I gather: https://commons.wikimedia.org/w/index.php?title=User:DaxServer/sandbox&oldid=899125871 from Petscan https://petscan.wmcloud.org/?psid=28923652 I omitted the first few which do not have the pattern "Golestan_Palace_Album_No._" in the title. Please edit them manually setting the desired sortkey. If the table looks good, I can file for the bot and can do the edits. Let me know -- DaxServer (talk) 13:58, 15 July 2024 (UTC)
- Looks great to me. Thanks. Hanooz 16:00, 15 July 2024 (UTC)
- @Hanooz Here is what I gather: https://commons.wikimedia.org/w/index.php?title=User:DaxServer/sandbox&oldid=899125871 from Petscan https://petscan.wmcloud.org/?psid=28923652 I omitted the first few which do not have the pattern "Golestan_Palace_Album_No._" in the title. Please edit them manually setting the desired sortkey. If the table looks good, I can file for the bot and can do the edits. Let me know -- DaxServer (talk) 13:58, 15 July 2024 (UTC)
- 008.2 (or 008-2) for File:Golestan Palace Album No. 100-8.2.jpg and 008.1 (or 008-1) for File:Golestan Palace Album No. 100-8.1.jpg. What comes after the dot (1 or 2) is recto/verso. Hanooz 19:59, 14 July 2024 (UTC)
- It's not, unfortunately. Hanooz 16:10, 14 July 2024 (UTC)
- Filed Commons:Bots/Requests/DaxBot (5) -- DaxServer (talk) 20:46, 15 July 2024 (UTC)
- @Hanooz I believe the sorting is done. Can you verify and mark this resolved? -- DaxServer (talk) 13:33, 28 July 2024 (UTC)
- Yes. Thank you for your assistance. Hanooz 14:07, 28 July 2024 (UTC)
- Pleasure! -- DaxServer (talk) 17:08, 28 July 2024 (UTC)
- Resolved
- Yes. Thank you for your assistance. Hanooz 14:07, 28 July 2024 (UTC)
- @Hanooz I believe the sorting is done. Can you verify and mark this resolved? -- DaxServer (talk) 13:33, 28 July 2024 (UTC)
Revert additions to Category:History by Mitte27
[edit]Thousands of uncategorized files were added to the already-bloated Category:History. All of the edits I find were on 31 May 2024. Could some please automatically revert these edits? Thanks. Cryptic-waveform (talk) 20:55, 24 June 2024 (UTC)
- I don't think it's a good idea to return it. My idea was to then move the files from "Category:History" to more specific categories. --Mitte27 (talk) 09:59, 25 June 2024 (UTC)
- The current status is that thousands of files that were correctly marked as Uncategorized, and therefore easily visible to contributors doing a first round of categorization, are now erroneously categorized in a top-level category. Cryptic-waveform (talk) 13:04, 25 June 2024 (UTC)
- @Mitte27: so when do you plan to move the images to more specific categories? This is clearly not an indefinite solution. —Matrix(!) {user - talk? -
uselesscontributions} 18:55, 26 June 2024 (UTC)- I sorted out some photos related to the history of Russia/USSR, but I have little understanding of American history, and most of the photos in the category are related to it. In any case, this category is better than none. --Mitte27 (talk) 22:29, 26 June 2024 (UTC)
- There is no reason to ever place files into extremely broad categories like Category:History. Please do not remove {{Uncategorized}} unless you are able to either accurate place a file in the most specific categories available or into a dedicated cleanup category. Pi.1415926535 (talk) 00:22, 27 June 2024 (UTC)
- I sorted out some photos related to the history of Russia/USSR, but I have little understanding of American history, and most of the photos in the category are related to it. In any case, this category is better than none. --Mitte27 (talk) 22:29, 26 June 2024 (UTC)
- You could just use cat-a-lot. I don't think adding all LOC or NARA images to "History" by default is a good idea. Enhancing999 (talk) 11:06, 30 June 2024 (UTC)
Convert Category:Photographs by Carol M. Highsmith to JPEG
[edit]Category:Photographs by Carol M. Highsmith is an excellent Library of Congress collection of very good images. Unfortunaly, all those images are in TIFF format, which means that the average file size is 100-300 MB, which is incredibly large. It causes long loading times of even the preview image (let alone the actual file), and TIFF file format is not supported by most browsers or general applications. Wikipedia discourages using TIFF files for those reasons, and this reduces the likelyhood of those excellent images being used.
Therefore, some bot should convert those TIFFs to JPEGs, copy the descriptions/categories and make sure the files reference each other. Further, the categories from the TIFF files should be replaced with Category:LC TIF images with categorized JPGs TheImaCow (talk) 21:59, 30 June 2024 (UTC)
- @TheImaCow Thanks for finding this. I've filed for a bot Commons:Bots/Requests/ImageConverterBot -- DaxServer (talk) 15:13, 1 July 2024 (UTC)
- I didn't expect someone to reply to this so quick, thank you!
- I came across this series via Category:Aerial photographs of the United States and subcats, which contains many poorly categorized images from this collection. TheImaCow (talk) 16:40, 1 July 2024 (UTC)
- LCCN2013631230.tif shows a jpg and several jpg-sizes are offered. Is this really needed? Enhancing999 (talk) 18:44, 1 July 2024 (UTC)
- Hmm, I didn't notice that. It seems it is not necessary after all -- DaxServer (talk) 20:56, 1 July 2024 (UTC)
- Maybe I'm blind, but where are those files offered? It's not the "Download/Use this file/Email a link" bar, all resolutions there only download the same low-quality preview generated by the Mediawiki software (which is shown on the file description page) TheImaCow (talk) 21:19, 1 July 2024 (UTC)
- Below the image, there is a line:
- "Size of this JPG preview of this TIF file: 800 × 533 pixels. Other resolutions: 320 × 213 pixels | 640 × 427 pixels | 1,024 × 683 pixels | 1,280 × 853 pixels | 2,560 × 1,707 pixels | 6,144 × 4,096 pixels."
- The last one matches the tiff. Enhancing999 (talk) 21:52, 1 July 2024 (UTC)
- Oh thanks I see. However this is very obscure and when embedding the file anywhere, it will always refer to the TIF version - so an seperate JPG should probably still be uploaded, like the 220,000 other TIF files in Category:LC TIF images with categorized JPGs (or the 58,000 Category:NARA TIF images with categorized JPGs)
- But I don't have strong opinions on this. TheImaCow (talk) 22:11, 1 July 2024 (UTC)
- Loading a file to test this -- DaxServer (talk) 05:51, 2 July 2024 (UTC)
- Possibly support for tiffs was less developed when they were uploaded. I wonder how all those thousands of duplicates are curated and how much volunteer time is lost by handling two instead of just one copy of every image. WMF recently expressed their view on hosting files on Commons that aren't used on WMF sites [6]. Enhancing999 (talk) 09:22, 2 July 2024 (UTC)
- Well, in theory, those TIF duplicates shouldn't need any curation, as they are supposed to be dumped into the massive categories mentioned above, and only linked from the description of the maintained JPG version.
- The use of TIF is something I think is generally not needed for 99.9% of files, modern compression is more than good enough.
- (I don't oppose eventually getting rid of the TIF duplicates, but there is not even consensus to delete de-facto duplicates where one version is rotated differently by single degrees, or random low quality TIF scans of generic text documents, where the same scans are also uploaded as JPG, so forget it) TheImaCow (talk) 13:28, 2 July 2024 (UTC)
- Oddly, I can't figure out which one of the two maps is correct ;) Did you nominate the wrong one? For the text ones, I'd have nominated the jpg ones. The assumption that deletion doesn't save anything is incorrect: deletion reduces curation (even if theoretically none is needed, it still happens and wastes volunteer time), limits spamming of Special:search, can even save storage space as files can be purged (from non-public view) or wont be exported twice when requested.
- As technology changes, I think views on this evolve. NARA's approach might have been the ideal 15 years ago, but other GLAMS that started only more recently use different approaches. Enhancing999 (talk) 12:13, 3 July 2024 (UTC)
- Not sure what you mean. Both maps are exactly the same. JPG ones nominated instead? Ideally someone uploads a PDF and 307 files are replaced with one in the correct format for documents. I never said I oppose deletions, I said the exact opposite.
- The NARA approach has actually changed - there have been at least two bulk uploads, one in 2011 and the other 2019.
- The 2011 one uploaded nearly every image twice - one TIF+one JPG. The 2019 one uploaded only JPGs.
- Looking at the NARA catalogue, files uploaded earlier have often TIF,JPG and sometimes GIF versions for download. Images uploaded 2019, presumably digitized later, have only high-resulution JPGs for download. TheImaCow (talk) 18:33, 3 July 2024 (UTC)
- It's better to have the lossless files than a JPEG, as you can always make a JPEG from a lossless file, but you can't make a lossless from a JPEG. Still, while we shouldn't delete the TIFFs, we should make JPEG options. Adam Cuerden (talk) 08:52, 4 July 2024 (UTC)
- If we want to offer lossless files in a reasonable sizes (2MB vs 200MB), we might want to consider offering PNGs instead of JPEGs -- DaxServer (talk) 08:57, 4 July 2024 (UTC)
- @DaxServer: Please don't, PNG images look fuzzy when scaled down (due to design decisions discussed in phab:T192744) on WMF projects. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 12:59, 13 July 2024 (UTC)
- If we want to offer lossless files in a reasonable sizes (2MB vs 200MB), we might want to consider offering PNGs instead of JPEGs -- DaxServer (talk) 08:57, 4 July 2024 (UTC)
- It's better to have the lossless files than a JPEG, as you can always make a JPEG from a lossless file, but you can't make a lossless from a JPEG. Still, while we shouldn't delete the TIFFs, we should make JPEG options. Adam Cuerden (talk) 08:52, 4 July 2024 (UTC)
What should be done now? Is there any reason to not do what has been already done successfully with hundreds of thousands of NARA/LOC files? TheImaCow (talk) 11:57, 24 July 2024 (UTC)
- What leads you to describe it as "successful"? How many edits had to be made because we have the same file twice? Enhancing999 (talk) 12:02, 24 July 2024 (UTC)
- "successful" because the TIF versions are dumped into the massive TIF categories linked above and linked in the "other_versions=" parameter at the information template of their respective JPG version, in case anyone needs them. JPG versions are maintained, TIF versions are just there. And there hasn't been much of an issue with as far as I can tell.
- Please show some nessescary manual edits that had to be done twice, because when done right, there aren't any. TheImaCow (talk) 17:02, 24 July 2024 (UTC)
- Any edit on the version you consider secondary is as waste of curation energy. Wouldn't professionally managed archives clean this up beforehand rather than waste our volunteer's time to clean it up?
- Sometimes I wonder if they employ uploaders paid by the number of files uploaded. We seem to end up with books added page by page in duplicate copy from what should be a single djvu document. Enhancing999 (talk) 18:20, 24 July 2024 (UTC)
- "Any edit on the version you consider secondary is as waste of curation energy." - Yes obviously and I fully agree on that. Thats why there are categories like Category:LC TIF images with categorized JPGs - these images have JPG versions which are being maintained, and the TIF files in this category are uncategorized besides being in that category/referenced from the JPG file description page. This means that there is no need to ever do any edits on TIF files in that category, as only the "categorized JPGs" are maintained.
- (Topic Paid by files uploaded: I don't think so, this is simply the format those scans are stored, and with proper software to handle this, there isn't anything wrong - but Commons dosen't have the software, and i fully support efforts to convert single page uploads of books into PDF. The core problem is that Commons software is not designed to handle the same media in multiple file formats, like e.g. the Internet Archive which offers texts for download in countless different file formats from the same description page (random example.)) TheImaCow (talk) 20:03, 24 July 2024 (UTC)
- These seems like an over optimization. Things work, they are a bit slower because someone thought having a very high resolution in a very badly compressible format was desirable. The side effect of very large originals is that it takes a while before the thumbnail is ready. But for 99.9% of the people that isn't a problem. Thubmnails are cached. If images are used, you therefor never have to wait for the thumbnail. You are in the .1% of people (Curators) looking at things that are NOT used. It's acceptable to wait a second in that case. —TheDJ (talk • contribs) 13:47, 24 July 2024 (UTC)
- "someone thought it was desireable" - The Library of Congress used an archival format to archive the images, but we are not the LOC, and have different goals - so we should use file formats better suited for our uses.
- The motto of this site is "freely usable media" - this also includes not having to have very fast internet connections or special programs to process or even fully view the file (at least edge/firefox cannot show raw .tif files in the browser, e.g. when trying to zoom in further), and this list could be expanded endlessly. We shouldn't forget our actual end users, who in general have much less knowledge about dealing with such file types, or generally anything.
- Another issue not yet mentioned is that TIFF files are not indexed by Google Image Search and presumably other search engines, which is bad for obvious reasons. (search for
site:commons.wikimedia.org carol highsmith
, and there are only a couple hundred images which have been manually converted to JPG, but not a single of the 30,000 TIFs, appending filetype:TIFF dosen't return anything at all) TheImaCow (talk) 17:02, 24 July 2024 (UTC)- If Google is broken than we don't want users at Commons having to fix it. Enhancing999 (talk) 18:22, 24 July 2024 (UTC)
- This is fully intentional, as demonstrated plenty of times, TIFF is a format simply not suitable for general web use. TheImaCow (talk) 20:05, 24 July 2024 (UTC)
- Weirdly, it appears to remove jpgs, see #c-Enhancing999-20240701184400-TheImaCow-20240630215900. Enhancing999 (talk) 10:38, 25 July 2024 (UTC)
- This is fully intentional, as demonstrated plenty of times, TIFF is a format simply not suitable for general web use. TheImaCow (talk) 20:05, 24 July 2024 (UTC)
- If Google is broken than we don't want users at Commons having to fix it. Enhancing999 (talk) 18:22, 24 July 2024 (UTC)
- Comment Yes, on Commons, we need JPEG. If the source may not be available in the long term, we should also upload the original TIFF versions, but that not the case here. Be sure to link both versions. That was not done for some other files uploaded from LOC or NARA, and it is not a mess. Yann (talk) 18:09, 24 July 2024 (UTC)
Template:Unknown
[edit]The IP range 64.189.18.0/24 has been blocked for removing the template:Unknown may times (example). I was reverting tens of these edits, but there appear to be hundreds. Could a bot operator revert these edits? Wikiwerner (talk) 17:18, 15 July 2024 (UTC)
- @Wikiwerner: Done, I rollbacked 90 of them in these edits for you. I didn't need a bot, just the rollback right and en:User:Writ Keeper/Scripts/massRollback and the associated .js. Thanks for asking. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 12:48, 16 July 2024 (UTC)
- Thank you very much. I just have browsed all remaining edits not containing the 'reverted' tag and undid these when appropriate. Wikiwerner (talk) 14:46, 16 July 2024 (UTC)
- @Wikiwerner: You're welcome. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 14:57, 16 July 2024 (UTC)
- Thank you very much. I just have browsed all remaining edits not containing the 'reverted' tag and undid these when appropriate. Wikiwerner (talk) 14:46, 16 July 2024 (UTC)
Cities in Finland and China by month
[edit]Hello! I would like to ask you to automatically create categories for the distribution of cities in Finland and China by month. There are corresponding templates: {{MonthinFinlandbycity}} and {{MonthinChinabycity}}. MasterRus21thCentury (talk) 07:22, 17 July 2024 (UTC)
- Hi @MasterRus21thCentury Could you explain it a bit more? Thanks! -- DaxServer (talk) 21:26, 2 August 2024 (UTC)
Remove extraneous "I, " in author param of PD-self
[edit]@Pelikana noticed that there are a lot of erroneous uses of {{PD-self}} which insert "I, " before the author. Could someone please replace {{PD-self|author=I,
with {{PD-self|author=
in the following pages? —CalendulaAsteraceae (talk • contribs) 06:52, 24 July 2024 (UTC)
- I can do that -- DaxServer (talk) 09:46, 24 July 2024 (UTC)
- @CalendulaAsteraceae and @Pelikana: "I, " is there to make the assertion first person. — 🇺🇦Jeff G. ツ please ping or talk to me🇺🇦 10:21, 24 July 2024 (UTC)
- Hi @Jeff Yes, it obviously used to be there to make the assertion first person. But I think at some point the textlines were changed and now IMHO it is a displaced element, plus very odd that it is the only not translated text element in the template, at least in these use cases. Do you mean to say the results are completely correct this way and need no change? Both lines seem grammatically faulty to me "... door de auteur, I, JohnDoe" (".. by the author, I, JohnDoe") and "I, JohnDoe allows ...". Last one should read (in Dutch) "Ik, JohnDoe sta ...." It should not read "I, JohnDoe staat ... " because this line starts in first person and ends in 3rd person. In later days (past 2007-2008) the "I, " "I, " is not in the templates anymore it seems. Peli (talk) 10:52, 24 July 2024 (UTC)
- Indeed. The template uses
{{int:Wm-license-pd-author-with-author-text}}
, which produces the text "This work has been released into the public domain by its author, $1. This applies worldwide." The appropriate way to make this first person would to edit the page on TranslateWiki (well, the English one needs to be changed in MW code, but for other languages this is where you'd edit it), not to manually put "I, " in the author parameter. —CalendulaAsteraceae (talk • contribs) 20:50, 24 July 2024 (UTC)
- Indeed. The template uses
- Hi @Jeff Yes, it obviously used to be there to make the assertion first person. But I think at some point the textlines were changed and now IMHO it is a displaced element, plus very odd that it is the only not translated text element in the template, at least in these use cases. Do you mean to say the results are completely correct this way and need no change? Both lines seem grammatically faulty to me "... door de auteur, I, JohnDoe" (".. by the author, I, JohnDoe") and "I, JohnDoe allows ...". Last one should read (in Dutch) "Ik, JohnDoe sta ...." It should not read "I, JohnDoe staat ... " because this line starts in first person and ends in 3rd person. In later days (past 2007-2008) the "I, " "I, " is not in the templates anymore it seems. Peli (talk) 10:52, 24 July 2024 (UTC)
- I think it is a good idea to add "I, " as a suffix if the uploader is also the work's creator. Please don't replace that. For example, it may not be clear to many or people only or first check the author field where this is useful metadata, especially if the author name is different from the username in which case they would also need to check the license template. Prototyperspective (talk) 12:04, 25 July 2024 (UTC)
- This is a good thing to handle in {{PD-self}} (which is a template only intended to be used by the uploader). Adding it manually means it's a huge pain to update if the wording of the template changes, and also doesn't work with internationalization. Right now
{{int:Wm-license-pd-author-with-author-text|I, Calendula}}
- produces
This work has been released into the public domain by its author, I, Calendula. This applies worldwide.
- in English, which is ungrammatical and frankly silly. If I switch my display language to Spanish, it instead produces
Este trabajo ha sido liberado al dominio público por su autor, I, Calendula. Esto aplica para todo el mundo.
- which is even worse. If you want to change the wording of {{PD-self}}, probably the way to go is switching in the template from int:Wm-license-pd-author-with-author-text to something like int:Wm-license-pd-author-self-text that incorporates the author's name. —CalendulaAsteraceae (talk • contribs) 19:28, 25 July 2024 (UTC)
- I couldn't find an existing piece of text, so I submitted a feature request at phabricator:T371057. I think that further discussion of updates to the text of {{PD-self}} should go to the template talk page, and also that this bot request should go ahead because manually adding "I, " before the author's name is a terrible way to make the template first-person. —CalendulaAsteraceae (talk • contribs) 20:28, 25 July 2024 (UTC)
- You're absolutely right. Sorry, I misunderstood. It's not really clear in your initial post that this would be added to the template instead. Prototyperspective (talk) 21:02, 25 July 2024 (UTC)
Images with borders (MTC)
[edit]Many images in Category:Independence Day 2019 in Brasília have a border. Sample: File:Comemoração da Independência do Brasil (48700486098).jpg
These should be added to Category:Images with borders. Possibly the same applies to more in from the same MTC Flickr stream. Enhancing999 (talk) 11:18, 27 July 2024 (UTC)
- Assuming the border always has the mark to the website www.mctic.gov.br website at left bottom, here's what I thought of: Load the image with OpenCV and extract the left bottom part, use Tesseract to do OCR for the website text, do a sequence match with the extracted text and the website string and if the comparision is very high enough that can be categorized.
- Here is a sample code: https://www.kaggle.com/code/daxserver/detecting-borders-from-brazil-mtc-flickr-images/ -- DaxServer (talk) 17:25, 27 July 2024 (UTC)
- I did some screening on Category:Independence Day 2019 in Brasília by changing the background color of the page. It appears that there are a few images without a border. The ones I checked were all from other Brazilian government agencies. Sample: File:07 09 2019 - Desfile 7 de setembro. (50751888331).jpg.
- The magic border locator of the crop tool does work fairly reliably on these images. Sample: https://croptool.toolforge.org/?site=undefined&title=Comemora%C3%A7%C3%A3o%20da%20Independ%C3%AAncia%20do%20Brasil%20(48700486098).jpg&page=undefined
- The only problem with directly cropping them seems to be that the file description pages don't include all details from the borders. Enhancing999 (talk) 10:53, 29 July 2024 (UTC)
- The magic borders module is interesting. Perhaps we can employ that to detect a border. I'll do some tests -- DaxServer (talk) 13:57, 29 July 2024 (UTC)
Missing "-" in coordinates (MX)
[edit]Some of the Mexico images by a former contributor show locations in Asia [7]. This seems to be due to a missing "-" in the coordinates. Sample fix: Special:Diff/905615293. I fixed a few myself. Enhancing999 (talk) 10:18, 1 August 2024 (UTC)
- Done ~74 edits, I did it with
pwb replace
using my normal account -- DaxServer (talk) 12:08, 1 August 2024 (UTC)
Add OCR output to jpg
[edit]From the discussion at VP/T, I found a solution to a problem identified earlier: frequently we have images of streets and other with some text in it. Sometimes this is of interest, but it's not necessarily included in filename or description.
https://ocr.wmcloud.org/ would allow to extract such text and make it editable on Commons.
Ideally a bot would go through new uploads (and also some maintenance category for older files) and run https://ocr.wmcloud.org/ on it. The output (if any) could be added to the file description page, either with a template or as structured data.
Sample file:
Input:
Output:
- "PER PONTEM AD FORTUNAM GOURNAY-SUR-MARNE RUE DES LAURIERS"
Enhancing999 (talk) 15:16, 2 August 2024 (UTC)
Move "Historical images of" to "History of"
[edit]Per note at Category:Historical images by country (as conclusion from Commons:Categories for discussion/2019/09/Category:Historical images), the content of the categories at Special:PrefixIndex/Category:Historical images of should be moved to "History of". This seems to involve more than 10'000 categories, see PetScan:29034509. I think the resulting redirect could afterwards be tagged for speedy deletion. Enhancing999 (talk) 18:59, 2 August 2024 (UTC)
- i dont think it's a good idea to handle this problem without human supervision.
- i would rather do these instead:
- prohibit new categories with the word from being created.
- let users slowly move the files to the appropriate categories (by time).
- RZuo (talk) 20:42, 2 August 2024 (UTC)
- "history of ..." is not any better. everything is history. RZuo (talk) 20:43, 2 August 2024 (UTC)
- There is just no way this can be done manually. If there are cases you think would be problematic, please state them here. Enhancing999 (talk) 20:56, 2 August 2024 (UTC)